llvm-project

Author	SHA1	Message	Date
Matt Arsenault	c180fc80dc	AMDGPU: Replace unused permlane inputs with poison instead of undef (#131288 )	2025-03-18 17:37:44 +07:00
Matt Arsenault	052eca9ff7	AMDGPU: Replace unused update.dpp inputs with poison instead of undef (#131287 )	2025-03-18 17:33:58 +07:00
Matt Arsenault	8392573469	AMDGPU: Replace unused export inputs with poison instead of undef (#131286 )	2025-03-18 17:30:42 +07:00
Matt Arsenault	4a3ee4f72d	AMDGPU: Make fma_legacy intrinsic propagate poison (#131063 )	2025-03-14 11:42:47 +07:00
Matt Arsenault	37706894f8	AMDGPU: Make fmul_legacy intrinsic propagate poison (#131062 )	2025-03-14 11:39:47 +07:00
Matt Arsenault	a716459f2d	AMDGPU: Make ballot intrinsic propagate poison (#131061 )	2025-03-14 11:36:44 +07:00
Matt Arsenault	0d8a22d6ad	AMDGPU: Make fmed3 intrinsic propagate poison (#131060 )	2025-03-14 11:30:52 +07:00
Matt Arsenault	9b887f5277	AMDGPU: Make cvt_pknorm and cvt_pk intrinsics propagate poison (#131059 )	2025-03-14 11:27:50 +07:00
Matt Arsenault	0a78bd67b3	AMDGPU: Make frexp_exp and frexp_mant intrinsics propagate poison (#130915 )	2025-03-13 10:07:45 +07:00
Matt Arsenault	d8f17b3de1	AMDGPU: Make sqrt and rsq intrinsics propagate poison (#130914 )	2025-03-13 10:01:48 +07:00
Matt Arsenault	95ab95fd10	AMDGPU: Make rcp intrinsic propagate poison (#130913 )	2025-03-13 09:58:46 +07:00
Matt Arsenault	af755af200	AMDGPU: Handle demanded subvectors for readfirstlane (#128648 )	2025-03-07 17:54:15 +07:00
Jay Foad	78281fd12c	Revert "[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments (#129895 )" This reverts commit be5149a3158cbce3051629e450950ccb96926365. It caused build failures in the openmp-offload-amdgpu-runtime buildbot and others.	2025-03-06 15:05:19 +00:00
Jay Foad	be5149a315	[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments (#129895 )	2025-03-06 14:31:59 +00:00
Matt Arsenault	5c375c3283	AMDGPU: Fix worklist management in simplifyDemandedVectorEltsIntrinsic Fixes bot sanitizer error, but it does leave behind a dead instruction if there is a bundle for some reason.	2025-03-05 16:39:19 +07:00
Matt Arsenault	95c64b7ee6	AMDGPU: Reduce readfirstlane for single demanded vector element (#128647 ) If we are only extracting a single element, rewrite the intrinsic call to use the element type. We should extend this to arbitrary extract shuffles.	2025-03-05 08:35:56 +07:00
Matt Arsenault	d410f093da	AMDGPU: Simplify demanded vector elts of readfirstlane sources (#128646 ) Stub implementation of simplifyDemandedVectorEltsIntrinsic for readfirstlane.	2025-02-28 13:01:10 +07:00
Matt Arsenault	447abfcc09	AMDGPU: Fold bitcasts into readfirstlane, readlane, and permlane64 (#128494 ) We should handle this for all the handled readlane and dpp ops.	2025-02-27 20:59:11 +07:00
Matt Arsenault	5deb2aa9eb	AMDGPU: Make is.shared and is.private propagate poison (#128617 )	2025-02-25 12:56:43 +07:00
Fraser Cormack	c82a6a0251	[AMDGPU] Use correct vector elt type when shrinking mfma scale (#123043 ) This might be a copy/paste error. I don't think this an issue in practice as the builtins/intrinsics are only legal with identical vector element types.	2025-01-15 14:28:42 +00:00
Ramkumar Ramachandra	4a0d53a0b0	PatternMatch: migrate to CmpPredicate (#118534 ) With the introduction of CmpPredicate in 51a895a (IR: introduce struct with CmpInst::Predicate and samesign), PatternMatch is one of the first key pieces of infrastructure that must be updated to match a CmpInst respecting samesign information. Implement this change to Cmp-matchers. This is a preparatory step in migrating the codebase over to CmpPredicate. Since we no functional changes are desired at this stage, we have chosen not to migrate CmpPredicate::operator==(CmpPredicate) calls to use CmpPredicate::getMatching(), as that would have visible impact on tests that are not yet written: instead, we call CmpPredicate::operator==(Predicate), preserving the old behavior, while also inserting a few FIXME comments for follow-ups.	2024-12-13 14:18:33 +00:00
Matt Arsenault	c74e2232f2	AMDGPU: Simplify demanded bits on readlane/writeline index arguments (#117963 ) The main goal is to fold away wave64 code when compiled for wave32. If we have out of bounds indexing, these will now clamp down to a low bit which may CSE with the operations on the low half of the wave.	2024-12-06 10:31:14 -05:00
Alex Voicu	48ec59c234	[llvm][AMDGPU] Fold `llvm.amdgcn.wavefrontsize` early (#114481 ) Fold `llvm.amdgcn.wavefrontsize` early, during InstCombine, so that it's concrete value is used throughout subsequent optimisation passes.	2024-11-25 10:29:50 +00:00
Matt Arsenault	0a6e8741dd	AMDGPU: Shrink used number of registers for mfma scale based on format (#117047 ) Currently the builtins assume you are using an 8-bit format that requires an 8 element vector. We can shrink the number of registers if the format requires 4 or 6.	2024-11-21 09:08:05 -08:00
Matt Arsenault	01c9a14ccf	AMDGPU: Define v_mfma_f32_{16x16x128\|32x32x64}_f8f6f4 instructions (#116723 ) These use a new VOP3PX encoding for the v_mfma_scale_* instructions, which bundles the pre-scale v_mfma_ld_scale_b32. None of the modifiers are supported yet (op_sel, neg or clamp). I'm not sure the intrinsic should really expose op_sel (or any of the others). If I'm reading the documentation correctly, we should be able to just have the raw scale operands and auto-match op_sel to byte extract patterns. The op_sel syntax also seems extra horrible in this usage, especially with the usual assumed op_sel_hi=-1 behavior.	2024-11-21 08:51:58 -08:00
Matt Arsenault	ca1b35a6c8	AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310 ) Rand num instruction for stochastic rounding.	2024-11-18 10:54:54 -08:00
Jay Foad	85c17e4092	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706 ) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.	2024-10-17 16:20:43 +01:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Jay Foad	d2d947b7e2	[AMDGPU] Fold llvm.amdgcn.cvt.pkrtz when either operand is fpext (#108237 ) This also generalizes the Undef handling and adds Poison handling.	2024-09-18 09:37:04 +01:00
Jay Foad	ff7eb1d0e9	[AMDGPU] Simplify API of matchFPExtFromF16. NFC. (#108223 )	2024-09-11 17:03:27 +01:00
Jay Foad	f142f8afe2	[AMDGPU] Improve uniform argument handling in InstCombineIntrinsic (#105812 ) Common up handling of intrinsics that are a no-op on uniform arguments. This catches a couple of new cases: readlane (readlane x, y), z -> readlane x, y (for any z, does not have to equal y). permlane64 (readfirstlane x) -> readfirstlane x (and likewise for any other uniform argument to permlane64).	2024-08-23 14:43:31 +01:00
Changpeng Fang	06ab30b574	[AMDGPU] Constant folding of llvm.amdgcn.trig.preop (#98562 ) If the parameters(the input and segment select) coming in to amdgcn.trig.preop intrinsic are compile time constants, we pre-compute the output of amdgcn.trig.preop on the CPU and replaces the uses with the computed constant. This work extends the patch https://reviews.llvm.org/D120150 to make it a complete coverage. For the segment select, only src1[4:0] are used. A segment select is invalid if we are selecting the 53-bit segment beyond the [1200:0] range of the 2/PI table. 0 is returned when a segment select is not valid.	2024-07-18 09:40:37 -07:00
Jay Foad	18ec885a26	[RFC][AMDGPU] Remove old llvm.amdgcn.buffer.* and tbuffer intrinsics (#93801 ) They have been superseded by llvm.amdgcn.raw.buffer.* and llvm.amdgcn.struct.buffer.*.	2024-06-10 12:14:51 +01:00
Eli Friedman	f893dccbba	Replace uses of ConstantExpr::getCompare. (#91558 ) Use ICmpInst::compare() where possible, ConstantFoldCompareInstOperands in other places. This only changes places where the either the fold is guaranteed to succeed, or the code doesn't use the resulting compare if we fail to fold.	2024-05-09 16:50:01 -07:00
Artem Tyurin	141145232f	[IRBuilder] Fold binary intrinsics (#80743 ) Fixes https://github.com/llvm/llvm-project/issues/61240.	2024-03-15 09:58:25 +01:00
Yingwei Zheng	930996e9e4	[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657 ) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by https://github.com/llvm/llvm-project/pull/73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](`e17bc67547/include/fmt/format.h (L3555-L3566)`)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated.	2024-02-06 02:30:12 +08:00
Valery Pykhtin	b8025d1482	Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#80303 ) Reapply #71556 with added lit test constraint: `REQUIRES: amdgpu-registered-target`. This reverts commit 9791e5414960f92396582b9e9ee503ac15799312.	2024-02-02 13:09:25 +01:00
Matt Arsenault	65f486c45d	AMDGPU: Simplify else if to just else in AMDGPUInstCombineIntrinsic Fixes #79738	2024-01-30 08:17:03 +05:30
Valery Pykhtin	9791e54149	Revert "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (#78429 ) Reverts llvm/llvm-project#71556 Fixes failures: https://lab.llvm.org/buildbot/#/builders/188/builds/40541 https://lab.llvm.org/buildbot/#/builders/91/builds/21847 https://lab.llvm.org/buildbot/#/builders/98/builds/31671 https://lab.llvm.org/buildbot/#/builders/139/builds/57289	2024-01-17 14:12:07 +01:00
Valery Pykhtin	57b50ef017	[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode. (#71556 ) Substitute with zero-extended to i64 ballot.i32 intrinsic.	2024-01-17 17:02:05 +07:00
Mariusz Sikora	2b83ceee3d	[AMDGPU][GFX12] Default component broadcast store (#76212 ) For image and buffer stores the default behaviour on GFX12 is to set all unset components to the value of the first component. So if we pass only X component, it will be the same as XXXX, or XY same as XYXX. This patch simplifies the passed vector of components in InstCombine by removing components from the end that are equal to the first component. For image stores it also trims DMask if necessary. --------- Co-authored-by: Mateja Marjanovic <mmarjano@amd.com>	2024-01-12 08:26:08 +01:00
Nikita Popov	9d60e95bcd	[AMDGPU] Use poison instead of undef for non-demanded elements (#75914 ) Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook. Also bail out of dmask is 0, as this case has special semantics: > If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by > LWE status if exists. TFE status is not generated since the fetch is dropped.	2023-12-20 11:01:59 +01:00
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Nikita Popov	bc7ca9170f	[AMDGPUInstCombine] Avoid use of ConstantExpr::getSExt() (NFC) Let the IRBuilder handle the constant folding instead.	2023-10-02 11:12:04 +02:00
Matt Arsenault	edecb60481	Reapply "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit d9333e360a7c52587ab6e4328e7493b357fb2cf3.	2023-09-13 08:38:48 +03:00
Matt Arsenault	61c8af6792	AMDGPU: InstCombine amdgcn.sqrt.f16 to sqrt.f16 There's nothing special about f16 sqrt handling. https://reviews.llvm.org/D158090	2023-08-23 20:30:40 -04:00
Matt Arsenault	7c4aa3b37e	AMDGPU: InstCombine amdgcn.rcp(amdgcn.sqrt) -> amdgcn.rsq We currently have some wrong combines in the backend that approximately do this. https://reviews.llvm.org/D158002	2023-08-16 10:04:13 -04:00
Matt Arsenault	5ccfc4543d	AMDGPU: Fold away mbcnt.hi in wave32 mode This will allow libraries to drop some of the special casing based on wave size.	2023-06-30 15:04:03 -04:00
Matt Arsenault	d9333e360a	Revert "AMDGPU: Drop and auto-upgrade llvm.amdgcn.ldexp to llvm.ldexp" This reverts commit 1159c670d40e3ef302264c681fe7e0268a550874. Accidentally pushed wrong patch	2023-06-16 18:13:07 -04:00

1 2 3

107 Commits