llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	a153e83e41	[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036 )	2025-08-04 19:16:34 -07:00
Changpeng Fang	d6094370cb	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-21 10:09:42 -07:00
Pierre van Houtryve	f223411e2e	[InstCombine]PtrReplacer: Correctly handle select with unavailable operands (#148829 ) The testcase I added previously failed because a SelectInst with invalid operands was created (one side `addrspace(4)`, the other `addrspace(5)`). PointerReplacer needs to dig deeper if the true and/or false instructions of the select are not available. Fixes SWDEV-542957	2025-07-16 09:32:05 +02:00
Darren Wihandi	9f3931b659	[AMDGPU] Fold fmed3 when inputs include infinity (#144824 )	2025-06-24 21:44:17 +09:00
Wenju He	9d570d568b	[ValueTracking] Return true for AddrSpaceCast in canCreateUndefOrPoison (#144686 ) In our downstream GPU target, following IR is valid before instcombine although the second addrspacecast causes UB. define i1 @test(ptr addrspace(1) noundef %v) { %0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4) %1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0) %2 = icmp eq i32 %1, 0 %3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3) %4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3 %5 = icmp eq ptr addrspace(3) %4, null ret i1 %5 } We have a custom optimization that replaces invalid addrspacecast with poison, and IR is still valid since `select` stops poison propagation. However, instcombine pass optimizes `select` to `or`: %0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4) %1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0) %2 = icmp eq i32 %1, 0 %3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3) %4 = icmp eq ptr addrspace(3) %3, null %5 = or i1 %2, %4 ret i1 %5 The transform is invalid for our target. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2025-06-24 08:43:47 +08:00
Anshil Gandhi	a314ac4d22	[Reland][InstCombine] Iterative replacement in PtrReplacer (#145410 ) This patch enhances the PtrReplacer as follows: 1. Users are now collected iteratively to be generous on the stack. In the case of PHIs with incoming values which have not yet been visited, they are pushed back into the stack for reconsideration. 2. Replace users of the pointer root in a reverse-postorder traversal, instead of a simple traversal over the collected users. This reordering ensures that the uses of an instruction are replaced before replacing the instruction itself. 3. During the replacement of PHI, use the same incoming value if it does not have a replacement. This patch specifically fixes the case when an incoming value of a PHI is addrspacecasted. This reland PR includes a fix for an assertion failure caused by https://github.com/llvm/llvm-project/pull/137215, which was reverted. The failing test involved a phi and gep depending on each other, in which case the PtrReplacer did not order them correctly for replacement. This patch fixes it by adding a check during the definition of `PostOrderWorklist`.	2025-06-23 20:35:40 -04:00
Anshil Gandhi	72979093e7	Revert "[Reland][InstCombine] Iterative replacement in PtrReplacer" (#145137 ) Reverts llvm/llvm-project#144626	2025-06-20 22:51:23 -04:00
Anshil Gandhi	94865edfa8	[Reland][InstCombine] Iterative replacement in PtrReplacer (#144626 ) This patch enhances the PtrReplacer as follows: 1. Users are now collected iteratively to be generous on the stack. In the case of PHIs with incoming values which have not yet been visited, they are pushed back into the stack for reconsideration. 2. Replace users of the pointer root in a reverse-postorder traversal, instead of a simpletraversal over the collected users. This reordering ensures that the uses of an instruction are replaced before replacing the instruction itself. 3. During the replacement of PHI, use the same incoming value if it does not have a replacement. This patch specifically fixes the case when an incoming value of a PHI is addrspacecasted. This is a reland of https://github.com/llvm/llvm-project/pull/137215.	2025-06-20 18:03:54 -04:00
Harrison Hao	0defde8e06	[AMDGPU] Support D16 folding for image.sample with multiple extractelement and fptrunc users (#141758 ) Now we only support D16 folding for `image sample` instructions with a single user: a `fptrunc` to half. However, we can actually support D16 folding for image.sample instructions with multiple users, as long as each user follows the pattern of extractelement followed by fptrunc to half. For example: ``` %sample = call <4 x float> @llvm.amdgcn.image.sample %e0 = extractelement <4 x float> %sample, i32 0 %h0 = fptrunc float %e0 to half %e1 = extractelement <4 x float> %sample, i32 1 %h1 = fptrunc float %e1 to half %e2 = extractelement <4 x float> %sample, i32 2 %h2 = fptrunc float %e2 to half ``` This change enables D16 folding for such cases and avoids generating `v_cvt_f16_f32_e32` instructions.	2025-06-18 09:00:07 +08:00
Anshil Gandhi	c62a6138d9	Revert "[InstCombine] Iterative replacement in PtrReplacer" (#144394 ) Reverts llvm/llvm-project#137215 This commit caused a failure in the LLVM CI: https://lab.llvm.org/buildbot/#/builders/10/builds/7442	2025-06-16 13:05:31 -04:00
Anshil Gandhi	8bbef3d1c9	[InstCombine] Iterative replacement in PtrReplacer (#137215 ) This patch enhances the PtrReplacer as follows: 1. Users are now collected iteratively to be generous on the stack. In the case of PHIs with incoming values which have not yet been visited, they are pushed back into the stack for reconsideration. 2. Replace users of the pointer root in a reverse-postorder traversal, instead of a simple traversal over the collected users. This reordering ensures that the operands of an instruction are replaced before replacing the instruction itself. 3. During the replacement of PHI, use the same incoming value if it does not have a replacement. This patch specifically fixes the case when an incoming value of a PHI is addrspacecasted.	2025-06-16 12:46:54 -04:00
Jay Foad	c400fe24ae	[AMDGPU] Update failing test after #129897	2025-06-09 12:33:33 +01:00
Jay Foad	6b25f4439c	[AMDGPU] Detect trivially uniform arguments in InstCombine (#129897 ) Update one test to use an SGPR argument as the simplest way of getting a uniform value.	2025-06-09 12:06:03 +01:00
Matt Arsenault	fabbc40a36	AMDGPU: Make llvm.amdgcn.make.buffer.rsrc propagate poison (#141913 )	2025-05-29 15:38:29 +02:00
Pierre van Houtryve	2278f5e65b	[AMDGPU] Hoist readlane/readfirstlane through unary/binary operands (#129037 ) When a read(first)lane is used on a binary operator and the intrinsic is the only user of the operator, we can move the read(first)lane into the operand if the other operand is uniform. Unfortunately IC doesn't let us access UniformityAnalysis and thus we can't truly check uniformity, we have to do with a basic uniformity check which only allows constants or trivially uniform intrinsics calls. We can also do the same for unary and cast operators.	2025-05-13 12:00:49 +02:00
Matt Arsenault	038d357dde	AMDGPU: Use minimumnum/maximumnum for fmed3 with amdgpu-ieee=0 (#139546) Try to respect the signaling nan behavior of the instruction, so also start the special case fold for src2.	2025-05-12 20:31:52 +02:00
Matt Arsenault	08dd0406c6	AMDGPU: Use minnum instead of maxnum for fmed3 src2-nan fold (#139531 ) By the pseudocode in the ISA manual, if any input is a nan it acts like min3, which will fold to min2 of the other operands. The other cases fold to min, I'm not sure how this one was wrong.	2025-05-12 20:26:29 +02:00
Matt Arsenault	83107e02ea	AMDGPU: Disable most fmed3 folds for strictfp (#139530 )	2025-05-12 20:21:02 +02:00
Matt Arsenault	e805d83487	AMDGPU: Add more tests for fmed3 instcombine folds (#139529 ) Add test with snan literals, and test with and without amdgpu-ieee	2025-05-12 20:18:07 +02:00
Matt Arsenault	4b89339899	AMDGPU: Reorganize fmed3 intrinsic instcombine tests (#139498 )	2025-05-12 10:05:03 +02:00
Alexander Richardson	ee13638362	[AMDGPU] Remove explicit datalayout from tests where not needed Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the correct datalayout when given a triple. Avoid explicitly specifying it in tests that depend on the AMDGPU target being present to avoid the string becoming out of sync with the TargetInfo value. Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg were updated to ensure that tests for non-target-specific passes that happen to use the AMDGPU layout still pass when building with a limited set of targets. Reviewed By: shiltian, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/137921	2025-04-30 10:58:17 -07:00
Matt Arsenault	dadea96791	AMDGPU: Add range to wavefrontsize intrinsic declaration (#136303 )	2025-04-25 10:19:47 +02:00
Jay Foad	e3350a6263	[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments (#130133 ) Reland #129895 with a fix to avoid trying to combine bpermute of bitcast.	2025-04-10 10:36:38 +01:00
Juan Manuel Martinez Caamaño	0375ef07c3	[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741 ) This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as a 4-bit signed integer and returns `0.0625f * src`. SWDEV-518861	2025-04-02 19:51:40 +02:00
Matt Arsenault	c180fc80dc	AMDGPU: Replace unused permlane inputs with poison instead of undef (#131288 )	2025-03-18 17:37:44 +07:00
Matt Arsenault	052eca9ff7	AMDGPU: Replace unused update.dpp inputs with poison instead of undef (#131287 )	2025-03-18 17:33:58 +07:00
Matt Arsenault	8392573469	AMDGPU: Replace unused export inputs with poison instead of undef (#131286 )	2025-03-18 17:30:42 +07:00
Matt Arsenault	4a3ee4f72d	AMDGPU: Make fma_legacy intrinsic propagate poison (#131063 )	2025-03-14 11:42:47 +07:00
Matt Arsenault	37706894f8	AMDGPU: Make fmul_legacy intrinsic propagate poison (#131062 )	2025-03-14 11:39:47 +07:00
Matt Arsenault	a716459f2d	AMDGPU: Make ballot intrinsic propagate poison (#131061 )	2025-03-14 11:36:44 +07:00
Matt Arsenault	0d8a22d6ad	AMDGPU: Make fmed3 intrinsic propagate poison (#131060 )	2025-03-14 11:30:52 +07:00
Matt Arsenault	9b887f5277	AMDGPU: Make cvt_pknorm and cvt_pk intrinsics propagate poison (#131059 )	2025-03-14 11:27:50 +07:00
Matt Arsenault	0a78bd67b3	AMDGPU: Make frexp_exp and frexp_mant intrinsics propagate poison (#130915 )	2025-03-13 10:07:45 +07:00
Matt Arsenault	d8f17b3de1	AMDGPU: Make sqrt and rsq intrinsics propagate poison (#130914 )	2025-03-13 10:01:48 +07:00
Matt Arsenault	95ab95fd10	AMDGPU: Make rcp intrinsic propagate poison (#130913 )	2025-03-13 09:58:46 +07:00
Changpeng Fang	fa45bf4300	InstCombine: Fix a crash in `PointerReplacer` when constructing a new PHI (#130256 ) When constructing a PHI node in `PointerReplacer::replace`, the incoming operands are expected to have already been replaced and in the replacement map. However, when one of the incoming operands is a load, the search of the map is unsuccessful, and a nullptr is returned from `getReplacement`. The reason is that, when a load is replaced, all the uses of the load has been actually replaced by the new load. It is useless to insert the original load into the map. Instead, we should place the new load into the map to meet the expectation of the later map search. Fixes: SWDEV-516420	2025-03-09 20:21:36 -07:00
Matt Arsenault	4136395ddc	AMDGPU: Regenerate test checks in instcombine test Passed precheck before 844a1d52a8f5dff032cbf58288675ad1e678d609 started deleting the dead instructions	2025-03-07 19:25:18 +07:00
Matt Arsenault	af755af200	AMDGPU: Handle demanded subvectors for readfirstlane (#128648 )	2025-03-07 17:54:15 +07:00
Jay Foad	78281fd12c	Revert "[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments (#129895 )" This reverts commit be5149a3158cbce3051629e450950ccb96926365. It caused build failures in the openmp-offload-amdgpu-runtime buildbot and others.	2025-03-06 15:05:19 +00:00
Jay Foad	be5149a315	[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments (#129895 )	2025-03-06 14:31:59 +00:00
Jay Foad	757554ee26	Regenerate checks in test/Transforms/InstCombine/AMDGPU/. NFC.	2025-03-06 14:29:34 +00:00
Yingwei Zheng	844a1d52a8	[IR] Return correct memory effects for `convergencectrl` (#129874 ) `convergencectrl` doesn't imply any memory access. Closes https://github.com/llvm/llvm-project/issues/129856.	2025-03-05 22:14:47 +08:00
Matt Arsenault	5c375c3283	AMDGPU: Fix worklist management in simplifyDemandedVectorEltsIntrinsic Fixes bot sanitizer error, but it does leave behind a dead instruction if there is a bundle for some reason.	2025-03-05 16:39:19 +07:00
Matt Arsenault	95c64b7ee6	AMDGPU: Reduce readfirstlane for single demanded vector element (#128647 ) If we are only extracting a single element, rewrite the intrinsic call to use the element type. We should extend this to arbitrary extract shuffles.	2025-03-05 08:35:56 +07:00
Matt Arsenault	d410f093da	AMDGPU: Simplify demanded vector elts of readfirstlane sources (#128646 ) Stub implementation of simplifyDemandedVectorEltsIntrinsic for readfirstlane.	2025-02-28 13:01:10 +07:00
Matt Arsenault	2fa6c5265e	AMDGPU: Add baseline tests for simplify elts of readfirstlane (#128645 )	2025-02-28 12:56:58 +07:00
Matt Arsenault	447abfcc09	AMDGPU: Fold bitcasts into readfirstlane, readlane, and permlane64 (#128494 ) We should handle this for all the handled readlane and dpp ops.	2025-02-27 20:59:11 +07:00
Matt Arsenault	29c5e4289f	AMDGPU: Add baseline tests for bitcast + readlane intrinsics (#128493 )	2025-02-26 14:22:57 +07:00
Matt Arsenault	5deb2aa9eb	AMDGPU: Make is.shared and is.private propagate poison (#128617 )	2025-02-25 12:56:43 +07:00
Matt Arsenault	28002dd50f	AMDGPU: Replace some undef pointer uses in test	2025-02-25 09:34:15 +07:00

1 2 3 4

197 Commits