197 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
a153e83e41
[AMDGPU] gfx1250 v_wmma_scale[16]_f32_16x16x128_f8f6f4 codegen (#152036) 2025-08-04 19:16:34 -07:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Pierre van Houtryve
f223411e2e
[InstCombine]PtrReplacer: Correctly handle select with unavailable operands (#148829)
The testcase I added previously failed because a SelectInst with invalid
operands was created (one side `addrspace(4)`, the other
`addrspace(5)`).
PointerReplacer needs to dig deeper if the true and/or false
instructions of the select are not available.

Fixes SWDEV-542957
2025-07-16 09:32:05 +02:00
Darren Wihandi
9f3931b659
[AMDGPU] Fold fmed3 when inputs include infinity (#144824) 2025-06-24 21:44:17 +09:00
Wenju He
9d570d568b
[ValueTracking] Return true for AddrSpaceCast in canCreateUndefOrPoison (#144686)
In our downstream GPU target, following IR is valid before instcombine
although the second addrspacecast causes UB.
  define i1 @test(ptr addrspace(1) noundef %v) {
    %0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
    %1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
    %2 = icmp eq i32 %1, 0
    %3 = addrspacecast ptr addrspace(4) %0 to ptr addrspace(3)
    %4 = select i1 %2, ptr addrspace(3) null, ptr addrspace(3) %3
    %5 = icmp eq ptr addrspace(3) %4, null
    ret i1 %5
  }
We have a custom optimization that replaces invalid addrspacecast with
poison, and IR is still valid since `select` stops poison propagation.

However, instcombine pass optimizes `select` to `or`:
    %0 = addrspacecast ptr addrspace(1) %v to ptr addrspace(4)
    %1 = call i32 @llvm.xxxx.isaddr.shared(ptr addrspace(4) %0)
    %2 = icmp eq i32 %1, 0
    %3 = addrspacecast ptr addrspace(1) %v to ptr addrspace(3)
    %4 = icmp eq ptr addrspace(3) %3, null
    %5 = or i1 %2, %4
    ret i1 %5
The transform is invalid for our target.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-06-24 08:43:47 +08:00
Anshil Gandhi
a314ac4d22
[Reland][InstCombine] Iterative replacement in PtrReplacer (#145410)
This patch enhances the PtrReplacer as follows:

1. Users are now collected iteratively to be generous on the stack. In
the case of PHIs with incoming values which have not yet been visited,
they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal,
instead of a simple traversal over the collected users. This reordering
ensures that the uses of an instruction are replaced before replacing
the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does
not have a replacement.

This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.

This reland PR includes a fix for an assertion failure caused by
https://github.com/llvm/llvm-project/pull/137215, which was reverted.
The failing test involved a phi and gep depending on each other, in
which case the PtrReplacer did not order them correctly for replacement.
This patch fixes it by adding a check during the definition of
`PostOrderWorklist`.
2025-06-23 20:35:40 -04:00
Anshil Gandhi
72979093e7
Revert "[Reland][InstCombine] Iterative replacement in PtrReplacer" (#145137)
Reverts llvm/llvm-project#144626
2025-06-20 22:51:23 -04:00
Anshil Gandhi
94865edfa8
[Reland][InstCombine] Iterative replacement in PtrReplacer (#144626)
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In the case of PHIs with incoming values which have not yet been visited, they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal, instead of a simpletraversal over the collected users. This reordering ensures that the uses of an instruction are replaced before replacing the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does not have a replacement.

This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.

This is a reland of https://github.com/llvm/llvm-project/pull/137215.
2025-06-20 18:03:54 -04:00
Harrison Hao
0defde8e06
[AMDGPU] Support D16 folding for image.sample with multiple extractelement and fptrunc users (#141758)
Now we only support D16 folding for `image sample` instructions with a
single user: a `fptrunc` to half.
However, we can actually support D16 folding for image.sample
instructions with multiple users,
as long as each user follows the pattern of extractelement followed by
fptrunc to half.
For example:
```
  %sample = call <4 x float> @llvm.amdgcn.image.sample
  %e0 = extractelement <4 x float> %sample, i32 0
  %h0 = fptrunc float %e0 to half
  %e1 = extractelement <4 x float> %sample, i32 1
  %h1 = fptrunc float %e1 to half
  %e2 = extractelement <4 x float> %sample, i32 2
  %h2 = fptrunc float %e2 to half
```
This change enables D16 folding for such cases and avoids generating
`v_cvt_f16_f32_e32` instructions.
2025-06-18 09:00:07 +08:00
Anshil Gandhi
c62a6138d9
Revert "[InstCombine] Iterative replacement in PtrReplacer" (#144394)
Reverts llvm/llvm-project#137215

This commit caused a failure in the LLVM CI:
https://lab.llvm.org/buildbot/#/builders/10/builds/7442
2025-06-16 13:05:31 -04:00
Anshil Gandhi
8bbef3d1c9
[InstCombine] Iterative replacement in PtrReplacer (#137215)
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In
the case of PHIs with incoming values which have not yet been visited,
they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal,
instead of a simple traversal over the collected users. This reordering
ensures that the operands of an instruction are replaced before
replacing the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does
not have a replacement.

This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
2025-06-16 12:46:54 -04:00
Jay Foad
c400fe24ae [AMDGPU] Update failing test after #129897 2025-06-09 12:33:33 +01:00
Jay Foad
6b25f4439c
[AMDGPU] Detect trivially uniform arguments in InstCombine (#129897)
Update one test to use an SGPR argument as the simplest way of getting a
uniform value.
2025-06-09 12:06:03 +01:00
Matt Arsenault
fabbc40a36
AMDGPU: Make llvm.amdgcn.make.buffer.rsrc propagate poison (#141913) 2025-05-29 15:38:29 +02:00
Pierre van Houtryve
2278f5e65b
[AMDGPU] Hoist readlane/readfirstlane through unary/binary operands (#129037)
When a read(first)lane is used on a binary operator and the intrinsic is
the only user of the operator, we can move the read(first)lane into the
operand if the other operand is uniform.

Unfortunately IC doesn't let us access UniformityAnalysis and thus we
can't truly check uniformity, we have to do with a basic uniformity
check which only allows constants or trivially uniform intrinsics calls.

We can also do the same for unary and cast operators.
2025-05-13 12:00:49 +02:00
Matt Arsenault
038d357dde
AMDGPU: Use minimumnum/maximumnum for fmed3 with amdgpu-ieee=0
(#139546)

Try to respect the signaling nan behavior of the instruction,
so also start the special case fold for src2.
2025-05-12 20:31:52 +02:00
Matt Arsenault
08dd0406c6
AMDGPU: Use minnum instead of maxnum for fmed3 src2-nan fold (#139531)
By the pseudocode in the ISA manual, if any input is a nan it acts
like min3, which will fold to min2 of the other operands. The other
cases fold to min, I'm not sure how this one was wrong.
2025-05-12 20:26:29 +02:00
Matt Arsenault
83107e02ea
AMDGPU: Disable most fmed3 folds for strictfp (#139530) 2025-05-12 20:21:02 +02:00
Matt Arsenault
e805d83487
AMDGPU: Add more tests for fmed3 instcombine folds (#139529)
Add test with snan literals, and test with and without amdgpu-ieee
2025-05-12 20:18:07 +02:00
Matt Arsenault
4b89339899
AMDGPU: Reorganize fmed3 intrinsic instcombine tests (#139498) 2025-05-12 10:05:03 +02:00
Alexander Richardson
ee13638362
[AMDGPU] Remove explicit datalayout from tests where not needed
Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the
correct datalayout when given a triple. Avoid explicitly specifying it
in tests that depend on the AMDGPU target being present to avoid the
string becoming out of sync with the TargetInfo value.
Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg
were updated to ensure that tests for non-target-specific passes that
happen to use the AMDGPU layout still pass when building with a limited
set of targets.

Reviewed By: shiltian, arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/137921
2025-04-30 10:58:17 -07:00
Matt Arsenault
dadea96791
AMDGPU: Add range to wavefrontsize intrinsic declaration (#136303) 2025-04-25 10:19:47 +02:00
Jay Foad
e3350a6263
[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments (#130133)
Reland #129895 with a fix to avoid trying to combine bpermute of
bitcast.
2025-04-10 10:36:38 +01:00
Juan Manuel Martinez Caamaño
0375ef07c3
[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741)
This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as
a 4-bit signed integer and returns `0.0625f * src`.

SWDEV-518861
2025-04-02 19:51:40 +02:00
Matt Arsenault
c180fc80dc
AMDGPU: Replace unused permlane inputs with poison instead of undef (#131288) 2025-03-18 17:37:44 +07:00
Matt Arsenault
052eca9ff7
AMDGPU: Replace unused update.dpp inputs with poison instead of undef (#131287) 2025-03-18 17:33:58 +07:00
Matt Arsenault
8392573469
AMDGPU: Replace unused export inputs with poison instead of undef (#131286) 2025-03-18 17:30:42 +07:00
Matt Arsenault
4a3ee4f72d
AMDGPU: Make fma_legacy intrinsic propagate poison (#131063) 2025-03-14 11:42:47 +07:00
Matt Arsenault
37706894f8
AMDGPU: Make fmul_legacy intrinsic propagate poison (#131062) 2025-03-14 11:39:47 +07:00
Matt Arsenault
a716459f2d
AMDGPU: Make ballot intrinsic propagate poison (#131061) 2025-03-14 11:36:44 +07:00
Matt Arsenault
0d8a22d6ad
AMDGPU: Make fmed3 intrinsic propagate poison (#131060) 2025-03-14 11:30:52 +07:00
Matt Arsenault
9b887f5277
AMDGPU: Make cvt_pknorm and cvt_pk intrinsics propagate poison (#131059) 2025-03-14 11:27:50 +07:00
Matt Arsenault
0a78bd67b3
AMDGPU: Make frexp_exp and frexp_mant intrinsics propagate poison (#130915) 2025-03-13 10:07:45 +07:00
Matt Arsenault
d8f17b3de1
AMDGPU: Make sqrt and rsq intrinsics propagate poison (#130914) 2025-03-13 10:01:48 +07:00
Matt Arsenault
95ab95fd10
AMDGPU: Make rcp intrinsic propagate poison (#130913) 2025-03-13 09:58:46 +07:00
Changpeng Fang
fa45bf4300
InstCombine: Fix a crash in PointerReplacer when constructing a new PHI (#130256)
When constructing a PHI node in `PointerReplacer::replace`, the incoming
operands are expected to have already been replaced and in the
replacement map. However, when one of the incoming operands is a load,
the search of the map is unsuccessful, and a nullptr is returned from
`getReplacement`. The reason is that, when a load is replaced, all the
uses of the load has been actually replaced by the new load. It is
useless to insert the original load into the map. Instead, we should
place the new load into the map to meet the expectation of the later map
search.

Fixes: SWDEV-516420
2025-03-09 20:21:36 -07:00
Matt Arsenault
4136395ddc AMDGPU: Regenerate test checks in instcombine test
Passed precheck before 844a1d52a8f5dff032cbf58288675ad1e678d609
started deleting the dead instructions
2025-03-07 19:25:18 +07:00
Matt Arsenault
af755af200
AMDGPU: Handle demanded subvectors for readfirstlane (#128648) 2025-03-07 17:54:15 +07:00
Jay Foad
78281fd12c Revert "[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments (#129895)"
This reverts commit be5149a3158cbce3051629e450950ccb96926365.

It caused build failures in the openmp-offload-amdgpu-runtime buildbot
and others.
2025-03-06 15:05:19 +00:00
Jay Foad
be5149a315
[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments (#129895) 2025-03-06 14:31:59 +00:00
Jay Foad
757554ee26 Regenerate checks in test/Transforms/InstCombine/AMDGPU/. NFC. 2025-03-06 14:29:34 +00:00
Yingwei Zheng
844a1d52a8
[IR] Return correct memory effects for convergencectrl (#129874)
`convergencectrl` doesn't imply any memory access.
Closes https://github.com/llvm/llvm-project/issues/129856.
2025-03-05 22:14:47 +08:00
Matt Arsenault
5c375c3283 AMDGPU: Fix worklist management in simplifyDemandedVectorEltsIntrinsic
Fixes bot sanitizer error, but it does leave behind a dead instruction
if there is a bundle for some reason.
2025-03-05 16:39:19 +07:00
Matt Arsenault
95c64b7ee6
AMDGPU: Reduce readfirstlane for single demanded vector element (#128647)
If we are only extracting a single element, rewrite the intrinsic call
to use the element type. We should extend this to arbitrary extract
shuffles.
2025-03-05 08:35:56 +07:00
Matt Arsenault
d410f093da
AMDGPU: Simplify demanded vector elts of readfirstlane sources (#128646)
Stub implementation of simplifyDemandedVectorEltsIntrinsic for
readfirstlane.
2025-02-28 13:01:10 +07:00
Matt Arsenault
2fa6c5265e
AMDGPU: Add baseline tests for simplify elts of readfirstlane (#128645) 2025-02-28 12:56:58 +07:00
Matt Arsenault
447abfcc09
AMDGPU: Fold bitcasts into readfirstlane, readlane, and permlane64 (#128494)
We should handle this for all the handled readlane and dpp ops.
2025-02-27 20:59:11 +07:00
Matt Arsenault
29c5e4289f
AMDGPU: Add baseline tests for bitcast + readlane intrinsics (#128493) 2025-02-26 14:22:57 +07:00
Matt Arsenault
5deb2aa9eb
AMDGPU: Make is.shared and is.private propagate poison (#128617) 2025-02-25 12:56:43 +07:00
Matt Arsenault
28002dd50f AMDGPU: Replace some undef pointer uses in test 2025-02-25 09:34:15 +07:00