9513 Commits

Author SHA1 Message Date
Jay Foad
fb49adb6ea
[AMDGPU] Another test for missing S_WAIT_XCNT (#166154) 2025-11-05 10:17:52 +00:00
Jan Patrick Lehr
833983918d
Revert "CodeGen: Record MMOs in finalizeBundle" (#166520)
Reverts llvm/llvm-project#166210

Buildbot failures in the libc on GPU bot:
https://lab.llvm.org/buildbot/#/builders/10/builds/16711
2025-11-05 11:11:08 +01:00
Nicolai Hähnle
304d2ff4d9
CodeGen: Record MMOs in finalizeBundle (#166210)
This allows more accurate alias analysis to apply at the bundle level.
This has a bunch of minor effects in post-RA scheduling that look mostly
beneficial to me, all of them in AMDGPU (the Thumb2 change is cosmetic).

The pre-existing (and unchanged) test in
CodeGen/MIR/AMDGPU/custom-pseudo-source-values.ll tests that MIR with a
bundle with MMOs can be parsed successfully.

v2:
- use cloneMergedMemRefs
- add another test to explicitly check the MMO bundling behavior

v3:
- use poison instead of undef to initialize the global variable in the
test
2025-11-05 06:56:19 +00:00
Matt Arsenault
849038cad1
AMDGPU: Do not infer implicit inputs for !nocallback intrinsics
(#131759)

This isn't really the right check, we want to know that the intrinsic
does not perform a true function call to any code (in the module or
not). nocallback
appears to be the closest thing to this property we have now though.
Fixes theoretically
miscompiles with intrinsics like statepoint, which hide a call to a real
function.

Also do the same for inferring no-agpr usage.
2025-11-05 04:53:42 +00:00
Vigneshwar Jayakumar
b5f200129a
[CodeGen] Register-coalescer remat fix subreg liveness (#165662)
This is a bugfix in rematerialization where the liveness of subreg mask
was incorrectly updated causing crash in scheduler.
2025-11-04 22:40:40 -06:00
Abhay Kanhere
d998f92a00
[CodeGen] MachineVerifier to check early-clobber constraint (#151421)
Currently MachineVerifier is missing verifying early-clobber operand
constraint.
The only other machine operand constraint -  TiedTo is already verified.
2025-11-04 18:39:31 -08:00
Nicolai Hähnle
d6fdfe0a27
CodeGen: Record tied virtual register operands in finalizeBundle (#166209)
This is in preparation of a future AMDGPU change where we are going to
create bundles before register allocation and want to rely on the
TwoAddressInstructionPass handling those bundles correctly.

v2:
- simplify the virtual register check and the test
2025-11-05 02:18:39 +00:00
Nicolai Hähnle
3974157929
AMDGPU: Pre-commit a test (#166414) 2025-11-05 01:21:37 +00:00
Matt Arsenault
2b4ac66297
AMDGPU: Cleanup and modernize limit-coalesce.mir test (#166465) 2025-11-04 23:57:39 +00:00
Syadus Sefat
ce091da5df
[AMDGPU] Mark WMMA machine instructions as convergent (#165602)
The WMMA MI(s) are missing the isConvergent flag. This causes incorrect
behavior in passes like machine-sink, where WMMA instructions get sunk
into divergent branches.

This patch fixes the issue by setting the isConvergent flag to 1 in the
VOP3PInstructions.td file.
2025-11-04 15:37:27 -06:00
choikwa
8f683c3e4b
[AMDGPU] NFC, delete promote-alloca testcase (#166297)
previous merge did not delete.
2025-11-04 14:34:54 -05:00
Jay Foad
dbce71382c
[AMDGPU] Skip debug instructions when eliminating S_SET_GPR_IDX_ON/OFF (#160715) 2025-11-04 12:03:16 +00:00
Jay Foad
f037f41350
[IR] Add new function attribute nocreateundeforpoison (#164809)
Also add a corresponding intrinsic property that can be used to mark
intrinsics that do not introduce poison, for example simple arithmetic
intrinsics that propagate poison just like a simple arithmetic
instruction.

As a smoke test this patch adds the new property to
llvm.amdgcn.fmul.legacy.
2025-11-04 12:00:44 +00:00
Jay Foad
99a1fcad5d
[UTC] Update AMDGPU asm regexp for private functions (#166169)
Since #163011 changed AMDGPU to use ELF mangling, the regexp failed to
match private functions because of the inconsistent presence/absence of
the .L prefix on the first line of the function e.g.:
```
.Lfoo:  ; @foo
```
2025-11-04 11:59:43 +00:00
Robert Imschweiler
c02bdd466a
[AMDGPU] Fix handling of FP in cs.chain functions (#161194)
In case there is an dynamic alloca / an alloca which is not in the entry
block, cs.chain functions do not set up an FP, but are reported to need
one. This results in a failed assertion in
`SIFrameLowering::emitPrologue()` (Assertion `(!HasFP || FPSaved) &&
"Needed to save FP but didn't save it anywhere"' failed.) This commit
changes `hasFPImpl` so that the need for an SP in a cs.chain function
does not directly imply the need for an FP anymore.

This LLVM defect was identified via the AMD Fuzzing project.
2025-11-04 10:22:13 +01:00
Robert Imschweiler
a8ea7f4580
Reapply: [AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm (#152161) (#166195)
Reapply #152161 with fixed 'changed' flags.
2025-11-03 20:59:48 +01:00
vangthao95
e8765401d4
[AMDGPU][GlobalISel] Add RegBankLegalize support for G_FENCE (#165939) 2025-11-03 09:36:49 -08:00
Robert Imschweiler
af68efc9c4
Revert "[AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm" (#166186)
Reverts llvm/llvm-project#152161

Need to revert to fix changed logic for the expensive checks.
2025-11-03 16:33:20 +00:00
Robert Imschweiler
332f9b5eee
[AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm (#152161)
Finishes adding inline-asm callbr support for AMDGPU, started by
https://github.com/llvm/llvm-project/pull/149308.
2025-11-03 16:09:12 +01:00
Aaditya
c8187f6539
[AMDGPU] Fix Xcnt handling between blocks (#165201)
For blocks with multiple predescessors, there
maybe `SMEM` and `VMEM` events active at the same time.
This patch handles these cases.
2025-11-01 16:48:48 +05:30
Matt Arsenault
f4f247f01e
AMDGPU/GlobalISel: Fix vgpr abs tests using SGPR return (#165965)
Fix the calling convention to use normal functions instead of
amdgpu_cs
2025-10-31 21:41:53 -07:00
Matt Arsenault
cf829cc11c
AMDGPU: Add baseline test for #161651 (#165921) 2025-10-31 21:50:17 +00:00
vangthao95
d1d635083d
[AMDGPU][GlobalISel] Clean up selectCOPY_SCC_VCC function (#165797)
Follow-up patch to address the comments in
https://github.com/llvm/llvm-project/pull/165355.
2025-10-31 13:17:44 -07:00
Stanislav Mekhanoshin
be2ae264dd
[AMDGPU] Record old VGPR MSBs in the high bits of s_set_vgpr_msb (#165035)
Fixes: SWDEV-562450
2025-10-31 12:21:59 -07:00
choikwa
4a5692d6b3
[AMDGPU] NFC, add testcase showing promote-alloca of array of vectors to a large vector (#165824)
later patch will target series of extractelement/insertelement pairs.
2025-10-31 14:43:35 -04:00
Stanislav Mekhanoshin
0d9c75be2d
[AMDGPU] Reset VGPR MSBs at the end of fallthrough basic block (#164901)
By convention a basic block shall start with MSBs zero. We also
need to know a previous mode in all cases as SWDEV-562450 asks
to record the old mode in the high bits of the mode.
2025-10-31 10:58:22 -07:00
vangthao95
2837a4bdd7
[AMDGPU][GlobalISel] Add RegBankLegalize support for G_READCYCLECOUNTER (#165754) 2025-10-31 09:12:56 -07:00
Abhinav Garg
1057c63b24
[AMDGPU][GlobalISel] Add register bank legalization for G_FADD (#163407)
This patch adds register bank legalization support for G_FADD opcodes in
the AMDGPU GlobalISel pipeline.
Added new reg bank type UniInVgprS64.
This patch also adds a combine logic for ReadAnyLane + Trunc + AnyExt.

---------

Co-authored-by: Abhinav Garg <abhigarg@amd.com>
2025-10-31 16:45:40 +05:30
Changpeng Fang
6b5afdc3ab
[AMDGPU] Support bfloat comparison for ballot intrinsic (#165495)
We do not have native instructions for direct bfloat comparisons.
However, we can expand bfloat to float, and do float comparison instead.

TODO: handle bfloat comparison for ballot intrinsic on global isel path.

Fixes: SWDEV-563403
2025-10-30 09:44:25 -07:00
Anshil Gandhi
b1d5a2a156
[AMDGPU] Add regbankselect rules for G_ADD/SUB and variants (#159860)
Add legalization rules for G_ADD, G_UADDO, G_UADDE and their SUB counterparts.
2025-10-30 11:45:02 -04:00
vangthao95
ba5cde79aa
[AMDGPU][GlobalISel] Fix issue with copy_scc_vcc on gfx7 (#165355)
When selecting for G_AMDGPU_COPY_SCC_VCC, we use S_CMP_LG_U64 or
S_CMP_LG_U32 for wave64 and wave32 respectively. However, on gfx7 we do
not have the S_CMP_LG_U64 instruction. Work around this issue by using
S_OR_B64 instead.
2025-10-30 08:19:12 -07:00
Vigneshwar Jayakumar
469702c5d5
[LICM] Sink unused l-invariant loads in preheader. (#157559)
Unused loop invariant loads were not sunk from the preheader to the exit
block, increasing live range.

This commit moves the sinkUnusedInvariant logic from indvarsimplify to
LICM also adds functionality to sink unused load that's not
clobbered by the loop body.
2025-10-30 09:23:04 -05:00
Pankaj Dwivedi
4d7093b806
[AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (#162819)
This PR enables AMDGPUUniformIntrinsicCombine pass in the llc pipeline.
Also introduces the "amdgpu-uniform-intrinsic-combine" command-line flag
to enable/disable the pass.

see the PR:https://github.com/llvm/llvm-project/pull/116953
2025-10-30 12:32:32 +05:30
Stanislav Mekhanoshin
5f1813e826
[AMDGPU] Support true16 spill restore with sram-ecc (#165320) 2025-10-29 12:35:01 -07:00
Pankaj Dwivedi
20532c0aab
[AMDGPU] make AMDGPUUniformIntrinsicCombine a function pass (#165265)
There has been an issue(using function analysis inside the module pass
in OPM) integrating this pass into the LLC pipeline, which currently
lacks NPM support. I tried finding a way to get the per-function
analysis, but it seems that in OPM, we don't have that option.

So the best approach would be to make it a function pass.

Ref: https://github.com/llvm/llvm-project/pull/116953
2025-10-29 11:56:43 +05:30
Harrison Hao
d604ab6288
[AMDGPU] Support image atomic no return instructions (#150742)
Add support for no-return variants of image atomic operations
(e.g. IMAGE_ATOMIC_ADD_NORTN, IMAGE_ATOMIC_CMPSWAP_NORTN). 
These variants are generated when the return value of the intrinsic is
unused, allowing the backend to select no return type instructions.
2025-10-29 10:42:15 +08:00
David Green
d51dcf929e
[GlobalISel] Combine away G_UNMERGE(G_IMPLICITDEF). (#119183)
This helps clean up some more legalization artefacts during
legalization, in a similar way to other operations, and helps some of
the DUP cases get through legalization successfully.
2025-10-28 09:57:31 +00:00
Carl Ritson
385c12134a
[AMDGPU] Rework GFX11 VALU Mask Write Hazard (#138663)
Apply additional counter waits to address VALU writes to SGPRs. Rework
expiry detection and apply wait coalescing to mitigate some of the
additional waits.
2025-10-28 16:09:28 +09:00
LU-JOHN
7d14733c12
[AMDGPU] Generate s_absdiff_i32 (#164835)
Generate s_absdiff_i32. Tested in absdiff.ll. Also update s_cmp_0.ll to
test that s_absdiff_i32 is foldable with a s_cmp_lg_u32 sX, 0.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-10-27 14:40:56 -05:00
Jeffrey Byrnes
30f2bf7558
[AMDGPU] Use implicit operand to preserve liveness of COPY (#164911)
When lowering spills / restores, we may end up partially lowering the
spill via copies and the remaining portion with loads/stores. In this
partial lowering case,the implicit-def operands added to the restore
load clobber the preceding copies -- telling MachineCopyPropagation to
delete them. By also attaching an implicit operand to the load, the
COPYs have an artificial use and thus will not be deleted - this is the
same strategy taken in https://github.com/llvm/llvm-project/pull/115285

I'm not sure that we need implicit-def operands on any load restore, but
I guess it may make sense if it needs to be split into multiple loads
and some have been optimized out as containing undef elements.

These implicit / implicit-def operands continue to cause correctness
issues. A previous / ongoing long term plan to remove them is being
addressed via:


https://discourse.llvm.org/t/llvm-codegen-rfc-add-mo-lanemask-type-and-a-new-copy-lanemask-instruction/88021
https://github.com/llvm/llvm-project/pull/151123
https://github.com/llvm/llvm-project/pull/151124
2025-10-27 10:47:11 -07:00
Gheorghe-Teodor Bercea
bce7f7cc22
[AMDGPU] Precommit test for sinking vector ops PR 162580 (#165050)
Pre-commit test for PR: https://github.com/llvm/llvm-project/pull/162580
2025-10-27 13:44:44 -04:00
Jay Foad
60f20ea465
[AMDGPU] Add target feature for waits before system scope stores. NFC. (#164993) 2025-10-27 10:31:37 +00:00
Yunqing Yu
059d90d08f
[Legalizer] Cache extracted element when lowering G_SHUFFLE_VECTOR. (#163893)
Cache extracted elements in lowerShuffleVector(). For example, when
lowering
```
%0:_(<2 x s32>) = G_BUILD_VECTOR %0, %1
%2:_(<N x s32>) = G_SHUFFLE_VECTOR %1, shufflemask(0, 0, 0, 0 ... x N )
```
Currently, we generate `N` `G_EXTRACT_VECTOR_ELT` for each element in
shufflemask. This is undesirable and bloats the code, especially for
larger vectors.

With this change, we only generate one `G_EXTRACT_VECTOR_ELT` from `%0`
and reuse it for all four result elements.
2025-10-25 10:26:11 -05:00
Mirko Brkušanin
bdec5bf69c
[AMDGPU][GlobalISel] Combine (or s64, zext(s32)) (#151519)
If we only deal with a one part of 64bit value we can just generate
merge and unmerge which will be either combined away or
selected into copy / mov_b32.
2025-10-24 17:25:00 +02:00
Mirko Brkušanin
fe5f49942e
[AMDGPU][GlobalISel] Lower G_FMINIMUM and G_FMAXIMUM (#151122)
Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same
logic as in SDag's expandFMINIMUM_FMAXIMUM.
Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method
and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.
2025-10-24 14:48:27 +02:00
David Green
a1e59bdc17
[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508)
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.

Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
2025-10-24 08:21:35 +01:00
Stanislav Mekhanoshin
ef923f1b28
[AMDGPU] Change patterns for v_[pk_]add_{min|max} (#164881)
The intermediate result is in fact the add with saturation
regardless of the clamp bit.
2025-10-23 15:45:15 -07:00
paperchalice
c2b2a347bf
[AMDGPU][test] Remove unsafe-fp-math uses (NFC) (#164609)
Post cleanup for #164534.
2025-10-23 01:45:54 +00:00
Matt Arsenault
1d9f9ad531
CodeGen: Fix crash when no libcall is available for stackguard (#164211)
Not all the paths appear to be implemented for GlobalISel
2025-10-23 10:40:40 +09:00
Stanislav Mekhanoshin
9b5bc98743
[AMDGPU] Add intrinsics for v_[pk]_add_{min|max}_* instructions (#164731) 2025-10-22 17:46:33 -07:00