1023 Commits

Author SHA1 Message Date
LU-JOHN
87b1d3537a
[AMDGPU][NFC] Avoid copying MachineOperands (#166293)
Avoid copying machine operands.

Signed-off-by: John Lu <John.Lu@amd.com>
2025-11-04 23:18:40 -06:00
Kazu Hirata
902b0bd04a
[llvm] Remove "const" in the presence of "constexpr" (NFC) (#166109)
"const" is extraneous in the presence of "constexpr" for simple
variables and arrays.
2025-11-02 15:52:44 -08:00
LU-JOHN
7ed2f1b82b
[AMDGPU][NFC] Refactor SCC optimization (#165871)
Refactor SCC optimization

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-10-31 22:26:28 -05:00
LU-JOHN
9abbec66bf
[AMDGPU] Reland "Remove redundant s_cmp_lg_* sX, 0" (#164201)
Reland PR https://github.com/llvm/llvm-project/pull/162352. Fix by
excluding SI_PC_ADD_REL_OFFSET from instructions that set SCC = DST!=0.
Passes check-libc-amdgcn-amd-amdhsa now.

Distribution of instructions that allowed a redundant S_CMP to be
deleted in check-libc-amdgcn-amd-amdhsa test:

```
S_AND_B32      485
S_AND_B64      47
S_ANDN2_B32    42
S_ANDN2_B64    277492
S_CSELECT_B64  17631
S_LSHL_B32     6
S_OR_B64       11
```

---------

Signed-off-by: John Lu <John.Lu@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-10-22 08:42:29 -05:00
Nicolai Hähnle
896d546cf3
AMDGPU: Refactor three-address conversion (NFC) (#162558)
Extract the core of the instruction rewriting into an implementation
method, and unify the update of live variables / intervals updates in
its caller.

This is intended to help make future changes to three-address conversion
more robust.
2025-10-20 17:17:51 -07:00
Jan Patrick Lehr
023b1f6a8e
Revert "[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 " (#164116)
Reverts llvm/llvm-project#162352

Broke our buildbot:
https://lab.llvm.org/buildbot/#/builders/10/builds/15674
To reproduce

cd llvm-project
cmake -S llvm -B thebuild -C offload/cmake/caches/AMDGPULibcBot.cmake
-GNinja
cd thebuild
ninja
ninja check-libc-amdgcn-amd-amdhsa
2025-10-18 22:38:14 +02:00
LU-JOHN
8e5f6dd37c
[AMDGPU] Remove redundant s_cmp_lg_* sX, 0 (#162352)
Remove redundant s_cmp_lg_* sX, 0 if SALU instruction already sets SCC
if sX!=0.

---------

Signed-off-by: John Lu <John.Lu@amd.com>
2025-10-18 09:33:47 -05:00
Brox Chen
ac193bc20f
[AMDGPU][True16][CodeGen] S_PACK_XX_B32_B16 lowering for true16 mode (#162389)
S_PACK_XX_B32_B16 requires special lowering for true16 mode when it's
being lowered to VALU in fix-sgpr-copy pass.

Added test cases in fix-sgpr-copies-f16-true16.mir
2025-10-17 14:29:37 -04:00
Jay Foad
8c6b499f06
[AMDGPU] Simplify vcc handling in copyPhysReg. NFC. (#163340) 2025-10-14 15:18:22 +01:00
carlobertolli
8892825917
[AMDGPU] Enable saving SHARED_BASE to VCC (#163244) 2025-10-13 15:38:28 -05:00
Matt Arsenault
1a5494ca4a
AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (#158272)
This removes special case processing in TargetInstrInfo::getRegClass to
fixup register operands which depending on the subtarget support AGPRs,
or require even aligned registers.

This regresses assembler diagnostics, which currently work by hackily
accepting invalid cases and then post-rejecting a validly parsed
instruction.
On the plus side this now emits a comment when disassembling unaligned
registers for targets with the alignment requirement.
2025-10-08 11:19:54 +09:00
Brox Chen
b8127cc8d0
[AMDGPU][True16][CodeGen] fix v_mov_b16_t16 index in folding pass (#161764)
With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but
the src operand is in different index.

Use the correct src index for  v_mov_b16_t16.
2025-10-03 17:34:42 -04:00
Matt Arsenault
5601c4080a
AMDGPU: Stop trying to constrain register class of post-RA-pseudos (#161792)
This is trying to constrain the register class of a physical register,
which makes no sense.
2025-10-03 21:19:33 +09:00
Matt Arsenault
2a39d8be87
AMDGPU: Remove dead code trying to constrain a physical register (#161790)
This constrainRegClass check would never pass for a physical
register.
2025-10-03 21:19:13 +09:00
Pierre van Houtryve
14fcd81861
[AMDGPU][InsertWaitCnts] Refactor some helper functions, NFC (#161160)
- Remove one-line wrappers around a simple function call when they're
only used once or twice.
- Move very generic helpers into SIInstrInfo
- Delete unused functions

The goal is simply to reduce the noise in SIInsertWaitCnts without
hiding functionality. I focused on moving trivial helpers, or helpers
with very descriptive/verbose names (so it doesn't hide too much logic
away from the pass), and that have some reusability potential.

I'm also trying to make the code style more consistent. It doesn't make
sense to see a function call `TII->isXXX` then suddenly call a random
`isY` method that just wraps around `TII->isY`.

The context of this work is that I'm trying to learn how this pass
works, and while going through the code I noticed some little things
here and there that I thought would be good to fix.
2025-10-01 10:51:00 +02:00
Brox Chen
934f802731
[AMDGPU][True16][CodeGen] true16 isel pattern for fma_mix_f16/bf16 (#159648)
This patch includes:
1. fma_mix inst takes fp16 type as input, but place the operand in
vgpr32. Update selector to insert vgpr32 for true16 mode if necessary.
2. fma_mix inst returns fp16 type as output, but place the vdst in
vgpr32. Create a fma_mix_t16 pesudo inst for isel pattern, and lower it
to mix_lo/hi in the mc lowering pass.

These stop isel from emitting illegal `vgpr32 = COPY vgpr16` and improve
code quality
2025-09-24 11:27:26 -04:00
Philip Reames
8b7a76a2ac [CodeGen] Rename isReallyTriviallyReMaterializable [nfc]
.. to isReMaterializableImpl.  The "Really" naming has always been
awkward, and we're working towards removing the "Trivial" part now,
so go ehead and remove both pieces in a single rename.

Note that this doesn't change any aspect of the current
implementation; we still "mostly" only return instructions which
are trivial (meaning no virtual register uses), but some targets
do lie about that today.
2025-09-23 11:58:37 -07:00
Jay Foad
b7a848e5ce
[AMDGPU] Skip debug uses in SIInstrInfo::foldImmediate (#160102) 2025-09-22 15:22:09 +01:00
Akash Dutta
c256966fe2
[AMDGPU]: Unpack packed instructions overlapped by MFMAs post-RA scheduling (#157968)
This is a cleaned up version of PR #151704. These optimizations are now
performed post-RA scheduling.
2025-09-19 09:41:02 -07:00
Matt Arsenault
daed12d00d
AMDGPU: Remove unnecessary AGPR legalize logic (#159491)
The manual legalizeOperands code only need to consider cases that
require full instruction context to know if the operand is legal.
This does not need to handle basic operand register class constraints.
2025-09-19 09:51:46 +09:00
Matt Arsenault
aa8b624518
AMDGPU: Remove unnecessary operand legalization for WMMAs (#159370)
The operand constraints already express this constraint, and
InstrEmitter will respect them.
2025-09-18 09:20:18 +09:00
Matt Arsenault
d57aa484e1
AMDGPU: Constrain regclass when replacing SGPRs with VGPRs (#159369) 2025-09-18 07:36:28 +09:00
Jay Foad
eeced0d073
[AMDGPU] Use larger immediate values in S_NOP (#158990)
The S_NOP instruction has an immediate operand which is one less than
the number of cycles to delay for. The maximum value that may be encoded
in this field was increased in GFX8 and again in GFX12.
2025-09-16 15:51:06 +01:00
Stanislav Mekhanoshin
72aa946762
[AMDGPU] Drop high 32 bits of aperture registers (#158725)
Fixes: SWDEV-551181
2025-09-16 02:11:39 -07:00
Carl Ritson
fdb06d9792
[AMDGPU] Refactor out common exec mask opcode patterns (NFCI) (#154718)
Create utility mechanism for finding wave size dependent opcodes used to
manipulate exec/lane masks.
2025-09-16 03:22:14 +00:00
Matt Arsenault
1dc4db8f1e
AMDGPU: Relax verifier for agpr/vgpr loads and stores (#158391) 2025-09-13 16:34:02 +09:00
Matt Arsenault
7289f2cd0c
CodeGen: Remove MachineFunction argument from getRegClass (#158188)
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
2025-09-12 19:22:02 +09:00
Matt Arsenault
3b48c64d08
AMDGPU: Move spill pseudo special case out of adjustAllocatableRegClass (#158246)
This is special for the same reason av_mov_b64_imm_pseudo is special.
2025-09-12 18:35:57 +09:00
Matt Arsenault
9e1d656c68
AMDGPU: Remove MIMG special case in adjustAllocatableRegClass (#158184)
I have no idea why this was here. MIMG atomics use tied operands
for the input and output, so AV classes should have always worked.
We have poor test coverage for AGPRs with atomics, so add a partial
set. Everything seems to work OK, although it seems image cmpswap
always uses VGPRs unnecessarily.
2025-09-12 09:02:24 +00:00
Matt Arsenault
5a21128f24
AMDGPU: Relax legal register operand constraint (#157989)
Find a common subclass instead of directly checking for a subclass
relationship. This fixes folding logic for unaligned register defs
into aligned use contexts. e.g., a vreg_64 def into an av_64_align2
use should be able to find the common subclass vreg_align2. This
avoids regressions in future patches.

Checking the subclass was also redundant on the subregister path;
getMatchingSuperRegClass is sufficient.
2025-09-12 08:57:47 +09:00
Matt Arsenault
1c325a07f8
AMDGPU: Stop checking allocatable in adjustAllocatableRegClass (#158105)
This no longer does anything.
2025-09-12 08:56:34 +09:00
Petar Avramovic
41c685975e
AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (#157845)
Use same rules for G_ZEXTLOAD and G_SEXTLOAD as for G_LOAD.
Flat addrspace(0) and private addrspace(5) G_ZEXTLOAD and G_SEXTLOAD
should be always divergent.
2025-09-10 17:57:15 +02:00
Stanislav Mekhanoshin
b0ee92be94
[AMDGPU] Restrict scale operands of WMMA to low 256 VGPRs (#157526)
These cannot accept high registers.
2025-09-08 15:44:51 -07:00
Matt Arsenault
727e9f5ea5
CodeGen: Pass SubtargetInfo to TargetGenInstrInfo constructors (#157337)
This will make it possible for tablegen to make subtarget
dependent decisions without adding new arguments to every
target.

---------

Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
2025-09-08 12:12:19 +09:00
Matt Arsenault
884130bf93
AMDGPU: Allow folding multiple uses of some immediates into copies (#154757)
In some cases this will require an avoidable re-defining of
a register, but it works out better most of the time. Also allow
folding 64-bit immediates into subregister extracts, unless it would
break an inline constant.

We could be more aggressive here, but this set of conditions seems
to do a reasonable job without introducing too many regressions.
2025-09-06 08:22:09 +09:00
Matt Arsenault
d096b1d48e
AMDGPU: Remove flat special case in getRegClass (#156991) 2025-09-06 07:42:16 +09:00
Stanislav Mekhanoshin
1f0f3473e6
[AMDGPU] High VGPR lowering on gfx1250 (#156965) 2025-09-04 16:20:47 -07:00
Pierre van Houtryve
e2bd10cf16
[AMDGPU][gfx1250] Add 128B cooperative atomics (#156418)
- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.
2025-09-04 09:19:25 +00:00
Diana Picus
018dc1b397
[AMDGPU] Tail call support for whole wave functions (#145860)
Support tail calls to whole wave functions (trivial) and from whole wave
functions (slightly more involved because we need a new pseudo for the
tail call return, that patches up the EXEC mask).

Move the expansion of whole wave function return pseudos (regular and
tail call returns) to prolog epilog insertion, since that's where we
patch up the EXEC mask.
2025-09-04 10:34:43 +02:00
Matt Arsenault
a23a5b0683
AMDGPU: Remove the DS special case in getRegClass (#156696)
These instructions should now have proper representation
with separate instructions for operands which must be paired.
2025-09-04 15:14:17 +09:00
Matt Arsenault
dc170c7e31 AMDGPU: Special case align requirement for AV_MOV_B64_IMM_PSEUDO
This should not require aligned registers. Fixes expensive_checks
test failure. I don't see a better way until the new system
to specify the alignment per register is done.
2025-09-04 09:55:39 +09:00
Matt Arsenault
dd5eb46690
AMDGPU: Fold 64-bit immediate into copy to AV class (#155615)
This is in preparation for patches which will intoduce more
copies to av registers.
2025-09-03 09:29:59 +09:00
Matt Arsenault
d7484684e5
AMDGPU: Refactor isImmOperandLegal (#155607)
The goal is to expose more variants that can operate without
preconstructed MachineInstrs or MachineOperands.
2025-09-03 09:06:18 +09:00
Matt Arsenault
d6a72cb300
AMDGPU: Fix fixme for out of bounds indexing in usesConstantBus check (#155603)
This loop over all the operands in the MachineInstr will eventually
go past the end of the MCInstrDesc's explicit operands. We don't
need the instr desc to compute the constant bus usage, just the
register and whether it's implicit or not. The check here is slightly
conservative. e.g. a random vcc implicit use appended to an instruction
will falsely report a constant bus use.
2025-09-02 17:25:08 +00:00
Matt Arsenault
e3e1652d18
AMDGPU: Add version of isImmOperandLegal for MCInstrDesc (#155560)
This avoids the need for a pre-constructed instruction, at least
for the first argument.
2025-09-03 01:18:41 +09:00
Chris Jackson
7d0203b39f
[AMDGPU] Prevent generation of unused SGPR IMPLICIT_DEF assignments (#155241)
Dead VGPR->SGPR copies were converted to IMPLICIT_DEF assignments that
were unused. Prevent these from being created and update the numerous
affected tests.
2025-08-27 13:18:18 +01:00
Matt Arsenault
de99aabed6
AMDGPU: Remove unused argument from adjustAllocatableRegClass (#155554) 2025-08-27 06:00:34 +00:00
Matt Arsenault
05f208ac0b
AMDGPU: Stop checking if registers are reserved in adjustAllocatableRegClass (#155125)
This function is used to implement TargetInstrInfo::getRegClass and
conceptually should not depend on the dynamic state of the function.
2025-08-26 20:09:32 +09:00
Matt Arsenault
db024764c1
AMDGPU: Fix not diagnosing unaligned VGPRs for vsrc operands (#155104)
This was not checking the alignment requirement for 64-bit
operands which accept inline immediates. Not all custom operand
types were handled in the switch, so round out with explicit
handling of all enum values, and change the default to use
the default checks for unhandled cases.

Fixes #155095
2025-08-25 17:42:58 +09:00
Matt Arsenault
52ed03db59
AMDGPU: Simplify foldImmediate with register class based checks (#154682)
Generalize the code over the properties of the mov instruction,
rather than maintaining parallel logic to figure out the type
of mov to use. I've maintained the behavior with 16-bit physical
SGPRs, though I think the behavior here is broken and corrupting
any value that happens to be live in the high bits. It just happens
there's no way to separately write to those with a real instruction
but I don't think we should be trying to make assumptions around
that property.

This is NFC-ish. It now does a better job with imm pseudos which
practically won't reach here. This also will make it easier
to support more folds in a future patch.

I added a couple of new tests with 16-bit extract of 64-bit sources.
2025-08-23 02:13:50 +00:00