183 Commits

Author SHA1 Message Date
Walter Lee
d7666c675c
[AMDGPU][SILoadStoreOptimizer] Fix unused variable warning (#177969)
Fix unused variable warning that fires in SILoadStoreOptimizer.cpp when
assertions are disabled. On review, we can just delete the whole assert,
since it isn't querying the def anymore. Fixes #176816
(13b20e7aeab83e82368be9ffd22ce02cb9b831ae).
2026-01-26 15:59:34 +00:00
Ryan Mitchell
13b20e7aea
[AMDGPU][SILoadStoreOptimizer] Fix lds address operand offset (#176816)
The offset operand in GLOBAL_LOAD_ASYNC_TO_LDS_B128, for instance, is
added to both the lds and global address, but SILoadStoreOptimizer is
currently unaware of that. This PR inserts an add to counteract the
offset meant for the global address. This one add is better than not
doing the optimization at all, and having to insert 2 adds for each
global address calculation (with no offset).

```
; ENABLE-LABEL: name: promote_async_load_offset
; ENABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; ENABLE-NEXT: {{  $}}
; ENABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; ENABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 512, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; ENABLE-NEXT: renamable $vgpr0 = V_ADD_U32_e32 256, $vgpr1, implicit $exec
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -256, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

; DISABLE-LABEL: name: promote_async_load_offset
; DISABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; DISABLE-NEXT: {{  $}}
; DISABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 256, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 512, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
```

This PR also promotes the global address to an offset when the offset is
calculated with V_ADD_U64 on applicable gfx versions, (and inversely
adds the LDS offset), whereas previously the optimization opportunity
was missed entirely.
2026-01-26 09:23:17 +01:00
Sam Elliott
7184229fea
[NFC][MI] Tidy Up RegState enum use (2/2) (#177090)
This Change makes `RegState` into an enum class, with bitwise operators.
It also:
- Updates declarations of flag variables/arguments/returns from
`unsigned` to `RegState`.
- Updates empty RegState initializers from 0 to `{}`.

If this is causing problems in downstream code:
- Adopt the `RegState getXXXRegState(bool)` functions instead of using a
ternary operator such as `bool ? RegState::XXX : 0`.
- Adopt the `bool hasRegState(RegState, RegState)` function instead of
using a bitwise check of the flags.
2026-01-23 00:19:03 -08:00
LU-JOHN
49381c3000
[NFC][AMDGPU] Declare variables initialized with getDebugLoc as const ref (#174434)
Declare variables initialized with getDebugLoc as a const reference.

Signed-off-by: John Lu <John.Lu@amd.com>
2026-01-05 12:37:47 -06:00
Matt Arsenault
55422e804b
CodeGen: Remove TRI argument from getRegClass (#158225)
TargetInstrInfo now directly holds a reference to TargetRegisterInfo
and does not need TRI passed in anywhere.
2025-11-10 15:43:55 -08:00
Matt Arsenault
e5e74e9877
AMDGPU: Use getMergedLocation in SILoadStoreOptimizer (#156396)
This is merging loads and stores so use the combined DebugLoc.

Not sure if computeBase should be using the merged location from
all the involved instructions. I'm also not sure how to test this
sort of thing.
2025-11-10 15:04:24 -08:00
Matt Arsenault
7289f2cd0c
CodeGen: Remove MachineFunction argument from getRegClass (#158188)
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
2025-09-12 19:22:02 +09:00
Matt Arsenault
4ec890857d
AMDGPU: Try to constrain av registers to VGPR to enable ds_write2 formation (#156400)
In future changes we will have more AV_ virtual registers, which
currently
block the formation of write2. Most of the time these registers can
simply
be constrained to VGPR, so do that.

Also relaxes the constraint in flat merging case. We already have the
necessary
code to insert copies to the original result registers, so there's no
point
in avoiding it.

Addresses the easy half of #155769
2025-09-03 00:48:21 +00:00
Matt Arsenault
88d075197e
AMDGPU: Add debug print to load/store opt for agpr case (#155767) 2025-08-29 11:49:02 +09:00
Harrison Hao
23a5a7bef3
[AMDGPU] Support merging 16-bit and 8-bit TBUFFER load/store instruction (#145078)
SILoadStoreOptimizer can now recognise consecutive 16-bit and 8-bit
`TBUFFER_LOAD`/`TBUFFER_STORE` instructions that each write

* a single component (`X`), or
* two components (`XY`),

and fold them into the wider native variants:

```
X + X          -->  XY
X + X + X + X  -->  XYZW
XY + XY        -->  XYZW
X + X + X      -->  XYZ
XY + X         -->  XYZ
```

The optimisation cuts the number of TBUFFER instructions, shrinking code
size and improving memory throughput.
2025-08-20 21:16:25 +08:00
Stanislav Mekhanoshin
668e6492b8
[AMDGPU] Support merging of flat GVS ops (#154200) 2025-08-18 14:31:41 -07:00
Stanislav Mekhanoshin
97a66a897c
[AMDGPU] Prohibit load/store merge if scale_offset is set on gfx1250 (#149895)
Scaling is done on the operation size, by merging instructions we
would need to generate code to scale the offset and reset the
auto-scale bit. This is unclear if that would be beneficial, just
disable such merge for now.
2025-07-21 15:41:24 -07:00
Kazu Hirata
7fa48ce547
[AMDGPU] Remove an unnecessary cast (NFC) (#149254)
getTargetLowering() already returns const SITargetLowering *.
2025-07-17 07:22:43 -07:00
Rahul Joshi
52c2e45c11
[NFC][CodeGen] Adopt MachineFunctionProperties convenience accessors (#141101) 2025-05-23 08:30:29 -07:00
Rahul Joshi
bee9664970
[TableGen] Emit OpName as an enum class instead of a namespace (#125313)
- Change InstrInfoEmitter to emit OpName as an enum class
  instead of an anonymous enum in the OpName namespace.
- This will help clearly distinguish between values that are 
  OpNames vs just operand indices and should help avoid
  bugs due to confusion between the two.
- Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES.
- Emit declaration of getOperandIdx() along with the OpName
  enum so it doesn't have to be repeated in various headers.
- Also updated AMDGPU, RISCV, and WebAssembly backends
  to conform to the new definition of OpName (mostly
  mechanical changes).
2025-02-12 08:19:30 -08:00
Kazu Hirata
975bba6f4b
[AMDGPU] Avoid repeated hash lookups (NFC) (#126001) 2025-02-06 10:34:49 -08:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Jay Foad
7a30b9c0f0
[AMDGPU] Make more use of getWaveMaskRegClass. NFC. (#108186) 2024-09-11 14:55:53 +01:00
Craig Topper
cd3667d1db
[CodeGen] Update a few places that were passing Register to raw_ostream::operator<< (#106877)
These would implicitly cast the register to `unsigned`. Switch most of
them to use printReg will give a more readable output. Change some
others to use Register::id() so we can eventually remove the implicit
cast to `unsigned`.
2024-09-02 00:19:19 -07:00
Akshat Oke
da13754103
AMDGPU/NewPM Port SILoadStoreOptimizer to NPM (#106362) 2024-09-02 11:41:56 +05:30
Tim Gymnich
273e0a4c56
[AMDGPU] add missing checks in processBaseWithConstOffset (#102310)
fixes https://github.com/llvm/llvm-project/issues/102231 by inserting
missing checks.
2024-08-12 11:54:02 +04:00
Christudasan Devadasan
37d7b06da0
[AMDGPU][SILoadStoreOptimizer] Include constrained buffer load variants (#101619)
Use the constrained buffer load opcodes while combining under-aligned
loads for XNACK enabled subtargets.
2024-08-06 11:27:04 +05:30
Christudasan Devadasan
a1d7da05d0
[AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (#96162)
Consider the constrained multi-dword loads while merging
individual loads to a single multi-dword load.
2024-07-23 13:50:42 +05:30
Stanislav Mekhanoshin
c771b670ea
[AMDGPU] Promote immediate offset to atomics (#94043) 2024-06-06 12:05:51 -07:00
Stanislav Mekhanoshin
fc21387b65
[AMDGPU] Enable constant offset promotion to immediate FLAT (#93884)
Currently it is only supported for FLAT Global.
2024-05-31 12:23:27 -07:00
Stanislav Mekhanoshin
215f92b979
[AMDGPU] Fix crash in the SILoadStoreOptimizer (#93862)
It does not properly handle situation when address calculation uses
V_ADDC_U32 0, 0, carry-in (i.e. with both src0 and src1 immediates).
2024-05-30 14:27:33 -07:00
Jay Foad
11f76b8511
[AMDGPU] Use some merging/unmerging helpers in SILoadStoreOptimizer (#90866)
Factor out copyToDestRegs and copyFromSrcRegs for merging store sources
and unmerging load results. NFC.
2024-05-02 21:01:51 +01:00
Jay Foad
e020e287c7 [AMDGPU] Modernize some syntax in SILoadStoreOptimizer. NFC.
Use structured bindings and similar.
2024-05-02 15:34:02 +01:00
Jay Foad
0606747c96 [AMDGPU] Remove some pointless fallthrough annotations 2024-05-01 16:04:35 +01:00
David Stuttard
06cfbe3cfd
[AMDPU] Add support for idxen and bothen buffer load/store merging in SILoadStoreOptimizer (#86285)
Added more buffer instruction merging support
2024-03-25 14:44:22 +00:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Mirko Brkušanin
5879162f7f
[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492) 2023-12-15 13:45:03 +01:00
Mirko Brkušanin
26b14aedb7
[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488) 2023-12-15 12:40:23 +01:00
Mirko Brkušanin
a278ac577e
[AMDGPU] CodeGen for SMEM instructions (#75579) 2023-12-15 12:10:33 +01:00
Konrad Kusiak
4fa8a5487e [AMDGPU] Add sanity check that fixes bad shift operation in AMD backend
There is a problem with the
SILoadStoreOptimizer::dmasksCanBeCombined() function that can lead to
UB.

This boolean function decides if two masks can be combined into 1. The
idea here is that the bits which are "on" in one mask, don't overlap
with the "on" bits of the other. Consider an example (10 bits for
simplicity):

Mask 1: 0101101000
Mask 2: 0000000110

Those can be combined into a single mask: 0101101110.

To check if such an operation is possible, the code takes the mask
which is greater and counts how many 0s there are, starting from the
LSB and stopping at the first 1. Then, it shifts 1u by this number and
compares it with the smaller mask. The problem is that when both masks
are 0, the counter will find 32 zeroes in the first mask and will try
to do a shift by 32 positions which leads to UB.

The fix is a simple sanity check, if the bigger mask is 0 or not.

https://reviews.llvm.org/D155051
2023-08-11 15:26:35 -04:00
Jay Foad
c68c6c56fc [AMDGPU] Minor refactoring in SILoadStoreOptimizer::offsetsCanBeCombined 2023-06-21 12:05:47 +01:00
Jay Foad
0c13e0b748 [AMDGPU] Do not handle _SGPR SMEM instructions in SILoadStoreOptimizer
After D147334 we never select _SGPR forms of SMEM instructions on
subtargets that also support the _SGPR_IMM form, so there is no need to
handle them here.

Differential Revision: https://reviews.llvm.org/D149139
2023-04-25 15:40:13 +01:00
mmarjano
f6e70ed1c7 [AMDGPU] Extend tbuffer_load_format merge
Add support for merging _IDXEN and _BOTHEN variants of
TBUFFER_LOAD_FORMAT instruction.
2023-04-10 12:24:21 +02:00
Kazu Hirata
7ada7bbee1 [Target] Use *{Set,Map}::contains (NFC) 2023-03-14 18:06:55 -07:00
Kazu Hirata
e078201835 [Target] Use llvm::count{l,r}_{zero,one} (NFC) 2023-01-28 09:23:07 -08:00
Kazu Hirata
caa99a01f5 Use llvm::popcount instead of llvm::countPopulation(NFC) 2023-01-22 12:48:51 -08:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Ivan Kosarev
1b560e6ab7 [AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores.
Reviewed By: dp, arsenm

Differential Revision: https://reviews.llvm.org/D137783
2022-11-14 15:36:18 +00:00
Pierre van Houtryve
7425077e31 [AMDGPU] Add & use hasNamedOperand, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Ivan Kosarev
693f816288 [AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.
Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D133787
2022-09-15 13:48:51 +01:00
Fangrui Song
de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Carl Ritson
4c4db81630 [AMDGPU] Extend SILoadStoreOptimizer to s_load instructions
Apply merging to s_load as is done for s_buffer_load.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130742
2022-07-30 11:38:39 +09:00
Stanislav Mekhanoshin
33fb23f728 [AMDGPU] Merge flat with global in the SILoadStoreOptimizer
Flat can be merged with flat global since address cast is a no-op.
A combined memory operation needs to be promoted to flat.

Differential Revision: https://reviews.llvm.org/D120431
2022-03-09 10:04:37 -08:00