145 Commits

Author SHA1 Message Date
Kazu Hirata
7ada7bbee1 [Target] Use *{Set,Map}::contains (NFC) 2023-03-14 18:06:55 -07:00
Kazu Hirata
e078201835 [Target] Use llvm::count{l,r}_{zero,one} (NFC) 2023-01-28 09:23:07 -08:00
Kazu Hirata
caa99a01f5 Use llvm::popcount instead of llvm::countPopulation(NFC) 2023-01-22 12:48:51 -08:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Ivan Kosarev
1b560e6ab7 [AMDGPU][MC] Support TFE modifiers in MUBUF loads and stores.
Reviewed By: dp, arsenm

Differential Revision: https://reviews.llvm.org/D137783
2022-11-14 15:36:18 +00:00
Pierre van Houtryve
7425077e31 [AMDGPU] Add & use hasNamedOperand, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Ivan Kosarev
693f816288 [AMDGPU][SILoadStoreOptimizer] Merge SGPR_IMM scalar buffer loads.
Reviewed By: foad, rampitec

Differential Revision: https://reviews.llvm.org/D133787
2022-09-15 13:48:51 +01:00
Fangrui Song
de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Carl Ritson
4c4db81630 [AMDGPU] Extend SILoadStoreOptimizer to s_load instructions
Apply merging to s_load as is done for s_buffer_load.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D130742
2022-07-30 11:38:39 +09:00
Stanislav Mekhanoshin
33fb23f728 [AMDGPU] Merge flat with global in the SILoadStoreOptimizer
Flat can be merged with flat global since address cast is a no-op.
A combined memory operation needs to be promoted to flat.

Differential Revision: https://reviews.llvm.org/D120431
2022-03-09 10:04:37 -08:00
Stanislav Mekhanoshin
517171ce20 [AMDGPU] Extend SILoadStoreOptimizer to handle flat load/stores
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120351
2022-02-28 11:27:30 -08:00
Stanislav Mekhanoshin
3279e44063 [AMDGPU] Extend SILoadStoreOptimizer to handle global stores
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120346
2022-02-24 11:09:51 -08:00
Stanislav Mekhanoshin
cefa1c5ca9 [AMDGPU] Fix combined MMO in load-store merge
Loads and stores can be out of order in the SILoadStoreOptimizer.
When combining MachineMemOperands of two instructions operands are
sent in the IR order into the combineKnownAdjacentMMOs. At the
moment it picks the first operand and just replaces its offset and
size. This essentially loses alignment information and may generally
result in an incorrect base pointer to be used.

Use a base pointer in memory addresses order instead and only adjust
size.

Differential Revision: https://reviews.llvm.org/D120370
2022-02-24 10:47:57 -08:00
Stanislav Mekhanoshin
9e055c0fff [AMDGPU] Extend SILoadStoreOptimizer to handle global saddr loads
This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.

TODO: merge global stores.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120285
2022-02-22 09:01:43 -08:00
Stanislav Mekhanoshin
ba17bd2674 [AMDGPU] Extend SILoadStoreOptimizer to handle global loads
There can be situations where global and flat loads and stores are not
combined by the vectorizer, in particular if their address space
differ in the IR but they end up the same class instructions after
selection. For example a divergent load from constant address space
ends up being the same global_load as a load from global address space.

TODO: merge global stores.
TODO: handle SADDR forms.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.

Differential Revision: https://reviews.llvm.org/D120279
2022-02-22 08:42:36 -08:00
Stanislav Mekhanoshin
dc0981562e [AMDGPU] Remove redundand check in the SILoadStoreOptimizer
Differential Revision: https://reviews.llvm.org/D120268
2022-02-21 15:04:44 -08:00
Jay Foad
359a792f9b [AMDGPU] SILoadStoreOptimizer: avoid unbounded register pressure increases
Previously when combining two loads this pass would sink the
first one down to the second one, putting the combined load
where the second one was. It would also sink any intervening
instructions which depended on the first load down to just
after the combined load.

For example, if we started with this sequence of
instructions (code flowing from left to right):

  X A B C D E F Y

After combining loads X and Y into XY we might end up with:

  A B C D E F XY

But if B D and F depended on X, we would get:

  A C E XY B D F

Now if the original code had some short disjoint live ranges
from A to B, C to D and E to F, in the transformed code
these live ranges will be long and overlapping. In this way
a single merge of two loads could cause an unbounded
increase in register pressure.

To fix this, change the way the way that loads are moved in
order to merge them so that:
- The second load is moved up to the first one. (But when
  merging stores, we still move the first store down to the
  second one.)
- Intervening instructions are never moved.
- Instead, if we find an intervening instruction that would
  need to be moved, give up on the merge. But this case
  should now be pretty rare because normal stores have no
  outputs, and normal loads only have address register
  inputs, but these will be identical for any pair of loads
  that we try to merge.

As well as fixing the unbounded register pressure increase
problem, moving loads up and stores down seems like it
should usually be a win for memory latency reasons.

Differential Revision: https://reviews.llvm.org/D119006
2022-02-21 10:51:14 +00:00
Sebastian Neubauer
6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Jay Foad
a456ace9c1 [AMDGPU] SILoadStoreOptimizer: rewrite checkAndPrepareMerge. NFCI.
Separate the function clearly into:
- Checks that can be done on CI and Paired before the loop.
- The loop over all instructions between CI and Paired.
- Checks that must be done on InstsToMove after the loop.

Previously these were mostly done inside the loop in a very
confusing way.

Differential Revision: https://reviews.llvm.org/D118994
2022-02-04 17:17:29 +00:00
Jay Foad
001cb43159 [AMDGPU] SILoadStoreOptimizer: fewer calls to offsetsCanBeCombined
Only call offsetsCanBeCombined with Modify = true in cases
where it will really do something. NFC.
2022-02-04 14:52:11 +00:00
Jay Foad
00bbda07ae [AMDGPU] SILoadStoreOptimizer: simplify class/subclass checks
Also add a comment explaining the difference between class
and subclass. NFCI.
2022-02-04 14:08:04 +00:00
Jay Foad
33ef8bdf36 [AMDGPU] SILoadStoreOptimizer: simplify optimizeInstsWithSameBaseAddr
Common up all the calls to CI.setMI. NFCI.
2022-02-04 13:25:05 +00:00
Jay Foad
ca05edd927 [AMDGPU] SILoadStoreOptimizer: simplify OptimizeListAgain test
At this point CI represents the combined access (original CI combined
with Paired) so it doesn't make any sense to add in Paired.width again.
NFCI.
2022-02-04 13:02:19 +00:00
Jay Foad
68e3946270 [AMDGPU] SILoadStoreOptimizer: break lists on instructions with side effects
This just helps to keep the lists shorter and faster to sort. NFCI.

Differential Revision: https://reviews.llvm.org/D118384
2022-01-28 18:03:42 +00:00
Jay Foad
4b133cee80 [AMDGPU] SILoadStoreOptimizer: reject AGPR DS_WRITE sooner
Rejecting AGPR DS_WRITE instructions before adding them to any mergeable
list seems cleaner than adding them to the list and rejecting them
later.

Differential Revision: https://reviews.llvm.org/D118368
2022-01-27 18:20:46 +00:00
Jay Foad
94a4594c54 [AMDGPU] SILoadStoreOptimizer: use separate lists for AGPR instructions
Using separate lists for AGPR and non-AGPR instructions seems like a
cleaner solution than putting them all in the same list and then later
refusing to merge instructions of different AGPR-ness.

Differential Revision: https://reviews.llvm.org/D118367
2022-01-27 18:20:46 +00:00
Jay Foad
8a52fef1e0 [AMDGPU] SILoadStoreOptimizer: tweak API of CombineInfo::setMI. NFC.
Change CombineInfo::setMI to take a reference to the
SILoadStoreOptimizer instance, for easy access to common fields like
TII and STM.

Differential Revision: https://reviews.llvm.org/D118366
2022-01-27 18:20:46 +00:00
Jay Foad
185cb8e82c [AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access
Swizzled accesses are not merged, but there is no particular reason not
to merge two instructions if any of the intervening instructions happens
to be a swizzled access.

This moves the check for swizzled accesses out of checkAndPrepareMerge
into collectMergeableInsts where I think it makes more sense.

Differential Revision: https://reviews.llvm.org/D118267
2022-01-27 14:40:58 +00:00
Jay Foad
95857a7058 [AMDGPU] SILoadStoreOptimizer: Remove redundant check for volatile
SILoadStoreOptimizer::collectMergeableInsts already ends the current
block if it sees a volatile (or ordered) memory access, so there is no
need to check for them again when scanning the instructions between two
pairing candidates in a block.

Differential Revision: https://reviews.llvm.org/D118266
2022-01-27 10:14:53 +00:00
Jay Foad
63eea41de6 [AMDGPU] Simplify SILoadStoreOptimizer::getSubRegIdxs. NFC. 2022-01-19 15:20:35 +00:00
Kazu Hirata
5a667c0e74 [llvm] Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2021-12-28 08:52:25 -08:00
Christudasan Devadasan
654c89d85a [AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300
2021-11-26 00:42:12 -05:00
Neubauer, Sebastian
d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Martin Liska
c5029023fb Fix building with GCC 12:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380

Differential Revision: https://reviews.llvm.org/D112990
2021-11-02 14:28:00 +01:00
Piotr Sobczak
30d6c39bca [AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM
Extend SILoadStoreOptimizer to merge into DWORDX8 variant of S_BUFFER_LOAD.

Merging into DWORDX2 and DWORDX4 variants is handled already.

Differential Revision: https://reviews.llvm.org/D108909
2021-09-02 16:26:25 +02:00
Carl Ritson
99c790dc21 [AMDGPU] Make BVH isel consistent with other MIMG opcodes
Suffix opcodes with _gfx10.
Remove direct references to architecture specific opcodes.
Add a BVH flag and apply this to diassembly.
Fix a number of disassembly errors on gfx90a target caused by
previous incorrect BVH detection code.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108117
2021-08-17 10:42:22 +09:00
Matt Arsenault
cc79aaced0 AMDGPU: Fix SILoadStoreOptimizer for gfx90a
This was hardcoding the register class to use for the newly created
pointer registers, violating the aligned VGPR requirement.
2021-05-11 21:26:43 -04:00
Stanislav Mekhanoshin
c297709ee1 [AMDGPU] Fixed msan failure with uninitialized value 2021-03-15 13:58:19 -07:00
Stanislav Mekhanoshin
3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Matt Arsenault
78b6d73a93 AMDGPU: Add even aligned VGPR/AGPR register classes
gfx90a operations require even aligned registers, but this was
previously achieved by reserving registers inside the full class.

Ideally this would be captured in the static instruction definitions
for the operands, and we would have different instructions per
subtarget. The hackiest part of this is we need to manually reassign
AGPR register classes after instruction selection (we get away without
this for VGPRs since those types are actually registered for legal
types).
2021-02-24 14:49:37 -05:00
Stanislav Mekhanoshin
75997e8407 [AMDGPU] Fixed msan build
LoadStoreOptimizer was using uninitialized SCC value for
instructions where it is unsupported.
2021-02-17 18:01:23 -08:00
Stanislav Mekhanoshin
a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Jay Foad
23db2d363f [AMDGPU] Better selection of base offset when merging DS reads/writes
When merging a pair of DS reads or writes needs to materialize the base
offset in a vgpr, choose a value that is aligned to as high a power of
two as possible. This maximises the chance that different pairs can use
the same base offset, in which case the base offset registers can be
commoned up by MachineCSE.

Differential Revision: https://reviews.llvm.org/D96421
2021-02-11 17:46:09 +00:00
Jay Foad
2114b458b0 [AMDGPU] Fix comments in SILoadStoreOptimizer::offsetsCanBeCombined 2021-02-10 14:49:33 +00:00
dfukalov
560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
dfukalov
6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Stanislav Mekhanoshin
91f503c3af [AMDGPU] gfx1030 RT support
Differential Revision: https://reviews.llvm.org/D87782
2020-09-16 11:40:58 -07:00
Jay Foad
3497860203 [AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
2020-08-20 17:59:11 +01:00