18 Commits

Author SHA1 Message Date
Petar Avramovic
3ad810ea9a
AMDGPU/GlobalISel: Disable LCSSA pass (#124297)
Disable LCSSA pass in preparation for implementing temporal divergence
lowering in amdgpu divergence lowering. Breaks all cases where sgpr or
i1 values are used outside of the cycle with divergent exit.
Regenerate regression tests for amdgpu divergence lowering with LCSSA
disabled.
Update IntrinsicLaneMaskAnalyzer to stop tracking lcssa phis that are
lane masks.
2025-03-12 11:09:50 +01:00
Daniil Fukalov
68d90cff58
[AMDGPU][GlobalISel] Fix assert on APInt creation. (#124608)
Since 3494ee95902cef62f767489802e469c58a13ea04 APInt stopped to
implicitly truncate values, therefore it asserts on a big signed value
converted to (implicitly) unsigned APInt.

The change explicitly marks offset as a signed value.
2025-01-28 15:53:17 +01:00
Petar Avramovic
0ee037b861
AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864)
Lower G_ instructions that can't be inst-selected with register bank
assignment from AMDGPURegBankSelect based on uniformity analysis.
- Lower instruction to perform it on assigned register bank
- Put uniform value in vgpr because SALU instruction is not available
- Execute divergent instruction in SALU - "waterfall loop"

Given LLTs on all operands after legalizer, some register bank
assignments require lowering while other do not.
Note: cases where all register bank assignments would require lowering
are lowered in legalizer.

AMDGPURegBankLegalize goals:
- Define Rules: when and how to perform lowering
- Goal of defining Rules it to provide high level table-like brief
  overview of how to lower generic instructions based on available
  target features and uniformity info (uniform vs divergent).
- Fast search of Rules, depends on how complicated Rule.Predicate is
- For some opcodes there would be too many Rules that are essentially
  all the same just for different combinations of types and banks.
  Write custom function that handles all cases.
- Rules are made from enum IDs that correspond to each operand.
  Names of IDs are meant to give brief description what lowering does
  for each operand or the whole instruction.
- AMDGPURegBankLegalizeHelper implements lowering algorithms

Since this is the first patch that actually enables -new-reg-bank-select
here is the summary of regression tests that were added earlier:
- if instruction is uniform always select SALU instruction if available
- eliminate back to back vgpr to sgpr to vgpr copies of uniform values
- fast rules: small differences for standard and vector instruction
- enabling Rule based on target feature - salu_float
- how to specify lowering algorithm - vgpr S64 AND to S32
- on G_TRUNC in reg, it is up to user to deal with truncated bits
  G_TRUNC in reg is treated as no-op.
- dealing with truncated high bits - ABS S16 to S32
- sgpr S1 phi lowering
- new opcodes for vcc-to-scc and scc-to-vcc copies
- lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC)
- S1 zext and sext lowering to select
- uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into
  SALU instruction
- divergent phi with uniform inputs
- divergent instruction with temporal divergent use, source instruction
  is defined as uniform(AMDGPURegBankSelect) - missing temporal
  divergence lowering
- uniform phi, because of undef incoming, is assigned to vgpr. Will be
  fixed in AMDGPURegBankSelect via another fix in machine uniformity
  analysis.
2025-01-24 12:12:45 +01:00
Petar Avramovic
f8a56df36e
AMDGPU/GlobalISel: AMDGPURegBankSelect (#112863)
Assign register banks to virtual registers. Does not use generic
RegBankSelect. After register bank selection all register operand of
G_ instructions have LLT and register banks exclusively. If they had
register class, reassign appropriate register bank.

Assign register banks using machine uniformity analysis:
Sgpr - uniform values and some lane masks
Vgpr - divergent, non S1, values
Vcc  - divergent S1 values(lane masks)

AMDGPURegBankSelect does not consider available instructions and, in
some cases, G_ instructions with some register bank assignment can't be
inst-selected. This is solved in RegBankLegalize.

Exceptions when uniformity analysis does not work:
S32/S64 lane masks:
- need to end up with sgpr register class after instruction selection
- In most cases Uniformity analysis declares them as uniform
  (forced by tablegen) resulting in sgpr S32/S64 reg bank
- When Uniformity analysis declares them as divergent (some phis),
  use intrinsic lane mask analyzer to still assign sgpr register bank
temporal divergence copy:
- COPY to vgpr with implicit use of $exec inside of the cycle
- this copy is declared as uniform by uniformity analysis
- make sure that assigned bank is vgpr
Note: uniformity analysis does not consider that registers with vgpr def
are divergent (you can have uniform value in vgpr).
- TODO: implicit use of $exec could be implemented as indicator
  that instruction is divergent
2025-01-24 11:06:02 +01:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Shilei Tian
bfcf7a0707
[AMDGPU] Remove hasAtomicFaddRtnForTy as it is not used anywhere (#82841) 2024-02-23 21:14:38 -05:00
Nico Weber
184ca39529
[llvm] Move CodeGenTypes library to its own directory (#79444)
Finally addresses https://reviews.llvm.org/D148769#4311232 :)

No behavior change.
2024-01-25 12:01:31 -05:00
Acim Maravic
01c1c7a19e
[AMDGPU][CodeGen] Update support (soffset + offset) s_buffer_load's (#68302)
getBaseWithConstantOffset() is used for scalar and non-scalar buffer
loads. Diffrence between s_load and load instruction is that s_load
instruction extends 32-bit offset to 64-bits, so a 32-bit (address +
offset) should not cause unsigned 32-bit integer wraparound, because it
performs addition in 64-bits.
2023-11-14 19:06:45 +01:00
NAKAMURA Takumi
9cfeba5b12 Restore CodeGen/LowLevelType from Support
This is rework of;
  - D30046 (LLT)

Since I have introduced `llvm-min-tblgen` as D146352, `llvm-tblgen`
may depend on `CodeGen`.

`LowLevlType.h` originally belonged to `CodeGen`. Almost all userse are
still under `CodeGen` or `Target`. I think `CodeGen` is the right place
to put `LowLevelType.h`.

`MachineValueType.h` may be moved as well. (later, D149024)

I have made many modules depend on `CodeGen`. It is consistent but
inefficient. It will be split out later, D148769

Besides, I had to isolate MVT and LLT in modmap, since
`llvm::PredicateInfo` clashes between `TableGen/CodeGenSchedule.h`
and `Transforms/Utils/PredicateInfo.h`.
(I think better to introduce namespace llvm::TableGen)

Depends on D145937, D146352, and D148768.

Differential Revision: https://reviews.llvm.org/D148767
2023-05-03 00:13:19 +09:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Pierre van Houtryve
4d39552abe [AMDGPU][NFC] Remove isLegalVOP3PShuffleMask
Unused function since D134967

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D138493
2022-11-22 14:31:59 +00:00
Jay Foad
ea09a426a9 [AMDGPU] Assume getDefIgnoringCopies will succeed. NFC.
getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on
valid MIR, so don't bother to check for failure.

Differential Revision: https://reviews.llvm.org/D136238
2022-10-19 11:10:00 +01:00
Ivan Kosarev
5db8d6fd2b [AMDGPU][CodeGen] Support (base | offset) SMEM loads.
Prevents generation of unnecessary s_or_b32 instructions.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D132552
2022-09-05 14:22:06 +01:00
Joe Nash
ae72fee74e [AMDGPU] gfx11 Select on Buffer Atomic FAdd Rtn type
Reviewed By: #amdgpu, foad, rampitec

Differential Revision: https://reviews.llvm.org/D128205
2022-06-23 11:05:32 -04:00
Mirko Brkusanin
4b422708ba [AMDGPU][GlobalISel] Handle G_PTR_ADD when looking for constant offset
Look throught G_PTRTOINT and G_PTR_ADD nodes when looking for constant
offset for buffer stores. This also helps with merging of these instructions
later on.

Differential Revision: https://reviews.llvm.org/D95242
2021-01-28 11:20:09 +01:00
Jay Foad
0ad4d04002 [AMDGPU] Remove an unused return value. NFC.
Differential Revision: https://reviews.llvm.org/D91063
2020-11-10 09:15:14 +00:00
Matt Arsenault
72eef820d5 AMDGPU/GlobalISel: Select G_SHUFFLE_VECTOR
G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel
for VOP3P instructions. Expand it in some other way in case it doesn't
fold into the use instructions.
2020-02-21 13:35:40 -05:00
Matt Arsenault
9861a8538c AMDGPU/GlobalISel: Add new utils file
There are some things that are shareable between the legalizer,
regbankselect, and the selector that don't have an obvious place to
go.
2020-01-03 15:25:50 -05:00