Since 3494ee95902cef62f767489802e469c58a13ea04 APInt stopped to
implicitly truncate values, therefore it asserts on a big signed value
converted to (implicitly) unsigned APInt.
The change explicitly marks offset as a signed value.
Lower G_ instructions that can't be inst-selected with register bank
assignment from AMDGPURegBankSelect based on uniformity analysis.
- Lower instruction to perform it on assigned register bank
- Put uniform value in vgpr because SALU instruction is not available
- Execute divergent instruction in SALU - "waterfall loop"
Given LLTs on all operands after legalizer, some register bank
assignments require lowering while other do not.
Note: cases where all register bank assignments would require lowering
are lowered in legalizer.
AMDGPURegBankLegalize goals:
- Define Rules: when and how to perform lowering
- Goal of defining Rules it to provide high level table-like brief
overview of how to lower generic instructions based on available
target features and uniformity info (uniform vs divergent).
- Fast search of Rules, depends on how complicated Rule.Predicate is
- For some opcodes there would be too many Rules that are essentially
all the same just for different combinations of types and banks.
Write custom function that handles all cases.
- Rules are made from enum IDs that correspond to each operand.
Names of IDs are meant to give brief description what lowering does
for each operand or the whole instruction.
- AMDGPURegBankLegalizeHelper implements lowering algorithms
Since this is the first patch that actually enables -new-reg-bank-select
here is the summary of regression tests that were added earlier:
- if instruction is uniform always select SALU instruction if available
- eliminate back to back vgpr to sgpr to vgpr copies of uniform values
- fast rules: small differences for standard and vector instruction
- enabling Rule based on target feature - salu_float
- how to specify lowering algorithm - vgpr S64 AND to S32
- on G_TRUNC in reg, it is up to user to deal with truncated bits
G_TRUNC in reg is treated as no-op.
- dealing with truncated high bits - ABS S16 to S32
- sgpr S1 phi lowering
- new opcodes for vcc-to-scc and scc-to-vcc copies
- lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC)
- S1 zext and sext lowering to select
- uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into
SALU instruction
- divergent phi with uniform inputs
- divergent instruction with temporal divergent use, source instruction
is defined as uniform(AMDGPURegBankSelect) - missing temporal
divergence lowering
- uniform phi, because of undef incoming, is assigned to vgpr. Will be
fixed in AMDGPURegBankSelect via another fix in machine uniformity
analysis.
Assign register banks to virtual registers. Does not use generic
RegBankSelect. After register bank selection all register operand of
G_ instructions have LLT and register banks exclusively. If they had
register class, reassign appropriate register bank.
Assign register banks using machine uniformity analysis:
Sgpr - uniform values and some lane masks
Vgpr - divergent, non S1, values
Vcc - divergent S1 values(lane masks)
AMDGPURegBankSelect does not consider available instructions and, in
some cases, G_ instructions with some register bank assignment can't be
inst-selected. This is solved in RegBankLegalize.
Exceptions when uniformity analysis does not work:
S32/S64 lane masks:
- need to end up with sgpr register class after instruction selection
- In most cases Uniformity analysis declares them as uniform
(forced by tablegen) resulting in sgpr S32/S64 reg bank
- When Uniformity analysis declares them as divergent (some phis),
use intrinsic lane mask analyzer to still assign sgpr register bank
temporal divergence copy:
- COPY to vgpr with implicit use of $exec inside of the cycle
- this copy is declared as uniform by uniformity analysis
- make sure that assigned bank is vgpr
Note: uniformity analysis does not consider that registers with vgpr def
are divergent (you can have uniform value in vgpr).
- TODO: implicit use of $exec could be implemented as indicator
that instruction is divergent
getBaseWithConstantOffset() is used for scalar and non-scalar buffer
loads. Diffrence between s_load and load instruction is that s_load
instruction extends 32-bit offset to 64-bits, so a 32-bit (address +
offset) should not cause unsigned 32-bit integer wraparound, because it
performs addition in 64-bits.
This is rework of;
- D30046 (LLT)
Since I have introduced `llvm-min-tblgen` as D146352, `llvm-tblgen`
may depend on `CodeGen`.
`LowLevlType.h` originally belonged to `CodeGen`. Almost all userse are
still under `CodeGen` or `Target`. I think `CodeGen` is the right place
to put `LowLevelType.h`.
`MachineValueType.h` may be moved as well. (later, D149024)
I have made many modules depend on `CodeGen`. It is consistent but
inefficient. It will be split out later, D148769
Besides, I had to isolate MVT and LLT in modmap, since
`llvm::PredicateInfo` clashes between `TableGen/CodeGenSchedule.h`
and `Transforms/Utils/PredicateInfo.h`.
(I think better to introduce namespace llvm::TableGen)
Depends on D145937, D146352, and D148768.
Differential Revision: https://reviews.llvm.org/D148767
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.
Differential Revision: https://reviews.llvm.org/D139828
getDefIgnoringCopies and getSrcRegIgnoringCopies should not fail on
valid MIR, so don't bother to check for failure.
Differential Revision: https://reviews.llvm.org/D136238
Look throught G_PTRTOINT and G_PTR_ADD nodes when looking for constant
offset for buffer stores. This also helps with merging of these instructions
later on.
Differential Revision: https://reviews.llvm.org/D95242
G_SHUFFLE_VECTOR is legal since it theoretically may help match op_sel
for VOP3P instructions. Expand it in some other way in case it doesn't
fold into the use instructions.