For unbuffered smem loads, it is illegal for the immediate offset to be
negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is
negative.
New PR of https://github.com/llvm/llvm-project/pull/79553.
Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to
check subtarget features for consistency and diagnose any
inconsistencies. To start with, the implementation just checks that
either wavefrontsize32 or wavefrontsize64 is selected.
checkSubtargetFeatures is called at the start of instruction selection.
This is pretty arbitrary. It is just a convenient point at which we have
access to the subtarget that we're going to use for codegenning a
particular function.
buffer_load instructions that use TFE also need to zero initialize
return values similar to how the image instructions currently work. Add
support for this with standard zero init of all results + zero init of
just TFE flag when enable-prt-strict-null subtarget feature is disabled.
AMDGPUInstructionSelector should no longer attempt to select S1 G_PHIs.
Remove MIR test that attempts to inst-select divergent vcc(S1) G_PHI.
Lane mask merging algorithm for GlobalISel is now responsible for
selecting divergent S1 G_PHIs in AMDGPUGlobalISelDivergenceLowering.
Uniform S1 G_PHIs should be lowered to S32 G_PHIs in reg bank select
pass. In summary S1 G_PHIs should not reach AMDGPUInstructionSelector.
Initialize ModOpcode directly before the loop execution to silence
static analyzer warnings about the usage of an uninitialized variable.
This leads to a redundant assignment of ElV2F16 inside the first loop
execution, but also avoids superfluous emptiness checks of EltsV2F16
after the first execution of the loop.
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.
TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.
patch 3 from: https://github.com/llvm/llvm-project/pull/73337
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.
TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.
patch 3 from: https://github.com/llvm/llvm-project/pull/73337
https://github.com/llvm/llvm-project/pull/70634 has disabled use
of potentially negative scratch offsets, but we can use it on GFX12.
---------
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Implement PhiLoweringHelper for GlobalISel in DivergenceLoweringHelper.
Use machine uniformity analysis to find divergent i1 phis and select
them as lane mask phis in same way SILowerI1Copies select VReg_1 phis.
Note that divergent i1 phis include phis created by LCSSA and all cases
of uses outside of cycle are actually covered by "lowering LCSSA phis".
GlobalISel lane masks are registers with sgpr register class and S1 LLT.
TODO: General goal is that instructions created in this pass are fully
instruction-selected so that selection of lane mask phis is not split
across multiple passes.
patch 3 from: https://github.com/llvm/llvm-project/pull/73337
In order to ensure the correctness of ptr addrspace(7) lowering, we need
a backwards-compatible way to flag buffer intrinsics as volatile that
can't be dropped (unlike metadata).
To acheive this in a backwards-compatible way, we use bit 31 of the
auxilliary immediates of buffer intrinsics as the volatile flag. When
this bit is set, the MachineMemOperand for said intrinsic is marked
volatile. Existing code will ensure that this results in the appropriate
use of flags like glc and dlc.
This commit also harmorizes the handling of the auxilliary immediate for
atomic intrinsics, which new go through extract_cpol like loads and
stores, which masks off the volatile bit.
This is used to select the source modifier (neg) from the immediate
operand. After a follow up commit this will no longer be DOTIU specific.
Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>
For scratch load/store, our hardware only accept non-negative value in
SGPR/VGPR. Besides the case that we can prove from known bits, we can
also prove that the value in `base` will be non-negative: 1.) When the
ADD for the address calculation has NonUnsignedWrap flag. 2.) When the
immediate offset is already negative.
getBaseWithConstantOffset() is used for scalar and non-scalar buffer
loads. Diffrence between s_load and load instruction is that s_load
instruction extends 32-bit offset to 64-bits, so a 32-bit (address +
offset) should not cause unsigned 32-bit integer wraparound, because it
performs addition in 64-bits.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
For the f64 case, this gives us a cheaper to materialize 32-bit
constant. It's less obviously a win for f32 and f16. It forces us to
use a VOP3 encoding so it's a neutral code size change.
GlobalISel cases don't work because of the constant-is-copy-to-vgpr
problem.
https://reviews.llvm.org/D157111
Introduced the convergent equivalent of the existing G_INTRINSIC opcodes:
- G_INTRINSIC_CONVERGENT
- G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS
Out of the targets that currently have some support for GlobalISel, the patch
assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D154766
In SelectionDAG InstrEmitter automatically puts dead flags on unused
physreg defs everywhere. The generated selectors should also set dead
on physreg defs that were not used in the pattern.
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.
Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().
Reviewed By: arsenm, Pierre-vh
Differential Revision: https://reviews.llvm.org/D155556
This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.
This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa.
The changes did not cover all occurrences of the deteleted function
MachineInstr::getIntrinsicID().
Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic
ID is the first non-def operand to the instruction. These are now represented as
a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID()
is now moved to this subclass GIntrinsic.
Some target-defined instructions behave like GMIR intrinsics, and have an
Intrinsic::ID operand. But they should not be recognized as generic intrinsics,
and should not use GIntrinsic::getIntrinsicID(). Separated these out by
introducing a new AMDGPU::getIntrinsicID().
Reviewed By: arsenm, Pierre-vh
Differential Revision: https://reviews.llvm.org/D155556
This isn't always folded to fneg for a freestanding fsub depending on
the denormal mode. When matching source modifiers, we're implicitly
canonicalizing the input so we can fold it here.
Doesn't bother handling the VOP3P case since it's only relevant with
DAZ, which nobody really uses with f16.
For f64, tests show an existing bug where DAGCombiner tries to respect
the denormal mode for fsub -0, x, but not after it's lowered to fadd
-0, (fneg x). Either the fold is wrong or we shouldn't restrict the
fsub case based on the denormal mode.
https://reviews.llvm.org/D155652
This time without the extra `->dump()`
A recent addition to the device libs, `__ockl_dm_trim`, caused a series of
failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function.
The quick fix for this is to support codegen for this rare case.
A proper long-term fix for this type of issue is still being discussed.
Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D155050
A recent addition to the device libs, `__ockl_dm_trim`, caused a series of
failures at O0 due to a i64 ballot intrinsic being inlined into a wave32 function.
The quick fix for this is to support codegen for this rare case.
A proper long-term fix for this type of issue is still being discussed.
Fixes SWDEV-408929, SWDEV-408957, SWDEV-409885, SWDEV-410193
Reviewed By: #amdgpu, arsenm
Differential Revision: https://reviews.llvm.org/D155050
`KnownBits` is also a type name. Having a field with this name
prevents derived classes from using the `KnownBits` type unless they use `struct KnownBits`.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D155082
Use the new matchtable-based combiner backend for all AMDGPU combiners.
This drop-in from the user's perspective; there are no test changes, the new combiner behaves exactly like the old one.
Depends on D153757
NOTE: This would land iff D153757 (RFC) lands too.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153758
Makes `InstructionSelector.h`/`InstructionSelectorImpl.h` generic so the match tables can also be used for the combiner.
Some notes:
- Coverage was made an optional parameter of `executeMatchTable`, combines won't use it for now.
- `GIPFP_` -> `GICXXPred_` so it's more generic. Those are just C++ predicates and aren't PatFrag-specific.
- Pass the MatcherState directly to testMIPredicate_MI, the combiner will need it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153755