Enable gisel selection for uaddsat and usubsat in true16 flow
This patch includes:
1. Added VGPR_16_Lo128/VGPR_16 to register bank and update register info
for recognizing 16bit regclass id and bit width
2. uaddsat/usubsat test update
This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed.
This is not a sound approach to dealing with this instruction change.
The new behavior is a different opcode pair, not a modifier on the
existing opcode.
For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding *_min_num_fXY/*_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.
For example:
```
void foo(volatile generic int* x) {
__builtin_assume(is_shared(x));
*x = 4;
}
void bar() {
private int y;
foo(&y); // violation, wrong address space
}
```
This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.
This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d0370872f28ec9965448f33db1b105addaf64ae.
Retain LLT type information by creating new LLTs from the original LLT
instead of only using the original scalar size.
This PR prepares for the [LLT FPInfo
RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24)
where LLTs will carry additional floating point type information in
addition to the scalar size.
Use a local pointer type to represent the named barrier in builtin and
intrinsic. This makes the definitions more user friendly
bacause they do not need to worry about the hardware ID assignment. Also
this approach is more like the other popular GPU programming language.
Named barriers should be represented as global variables of addrspace(3)
in LLVM-IR. Compiler assigns the special LDS offsets for those variables
during AMDGPULowerModuleLDS pass. Those addresses are converted to hw
barrier ID during instruction selection. The rest of the
instruction-selection changes are primarily due to the
intrinsic-definition changes.
This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b.
It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
The int_amdgcn_mov_dpp8 is overloaded, but we can only select i32.
To allow a corresponding builtin to be overloaded the same way as
int_amdgcn_mov_dpp we need it to be able to split unsupported values.
This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior mismatches between debug and release builds
The first attempted reapply was #111059
This reverts commit e075dcf7d270fd52dc837163ff24e8c872dfeb49.
This reverts commit 650c41aad2eb43c634a05b2b5799a0c13a73b92f.
The test failures appear to be from conflicts with other PRs that landed around this time.
Certain pointer address spaces were not being correctly handled by the
GlobalISel lowering for buffer_load and buffer_store.
1. ptr addrspace(1) and addrspace(4) did not have rewrite patterns
defined for them, while p0 did, since those pointer types weren't in the
list of types that was iterated to form the patterns.
2. Vectors of pointers need to be bitcast to vectors of the
corresponding scalars, since there doesn't seem to be a good way to
define the rewrite patterns for buffer_load/store of those types
The need to bitcast vectors of pointers was also revealed to affect
ordinary `G_LOAD` and `G_STORE` in some cases, so
`shouldBitcastLoadStore()` has been fixed to handle it properly.
This reverts commit 63b2595846b86b4e4eb9afba5e97dd64e8135c10.
(llvmorg-20-init-6782-g63b2595846b8)
A few bots have been failing on `inst-select-unmerge-values.mir`
Always generate v_cndmask_b32 instead of modifying exec around
v_mov_b32. This is expected to be faster because
modifying exec generally causes pipeline stalls.
Use GCNPat instead of Custom Lowering to select instructions for
intrinsic llvm.fptrunc.round. "SupportedRoundMode : TImmLeaf" is used as
a predicate to select only when the rounding mode is supported.
"as_hw_round_mode : SDNodeXForm" is developed to translate the round
modes to the corresponding ones that hardware recognizes.
This work simplifies and generalizes the instruction definition for
intrinsic llvm.fptrunc.round. We no longer name the instruction with the
rounding mode. Instead, we introduce an immediate operand for the
rounding mode for the pseudo instruction. This immediate will be used to
set up the hardware mode register at the time the real instruction is
generated. We name the pseudo instruction as FPTRUNC_ROUND_F16_F32 (for
f32 -> f16), which is easy to generalize for other types.
"round.towardzero" and "round.tonearest" are added for f32 -> f16
truncating, in addition to the existing "round.upward" and
"round.downward". Other rounding modes are not supported by hardware at
this moment.
Mark these intrinsics as atomic loads within LLVM to prevent hoisting
out of loops in cases where
the load is considered invariant.
Similar to https://github.com/llvm/llvm-project/pull/97707, but for
struct buffer loads.
This patch enables the target-independent lowering of llvm.lround via
GlobalISel. For SelectionDAG, the instrinsic is custom lowered for
AMDGPU. In order to support vector floating point input for llvm.lround,
this patch extends the target independent APIs and provide support for
scalarizing. pr98950 is needed to let verifier allow vector floating
point types
Upstream the intrinsics `llvm.amdgcn.raw.atomic.buffer.load`
and `llvm.amdgcn.raw.atomic.ptr.buffer.load`.
These additional intrinsics mark atomic buffer loads
as atomic to LLVM by removing the `IntrReadMem`
attribute. Otherwise, it could hoist these
intrinsics out of loops in cases where LLVM marks
them as invariant. That can cause issues such as
infinite loops.
Continuation of https://reviews.llvm.org/D138786
with the additional use in the fat buffer lowering,
more test cases and the additional ptr versions
of these intrinsics.
---------
Co-authored-by: rtayl <>
Co-authored-by: Jay Foad <jay.foad@amd.com>
Co-authored-by: Mariusz Sikora <mariusz.sikora@amd.com>
An appropriately configured image resource descriptor can trigger
image_sample instructions to store outputs directly to a linked memory
location instead of returning to VGPRs.
This is opaque to the backend as instruction encoding is unchanged;
however, a mechanism is require to allow frontends to communicate that
these instructions do not require destination VGPRs and store to memory.
Flagging these as stores means they will not be optimized away.
These are incremental changes over #89217 , with core logic being the
same. This patch along with #89217 and #91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
This patch is intended to be the first of a series with end goal to
adapt atomic optimizer pass to support i64 and f64 operations (along
with removing all unnecessary bitcasts). This legalizes 64 bit readlane,
writelane and readfirstlane ops pre-ISel
---------
Co-authored-by: vikramRH <vikhegde@amd.com>
These are redundant with the unsuffixed versions, and have a name
collision with surprising behavior when the base intrinsic is used with
v2bf16.
The global and flat variants should be removed too, but those are complicated
due to using v2i16 in place of the natural v2bf16. Those cases can soon be
completely deleted in favor of atomicrmw.
The GlobalISel codegen change is broken and substitutes handling as bf16
for handling as f16, but it's a bug that this passed the IRTranslator in the first
place.