Lowering in GlobalISel for AMDGPU previously always narrows to i32 on
truncating store regardless of mem size or scalar size, causing issues
with types like i65 which is first extended to i128 then stored as i64 +
i8 to i128 locations. Narrowing only on store to pow of 2 mem location
ensures only narrowing to mem size near end of legalization.
This LLVM defect was identified via the AMD Fuzzing project.
This concerns offset computations for kernargs and
RegBankLegalizeHelper::splitLoad, which should all be within the bounds of a
memory object. See #150392 for the motivation for introducing the
buildObjectPtrOffset function.
For SWDEV-516125.
The hardware min/max follow the IR rules with IEEE mode disabled,
so we can avoid the canonicalizes of the input. We lose the quieting
of a signaling nan if both inputs are nans, but we only require that
with strictfp.
The latest asics support v_cvt_pk_f16_f32 instruction. However current
implementation of vector fptrunc lowering fully scalarizes the vectors,
and the scalar conversions may not always be combined to generate the
packed one.
We made v2f32 -> v2f16 legal in
https://github.com/llvm/llvm-project/pull/139956. This work is an
extension to handle wider vectors. Instead of fully scalarization, we
split the vector to packs (v2f32 -> v2f16) to ensure the packed
conversion can always been generated.
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.
We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
Fix for a bug found by the AMD fuzzing project.
The legaliser would originally try to widen a small vector such as `<4 x
i1>` to a single `i16` during the legalisation of bitshifts, as it was
not originally written with consideration for vector operands. This
patch simply adds a guard to prohibit this transformation and allow
other legalisation transformations to step in.
This is the bare minimum to get the intrinsic to compile for AMDGPU,
and it's not optimal. We need to follow along closer with the existing
G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case
better.
Just re-use the existing lowering for the old semantics for
G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's
treatment,
nor try to handle the general expansion without an underlying min/max
variant (or with G_FMINIMUM/G_FMAXIMUM).
Legalize the amdgcn.dead intrinsic to work with types other than i32. It
still generates IMPLICIT_DEFs.
Remove some of the previous code for selecting/reg bank mapping it for
32-bit types, since everything is done in the legalizer now.
On targets that support v_cvt_pk_f16_f32 instruction, if we make v2f64
-> v2f16 Legal, we will generate the following sequence of instructions:
v_cvt_f32_f64_e32 v1, s[6:7]
v_cvt_f32_f64_e32 v2, s[4:5]
v_cvt_pk_f16_f32 v1, v2, v1
It possibly returns imprecise results due to double rounding. This patch
fixes the issue by not setting the conversion Legal. While we may still
expect the above sequence of code when unsafe fpmath is set, I hope
https://github.com/llvm/llvm-project/pull/134738 can address that
performance concern.
Fixes: SWDEV-523856
Enable gisel selection for uaddsat and usubsat in true16 flow
This patch includes:
1. Added VGPR_16_Lo128/VGPR_16 to register bank and update register info
for recognizing 16bit regclass id and bit width
2. uaddsat/usubsat test update
This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed.
This is not a sound approach to dealing with this instruction change.
The new behavior is a different opcode pair, not a modifier on the
existing opcode.
For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding *_min_num_fXY/*_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
These cannot be static compile errors, and should be treated as
poison. Invalid casts may be introduced which are dynamically dead.
For example:
```
void foo(volatile generic int* x) {
__builtin_assume(is_shared(x));
*x = 4;
}
void bar() {
private int y;
foo(&y); // violation, wrong address space
}
```
This could produce a compile time backend error or not depending on
the optimization level. Similarly, the new test demonstrates a failure
on a lowered atomicrmw which required inserting runtime address
space checks. The invalid cases are dynamically dead, we should not
error, and the AtomicExpand pass shouldn't have to consider the details
of the incoming pointer to produce valid IR.
This should go to the release branch. This fixes broken -O0 compiles
with 64-bit atomics which would have started failing in
1d0370872f28ec9965448f33db1b105addaf64ae.
Retain LLT type information by creating new LLTs from the original LLT
instead of only using the original scalar size.
This PR prepares for the [LLT FPInfo
RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24)
where LLTs will carry additional floating point type information in
addition to the scalar size.
Use a local pointer type to represent the named barrier in builtin and
intrinsic. This makes the definitions more user friendly
bacause they do not need to worry about the hardware ID assignment. Also
this approach is more like the other popular GPU programming language.
Named barriers should be represented as global variables of addrspace(3)
in LLVM-IR. Compiler assigns the special LDS offsets for those variables
during AMDGPULowerModuleLDS pass. Those addresses are converted to hw
barrier ID during instruction selection. The rest of the
instruction-selection changes are primarily due to the
intrinsic-definition changes.
This reverts commit 8a849a2a567d4e519b246a16936b6e7519936d4b.
It seems I missed a spot when trying to ensure the code in the
instruction selection tests were actually legalized MIR.
The int_amdgcn_mov_dpp8 is overloaded, but we can only select i32.
To allow a corresponding builtin to be overloaded the same way as
int_amdgcn_mov_dpp we need it to be able to split unsupported values.
This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior mismatches between debug and release builds
The first attempted reapply was #111059
This reverts commit e075dcf7d270fd52dc837163ff24e8c872dfeb49.