The `MSB` must not be greater than `GRLen`. Without this patch, newly
added test cases will crash with LoongArch32, resulting in a 'cannot
select' error.
No nans/infs in SelectionDAG is complicated. Hopefully I've captured
all of the cases. I've only applied to ConsiderFlags to the SDNodeFlags
since those are the only ones that will be droped by hoisting. The
condition code and TargetOptions would still be in effect.
Recovers some regression from #84232.
v8f16 is a legal type but promoting to v16f16 would result in an illegal
type.
Let's legalize these by a combination of splitting+promoting resulting
in a pair of v4f16.
Also, we were being overly cautious with different v4f16 nodes. Mark
more of them safe to promote to v4f32.
The current implementation of `isInlinableLiteral16` assumes, a 16-bit
inlinable
literal is either an `i16` or a `fp16`. This is not always true because
of
`bf16`. However, we can't tell `fp16` and `bf16` apart by just looking
at the
value. This patch splits `isInlinableLiteral16` into three versions,
`i16`,
`fp16`, `bf16` respectively, and call the corresponding version.
Teach canCreateUndefOrPoison that ISD::SETCC with integer operands can
never create undef/poison. FP SETCC is more complicated and will be
handled in a future patch.
Teach isGuaranteedNotToBeUndefOrPoison that ISD::CONDCODE is not
poison/undef. Its a special constant only used by setcc/select_cc like
nodes. This is needed since the hoisting will only hoist if exactly one
operand might be poison. setcc has 3 operand including the condition
code.
Recovers some regression from #84232.
This patch adds clang and llvm support for following intrinsic and maps
it to DUPQ instruction:
```
// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
// _bf16, _f16, _f32, _f64
svuint8_t svdup_laneq[_u8](svuint8_t zn, uint64_t imm_idx);
```
This removes, at least when a vector library is available, a failure
case for scalable vectors. Doing so means we can confidently cost vector
FREM instructions without making an assumption that later passes will
transform the IR before it gets to the code generator.
NOTE: Whilst only FREM has been implemented the same mechanism
can be used for the other libm related ISD nodes.
This PR introduces a step after instruction selection where instructions
can be traversed from the perspective of their validity from the
specification point of view. The PR adds also a way to correct
load/store when there is a type mismatch contradicting the specification
-- an additional bitcast is inserted to keep types consistent.
Correspondent test cases are added and existing test cases are
corrected.
This PR helps to successfully validate with the `spirv-val` tool
(https://github.com/KhronosGroup/SPIRV-Tools) some output that
previously led to validation errors and crashes of back translation from
SPIRV to LLVM IR from the side of SPIRV Translator project
(https://github.com/KhronosGroup/SPIRV-LLVM-Translator).
The added step of bringing instructions to required by the specification
type correspondence can be (should be and will be) extended beyond
load/store instructions to ensure validity rules of other SPIRV
instructions related to type inference.
* Create a libcall for s64 type for 32 bit targets.
* Fix a bug in REM selection: SUBREG_TO_REG is not intended to produce a
value from super registers.
* Replace selector tests by end-to-end tests. Other passes
check the selected MIR better.
Select blocks poison, but AND/OR do not. We need to insert a freeze
to block poison propagation.
This creates suboptimal codegen which I will try to fix with other
patches. I'm prioritizing the correctness fix since we have 2 bug reports.
Fixes#84200 and #84350
This is undoing a middle-end transform which does the opposite. Since
X86 doesn't have unsigned vector comparison instructions pre-AVX512,
the simplified form gets worse codegen.
Fixes#66479
Proofs: https://alive2.llvm.org/ce/z/UCz3wtCloses#84104Closes#66479
Recommits llvm/llvm-project#80378 which was reverted in
llvm/llvm-project#84330. The problem was that the change in
llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir used
217 as an opcode instead of a regex.
This patch is stacked on
https://github.com/llvm/llvm-project/pull/80372,
https://github.com/llvm/llvm-project/pull/80307, and
https://github.com/llvm/llvm-project/pull/80306.
ShuffleVector on scalable vector types gets IRTranslate'd to
G_SPLAT_VECTOR since a ShuffleVector that has operates on scalable
vectors is a splat vector where the value of the splat vector is the 0th
element of the first operand, because the index mask operand is the
zeroinitializer (undef and poison are treated as zeroinitializer here).
This is analogous to what happens in SelectionDAG for ShuffleVector.
`buildSplatVector` is renamed to`buildBuildVectorSplatVector`. I did not
make this a separate patch because it would cause problems to revert
that change without reverting this change too.
Currently, the SLS hardening pass is run before the machine outliner,
which means that the outliner creates new functions and calls which do
not have the SLS hardening applied.
The fix for this is to move the SLS passes to after the outliner, as has
recently been done for the return address signing pass.
This also avoids a bug where the SLS outliner emits code with
instructions after a return, which the outliner doesn't correctly
handle.
This is the concat_vector equivalent of #81312, in that we recursively
split concat_vectors with more than two operands into smaller
concat_vectors.
This allows us to break up the chain of vslideups, as well as perform
the vslideups at a smaller LMUL, which in turn reduces register pressure
as the previous lowering performed N vslideups at the highest result
LMUL. For now, it stops splitting past MF2.
This is done as a DAG combine so that any undef operands are combined
away: If we do this during lowering then we end up with unnecessary
vslideups of undefs.
This load isn't selected by tablegen due to the anyext, but wasn't generating
a subreg_to_reg. Maybe it shouldn't be formed at all during the combiner but to stop
crashes later in codegen select it manually for now.
A new widening rule was running before the shuffle was canonicalized into a
homogenous form. Moving the rules around to ensure it's done before the
widening fixes the crash, although this particular test still falls back.
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
Don't just return the known zero upperbits, compute the absdiff Knownbits and perform the horizontal sum.
Add implementations that handle both the X86ISD::PSADBW nodes and the INTRINSIC_WO_CHAIN intrinsics (pre-legalization).
Instead of initializing the accumulator to 0. Initialize it on first
assignment with a mv from the register that holds VLENB << ShiftAmount.
Fix a missing kill flag on the final Add.
I have no real interest in this case, just an easy optimization I
noticed.
This commit adds the -lower-buffer-fat-pointers pass, which is
applicable to all AMDGCN compilations.
The purpose of this pass is to remove the type `ptr addrspace(7)` from
incoming IR. This must be done at the LLVM IR level because `ptr
addrspace(7)`, as a 160-bit primitive type, cannot be correctly handled
by SelectionDAG.
The detailed operation of the pass is described in comments, but, in
summary, the removal proceeds by:
1. Rewriting loads and stores of ptr addrspace(7) to loads and stores of
i160 (including vectors and aggregates). This is needed because the
in-register representation of these pointers will stop matching their
in-memory representation in step 2, and so ptrtoint/inttoptr operations
are used to preserve the expected memory layout
2. Mutating the IR to replace all occurrences of `ptr addrspace(7)` with
the type `{ptr addrspace(8), ptr addrspace(6) }`, which makes the two
parts of a buffer fat pointer (the 128-bit address space 8 resource and
the 32-bit address space 6 offset) visible in the IR. This also impacts
the argument and return types of functions.
3. *Splitting* the resource and offset parts. All instructions that
produce or consume buffer fat pointers (like GEP or load) are rewritten
to produce or consume the resource and offset parts separately. For
example, GEP updates the offset part of the result and a load uses the
resource and offset parts to populate the relevant
llvm.amdgcn.raw.ptr.buffer.load intrinsic call.
At the end of this process, the original mutated instructions are
replaced by their new split counterparts, ensuring no invalidly-typed IR
escapes this pass. (For operations like call, where the struct form is
needed, insertelement operations are inserted).
Compared to LGC's PatchBufferOp (
32cda89776/lgc/patch/PatchBufferOp.cpp
): this pass
- Also handles vectors of ptr addrspace(7)s
- Also handles function boundaries
- Includes the same uniform buffer optimization for loops and
conditionals
- Does *not* handle memcpy() and friends (this is future work)
- Does *not* break up large loads and stores into smaller parts. This
should be handled by extending the legalization
of *.buffer.{load,store} to handle larger types by producing multiple
instructions (the same way ordinary LOAD and STORE are legalized). That
work is planned for a followup commit.
- Does *not* have special logic for handling divergent buffer
descriptors. The logic in LGC is, as far as I can tell, incorrect in
general, and, per discussions with @nhaehnle, isn't widely used.
Therefore, divergent descriptors are handled with waterfall loops later
in legalization.
As a final matter, this commit updates atomic expansion to treat buffer
operations analogously to global ones.
(One question for reviewers: is the new pass is the right place? Should
it be later in the pipeline?)
Differential Revision: https://reviews.llvm.org/D158463
The previous name 'amdgpu_code_object_version', was misleading since
this is really a property of the HSA OS. The new spelling also matches
the asm directive I added in bc82cfb.
Summary:
This patch implements the LLVM floating point environment control
intrinsics and also exposes it through clang. We encode the floating
point environment as a 64-bit value that simply concatenates the values
of the mode registers and the current trap status. We only fetch the
bits relevant for floating point instructions. That is, rounding mode,
denormalization mode, ieee, dx10 clamp, debug, enabled traps, f16
overflow, and active exceptions.