Addition of a check in the MachineVerifier to detect and report illegal
vector registers to SGPR copies in the AMDGPU backend, ensuring correct
code generation.
We can enforce this check only after SIFixSGPRCopies pass.
This is half-fix in the pipeline with the help of isSSA MachineFuction
property, the check is happening for passes after phi-node-elimination.
-Improve messages.
-Remove redundant checks that are handled in generic code.
-Add check that the subvector is smaller than the vector.
-Add checks that subvector is smaller than the vector.
The implementation was missing the fact that `G_EXTRACT_SUBVECTOR`
destination and source vector can be different types.
Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate
the correct opcode.
Clarify the G_EXTRACT_SUBVECTOR specification.
These tests just don't check the output written to the current directory. The
current directory may be write protected e.g. in a sandboxed environment.
The Testcases that use -emit-llvm and -verify only care about stdout/stderr
and are in this patch changed to use -emit-llvm-only to avoid writing to an
output file. The verify-inlineasmbr.mir testcase that also only care about
stdout/stderr is in this patch changed to throw away the output file and just
write to /dev/null.
This patch legalizes G_ZEXT, G_SEXT, and G_ANYEXT. If the type is a
legal mask type, then the instruction is legalized as the element-wise
select, where the condition on the select is the mask typed source
operand, and the true and false values are 1 or -1 (for
zero/any-extension and sign extension) and zero. If the type is a legal integer
or vector integer type, then the instruction is marked as legal.
The legalization of the extends may introduce a G_SPLAT_VECTOR, which
needs to be legalized in this patch for the extend test cases to pass.
A G_SPLAT_VECTOR is legal if the vector type is a legal integer or
floating point vector type and the source operand is sXLen type. This is
because the SelectionDAG patterns only support sXLen typed
ISD::SPLAT_VECTORS, and we'd like to reuse those patterns. A
G_SPLAT_VECTOR is cutom legalized if it has a legal s1 element vector
type and s1 scalar operand. It is legalized to G_VMSET_VL or G_VMCLR_VL
if the splat is all ones or all zeros respectivley. In the case of a
non-constant mask splat, we legalize by promoting the scalar value to
s8.
In order to get the s8 element vector back into s1 vector, we use a
G_ICMP. In order for the splat vector and extend tests to pass, we also
need to legalize G_ICMP in this patch.
A G_ICMP is legal if the destination type is a legal bool vector and the LHS and
RHS are legal integer vector types.
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.
These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.
Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.
G_INSERT and G_EXTRACT are not sufficient to use to represent both
INSERT/EXTRACT on a subregister and INSERT/EXTRACT on a vector.
We would like to be able to INSERT/EXTRACT on vectors in cases that
INSERT/EXTRACT on vector subregisters are not sufficient, so we add
these opcodes.
I tried to do a patch where we treated G_EXTRACT as both
G_EXTRACT_SUBVECTOR and G_EXTRACT_SUBREG, but ran into an infinite loop
at this
[point](8b5b294ec2/llvm/lib/Target/RISCV/RISCVISelLowering.cpp (L9932))
in the SDAG equivalent code.
Recommits llvm/llvm-project#80378 which was reverted in
llvm/llvm-project#84330. The problem was that the change in
llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir used
217 as an opcode instead of a regex.
This patch is stacked on
https://github.com/llvm/llvm-project/pull/80372,
https://github.com/llvm/llvm-project/pull/80307, and
https://github.com/llvm/llvm-project/pull/80306.
ShuffleVector on scalable vector types gets IRTranslate'd to
G_SPLAT_VECTOR since a ShuffleVector that has operates on scalable
vectors is a splat vector where the value of the splat vector is the 0th
element of the first operand, because the index mask operand is the
zeroinitializer (undef and poison are treated as zeroinitializer here).
This is analogous to what happens in SelectionDAG for ShuffleVector.
`buildSplatVector` is renamed to`buildBuildVectorSplatVector`. I did not
make this a separate patch because it would cause problems to revert
that change without reverting this change too.
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
Currently the asm parser takes `v_writelane_b32 v1, s13, m0` as illegal
instruction for pre-gfx11 because it uses two constant buses while the
hardware
can only allow one. However, based on the comment of
`AMDGPUInstructionSelector::selectWritelane`,
it is allowed to have M0 as lane selector and a SGPR used as SRC0
because the
lane selector doesn't count as a use of constant bus. In fact, codegen
can already
generate this form, but this inconsistency is not exposed because the
validation
of constant bus limitation only happens when paring an assembly but we
don't have
a test case when both SGPR and M0 used as source operands for the
instruction.
Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
We were allowing extra immediate arguments, and only bothering to check
if registers were implicit or not.
Also consolidate extra operand checks in verifier, to make this
testable. We had 3 different places checking if you were trying to build
an instruction with more operands than allowed by the definition. We had
an assertion in addOperand, a direct check in the MIRParser to avoid the
assertion, and the machine verifier checks. Remove the assert and parser
check so the verifier can provide a consistent verification experience,
which will also handle instructions modified in place.
This change fixes#71518, which compared the KnownMinValue of the
scalable virtual register with the FixedSize of the physical register in
the wrong direction. It turns out that we cannot include this check at all since
it will lead to a false failures. Test cases are added to show that
the false failures no longer occur after this fix.
…gSizeInBits
This patch changes getRegSizeInBits to return a TypeSize instead of an
unsigned in the case that a virtual register has a scalable LLT. In the
case that register is physical, a Fixed TypeSize is returned.
The MachineVerifier pass is updated to allow copies between fixed and
scalable operands as long as the Src size will fit into the Dest size.
This is a precommit which will be stacked on by a change to GISel to
generate COPYs with a scalable destination but a fixed size source.
This patch is stacked on https://github.com/llvm/llvm-project/pull/70893
for the ability to use scalable vector types in MIR tests.
This is consistent with the IR verifier and SelectionDAG's getNode.
Update tests accordingly. I tried to keep some coverage of non-pow2 when
possible. X86 didn't like a G_UNMERGE_VALUES from s48 to 3 s16 that got
created when I tried s48.
In the process of splitting out the liveness tracking, I ran into
these cases which should have been caught. There are still missing
errors for some cases in the entry block.
https://reviews.llvm.org/D127104
Refresh of the generic scheduling model to use A510 instead of A55.
Main benefits are to the little core, and introducing SVE scheduling information.
Changes tested on various OoO cores, no performance degradation is seen.
Differential Revision: https://reviews.llvm.org/D156799
D105263 changed this test to not expect a MachineVerifier error on this
instruction:
; FP16 reg is sub_reg of xmm
%0:_(s16) = COPY $xmm0
D107082 changed the behaviour back again so that this instruction did
cause an error, but the test was not updated to expect the error.
Differential Revision: https://reviews.llvm.org/D148534
We currently have a bug where the legalizer, when dealing with phi operands,
may create instructions in the phi's incoming blocks at points which are effectively
dead due to a possible exception throw.
Say we have:
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
B returnbb
bb:
%v = phi i1 %true, throwbb, %false....
When legalizing we may need to widen the i1 %true value, and to do that we need
to create new extension instructions in the incoming block. Our insertion point
currently is the MBB::getFirstTerminator() which puts the IP before the unconditional
branch terminator in throwbb. These extensions may never be executed if the call
throws, and therefore we need to emit them before the call (but not too early, since
our new instruction may need values defined within throwbb as well).
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
%true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws!
B returnbb
bb:
%v = phi i32 %true, throwbb, %false....
To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START
is a terminator, which tries to model the fact that in the IR, the original invoke inst
is actually a terminator as well. By using that as the new insertion point, we
make sure to place new instructions on always executing paths.
Unfortunately we still need to make the legalizer use a new insertion point API
that I've added, since the existing `getFirstTerminator()` method does a reverse
walk up the block, and any non-terminator instructions cause it to bail out. To
avoid impacting compile time for all `getFirstTerminator()` uses, I've added a new
method that does a forward walk instead.
Differential Revision: https://reviews.llvm.org/D137905
Verify three cases of G_UNMERGE_VALUES separately:
1. Splitting a vector into subvectors (the converse of
G_CONCAT_VECTORS).
2. Splitting a vector into its elements (the converse of
G_BUILD_VECTOR).
3. Splitting a scalar into smaller scalars (the converse of
G_MERGE_VALUES).
Previously #1 allowed strange combinations like this:
%1:_(<2 x s16>),%2:_(<2 x s16>) = G_UNMERGE_VALUES %0(<2 x s32>)
This has been tightened up to check that the source and destination
element types match, and some MIR test cases updated accordingly.
Differential Revision: https://reviews.llvm.org/D111132
Instruction G_IS_FPCLASS had an operand that represented floating-point
semantics of its first operand. It allowed types that have the same length,
like `bfloat16` and `half`, to be distinguished. Unfortunately, it is
not sufficient, as other operation still cannot distinguish such types.
Solution of this problem must be more general, so now this operand is removed.
Differential Revision: https://reviews.llvm.org/D138004
MachineVerifier tried to checkLivenessAtDef() ignoring it is actually a subreg.
The issue was with processing two subregs of the same reg are used in the same
instruction (e.g. inline asm): "def early-clobber" and other just "def".
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D126661
Test for a case we observed after the initial implementation of D129997
landed, in which case we observed a crash while building the ppc64le
Linux kernel. In that case, we had one block with two exits, both to the
same successor. Removing one of the exits corrupted the
successor/predecessor lists.
So when we have an INLINEASM_BR, check a few things for each indirect
target:
1. that it exists.
2. that it is listed in our successors.
3. that its predecessor list contains the parent MBB of INLINEASM_BR.
This would have caught the regression discovered after D129997 landed,
after the pass that was problematic (early-tailduplication) rather than
getting a stack trace in a later pass (regalloc) that doesn't understand
the anomaly and crashes.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D130290