The `STGloop` family of pseudo-instructions all expand to a loop which
iterates over a region of memory setting all its MTE tags to a given
value. The loop writes to the flags in order to check termination. But
the unexpanded pseudo-instructions were not marked as modifying the
flags. Therefore it was possible for one to end up in a location where
the flags were live, and then the loop would corrupt them.
We spotted the effect of this in a libc++ test involving a lot of
complicated inlining, and haven't been able to construct a smaller
test case that demonstrates actual incorrect output code. So my test
here is just checking that `implicit-def $nzcv` shows up on the
pseudo-instructions as they're output from isel.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D158262
The testcases mainly cover three situations:
- the arguments which should be immediates are non immediates.
- the immediate is out of upper limit of the argument type.
- the immediate is out of lower limit of the argument type.
Depends on D155829
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D157570
The function emitFunctionEntryLabel does not look at whether or not a function is a leaf when setting the entry flags,
and instead blindly marks all functions as non-leaf routines. Change it to check if a function is a leaf function and
mark it accordingly.
Some opcodes in MIR are defined to be convergent by the target by setting
IsConvergent in the corresponding TD file. For example, in AMDGPU, the opcodes
G_SI_CALL and G_INTRINSIC* are marked as convergent. But this is too
conservative, since calls to functions that do not execute convergent operations
should not be marked convergent. This information is available in LLVM IR.
The new flag MIFlag::NoConvergent now allows the IR translator to mark an
instruction as not performing any convergent operations. It is relevant only on
occurrences of opcodes that are marked isConvergent in the target.
Differential Revision: https://reviews.llvm.org/D157475
Ensure we only use the eflags results from shift instructions when it won't cause stalls
shift by variable causes stalls as it has to preserve eflags when the shift amount was zero, so we're better off using a separate test
SwiftErrorValueTracking creates vregs at swifterror use sites and then
connects it with appropriate definitions after instruction selection.
To propagate swifterror values SwiftErrorValueTracking::propagateVRegs
iterates over basic blocks in RPO, but some vregs previously created
at use sites may be located in blocks that became unreachable after
instruction selection. Because of that there will no definition for
such vregs and that may cause issues down the pipeline.
To ensure that all vregs created by the SwiftErrorValueTracking will
be defined propagateVRegs was updated to insert IMPLICIT_DEF at the
beginning of unreachable blocks containing swifterror uses.
Related issue: https://github.com/llvm/llvm-project/issues/59751
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D141053
My thought is that we can directly select W instructions using s32.
This will likely require combines and other optimizations eventually,
but this makes a simple starting point.
I'm slowly prototyping a similar approach for SelectionDAG.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D157770
The testcases mainly cover three situations:
- the arguments which should be immediates are non immediates.
- the immediate is out of upper limit of the argument type.
- the immediate is out of lower limit of the argument type.
Depends on D155830
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D157571
Targets may lose some optimization opportunities for certain vector operation
if we reduce BUILD_VECTOR to BITCAST early.
And if VT is not legal, reduce BUILD_VECTOR to BITCAST before LegailizeTypes
can get benefit. Because type-legalizer often scalarizes illegal type of vectors.
Reviewed By: sebastian-ne
Differential Revision: https://reviews.llvm.org/D156645
This patch focuses on power of 2 bytes up to 2x XLen with and without alignment. Other cases will be handled in future patches.
Reviewed By: nitinjohnraj
Differential Revision: https://reviews.llvm.org/D157828
This CL includes two changes:
1. moved clang backend-warnings test cases from Driver/ to CodeGen/.
2. removed multiple `cd "$(dirname "%t")"` and replaced with `-o %t`.
Reviewed By: maskray (Fangrui Song)
Differential Revision: https://reviews.llvm.org/D157565
RISC-V doesn't have flag registers, we need to implement these
with add/sub and compares.
Remove the untested legalization for the signed versions. We can
add it back when we write tests.
Reviewed By: nitinjohnraj
Differential Revision: https://reviews.llvm.org/D157772
On the extract_subvector side, we already have the restriction. With D158201, we'd start getting unprofitable splat combines unless we add the same one on the extract_subvector side.
Differential Revision: https://reviews.llvm.org/D158202
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both). The transform is restricted to index = 0 to avoid having to adjust indices after the transform.
Differential Revision: https://reviews.llvm.org/D158201
DemandedBits is forced to all ones if there are multiple users.
The changes X86 test cases looks like they were miscompiles before.
The value of eax/rax from the cmov is returned from the function in
addition to being used by the sar. That usage needs all bits even
though the sar doesn't.
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both). The transform is restricted to index = 0 to avoid having to adjust indices after the transform.
Reviews, a couple comments on the test changes:
* Mostly RISCV, mostly schedule reordering.
* One real regression in splats-with-mixed-vl.ll due to a different overly aggressive combine, fix in a follow up patch.
* The test/CodeGen/X86/vector-replicaton-i1-mask.ll diff looked concerning at first, but not the mask size at most 4 i1s. I think the type changes on the mask loads are correct, but would welcome a second opinion with someone more familiar with AVX512 codegen.
Differential Revision: https://reviews.llvm.org/D158201
This reverts commit 54d663d5896008c09c938f80357e2a056454bc65, which breaks the test CodeGen/SystemZ/ctpop-01.ll for stage2-ubsan check (see https://lab.llvm.org/buildbot/#/builders/85/builds/18410)
I manually confirmed that the test had been passing immediately prior to that commit
(BUILDBOT_REVISION=4772c66cfb00d60f8f687930e9dd3aa1b6872228 llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_ubsan.sh)
Change lowering store iff the data operand is leagalized. In this way,
llvm can lower only operands first, then lower store instruction later.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D158253
As far as I can tell FeatureLSLFast was originally added to specify that a lsl
of <= 3 was cheap when folded into an addressing operand, so should override
the one-use checks usually intended to make sure we don't perform redundant
work. At a later point it also came to also mean that add x0, x1, x2, lsl N
with N <= 4 was cheap, in that it took a single cycle not multiple cycles that
more complex adds usually take.
This patch splits those two concepts out into separate subtarget features. The
biggest change is the change to AArch64DAGToDAGISel::isWorthFoldingALU, making
ALU operations now produce a ADDWrs if the shift is <= 4.
Otherwise the patch is mostly an NFC as it tries to keep the subtarget features
the same for each cpu. I believe that the Arm OoO CPUs should eventually be
changed to a new subtarget feature that specifies that a shift of 2 or 3 with
any extend should be treated as cheap (just not shifts of 1 or 4).
Differential Revision: https://reviews.llvm.org/D157982
Test cases in D157680 should be target specific, but miss some limit, add them back to make buildbot pass.
Reviewed By: skan, Hahnfeld
Differential Revision: https://reviews.llvm.org/D158252
This makes "lo" refer to the least significant bits and "hi" refer
to the most significant bits.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D158228
This modifies the G_UADDE legalizaton to a version that looks shorter
on Mips and RISC-V when feeding the equivalent IR to SelectionDAG.
This also removes the boolean select from G_USUBE.
Comments taken from LegalizeDAG and tweaked.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D158232
For callee saved/restored operations, they mostly use the
following inst patterns,
sw rs2, offset(x2)
sd rs2, offset(x2)
fsw rs2, offset(x2)
fsd rs2, offset(x2)
lw rd, offset(x2)
ld rd, offset(x2)
flw rd, offset(x2)
fld rd, offset(x2)
and offset decides whether the instructions can be compressed.
now offset 2032 will be set by default if stacksize is bigger
than 2^12-1 to save and restore callee saved register, so it
will prevent all the callee saved/restored stack insts be
compressed.
Allocating proper offset for stack insts is useful to make
them be compressed.
Reviewed By: craig.topper, wangpc
Differential Revision: https://reviews.llvm.org/D157373
Scalar move and splat instruction are only demand the SEW is greater than
its own needs, but floating point vector with SEW=64 is not alwaws valid even
SEW=64 is valid, because we have a special configuration: zve64f.
So we need to check floating point vector instruction with SEW=64 is
valid when compute demand of floating point scalar move and splat
instruction.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D158086
RISC-V found a case where the CombineTo caused N to be CSEd with
an existing node and then deleted. The top level DAGCombiner loop
was surprised to find a node was deleted, but SDValue() was returned
from the visit function.
We need to return SDValue(N, 0) to tell the top level loop that
a change was made, but the worklist updates were already handled.
Fixes#64772.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D158208
If carryin was 1, and RHS is 0xffffffff we were not giving a carry
out.
In that case Res would be equal to LHS, so Res <u LHS would be false.
But there should be a carry out since carryin+RHS wraps around to 0.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D157943
Legalize division and remainder. We test for (s7, s8, s16, s32, s48, s64) on rv64 and (s8, s15, s16, s32, s64, s72, s128) on rv64, with and without the +m, +zmmul extensions. We do not handle types with size > 2 x XLen -- these ought to be handled in the IR pass.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D157422
Legalize multiplication with the +m, +zmmul extensions and without extensions. With extensions, we test for (s7, s8, s16, s32, s48, s64, s96) on rv32 and (s8, s15, s32, s64, s72, s128, s192) on rv64. Without extensions, test (s7, s8, s16, s32) on rv32 and (s8, s15, s16, s32, s64) on rv64. Does not yet work for the type which is 2 times XLen without extensions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D157416
When adjusting the Stack Pointer at the end of the function epilogue,
use a callee-saved register, rather than explicitly using R4 which may
not have been saved.
Differential Revision: https://reviews.llvm.org/D157500