52796 Commits

Author SHA1 Message Date
Simon Tatham
b09c575975 [AArch64] Add Defs=[NZCV] to MTE loop pseudos.
The `STGloop` family of pseudo-instructions all expand to a loop which
iterates over a region of memory setting all its MTE tags to a given
value. The loop writes to the flags in order to check termination. But
the unexpanded pseudo-instructions were not marked as modifying the
flags. Therefore it was possible for one to end up in a location where
the flags were live, and then the loop would corrupt them.

We spotted the effect of this in a libc++ test involving a lot of
complicated inlining, and haven't been able to construct a smaller
test case that demonstrates actual incorrect output code. So my test
here is just checking that `implicit-def $nzcv` shows up on the
pseudo-instructions as they're output from isel.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D158262
2023-08-21 09:17:25 +01:00
chenli
0c76f46ca6 [LoongArch] Add testcases of LSX intrinsics with immediates
The testcases mainly cover three situations:
- the arguments which should be immediates are non immediates.
- the immediate is out of upper limit of the argument type.
- the immediate is out of lower limit of the argument type.

Depends on D155829

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D157570
2023-08-21 11:04:19 +08:00
Neumann Hon
43207225b6 Revert "[SystemZ][z/OS] Fix the entry point marker for leaf functions"
This reverts commit 8af297bbb8e97de8908b857eae1a44f46a0d5afe.

Testcase LLVM :: MC/GOFF/ppa1.ll needs to be updated to account for this.
2023-08-20 22:04:02 -04:00
Neumann Hon
8af297bbb8 [SystemZ][z/OS] Fix the entry point marker for leaf functions
The function emitFunctionEntryLabel does not look at whether or not a function is a leaf when setting the entry flags,
and instead blindly marks all functions as non-leaf routines. Change it to check if a function is a leaf function and
mark it accordingly.
2023-08-20 21:53:13 -04:00
Freddy Ye
6acff5390d [X86] Support -march=gracemont
gracemont has some different tuning features from alderlake.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D158046
2023-08-21 08:49:01 +08:00
Sameer Sahasrabuddhe
ef38e6d97f [GlobalISel] introduce MIFlag::NoConvergent
Some opcodes in MIR are defined to be convergent by the target by setting
IsConvergent in the corresponding TD file. For example, in AMDGPU, the opcodes
G_SI_CALL and G_INTRINSIC* are marked as convergent. But this is too
conservative, since calls to functions that do not execute convergent operations
should not be marked convergent. This information is available in LLVM IR.

The new flag MIFlag::NoConvergent now allows the IR translator to mark an
instruction as not performing any convergent operations. It is relevant only on
occurrences of opcodes that are marked isConvergent in the target.

Differential Revision: https://reviews.llvm.org/D157475
2023-08-20 21:14:46 +05:30
Nico Weber
3d22dac6c3 Revert "[clang][test] Refine clang machine-function-split tests."
This reverts commit b9d079d6188b50730e0a67267b7fee36008435ce.
Breaks tests on Windows, see https://reviews.llvm.org/D157565#4600939
2023-08-20 10:38:29 -04:00
Simon Pilgrim
2c090e9e67 [X86] Add test case for Issue #64655 2023-08-20 15:34:47 +01:00
Simon Pilgrim
9405b67a9e [X86] Add test coverage for PR33879 (Issue #33226)
Ensure we only use the eflags results from shift instructions when it won't cause stalls

shift by variable causes stalls as it has to preserve eflags when the shift amount was zero, so we're better off using a separate test
2023-08-20 15:32:46 +01:00
Simon Pilgrim
95865e5138 [DAG] SimplifyDemandedBits - if we're only demanding the signbit, a SMIN/SMAX node can be simplified to a OR/AND node respectively.
Alive2: https://alive2.llvm.org/ce/z/MehvFB

REAPPLIED from 54d663d5896008 with fix for using the correct DemandedBits mask.
2023-08-20 14:20:49 +01:00
Simon Pilgrim
ca10a6caee [X86] Add test coverage for min/max signbit simplification
If we're only demanding the signbit from a min/max then we can simplify this to a logic op
2023-08-20 14:20:49 +01:00
Filipp Zhinkin
08d0b558f5 [SwiftError] Use IMPLICIT_DEF as a definition for unreachable VReg uses
SwiftErrorValueTracking creates vregs at swifterror use sites and then
connects it with appropriate definitions after instruction selection.
To propagate swifterror values SwiftErrorValueTracking::propagateVRegs
iterates over basic blocks in RPO, but some vregs previously created
at use sites may be located in blocks that became unreachable after
instruction selection. Because of that there will no definition for
such vregs and that may cause issues down the pipeline.

To ensure that all vregs created by the SwiftErrorValueTracking will
be defined propagateVRegs was updated to insert IMPLICIT_DEF at the
beginning of unreachable blocks containing swifterror uses.

Related issue: https://github.com/llvm/llvm-project/issues/59751

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D141053
2023-08-20 13:00:31 +02:00
Simon Pilgrim
1b95661616 [AArch64] Regenerate sve-fixed-length-fp-minmax.ll
Should remove the D158053 diffs
2023-08-20 11:46:44 +01:00
Craig Topper
d6cd49dd9a [RISCV][GISel] Add legalizer tests for G_SEXT/ZEXT from s32 to s64 for rv64. 2023-08-19 21:28:48 -07:00
Craig Topper
b41e75c8a4 [RISCV][GISel] Make s32 a legal type for RV64 for any operation that has a W version.
My thought is that we can directly select W instructions using s32.

This will likely require combines and other optimizations eventually,
but this makes a simple starting point.

I'm slowly prototyping a similar approach for SelectionDAG.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D157770
2023-08-19 11:20:42 -07:00
chenli
82bbf7003c [LoongArch] Add testcases of LASX intrinsics with immediates
The testcases mainly cover three situations:
- the arguments which should be immediates are non immediates.
- the immediate is out of upper limit of the argument type.
- the immediate is out of lower limit of the argument type.

Depends on D155830

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D157571
2023-08-19 17:14:16 +08:00
chenli
83311b2b5d [LoongArch] Add LASX intrinsic testcases
Depends on D155830

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D155835
2023-08-19 17:12:31 +08:00
chenli
f3aa441631 [LoongArch] Add LSX intrinsic testcases
Depends on D155829

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D155834
2023-08-19 17:10:46 +08:00
Jim Lin
18f5ada244 [DAGCombiner] Don't reduce BUILD_VECTOR to BITCAST before LegalizeTypes if VT is legal.
Targets may lose some optimization opportunities for certain vector operation
if we reduce BUILD_VECTOR to BITCAST early.

And if VT is not legal, reduce BUILD_VECTOR to BITCAST before LegailizeTypes
can get benefit. Because type-legalizer often scalarizes illegal type of vectors.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D156645
2023-08-19 12:53:50 +08:00
Craig Topper
92464ccbad [RISCV][GISel] Initial legalization support for G_LOAD and G_STORE.
This patch focuses on power of 2 bytes up to 2x XLen with and without alignment. Other cases will be handled in future patches.

Reviewed By: nitinjohnraj

Differential Revision: https://reviews.llvm.org/D157828
2023-08-18 20:17:19 -07:00
Han Shen
b9d079d618 [clang][test] Refine clang machine-function-split tests.
This CL includes two changes:
1. moved clang backend-warnings test cases from Driver/ to CodeGen/.
2. removed multiple `cd "$(dirname "%t")"` and replaced with `-o %t`.

Reviewed By: maskray (Fangrui Song)
Differential Revision: https://reviews.llvm.org/D157565
2023-08-18 18:05:47 -07:00
Craig Topper
3e569883fa [RISCV][GISel] Lower G_UADDE, G_UADDO, G_USUBE, and G_USUBO
RISC-V doesn't have flag registers, we need to implement these
with add/sub and compares.

Remove the untested legalization for the signed versions. We can
add it back when we write tests.

Reviewed By: nitinjohnraj

Differential Revision: https://reviews.llvm.org/D157772
2023-08-18 17:22:30 -07:00
Philip Reames
92e0c0dc1a [DAG] Restrict insert_subvector undef, splat_veector, dontcare transform
On the extract_subvector side, we already have the restriction. With D158201, we'd start getting unprofitable splat combines unless we add the same one on the extract_subvector side.

Differential Revision: https://reviews.llvm.org/D158202
2023-08-18 12:44:09 -07:00
Philip Reames
67b71ad04a [DAG] Fold insert_subvector undef, (extract_subvector X, 0), 0 with non-matching types
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both). The transform is restricted to index = 0 to avoid having to adjust indices after the transform.

Differential Revision: https://reviews.llvm.org/D158201
2023-08-18 12:28:27 -07:00
Craig Topper
bbbb93eb48 Revert "[DAG] Fold insert_subvector undef, (extract_subvector X, 0), 0 with non-matching types"
This reverts commit 770be43f6782dab84d215d01b37396d63a9c2b6e.

Forgot to remove from my tree while experimenting.
2023-08-18 12:00:07 -07:00
Craig Topper
0a5347f40d [DAG] SimplifyDemandedBits - Use DemandedBits intead of OriginalDemandedBits to when simplifying UMIN/UMAX to AND/OR.
DemandedBits is forced to all ones if there are multiple users.

The changes X86 test cases looks like they were miscompiles before.
The value of eax/rax from the cmov is returned from the function in
addition to being used by the sar. That usage needs all bits even
though the sar doesn't.
2023-08-18 11:59:18 -07:00
Craig Topper
770be43f67 [DAG] Fold insert_subvector undef, (extract_subvector X, 0), 0 with non-matching types
We have an existing DAG combine for when an insert/extract subvector pair is entirely a nop, but we hadn't handled the case where the net result was either an insert or an extract (but not both).  The transform is restricted to index = 0 to avoid having to adjust indices after the transform.

Reviews, a couple comments on the test changes:
* Mostly RISCV, mostly schedule reordering.
* One real regression in splats-with-mixed-vl.ll due to a different overly aggressive combine, fix in a follow up patch.
* The test/CodeGen/X86/vector-replicaton-i1-mask.ll diff looked concerning at first, but not the mask size at most 4 i1s.  I think the type changes on the mask loads are correct, but would welcome a second opinion with someone more familiar with AVX512 codegen.

Differential Revision: https://reviews.llvm.org/D158201
2023-08-18 11:59:18 -07:00
Thurston Dang
29b2009061 Revert "[DAG] SimplifyDemandedBits - if we're only demanding the signbit, a SMIN/SMAX node can be simplified to a OR/AND node respectively."
This reverts commit 54d663d5896008c09c938f80357e2a056454bc65, which breaks the test CodeGen/SystemZ/ctpop-01.ll for stage2-ubsan check (see https://lab.llvm.org/buildbot/#/builders/85/builds/18410)

I manually confirmed that the test had been passing immediately prior to that commit
(BUILDBOT_REVISION=4772c66cfb00d60f8f687930e9dd3aa1b6872228 llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_bootstrap_ubsan.sh)
2023-08-18 18:08:10 +00:00
Pravin Jagtap
c931f2e6fd [AMDGPU] Autogenerate & pre-commit tests for D156301 and D157388
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157712
2023-08-18 09:50:44 -04:00
Simon Pilgrim
4cd1c07491 [DAG] SimplifyDemandedBits - if we're only demanding the msb, a UMIN/UMAX node can be simplified to a AND/OR node respectively.
Alive2: https://alive2.llvm.org/ce/z/qnvmc6
2023-08-18 12:12:22 +01:00
Simon Pilgrim
54d663d589 [DAG] SimplifyDemandedBits - if we're only demanding the signbit, a SMIN/SMAX node can be simplified to a OR/AND node respectively.
Alive2: https://alive2.llvm.org/ce/z/MehvFB
2023-08-18 11:35:34 +01:00
Carl Ritson
ad9eed1e77 [MachineVerifier] Verify LiveIntervals for PHIs
Implement basic support for verifying LiveIntervals for PHIs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156872
2023-08-18 18:14:22 +09:00
Kazushi (Jam) Marukawa
2e2395651e [VE] Change the way of lowering store
Change lowering store iff the data operand is leagalized.  In this way,
llvm can lower only operands first, then lower store instruction later.

Reviewed By: efocht

Differential Revision: https://reviews.llvm.org/D158253
2023-08-18 17:13:55 +09:00
David Green
42b3419339 [AArch64] Split LSLFast into Addr and ALU parts
As far as I can tell FeatureLSLFast was originally added to specify that a lsl
of <= 3 was cheap when folded into an addressing operand, so should override
the one-use checks usually intended to make sure we don't perform redundant
work. At a later point it also came to also mean that add x0, x1, x2, lsl N
with N <= 4 was cheap, in that it took a single cycle not multiple cycles that
more complex adds usually take.

This patch splits those two concepts out into separate subtarget features. The
biggest change is the change to AArch64DAGToDAGISel::isWorthFoldingALU, making
ALU operations now produce a ADDWrs if the shift is <= 4.

Otherwise the patch is mostly an NFC as it tries to keep the subtarget features
the same for each cpu. I believe that the Arm OoO CPUs should eventually be
changed to a new subtarget feature that specifies that a shift of 2 or 3 with
any extend should be treated as cheap (just not shifts of 1 or 4).

Differential Revision: https://reviews.llvm.org/D157982
2023-08-18 08:59:24 +01:00
XinWang10
b7cf9bbfde Fix regression of D157680
Test cases in D157680 should be target specific, but miss some limit, add them back to make buildbot pass.

Reviewed By: skan, Hahnfeld

Differential Revision: https://reviews.llvm.org/D158252
2023-08-18 00:12:10 -07:00
XinWang10
993bdb047c [X86]Support options -mno-gather -mno-scatter
Gather instructions could lead to security issues, details please refer to https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html.
This supported options -mno-gather and -mno-scatter, which could avoid generating gather/scatter instructions in backend except using intrinsics or inline asms.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D157680
2023-08-17 23:02:25 -07:00
4vtomat
29f11e4fb7 [RISCV] Bump vector crypto to v1.0 RC2
Differential Revision: https://reviews.llvm.org/D158067
2023-08-17 21:19:59 -07:00
Craig Topper
f64eb69d96 [RISCV][GISel] Swap lo/hi register names in legalizer tests.
This makes "lo" refer to the least significant bits and "hi" refer
to the most significant bits.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158228
2023-08-17 20:37:42 -07:00
Craig Topper
c6dee6982f [GlobalISel][Mips] Sync G_UADDE and G_USUBE legalization with LegalizeDAG.
This modifies the G_UADDE legalizaton to a version that looks shorter
on Mips and RISC-V when feeding the equivalent IR to SelectionDAG.
This also removes the boolean select from G_USUBE.

Comments taken from LegalizeDAG and tweaked.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158232
2023-08-17 20:36:55 -07:00
laichunfeng
13454a6e87 [RISCV] Compress stack insts by adjust offset.
For callee saved/restored operations, they mostly use the
following inst patterns,
sw rs2, offset(x2)
sd rs2, offset(x2)
fsw rs2, offset(x2)
fsd rs2, offset(x2)

lw rd, offset(x2)
ld rd, offset(x2)
flw rd, offset(x2)
fld rd, offset(x2)
and offset decides whether the instructions can be compressed.
now offset 2032 will be set by default if stacksize is bigger
than 2^12-1 to save and restore callee saved register, so it
will prevent all the callee saved/restored stack insts be
compressed.
Allocating proper offset for stack insts is useful to make
them be compressed.

Reviewed By: craig.topper, wangpc

Differential Revision: https://reviews.llvm.org/D157373
2023-08-18 10:49:53 +08:00
Kito Cheng
0816b3efbf [RISCV] Check floating point vector instruction with SEW=64 is valid when vsetvl insertion
Scalar move and splat instruction are only demand the SEW is greater than
its own needs, but floating point vector with SEW=64 is not alwaws valid even
SEW=64 is valid, because we have a special configuration: zve64f.

So we need to check floating point vector instruction with SEW=64 is
valid when compute demand of floating point scalar move and splat
instruction.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D158086
2023-08-18 10:31:01 +08:00
Kito Cheng
b00b4697ae [RISCV] Precommit test for D158086
Test case for demonstrate invalid vsetvli insertion case

Differential Revision: https://reviews.llvm.org/D158087
2023-08-18 10:14:23 +08:00
Craig Topper
846fbb06b8 [DAGCombiner][RISCV] Return SDValue(N, 0) instead of SDValue() after 2 calls to CombineTo in visitSTORE.
RISC-V found a case where the CombineTo caused N to be CSEd with
an existing node and then deleted. The top level DAGCombiner loop
was surprised to find a node was deleted, but SDValue() was returned
from the visit function.

We need to return SDValue(N, 0) to tell the top level loop that
a change was made, but the worklist updates were already handled.

Fixes #64772.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158208
2023-08-17 15:13:36 -07:00
Craig Topper
ebb2e5ebb2 [GlobalISel][Mips] Correct corner case in G_UADDE legalization.
If carryin was 1, and RHS is 0xffffffff we were not giving a carry
out.

In that case Res would be equal to LHS, so Res <u LHS would be false.
But there should be a carry out since carryin+RHS wraps around to 0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157943
2023-08-17 15:06:16 -07:00
Nitin John Raj
b5c106e873 [RISCV][GlobalISel] Legalize division and remainder
Legalize division and remainder. We test for (s7, s8, s16, s32, s48, s64) on rv64 and (s8, s15, s16, s32, s64, s72, s128) on rv64, with and without the +m, +zmmul extensions. We do not handle types with size > 2 x XLen -- these ought to be handled in the IR pass.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157422
2023-08-17 14:41:40 -07:00
Hiroshi Yamauchi
3406934e4d [MC][COFF][AArch64] Fix the storage class for private linkage symbols.
Use IMAGE_SYM_CLASS_STATIC like X86.

Differential Revision: https://reviews.llvm.org/D158122
2023-08-17 13:54:12 -07:00
Nitin John Raj
638865c8f9 [RISCV][GlobalISel] Legalize multiplication
Legalize multiplication with the +m, +zmmul extensions and without extensions. With extensions, we test for (s7, s8, s16, s32, s48, s64, s96) on rv32 and (s8, s15, s32, s64, s72, s128, s192) on rv64. Without extensions, test (s7, s8, s16, s32) on rv32 and (s8, s15, s16, s32, s64) on rv64. Does not yet work for the type which is 2 times XLen without extensions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157416
2023-08-17 12:59:34 -07:00
Nitin John Raj
b03c7efe9a [RISCV][GlobalISel] Test legalization for bitshifting with wider types
We test for (s48, s64, s96) on rv32 and (s72, s128, s192) on rv64.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157415
2023-08-17 11:53:32 -07:00
Joe Nash
6aab000874 [AMDGPU] Convert fmul-2-combine-multi-use test to auto-gen
NFC. Deletes the unused SI runline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D158198
2023-08-17 14:23:20 -04:00
Keith Walker
2d9c6e699a [Thumb1] Use callee-saved register to adjust stack pointer
When adjusting the Stack Pointer at the end of the function epilogue,
use a callee-saved register, rather than explicitly using R4 which may
not have been saved.

Differential Revision: https://reviews.llvm.org/D157500
2023-08-17 18:29:50 +01:00