45410 Commits

Author SHA1 Message Date
Craig Topper
e94dc58dff [RISCV] Inline scalar ceil/floor/trunc/rint/round/roundeven.
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.

The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.

The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.

I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.

We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.

Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136508
2022-10-26 14:36:49 -07:00
Craig Topper
0a03240fb4 [RISCV] Add tests for fixed vector sshl_sat/ushl_sat. NFC 2022-10-26 14:15:47 -07:00
Sanjay Patel
54eeadcf44 [SDAG] avoid vector extract/insert around binop
scalar-to-vector (scalar binop (extractelt V, Idx), C) --> shuffle (vector binop V, C'), {Idx, -1, -1...}

We generally try to avoid ad-hoc vectorization in SDAG,
but the motivating case from issue #39482 escapes our
normal vectorization folds in IR. It seems like it should
always be a win to transform this pattern in cases where
we have the same vector type for input and output and the
target supports the vector operation. That avoids
transfers from vector to scalar and back.

In the x86 shift examples, we create the scalar-to-vector
node during legalization. I'm not sure if there's a more
general way to create the pattern for testing. (If so, I
could add tests for other targets.)

Differential Revision: https://reviews.llvm.org/D136713
2022-10-26 14:04:46 -04:00
Piyou Chen
7d7940fd77 [RISCV] add svinval extension
1. Add the svinval extension support
2. Add the svinval Predicates for its instruction

Note: the svinval instructions defined in https://reviews.llvm.org/D117654

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136571
2022-10-26 09:45:30 -07:00
Craig Topper
a61b74889f [RISCV] Use vslide1down for i64 insertelt on RV32.
Instead of using vslide1up, use vslide1down and build the other
direction. This avoids the overlap constraint early clobber of
vslide1up.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136735
2022-10-26 09:43:12 -07:00
Yashwant Singh
14fb4040e2 [AMDGPU][test] precommiting tests for D136663
More tests for si-peephole-sdwa pass
2022-10-26 22:08:28 +05:30
Sanjay Patel
ef9dfcd6cd [x86] add tests for extract + insert of vector shift amount; NFC 2022-10-26 11:20:14 -04:00
Haohai Wen
21f23a37c6 [SelectionDAG] Clamp stack alignment for memset, memmove
memcpy has clamped dst stack alignment to NaturalStackAlignment if
hasStackRealignment is false. We should also clamp stack alignment
for memset and memmove. If we don't clamp, SelectionDAG may first
do tail call optimization which requires no stack realignment. Then
memmove, memset in same function may be lowered to load/store with
larger alignment leading to PEI emit stack realignment code which
is absolutely not correct.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D136456
2022-10-26 16:45:31 +08:00
Pierre van Houtryve
c1b2920c6e [AMDGPU] Autogenerate llvm.amdgcn.fcmp.ll
Prep commit for adding GISel run lines to that test.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136591
2022-10-26 07:00:34 +00:00
chenglin.bi
9403a8bc37 [GlobalISel][AArch64] Fix miscompile caused by wrong G_ZEXT selection in GISel
The miscompile case's G_ZEXT has a G_FREEZE source.  Similar to D127154, this patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes also in GISel.

Fix #58431

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D136433
2022-10-26 09:54:13 +08:00
Guozhi Wei
d24c93cc41 [X86] Enable reassociation for ADD instructions
ADD is an associative and commutative operation, so we can do reassociation for it.

Differential Revision: https://reviews.llvm.org/D136396
2022-10-26 00:46:13 +00:00
Douglas Yung
fc40c73921 Revert "Update supported features in the generic CPU configuration"
This reverts commit 11afbf396e10e1b1e91a5991e2aec1916e29a910.

There are 10 tests still failing after follow-up fix b5d0bf9b9853, this should get the following bots back to green:
 - https://lab.llvm.org/buildbot/#/builders/183/builds/8194
 - https://lab.llvm.org/buildbot/#/builders/186/builds/9491
 - https://lab.llvm.org/buildbot/#/builders/214/builds/3908
 - https://lab.llvm.org/buildbot/#/builders/93/builds/11740
 - https://lab.llvm.org/buildbot/#/builders/231/builds/4200
 - https://lab.llvm.org/buildbot/#/builders/121/builds/24519
 - https://lab.llvm.org/buildbot/#/builders/230/builds/4466
 - https://lab.llvm.org/buildbot/#/builders/94/builds/11639
 - https://lab.llvm.org/buildbot/#/builders/45/builds/9325
 - https://lab.llvm.org/buildbot/#/builders/124/builds/5219
 - https://lab.llvm.org/buildbot/#/builders/67/builds/8623
 - https://lab.llvm.org/buildbot/#/builders/123/builds/13836
 - https://lab.llvm.org/buildbot/#/builders/109/builds/49355
 - https://lab.llvm.org/buildbot/#/builders/58/builds/27751
 - https://lab.llvm.org/buildbot/#/builders/117/builds/9922
 - https://lab.llvm.org/buildbot/#/builders/16/builds/37012
 - https://lab.llvm.org/buildbot/#/builders/104/builds/9490
 - https://lab.llvm.org/buildbot/#/builders/42/builds/7725
 - https://lab.llvm.org/buildbot/#/builders/196/builds/20077
 - https://lab.llvm.org/buildbot/#/builders/3/builds/15217
 - https://lab.llvm.org/buildbot/#/builders/6/builds/15251
 - https://lab.llvm.org/buildbot/#/builders/9/builds/15247
 - https://lab.llvm.org/buildbot/#/builders/36/builds/26487
 - https://lab.llvm.org/buildbot/#/builders/54/builds/2474
 - https://lab.llvm.org/buildbot/#/builders/74/builds/14536
 - https://lab.llvm.org/buildbot/#/builders/5/builds/28555
2022-10-25 16:34:08 -07:00
Dan Gohman
11afbf396e Update supported features in the generic CPU configuration
Accompanying https://reviews.llvm.org/D125728, this updates LLVM
Codegen's "generic" CPU to enable the same new features.

Differential Revision: https://reviews.llvm.org/D125729
2022-10-25 11:42:32 -07:00
Artem Belevich
0e8a414ab3 [CUDA, NVPTX] Added basic __bf16 support for NVPTX.
Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.

Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
enabled on the host should pose no issues, correctness-wise.

Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward.

Differential Revision: https://reviews.llvm.org/D136311
2022-10-25 11:08:06 -07:00
Joe Nash
01b8140d3a [AMDGPU] Fix delay alu for VOPD with src2acc
V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to
inform passes that the instructions read the dst operand. The VOPD
versions of these instructions lacked the dummy operand, which was a
problem for inserting s_delay_alu.
Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand
tracking logic to account for it.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D136629
2022-10-25 13:11:17 -04:00
Mircea Trofin
87ec22de70 [mlgo] More wildcarding in extra features logging for regalloc
May need a different testing approach for opcodes.
2022-10-25 08:20:55 -07:00
Simon Pilgrim
ed1b0da557 [X86] combineConcatVectorOps - fold v4i64/v8x32 concat(broadcast(),broadcast()) -> permilps(concat())
Extend the existing v4f64 fold to handle v4i64/v8f32/v8i32 as well

Fixes #58585
2022-10-25 15:37:42 +01:00
Simon Pilgrim
fcbaf6f4e8 [X86] Add v4i64 test coverage for #58585
Turns out we fail to do this for concat_v4i64(broadcast_v2i64,broadcast_v2i64) as well
2022-10-25 15:03:08 +01:00
Simon Pilgrim
b92725ecbc [X86] Add test coverage for #58585 2022-10-25 14:33:55 +01:00
Simon Pilgrim
c4051b2606 [X86] Fold vbroadcast(bitcast(vbroadcast(src))) -> bitcast(vbroadcast(vbroadcast(src)))
If the inner broadcast scalar type is smaller/same width as the outer broadcast scalar type then we can broadcast using the same inner type directly. Works for vbroadcast_load as well.
2022-10-25 14:03:43 +01:00
chenglin.bi
e95c74b423 [AArch64] Add precommit test for bcmp; NFC 2022-10-25 17:23:03 +08:00
Cullen Rhodes
1e02a29e47 [AArch64][SVE] Use more flag-setting instructions
If OP in PTEST(PG, OP(PG, ...)) has a flag-setting variant change the
opcode so the PTEST becomes redundant. This patch extends this existing
optimization in AArch64::optimizePTestInstr to cover all flag-setting
opcodes.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D136083
2022-10-25 09:02:21 +00:00
Cullen Rhodes
5621caeb82 [AArch64][SVE] NFC: extend tests for flag-setting predicate instructions
A follow on patch will extend existing

  PTEST(PG, OP(PG, ...)) -> OP_FLAG_SETTING(PG, ...)

optimization in AArch64InstrInfo::optimizePTestInstr to cover more of
the flag-setting instructions

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D136161
2022-10-25 09:02:20 +00:00
Thomas Symalla
1f23cf4e50 [NFC][AMDGPU] Pre-commit test for D136432
Nested BFI instruction with multiple uses.
2022-10-25 10:52:32 +02:00
Jay Foad
325927ffb9 [X86] Update LiveVariables in more cases in convertToThreeAddress
Following on from D129634, this patch fixes more X86 CodeGen test
failures with D129213 applied, which adds verification of LiveIntervals
after the TwoAddressInstruction pass runs. These failures only showed up
with LLVM_ENABLE_EXPENSIVE_CHECKS=ON which adds the equivalent of an
implicit -verify-machineinstrs on all tests.

Differential Revision: https://reviews.llvm.org/D136596
2022-10-25 09:21:51 +01:00
Sander de Smalen
19b9e6204a [AArch64][SME] Fix chain for arm_locally_streaming functions.
The Chain wasn't set correctly in the DAG for functions marked
with aarch64_pstate_sm_body, which meant that SelectionDAG would
dead-code some of the CopyToReg's. This didn't show up in the
existing tests because all uses were in the same block, but when
adding some control-flow, suddenly things would break.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D136579
2022-10-25 08:14:51 +00:00
Freddy Ye
fdac4c4e92 [X86] Add CMPCCXADD instructions.
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html

Reviewed By: pengfei, skan

Differential Revision: https://reviews.llvm.org/D135933
2022-10-25 14:33:39 +08:00
Craig Topper
a54f3347e8 [RISCV] Add shift amount operands of shift, rotate, and Zbs instructions to hasAllNBitUsers. 2022-10-24 22:07:22 -07:00
Craig Topper
223f466f4f [RISCV] Add ORI to hasAllNBitUsers.
If the immediate is negative with sufficient leading ones, then
the upper bits of the other operand aren't demanded.
2022-10-24 21:33:17 -07:00
Xiang Li
996267d20e [DirectX backend] set target triple to "dxil-ms-dx"
Set target triple to "dxil-ms-dx" for DXIL at the end of DXILTranslateMetadata.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D131545
2022-10-24 14:49:31 -07:00
Zhiyao Ma
7e8af2fc0c [ARM] Support -mexecute-only with -mlong-calls.
Instead of using constant pools, use movw movt pair.

Differential Revision: https://reviews.llvm.org/D136203
2022-10-24 11:41:24 -07:00
Guozhi Wei
f298bfb09b [X86] New test case for reassociation of ADD instructions.
This is a pre-commit test case for D136396.

Differential Revision: https://reviews.llvm.org/D136501
2022-10-24 17:46:46 +00:00
Roman Lebedev
377f27be87
[X86] DAGTypeLegalizer::ModifyToType(): when widening w/ zeros, insert into undef and and-mask the padding away
We can expect that the sequence of inserting-of-extracts-into-undef
will be successfully lowered back into widening of the source vector,
but it seems that at least for X86 mask vectors, we have a really hard time
recovering from inserting-into-zero.

I've looked into alternative fix injection points, and they are much more
involved, by the time of `LowerBUILD_VECTORvXi1()`/`LowerINSERT_VECTOR_ELT()`
the constants might be obscured, so it does not seem like we can easily
deal with this by lowering into bit math later on,
some other pieces are missing.

Instead, it seems like just clearing the padding away via an `AND`-mask
is at least not a worse choice. Why create a problem where there wasn't one.
Though yes, it is possible that there are cases where constants originate
from the source IR, so some other fix may still be needed.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D136046
2022-10-24 20:27:02 +03:00
Craig Topper
1fa8fd4c33 Recommit "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant."
This reverts commit 65aaecca8842dec30d03734a7fe8ce33c5afec81.

There was an ordering problem in the calculation of the partial
remainder.

Original commit message:

If the divisor is even, we can first shift the dividend and divisor
right by the number of trailing zeros. Now the divisor is odd and we
can do the original algorithm to calculate a remainder. Then we shift
that remainder left by the number of trailing zeros and add the bits
that were shifted out of the dividend.

Differential Revision: https://reviews.llvm.org/D135541
2022-10-24 10:08:50 -07:00
Simon Pilgrim
d81919dd7e [X86] 2012-01-12-extract-sv.ll - add AVX2 test coverage 2022-10-24 17:55:00 +01:00
Ahmed Bougacha
718bb22c28 [AArch64][PAC] Select XPAC for ptrauth.strip intrinsic.
Differential Revision: https://reviews.llvm.org/D132385
2022-10-24 08:15:56 -07:00
Amy Kwan
715301056e [PowerPC] Fix invalid cast for vector shuffles when lowering to the xxsplti32dx instruction.
When lowering vector shuffles into the xxsplti32dx instruction on Power10, we
canonicalize the right operand to be a BUILD_VECTOR and as a result, get the
commuted vector shuffle node.

However, a vector shuffle will not always be returned as the result for a
commuted vector shuffle. In such a scenario, this patch updates the original
cast of a shuffle into a dyn_cast<> and checks if the shuffle is a valid vector
shuffle node prior to obtaining the commuted shuffle mask.

This patch also adds a new test case that demonstrates this scenario (primarily
seen on 32-bit), and was originally a crash prior to this fix.

Differential Revision: https://reviews.llvm.org/D135024
2022-10-24 09:56:54 -05:00
Craig Topper
65aaecca88 Revert "[TargetLowering][RISCV][X86] Support even divisors in expandDIVREMByConstant."
This reverts commit f6a7b47820904c5e69cc4f133d382c74a87c44e8.

I received a report that this fails on 32-bit X86.
2022-10-24 07:12:54 -07:00
Petar Avramovic
cbc378ecb8 GlobalISel: Artifact combine merge-like and unmerges into merge-like
Recognize when sub-vectors have been split to elements which are used to
build large vector.
This happens when instructions have different vector sizes available.
For example a few arithmetic instruction are required to process all
elements of larger vector that can be stored using one instruction.

Differential Revision: https://reviews.llvm.org/D109242
2022-10-24 13:33:06 +02:00
Petar Avramovic
e6c778f861 GlobalISel: Artifact combine merge-like and unmerge into unmerge
Recognize when source could have been unmerged to pieces with DstTy
without having to split source to smaller elements
and then merge small elements into DstTy pieces.
This happens when vector was meant to be split to sub-vectors but there
was leftover. At this point artifact combiner have already dealt with
leftover and we can continue to use sub-vectors.

Differential Revision: https://reviews.llvm.org/D109241
2022-10-24 13:33:05 +02:00
Petar Avramovic
f1aa598046 GlobalISel: Artifact combine merge-like and unmerge into copy
Recognize copy that is represented as split of a source register to
elements that were reassembled to another register with the same type.

Differential Revision: https://reviews.llvm.org/D109240
2022-10-24 13:33:05 +02:00
Petar Avramovic
51b98db487 GlobalISel: Precommit for artifact combine patches
Differential Revision: https://reviews.llvm.org/D117655
2022-10-24 13:33:05 +02:00
Simon Pilgrim
fd5f3abb07 [DAG] Fold (abs (sign_extend_inreg x)) -> (zero_extend (abs (truncate x))) (PR43370)
If the upper half of an abs() is all sign bits, then we can perform the abs() using just the lower half and then zero extend.

I've limited the DAG combine to only sign_extend_inreg (and free truncate/zero_extend) to minimise any later promotion issues, but for legalization a similar fold can use ComputeNumSignBits to be more aggressive.

Alive2: https://alive2.llvm.org/ce/z/y32fS4

Fixes #43370

Differential Revision: https://reviews.llvm.org/D136559
2022-10-24 10:27:08 +01:00
Piyou Chen
f8b8426861 [RISCV] Add Svnapot extension
Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D136570
2022-10-24 01:27:04 -07:00
gonglingqin
3be059377e [LoongArch] Add support for ISD::FRAMEADDR and ISD::RETURNADDR
For now, only support lowering frame/return address for current frame.

Differential Revision: https://reviews.llvm.org/D136215
2022-10-24 15:12:28 +08:00
Pierre van Houtryve
eccdedd6f7 [AMDGPU] Autogenerate icmp codegen test
Switch to autogenerated tests so we can use the same test for GISel and DAGIsel.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136446
2022-10-24 06:37:50 +00:00
Min-Yih Hsu
718a9793b4 [M68k][NFC] Use OS and ABI agnostic triple in codegen tests
Use 'm68k' (i.e. m68k-unknown-unknown) in all codegen tests rather
than m68k-linux-gnu. NFC.
2022-10-23 15:26:13 -07:00
Craig Topper
a41c1f3168 [RISCV] Make selectShiftMask look for negate opportunities after looking through AND.
Previously we would only look for an AND or a negate. But its
possible there is a negate after looking through the AND.
2022-10-23 14:23:13 -07:00
Simon Pilgrim
fe7bc7153a [X86] Add abs(sext_inreg(x)) test coverage for Issue #43370 2022-10-23 18:17:19 +01:00
Simon Pilgrim
4e8f847676 [X86][AVX512] Fold extract_element(bitcast(<X x i1>) -> bitcast(extract_subvector())
On AVX512, extract legal bool vectors as bool subvectors before bitcasting to scalars to avoid spilling to stack.

This helps rust which internally represents bool vectors as bool arrays

It also exposes more missed opportunities to use the KADD instruction to add masks together before moving to gpr

Fixes #58546
2022-10-23 14:47:24 +01:00