53324 Commits

Author SHA1 Message Date
Simon Pilgrim
fa9881d6a9 [X86] vector-bitreverse.ll - add AVX512BW+AVX512VL test coverage 2024-05-17 15:04:22 +01:00
Momchil Velikov
a0cc1ab978
[AArch64] Add intrinsics for multi-vector to ZA array vector accumulators (#91606)
[Recommit of e88ba6d975d887ca001cae30bfa0c53d91165148]

According to the specification in
https://github.com/ARM-software/acle/pull/309 this adds the intrinsics

void_svadd_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svadd_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x2_f16(uint32_t slice, svfloat16x2_t zn)
__arm_streaming __arm_inout("za");
void_svsub_za16_vg1x4_f16(uint32_t slice, svfloat16x4_t zn)
__arm_streaming __arm_inout("za");

as well as the corresponding `bf16` variants.
2024-05-17 15:02:53 +01:00
Sander de Smalen
3a32590f25
[AArch64] Avoid using NEON FCVTXN in Streaming-SVE mode. (#91981) 2024-05-17 14:11:28 +01:00
Jay Foad
ac092925c3
[SelectionDAG] Widen cttz to cttz_zero_undef (#92514)
Instead of widening e.g. i8 cttz(x) to i16 cttz(x | 0x100), use the more
optimizable form cttz_zero_undef(x | 0x100) since the widened operand is
definitely not zero.
2024-05-17 12:39:40 +01:00
Matt Arsenault
ddb87e0f96
SystemZ: Use REG_SEQUENCE for PAIR128 (#90640)
PAIR128 should probably just be removed entirely

Depends #90638
2024-05-17 13:16:34 +02:00
David Green
4349ffb3fa [SelectOpt] Add tests for not select conditions. NFC 2024-05-17 11:34:01 +01:00
Johannes Reifferscheid
698cf0176b
Fix i1 array global crash in NVPTXAsmPrinter. (#92506)
See the test file. At head, this crashes with

```
assertion failed at llvm/lib/Support/APInt.cpp:492 in
uint64_t llvm::APInt::extractBitsAsZExtValue(unsigned int, unsigned int) const:
  bitPosition < BitWidth && (numBits + bitPosition) <= BitWidth &&
  "Illegal bit extraction"
```
2024-05-17 12:06:35 +02:00
Vyacheslav Levytskyy
e3e06135eb
[SPIR-V] Ensure that we don't have a dangling BlockAddress constants after internal intrinsic 'spv_switch' is processed (#92390)
After internal intrinsic 'spv_switch' is processed we need to delete
G_BLOCK_ADDR instructions that were generated to keep track of the
corresponding basic blocks. If we just delete G_BLOCK_ADDR instructions
with BlockAddress operands, this leaves their BasicBlock counterparts in
a "address taken" status. This would make AsmPrinter to generate a
series of unneeded labels of a `"Address of block that was removed by
CodeGen"` kind. This PR is to ensure that we don't have a dangling
BlockAddress constants by zapping the BlockAddress nodes, and only after
that proceed with erasing G_BLOCK_ADDR instructions.

See also https://github.com/llvm/llvm-project/pull/87823 for more
details.
2024-05-17 11:43:02 +02:00
Vyacheslav Levytskyy
2ed8ff3bf8
[SPIR-V] Fix types of internal intrinsic functions and add a test case for __builtin_alloca() (#92265)
This PR generation of argument types of internal intrinsic functions
`spv_const_composite` and `spv_track_constant`, so that composite
constants of ConstantVector type preserve their correct type in
transformation passes and can be successfully used further by LLVM
intrinsic functions.

The added test case serves two purposes: it is to check the above
mentioned fix and to demonstrate that a call to __builtin_alloca() maps
to instructions from SPV_INTEL_variable_length_array when this extension
is available.
2024-05-17 11:42:37 +02:00
CarolineConcatto
c4bac7f7dc
[LLVM][AArch64]Use load/store with consecutive registers in SME2 or S… (#77665)
…VE2.1 for spill/fill

When possible the spill/fill register in Frame Lowering uses the ld/st
consecutive pairs available in sme or sve2.1.
2024-05-17 09:25:21 +01:00
Vyacheslav Levytskyy
37d00635c4
[SPIR-V] Ensure that internal intrinsic functions for PHI's operand are inserted at the correct positions (#92316)
This PR is to ensure that internal intrinsic functions for PHI's operand
are inserted at the correct positions and don't break rules of
instruction domination and PHI nodes grouping at top of basic block.
2024-05-17 09:01:29 +02:00
Phoebe Wang
bc9823cf60 [X86][BF16] Change MVT to EVT in combineFP_EXTEND
Fixes: #92471
2024-05-17 13:41:30 +08:00
Thorsten Schütt
9bffe79049
[GlobalIsel] Speedup select to integer min/max (#92378)
https://github.com/llvm/llvm-project/issues/92309
2024-05-17 07:32:18 +02:00
wanglei
bf1d417233
[LoongArch] Suppress the unnecessary extensions for arguments in makeLibCall
Reviewed By: SixWeining, heiher

Pull Request: https://github.com/llvm/llvm-project/pull/92376
2024-05-17 09:13:51 +08:00
wanglei
5a204a5f0a
[LoongArch] Use sign extend for i32 arguments in makeLibCall on LA64
The 32 bits arguments and returns on LA64 are always sign extended to
i64. So we should be taking this into account around libcalls.

Reviewed By: heiher, SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/92375
2024-05-17 09:13:05 +08:00
wanglei
96d2db4ba9
[LoongArch] Pre-commit test for lib call auguments extension
Reviewed By: SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/92374
2024-05-17 09:12:12 +08:00
James Y Knight
d6f9278ae9
[X86] Use plain load/store instead of cmpxchg16b for atomics with AVX (#74275)
In late 2021, both Intel and AMD finally documented that every
AVX-capable CPU has always been guaranteed to execute aligned 16-byte
loads/stores atomically, and further, guaranteed that all future CPUs
with AVX will do so as well.

Therefore, we may use normal SSE 128-bit load/store instructions to
implement atomics, if AVX is enabled.

Per AMD64 Architecture Programmer's manual, 7.3.2 Access Atomicity:

> Processors that report [AVX] extend the atomicity for cacheable,
> naturally-aligned single loads or stores from a quadword to a double
> quadword.

Per Intel's SDM:

> Processors that enumerate support for Intel(R) AVX guarantee that the
> 16-byte memory operations performed by the following instructions will
> always be carried out atomically:
> - MOVAPD, MOVAPS, and MOVDQA.
> - VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128.
> - VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded with
>   EVEX.128 and k0 (masking disabled).

This was also confirmed to be true for Zhaoxin CPUs with AVX, in
https://gcc.gnu.org/PR104688
2024-05-16 18:24:23 -04:00
Fangrui Song
997eae3673
[AsmPrinter] Increase upper bound for size in global structs
This is part of the fixes to address #57353

https://reviews.llvm.org/D133845

Pull Request: https://github.com/llvm/llvm-project/pull/92334
2024-05-16 14:41:19 -07:00
Eli Friedman
b28766eb3f
[Arm64EC] Correctly handle sret in entry thunks. (#92326)
I accidentally left out the code to transfer sret attributes to entry
thunks, so values weren't being passed in the right registers, and the
sret pointer wasn't returned in the correct register.

Fixes #90229
2024-05-16 09:15:17 -07:00
Simon Pilgrim
117d755b1b
[DAG] SimplifyDemandedBits - use ComputeKnownBits instead of getValidShiftAmountConstant to check for constant shift amounts. (#92412)
This allows us to handle cases where the constant has already been type legalized behind a bitcast

Despite calling ComputeKnownBits I'm not seeing any notable change in compile time.
2024-05-16 17:04:30 +01:00
Jacek Caban
93c02b7dc3
[CodeGen][ARM64EC] Use MCSymbolRefExpr::VK_None for function aliases. (#92100) 2024-05-16 15:47:39 +02:00
Jacek Caban
4a5dffc674 [CodeGen][ARM64EC][NFC] Add ARM64EC alias symbols test. (#92100) 2024-05-16 15:15:17 +02:00
Hassnaa Hamdi
f7392f40f3
[AArch64] Add intrinsics for bfloat16 min/max/minnm/maxnm (#90105)
According to specifications in
[ARM-software/acle/pull/309](https://github.com/ARM-software/acle/pull/309)
Add following intrinsics:

```
// svmax single,multi
svbfloat16x2_t svmax_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svmax_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svmax_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svmax_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```

```
// svmin single,multi
svbfloat16x2_t svmin_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svmin_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svmin_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svmin_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```

```
// svmaxnm single,multi
svbfloat16x2_t svmaxnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svmaxnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svmaxnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svmaxnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```

```
// svminnm single,multi
svbfloat16x2_t svminnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svminnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svminnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svminnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```
- Variations other than bfloat16 are already supported.
2024-05-16 13:56:02 +01:00
Simon Pilgrim
80fac30a09 [X86] rot32.ll - remove old shld check prefixes
This was missed in 8dbd745b09c9f65fefc2ffac14e8f7f288766861
2024-05-16 13:53:25 +01:00
Simon Pilgrim
311339e25c [DAG] SimplifyDemandedBits - ISD::AND - only request DemandedElts when looking for a splat constant
Limit the isConstOrConstSplat call to the vector elements we care about

Noticed while investigating regressions in #92096
2024-05-16 13:05:35 +01:00
wanglei
70608c24fa
[LoongArch] Refactor LoongArchABI::computeTargetABI
The previous logic did not consider whether the architectural features
meet the requirements of the ABI, resulting in the generation of
incorrect object files in some cases. For example:

```
llc -mtriple=loongarch64 -filetype=obj test/CodeGen/LoongArch/ir-instruction/fadd.ll -o t.o
llvm-readelf -h t.o
```
The object file indicates the ABI as lp64d, however, the generated code
is lp64s.

The new logic introduces the `feature-implied` ABI. When both target-abi
and triple-implied ABI are invalid, the feature-implied ABI is used.

Reviewed By: SixWeining, xen0n

Pull Request: https://github.com/llvm/llvm-project/pull/92223
2024-05-16 17:15:21 +08:00
wanglei
30410018d3
[LoongArch] Enable all -target-abi options
This is a pre-commit for modifying `computeTargetABI` logic.

This patch will provide warning prompts when using those ABIs that have
not yet been standardized.

Reviewed By: xen0n, SixWeining

Pull Request: https://github.com/llvm/llvm-project/pull/92222
2024-05-16 16:54:18 +08:00
Craig Topper
f2d74002fd
[LegalizeVectorOps][X86] Add ISD::ABDS/ABSDU to the list of opcodes handled by LegalizeVectorOps. (#92332)
The expand code is present, but we were missing the type query code
so the nodes would be ignored until LegalizeDAG.
2024-05-15 21:46:31 -07:00
Dhruv Chawla
1dd0d3cf40
[AArch64][GISel] Fold COPY(y:gpr, DUP(x:fpr, i)) -> UMOV(y:gpr, x:fpr, i) (#89017)
This patch adds a peephole to AArch64PostSelectOptimize for codegen
that is caused by RegBankSelect limiting G_EXTRACT_VECTOR_ELT
only to FPR registers in both the input and output registers. This can
cause a generation of COPY from FPR to GPR when, for example, the
output register of the G_EXTRACT_VECTOR_ELT is used in a branch
condition.

This was noticed when looking at codegen differences between SDAG and GI
for the s1279 kernel in the TSVC benchmark.
2024-05-16 08:08:06 +05:30
Amara Emerson
1daa7fd3fa [AArch64][SME] Remove Darwin compile error for ABI support routine calls.
These are allowed for Darwin and use the same ABI.
2024-05-15 14:47:17 -07:00
Nicolai Hähnle
ec1f28dc97
AMDGPU/gfx12: avoid crashing on legacy waitcnt intrinsics (#92306)
They *are* still accepted by the HW but have a conservative effect.

Leave them untouched since handling them would complicate the logic a
bit, and developers who code to such a low level really need to revisit
what they're doing anyway.
2024-05-15 22:23:18 +02:00
Alex Bradbury
891d687137
[RISCV] Gate unratified profiles behind -menable-experimental-extensions (#92167)
As discussed in the last sync-up call, because these profiles are not
yet finalised they shouldn't be exposed to users unless they opt-in to
them (much like experimental extensions). We may later want to add a
more specific flag, but reusing `-menable-experimental-extensions`
solves the immediate problem.

This is implemented using the new support for marking profiles s
experimental added in #91993 to move the unratified profiles to
RISCVExperimentalProfile and making the necessary changes to logic in
RISCVISAInfo to handle this.
2024-05-15 21:09:43 +01:00
Patrick O'Neill
4ab2ac22d0
[DAGCombiner] Mark vectors as not AllAddOne/AllSubOne on type mismatch (#92195)
Fixes #92193.
2024-05-15 12:39:28 -07:00
Luke Lau
9ae2177843 [RISCV] Handle undef AVLs in RISCVInsertVSETVLI
Before #91440 a VSETVLIInfo would have had an IMPLICIT_DEF defining
instruction, but now we look up a VNInfo which doesn't exist, which
triggers an assertion failure. Mark these undef AVLs as AVLIsIgnored.
2024-05-16 02:46:31 +08:00
Simon Pilgrim
e2d74a25eb
[X86] EmitCmp - always use cmpw with foldable loads (#92251)
By default, EmitCmp avoids cmpw with i16 immediates due to 66/67h length-changing prefixes causing stalls, instead extending the value to i32 and using a cmpl with a i32 immediate, unless it has the TuningFastImm16 flag or we're building for optsize/minsize.

However, if we're loading the value for comparison, the performance costs of the decode stalls are likely to be exceeded by the impact of the load latency of the folded load, the shorter encoding and not needing an extra register to store the ext-load.

This matches the behaviour of gcc and msvc.

Fixes #90355
2024-05-15 17:46:49 +01:00
Luke Lau
ff313ee70a
[RISCV] Remove hasSideEffects=1 for vsetvli pseudos (#91319)
In a similar vein to #90049, we currently model all of the effects of a
vsetvli pseudo:

* VL and VTYPE are marked as defs
* VL preserving x0,x0 vsetvlis doesn't get emitted until
RISCVInsertVSETVLI, and when they are they have implicit uses on VL
* Regular vector pseudos are fully modelled too: Before
RISCVInsertVSETVLI they can be moved between vsetvli pseudos because we
will eventually insert vsetvlis to correct VL and VTYPE. Afterwards,
they will have implicit uses on VL and VTYPE.

Since we model everything we can remove hasSideEffects=1. This gives us
some improvements like sinking in vsetvli-insert-crossbb.ll.

We need to update RISCVDeadRegisterDefinitions to keep handling vsetvli
pseudos since it only operates on instructions with unmodelled side
effects.
2024-05-15 23:37:31 +08:00
Phoebe Wang
b576a6b045
[X86][AMX] Fix a bug after #83628 (#91207)
We need to check if `GR64Cand` a valid register before using it.

Test is not needed since it's covered in llvm-test-suite.

Fixes #90954
2024-05-15 23:15:48 +08:00
Jay Foad
466d266945
[AMDGPU] Fix GFX90x check prefixes in tests (#92254) 2024-05-15 15:13:53 +01:00
Simon Pilgrim
f8395f8420 [X86] Cleanup check prefixes identified in #92248
Avoid using leading numbers in check prefixes - replace with actual triple config names.
2024-05-15 14:25:29 +01:00
Simon Pilgrim
3f07430c38 [X86] avoid-sfb-g-no-change.mir - cleanup check prefixes identified in #92248
Don't include "-LABEL" (or any other FileCheck modifier) in the core check prefix name
2024-05-15 14:23:46 +01:00
Simon Pilgrim
e26eacf771 [X86] prefetch.ll - cleanup check prefixes identified in #92248
Avoid using leading numbers in check prefixes - replace with actual triple config names (and makes it easier to add X64 test coverage in a future commit).
2024-05-15 14:11:24 +01:00
Simon Pilgrim
96ac2e3af7 [X86] cmpxchg-clobber-flags.ll - cleanup check prefixes identified in #92248
Avoid using numbers as check prefix - replace with actual triple config names
2024-05-15 14:11:24 +01:00
Simon Pilgrim
97418bb519 [X86] patchable functions - cleanup check prefixes identified in #92248
Avoid using numbers as check prefix - replace with actual triple config names
2024-05-15 14:11:23 +01:00
Simon Pilgrim
8987369465 [X86] sibcall - cleanup check prefixes identified in #92248
Avoid using numbers as check prefix - replace with actual triple config names
2024-05-15 13:49:39 +01:00
Jay Foad
1650f1b3d7
Fix typo "indicies" (#92232) 2024-05-15 13:10:16 +01:00
Paul Walker
7621a0d364
[LLVM][CodeGen][SVE] Improve custom lowering for EXTRACT_SUBVECTOR. (#90963)
We can extract any legal fixed length vector from a scalable vector by
using VECTOR_SPLICE.
2024-05-15 11:27:06 +01:00
Jonas Paulsson
d6ee7e8481
[SystemZ] Handle address clobbering in splitMove(). (#92105)
When expanding an L128 (which is used to reload i128) it is
possible that the quadword destination register clobbers an
address register. This patch adds an assertion against the case
where both of the expanded parts clobber the address, and in the
case where one of the expanded parts do so puts it last.

Fixes #91437
2024-05-15 08:36:26 +02:00
Luke Lau
77047e3cd2
[RISCV] Make vsetvli in test not loop invariant. NFC (#92094)
The middle end will remove the inner vsetvli otherwise, and it's more
typical to set the AVL to the remaining VL.

This also prevents the test from showing up as a regression in #91319
2024-05-15 12:32:26 +08:00
Luke Lau
1a58e88690
[RISCV] Move RISCVInsertVSETVLI to after phi elimination (#91440)
Split off from #70549, this patch moves RISCVInsertVSETVLI to after phi
elimination where we exit SSA and need to move to LiveVariables.

The motivation for splitting this off is to avoid the large scheduling
diffs from moving completely to after regalloc, and instead focus on
converting the pass to work on LiveIntervals.

The two main changes required are updating VSETVLIInfo to store VNInfos
instead of MachineInstrs, which allows us to still check for PHI defs in
needVSETVLIPHI, and fixing up the live intervals of any AVL operands
after inserting new instructions.

On O3 the pass is inserted after the register coalescer, otherwise we
end up with a bunch of COPYs around eliminated PHIs that trip up
needVSETVLIPHI.

Co-authored-by: Piyou Chen <piyou.chen@sifive.com>
2024-05-15 11:44:32 +08:00
Craig Topper
e417e61532 [RISCV][LegalizeTypes] Add additional test coverage for type promotion of VP_FSHL/FSHR. NFC
There's a special path when the promoted type has an element size
more than twice the size of the original type.
2024-05-14 16:25:07 -07:00