52796 Commits

Author SHA1 Message Date
Samuel Tebbs
72a60e770c [AArch64][NFC] Use regexes in register class tests
Some MIR and IR tests include checks for register class IDs, which are
unnecessary since the register class name is also checked for and that
doesn't change when new classes are added. This patch replaces the
hard-coded register class ID checks with regexes so they don't have to
be updated every time a new class is added.
2024-02-29 11:46:07 +00:00
Matt Arsenault
6cfd3439d4
APFloat: Fix signed zero handling in minnum/maxnum (#83376)
Follow the 2019 rules and order -0 as less than +0 and +0 as greater
than -0. As currently defined this isn't required for the intrinsics,
but is a better QoI.

This will avoid the workaround in libc added by #83158
2024-02-29 16:51:33 +05:30
Simon Pilgrim
7ff3f9760d [X86] getFauxShuffleMask - handle insert_vector_elt(bitcast(extract_vector_elt(x))) shuffle patterns
If the bitcast is between types of equal scalar size (i.e. fp<->int bitcasts), then we can safely peek through them

Fixes #83289
2024-02-29 10:32:49 +00:00
Simon Pilgrim
30b63def50 [X86] Regenerate tests to add missing avx512 constant comments 2024-02-29 10:32:48 +00:00
Chen Zheng
3196005f6b [NFC][PowerPC] use script to regenerate the CHECK lines 2024-02-29 04:49:37 -05:00
Tomas Matheson
03420f570e Revert "[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)"
This reverts commit 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

Failing EXPENSIVE_CHECKS builds with "undefined physical register".
2024-02-29 09:48:29 +00:00
Craig Topper
95aab69c10
[RISCV] Remove experimental from Zacas. (#83195)
Document that we don't use the double compare and swap instructions due
to ABI concerns.
2024-02-28 21:46:58 -08:00
Dávid Ferenc Szabó
71c06bbb25
[GlobalISel] Combine (X == 0) & (Y == 0) -> (X | Y) == 0 (#71949)
Also combine (X != 0) | (Y != 0) -> (X | Y) != 0
2024-02-29 10:58:17 +05:30
Yeting Kuo
14d8c4563e
[RISCV] Add more intrinsics into canSplatOperand. (#83106)
This patch adds smin/smax/umin/umax/sadd_sat/ssub_sat/uadd_sat/usub_sat
into canSplatOperand. It can help llvm fold vv instructions with one
splat operand to vx instructions.
2024-02-29 12:57:34 +08:00
Shilei Tian
191fd2d9db
[NFC][AMDGPU] Move the rem tests in div_i128.ll into rem_i128.ll (#83307) 2024-02-28 18:47:02 -05:00
David Green
b339c88120 [AArch64] Add some base aes intrinsic tests. NFC
Including commutative tests.
2024-02-28 20:31:26 +00:00
Simon Pilgrim
b4bc19e2e6 [X86] Add tests showing failure to demand only the sign bit of a sitofp/uitofp node
sitofp - if we only demand the signbit, then we can try to use the source integer
uitofp - signbit is guaranteed to be zero

Noticed while reviewing #82290
2024-02-28 18:49:24 +00:00
SivanShani-Arm
634b0243b8
[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)
T1 allows for an optional registers list, the register list must be {d0-d15}.
T2 defines a mandatory register list, the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-02-28 17:02:51 +00:00
Lukacma
26402777eb
[AArch64] Optimized generated assembly for bool to svbool_t conversions (#83001)
In certain cases Legalizer was generating `AND(WHILELO, SPLAT 1)` instruction pattern, when `WHILELO` would be sufficient.
2024-02-28 16:45:39 +00:00
Petar Avramovic
3e35ba53e2
AMDGPU/GFX12: Insert waitcnts before stores with scope_sys (#82996)
Insert waitcnts for loads and atomics before stores with system scope.
Scope is field in instruction encoding and corresponds to desired
coherence level in cache hierarchy.
Intrinsic stores can set scope in cache policy operand.
If volatile keyword is used on generic stores memory legalizer will set
scope to system. Generic stores, by default, get lowest scope level.
Waitcnts are not required if it is guaranteed that memory is cached.
For example vulkan shaders can guarantee this.
TODO: implement flag for frontends to give us a hint not to insert
waits.
Expecting vulkan flag to be implemented as vulkan:private MMRA.
2024-02-28 16:18:04 +01:00
chuongg3
8e51b22ce2
[AArch64][GlobalISel] Legalize G_LOAD for v4s8 Vector (#82989)
Lowers `v4s8 = G_LOAD %ptr ptr` into

`s32 = G_LOAD %ptr ptr`
`v4s8 = G_BITCAST s32`
2024-02-28 13:55:27 +00:00
Valery Pykhtin
a845ea3878
[AMDGPU] Fix SDWA 'preserve' transformation for instructions in different basic blocks. (#82406)
This fixes crash when operand sources for V_OR instruction reside in
different basic blocks.
2024-02-28 14:47:33 +01:00
AtariDreams
0a54b36d5e
[X86] Resolve FIXME: Create cld only when needed (#82415)
Only use cld when we also have rep instructions, are calling a function, or contain inline asm.
2024-02-28 12:32:58 +00:00
Simon Pilgrim
6287b7b9e9 [X86] combineEXTRACT_SUBVECTOR - extract 256-bit comparisons if only one subvector is required
If only one subvector extraction will be necessary (i.e. because the other is constant etc.) then extract the source operands and perform as a 128-bit comparison

Ideally DAGCombiner's narrowExtractedVectorBinOp would handle this but its tricky to confirm when a target opcode can be safely extracted and performed as a different vector type

Partially improves an outstanding regression in #82290
2024-02-28 12:24:34 +00:00
Simon Pilgrim
37daff028f [X86] setcc-lowering.ll - regenerate with AVX2 test coverage
Added while triaging a regression from #82290
2024-02-28 11:19:48 +00:00
Tuan Chuong Goh
fd336c33b6 [AArch64][GlobalISel] Pre-Commit Test for Legalize G_LOAD v4i8 (#82989) 2024-02-28 10:57:10 +00:00
Sander de Smalen
41427b0e8e
[AArch64] Disable FastISel/GlobalISel for ZT0 state (#82768)
For __arm_new("zt0") we need to have special setup code in the prologue.
For calls that don't preserve zt0, we need to emit code preserve ZT0
around the call.
This is only emitted by SelectionDAG ISel at the moment.
2024-02-28 10:42:16 +00:00
chuongg3
686ec7c2e9
[AArch64][GlobalISel] Legalize G_STORE for v4s8 vector (#82498)
Lowers `G_STORE v4s8, ptr` into

`s32 = G_BITCAST v4s8`
`G_STORE s32, ptr`
2024-02-28 10:26:41 +00:00
Tuan Chuong Goh
ba692301f1 [AArch64][GlobalISel] Pre-Commit Test for G_STORE v4s8 (#82498) 2024-02-28 09:52:08 +00:00
David Green
6e41d60a71
[SelectionDAG] Change computeAliasing signature from optional<uint64> to LocationSize. (#83017)
This is another smaller step of #70452, changing the signature of
computeAliasing() from optional<uint64_t> to LocationSize, and follow-up
changes in DAGCombiner::mayAlias(). There are some test change due to
the previous AA->isNoAlias call incorrectly using an unknown size
(~UINT64_T(0)). This should then be improved again in #70452 when the
types are known to be scalable.
2024-02-28 09:43:05 +00:00
Luke Lau
9617da88ab
[RISCV] Use a ta vslideup if inserting over end of InterSubVT (#83230)
The description in #83146 is slightly inaccurate: it relaxes a tail
undisturbed vslideup to tail agnostic if we are inserting over the
entire tail of the vector **and** we didn't shrink the LMUL of the
vector being inserted into.

This handles the case where we did shrink down the LMUL via InterSubVT
by checking if we inserted over the entire tail of InterSubVT, the
actual type that we're performing the vslideup on, not VecVT.
2024-02-28 15:58:55 +08:00
Luke Lau
28c29fbec3 [RISCV] Add exact VLEN RUNs for insert_subvector and concat_vector tests. NFC
Also update the RUNs in the extract_subvector tests to be consistent.
Using the term VLS/VLA here as it's more succinct than KNOWNVLEN/UNKNOWNVLEN.
2024-02-28 14:44:42 +08:00
Luke Lau
91d23370cd
[RISCV] Use a tail agnostic vslideup if possible for scalable insert_subvector (#83146)
If we know that an insert_subvector inserting a fixed subvector will
overwrite the entire tail of the vector, we use a tail agnostic
vslideup. This was added in https://reviews.llvm.org/D147347, but we can
do the same thing for scalable vectors too.

The `Policy` variable is defined in a slightly weird place but this is
to mirror the fixed length subvector code path as closely as possible. I
think we may be able to deduplicate them in future.
2024-02-28 10:26:54 +08:00
Jeffrey Byrnes
cf1c97b2d2
[AMDGPU] Do not attempt to fallback to default mutations (#83208)
IGLP itself will be in SavedMutations via mutations added during
Scheduler creation, thus falling back results in reapplying IGLP.

In PostRA scheduling, if we have multiple regions with IGLP
instructions, then we may have infinite loop.

Disable the feature for now.
2024-02-27 18:04:59 -08:00
Heejin Ahn
8506a63bf7 Revert "[WebAssembly] Disable multivalue emission temporarily (#82714)"
This reverts commit 6e6bf9f81756ba6655b4eea8dc45469a47f89b39.

It turned out the multivalue feature had active outside users and it
could cause some disruptions to them, so I'd like to investigate more
about the workarounds before doing this.
2024-02-28 01:02:39 +00:00
Heejin Ahn
d4cdb516ee
[WebAssembly] Add RefTypeMem2Local pass (#81965)
This adds `WebAssemblyRefTypeMem2Local` pass, which changes the address
spaces of reference type `alloca`s to `addrspace(1)`. This in turn
changes the address spaces of all `load` and `store` instructions that
use the `alloca`s.

`addrspace(1)` is `WASM_ADDRESS_SPACE_VAR`, and loads and stores to this
address space become `local.get`s and `local.set`s, thanks to the Wasm
local IR support added in

82f92e35c6.

In a follow-up PR, I am planning to replace the usage of mem2reg pass
with this to solve the reference type `alloca` problems described in
#81575.
2024-02-27 14:00:43 -08:00
David Green
f42e321b9f
[AArch64] Use FMOVDr for clearing upper bits (#83107)
This adds some tablegen patterns for generating FMOVDr from concat(X,
zeroes), as the FMOV will implicitly zero the upper bits of the
register. An extra AArch64MIPeepholeOpt is needed to make sure we can
remove the FMOV in the same way we would remove the insert code.
2024-02-27 19:45:43 +00:00
Sumanth Gundapaneni
f44c3facca
Revert "[Hexagon] Optimize post-increment load and stores in loops. (… (#83151)
…#82418)"

This reverts commit d62ca8def395ac165f253fdde1d93725394a4d53.
2024-02-27 12:50:22 -06:00
Billy Laws
abc693fb40
[AArch64] Skip over shadow space for ARM64EC entry thunk variadic calls (#80994)
When in an entry thunk the x64 SP is passed in x4 but this cannot be
directly passed through since x64 varargs calls have a 32 byte shadow
store at SP followed by the in-stack parameters. ARM64EC varargs calls
on the other hand expect x4 to point to the first in-stack parameter.
2024-02-27 10:32:15 -08:00
Michael Maitland
9106b58ce4 [CodeGen][MISched] Add misched post-regalloc bottom-up scheduling
There is the possibility that the bottom-up direction will lead to
performance improvements on certain targets, as this is certainly the case for
the pre-regalloc GenericScheduler. This patch will give people the
opportunity to experiment for their sub-targets. However, this patch
keeps the top-down approach as the default for the PostGenericScheduler
since that is what subtargets expect today.
2024-02-27 09:56:28 -08:00
choikwa
04db60d150
[AMDGPU] Prevent hang in SIFoldOperands by caching uses (#82099)
foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop
as the method can modify the operand order, which messes up the range-based
for loop. This patch fixes the issue by caching the uses for processing beforehand,
and then iterating over the cache rather using the instruction iterator.
2024-02-27 09:13:59 -06:00
Simon Pilgrim
13c359aa9b
[X86] ReplaceNodeResults - truncate sub-128-bit vectors as shuffles directly (#83120)
We were scalarizing these truncations, but in most cases we can widen the source vector to 128-bits and perform the truncation as a shuffle directly (which will usually lower as a PACK or PSHUFB).

For the cases where the widening and shuffle isn't legal we can leave it to generic legalization to scalarize for us.

Fixes #81883
2024-02-27 15:03:42 +00:00
Paul Walker
900bea9b1c [LLVM][test] Convert remaining instances of ConstantExpr based splats to use splat().
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
2024-02-27 13:37:23 +00:00
Paul Walker
dbb65dd330 [LLVM][tests/CodeGen/RISCV] Convert instances of ConstantExpr based splats to use splat().
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
2024-02-27 13:37:23 +00:00
Paul Walker
d6ff986dd2 [LLVM][tests/CodeGen/AArch64] Convert instances of ConstantExpr based splats to use splat().
This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
2024-02-27 13:37:23 +00:00
Matt Arsenault
ca66f7469f AMDGPU: Merge tests for llvm.amdgcn.dispatch.id 2024-02-27 18:42:40 +05:30
Matt Arsenault
2e4643a53e AMDGPU: Regenerate baseline test checks 2024-02-27 18:42:40 +05:30
michaelselehov
56ad6d1939
[MachineLICM] Hoist COPY instruction only when user can be hoisted (#81735)
befa925acac8fd6a9266e introduced preliminary hoisting of COPY
instructions when the user of the COPY is inside the same loop. That
optimization appeared to be too aggressive and hoisted too many COPY's
greatly increasing register pressure causing performance regressions for
AMDGPU target.

This is intended to fix the regression by hoisting COPY instruction only
if either:
 - User of COPY can be hoisted (other args are invariant) 
 or
 - Hoisting COPY doesn't bring high register pressure
2024-02-27 12:31:29 +00:00
Dhruv Chawla (work)
2c9b6c1b36
[AArch64][GlobalISel] Improve codegen for G_VECREDUCE_{SMIN,SMAX,UMIN,UMAX} for odd-sized vectors (#82740)
i8 vectors do not have their sizes changed as I noticed regressions in
some tests when that was done.

This patch also adds support for most G_VECREDUCE_* operations to
moreElementsVector in LegalizerHelper.cpp.

The code for getting the "neutral" element is taken almost exactly as it
is in SelectionDAG, with the exception that support for
G_VECREDUCE_{FMAXIMUM,FMINIMUM} was not added.

The code for SelectionDAG is located at
SelectionDAG::getNeutralELement().
2024-02-27 15:57:46 +05:30
Vyacheslav Levytskyy
ada70f50a5
[SPIR-V]: add SPIR-V extension: SPV_INTEL_variable_length_array (#83002)
This PR adds SPIR-V extension SPV_INTEL_variable_length_array that
allows to allocate local arrays whose number of elements is unknown at
compile time:
* add a new SPIR-V internal intrinsic:int_spv_alloca_array
* legalize G_STACKSAVE and G_STACKRESTORE
* implement allocation of arrays (previously getArraySize() of
AllocaInst was not used)
* add tests
2024-02-27 10:58:45 +01:00
Vyacheslav Levytskyy
9796b0e9f9
Add support for the 'freeze' instruction (#82979)
This PR is to add support for the 'freeze' instruction:
https://llvm.org/docs/LangRef.html#freeze-instruction

There is no way to implement `freeze` correctly without support on
SPIR-V standard side, but we may at least address a simple (static) case
when undef/poison value presence is obvious. The main benefit of even
incomplete `freeze` support is preventing of translation from crashing
due to lack of support on legalization and instruction selection steps.
2024-02-27 10:58:04 +01:00
leecheechen
d7c80bba69
[llvm][LoongArch] Improve loongarch_lasx_xvpermi_q instrinsic (#82984)
For instruction xvpermi.q, only [1:0] and [5:4] bits of operands[3] are
used. The unused bits in operands[3] need to be set to 0 to avoid
causing undefined behavior.
2024-02-27 15:38:11 +08:00
YunQiang Su
c88beb4112
MIPS: Fix asm constraints "f" and "r" for softfloat (#79116)
This include 2 fixes:
        1. Disallow 'f' for softfloat.
        2. Allow 'r' for softfloat.

Currently, 'f' is accpeted by clang, then LLVM meets an internal error.

'r' is rejected by LLVM by: couldn't allocate input reg for constraint
'r'.

Fixes: #64241, #63632

---------

Co-authored-by: Fangrui Song <i@maskray.me>
2024-02-26 22:08:36 -08:00
Matt Arsenault
e7900e695e AMDGPU: Regenerate baseline mir tests 2024-02-27 10:44:53 +05:30
Craig Topper
62d0c01c2c
[SelectionDAG] Remove pointer from MMO for VP strided load/store. (#82667)
MachineIR alias analysis assumes that only bytes after the pointer will
be accessed. This is incorrect if the stride is negative.

This is causing miscompiles in our downstream after SLP started making
strided loads.

Fixes #82657
2024-02-26 16:15:34 -08:00