52796 Commits

Author SHA1 Message Date
Luke Lau
208edf7672 [RISCV] Fix assertion in lowerEXTRACT_SUBVECTOR
This fixes a crash when lowering an extract_subvector like:

t0:v1i64 = extract_subvector t1:v2i64, 1

Whilst we never need a vslidedown with M1 on scalable vector types, we might
need to do it for v1i64/v1f64, since the smallest container type for it is
nxv1i64/nxv1f64.

The lowering code is still correct for this case, but the assertion was too
strict. The actual invariant we're relying on is that ContainerSubVecVT's LMUL
<= M1, not < M1. Hence why we handled v2i32 fine, because its container type
was nxv1i32 and MF2.
2024-02-13 20:40:31 +08:00
Orlando Cazalet-Hyams
d860ea96b1
[HWASAN] Update dbg.assign intrinsics in HWAsan pass (#79864)
llvm.dbg.assign intrinsics have 2 {value, expression} pairs; fix hwasan to
update the second expression.

Fixes #76545. This is #78606 rebased and with the addition of DPValue handling.
Note the addition of --try-experimental-debuginfo-iterators in the tests and
some shuffling of code in MemoryTaggingSupport.cpp.
2024-02-13 09:11:09 +00:00
Nikita Popov
070848c17c
[AArch64][GISel] Don't pointlessly lower G_TRUNC (#81479)
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.

Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.

Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.

Fixes https://github.com/llvm/llvm-project/issues/81244.
2024-02-13 09:29:56 +01:00
Pierre van Houtryve
87d7711934
[AMDGPU][SIMemoryLegalizer] Fix order of GL0/1_INV on GFX10/11 (#81450)
Fixes SWDEV-443292
2024-02-13 09:07:51 +01:00
sstipanovic
785eddd7a7
[AMDGPU][GlobalIsel] Introduce isRegisterClassType to check for legal types, instead of checking bit width. (#68189)
In D151116 it was suggested to have a set of classes to cover every
possible case. This does it for bitcast first.

closes #79578
2024-02-13 08:26:10 +01:00
Austin Kerbow
4bcbeaed63
[AMDGPU] Enable kernel arg preloading with gfx90a (#81180)
Add a trap instruction to the beginning of the kernel prologue to handle
cases where preloading is attempted on HW loaded with incompatible
firmware.
2024-02-12 22:33:29 -08:00
Luke Lau
bb77047a3b
[RISCV] Handle fixed length vectors with exact VLEN in loweringEXTRACT_SUBVECTOR (#79949)
This is a revival of #65392. When we lower an extract_subvector, we
extract the
subregister that the subvector is contained in first and then do a
vslidedown
with LMUL=1. We can currently only do this for scalable vectors though
because
the index is scaled by vscale and thus we will know what subregister the
subvector lies in.

For fixed length vectors, the index isn't scaled by vscale and so the
subvector
could lie in any arbitrary subregister, so we have to do a vslidedown
with the
full LMUL.

The exception to this is when we know the exact VLEN: in which case, we
can
still work out the exact subregister and do the LMUL=1 vslidedown on it.

This patch handles this case by scaling the index by 1/vscale before
computing
the subregister, and extending the LMUL=1 path to handle fixed length
vectors.
2024-02-13 14:29:08 +08:00
Fangrui Song
78f2eb8d0f [test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64 2024-02-12 18:36:31 -08:00
Fangrui Song
3d18c8cd26 [test] Replace aarch64-*-{eabi,gnueabi}{,hf} with aarch64
Similar to d39b4ce3ce8a3c256e01bdec2b140777a332a633
Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and
warned by Clang Driver. We want to avoid them elsewhere as well. Just
use the common "aarch64" without other triple components.
2024-02-12 18:29:55 -08:00
Artem Belevich
61a0fc7947
[NVPTX] pass correct GPU arch to ptxas test (#81535) 2024-02-12 13:18:08 -08:00
Artem Belevich
8799d7143f
[NVPTX] Fix the error in a pattern match in v4i8 comparisons. (#81308)
The replacement should've had BFE() as the arguments for the comparison,
not the source register.

While at that, tighten the patterns a bit, and expand them to cover
variants with immediate arguments. Also change the default lowering of
bfe() to use unsigned variant, so the value of the upper bits is
predictable.
2024-02-12 12:59:03 -08:00
Craig Topper
1114ac4399
[RISCV] Remove stale comment from test. NFC (#81098)
The bug mentioned in the comment has been committed and did change the
cfi_offset.
2024-02-12 09:19:28 -08:00
Andrei Safronov
b5046a7fa9
[Xtensa] Initial codegen support from IR (#78548)
This PR provides implementation of the basic codegen infra such as
TargetFrameLowering, MCInstLower,
AsmPrinter, RegisterInfo, InstructionInfo, TargetLowering,
SelectionDAGISel.

Migrated from https://reviews.llvm.org/D145658
2024-02-12 17:41:59 +01:00
Nikita Popov
69ddf1eb4d [X86] Add test for #80911 (NFC) 2024-02-12 16:40:43 +01:00
Antonio Frighetto
8373ceef8f [CGP] Extend dupRetToEnableTailCallOpts to known intrinsics
Hint further tail call optimization opportunities when the examined
returned value is the return value of a known intrinsic or library
function, and it appears as first function argument.

Fixes: https://github.com/llvm/llvm-project/issues/75455.
2024-02-12 14:17:02 +01:00
Antonio Frighetto
d1c481d27d [CGP] Precommit tests for PR76613 (NFC) 2024-02-12 14:17:02 +01:00
Joseph Huber
2ac8e6b7f5
[NVPTX] Implement __builtin_readcyclecounter on NVPTX (#81344)
Summary:
This patch simply states that `__builtin_readcyclecounter` is legal on
NVPTX and makes it  return the value from the `clock64` sreg. The timer
intrinsics are marked as having side effects, which is desireable for
timing primitives and required to pattern match the instrinic DAG.
2024-02-12 07:07:48 -06:00
Serge Pavlov
213b0ae497
[GlobalISel][ARM] legalize G_FPENV_RESET for soft-float mode (#81456) 2024-02-12 17:46:59 +07:00
Vyacheslav Levytskyy
d153ef6a34
Add support for SPIR-V extension: SPV_INTEL_function_pointers (#80759)
This PR adds initial support for "SPV_INTEL_function_pointers" SPIR-V
extension:
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc

The goal of the extension is to support indirect function calls and
translation of function pointers into SPIR-V.
2024-02-12 11:22:48 +01:00
Pierre van Houtryve
f93aa5157a
[AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (#76955)
These generic targets include multiple GPUs and will, in the future,
provide a way to build once and run on multiple GPU, at the cost of less
optimization opportunities.

Note that this is just doing the compiler side of things, device libs an
runtimes/loader/etc. don't know about these targets yet, so none of them
actually work in practice right now. This is just the initial commit to
make LLVM aware of them.

This contains the documentation changes for both this change and #76954
as well.
2024-02-12 10:18:20 +01:00
Vyacheslav Levytskyy
b221b97336
Add support for SPIR-V extension: SPV_INTEL_subgroups (#81023)
The goal of this PR is to implement SPV_INTEL_subgroups extension in
SPIR-V Backend.
2024-02-12 10:05:21 +01:00
Pierre van Houtryve
1e36d92b70
[LowerMemIntrinsics] Avoid udiv/urem when type size is a power of 2 (#81238)
See #64620 - does not fix the issue but improves the generated code a
bit.
2024-02-12 10:01:22 +01:00
Nikita Popov
92d7992205
[AArch64] Only apply bool vector bitcast opt if result is scalar (#81256)
This optimization tries to optimize bitcasts from `<N x i1>` to iN, but
currently also triggers for `<N x i1>` to `<M x iK>` bitcasts, if custom
lowering has been requested for these for an unrelated reason. Fix this
by explicitly checking that the result type is scalar.

Fixes https://github.com/llvm/llvm-project/issues/81216.
2024-02-12 10:00:34 +01:00
Adrian Kuegel
da9559d69a
Do not use PerformEXTRACTCombine for v8i8 types (#81242)
Same as with v4i8 types, we should not be using PerformEXTRACTCombine
for v8i8 types.
2024-02-12 07:31:31 +01:00
David Green
b1771475da [AArch64][GlobalISel] Additional insert and extract GISel tests. NFC 2024-02-11 22:25:16 +00:00
Jon Roelofs
ffab5a089b
Add a test for the A16/A17 parts of eb1b428750181ea742c547db0bc7136cd5b8f732
There are a couple of open questions on what we should do for A14, so I'll
leave that off for now.

https://github.com/llvm/llvm-project/pull/81325#issuecomment-1937489565
2024-02-11 10:51:51 -08:00
Simon Pilgrim
b45de48be2
[MVE] Expand64BitShift - handle all constant shift amounts less than 32 (#81261)
Expand64BitShift was always dropping to generic shift legalization if the shift amount type was larger than i64, even if the constant shift amount was actually very small. I've adjusted the constant bounds checks to work with APInt types so we can always perform the comparison.

This results in the MVE long shift instructions being used more often, and it looks like this is preventing some additional combines from happening. This could be addressed in the future.

This came about while I was trying to extend the DAGTypeLegalizer::ExpandShift* helpers and need to move to consistently using the legal shift amount types instead of reusing the shift amount type from the original wider shift.
2024-02-11 15:02:27 +00:00
David Green
c3dfbb6f49
[AArch64][GlobalISel] Add commute_constant_to_rhs to post legalizer combiners (#81103)
This helps the fp reductions, moving the constant operands to the RHS
which in turn helps simplify away fadd -0.0 and fmul 1.0.
2024-02-11 11:20:11 +00:00
Koakuma
c2f9885a8a
[SPARC] Support reserving arbitrary general purpose registers (#74927)
This adds support for marking arbitrary general purpose registers -
except for those with special purpose (G0, I6-I7, O6-O7) - as reserved,
as needed by some software like the Linux kernel.
2024-02-11 02:04:18 -05:00
darkbuck
d0f4663f48
[GlobalISel][Mips] Global ISel for brcond
- Enable equivalent between `brcond` and `G_BRCOND`.
- Remove the manual selection of `G_BRCOND` in Mips. Revise test cases.

Reviewers: petar-avramovic, bcardosolopes, arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/81306
2024-02-10 21:44:05 -05:00
Ikhlas Ajbar
76e3759d8d
[Hexagon] Order objects on the stack by their alignments (#81280)
This patch sorts stack objects by their alignment value from the largest
to the smallest. If two objects have the same alignment, then they are
sorted by their size from the largest to the smallest. This minimizes
padding and reduces run time stack size.
2024-02-10 14:42:50 -06:00
Yeting Kuo
59037c0975
[RISCV] Add Zicfiss support to the shadow call stack implementation. (#68075)
This patch enable hardware shadow stack with `Zicifss` and
`mno-forced-sw-shadow-stack`. New feature forced-sw-shadow-stack
disables hardware shadow stack even when `Zicfiss` enabled.
2024-02-10 22:18:46 +08:00
Craig Topper
c08b90c50b [RISCV] Lower the TransientStackAlignment to the ABI alignment for rv32e/rv64e.
I don't think the transient alignment needs to be larger than the
ABI alignment.
2024-02-09 21:48:11 -08:00
Mikhail Gudim
7192c22ee4
[GlobalISel][RISCV] Use constant pool for large integer constants. (#81101)
We apply custom lowering to 64 bit constants where we use the same logic
as in non-global isel: if materializing in registers is too expensive,
we emit a load from constant pool. Later, during instruction selection,
constant pool address is generated using `selectAddr`.
2024-02-10 00:42:33 -05:00
Philipp Tomsich
fbba818a78
[AArch64] Add the Ampere1B core (#81297)
The Ampere1B is Ampere's third-generation core implementing a
superscalar, out-of-order microarchitecture with nested virtualization,
speculative side-channel mitigation and architectural support for
defense against ROP/JOP style software attacks.

Ampere1B is an ARMv8.7+ implementation, adding support for the FEAT
WFxT, FEAT CSSC, FEAT PAN3 and FEAT AFP extensions. It also includes all
features of the second-generation Ampere1A, such as the Memory Tagging
Extension and SM3/SM4 cryptography instructions.
2024-02-09 15:22:09 -08:00
choikwa
0b77b19292
[AMDGPU] Add test to show s_cselect generation from uniform select (#79384) 2024-02-09 14:10:04 -08:00
Joseph Huber
3c707310a3
[NVPTX] Add clang builtin for __nvvm_reflect intrinsic (#81277)
Summary:
Some recent support made usage of `__nvvm_reflect` more consistent. We
should expose it as a builtin rather than forcing users to externally
define the function.
2024-02-09 14:11:01 -06:00
Joseph Huber
bb180856ec [NVPTX][Fix] Update minimum CPU for NVPTX intrinsics test
Summary:
This test requires at least sm_30 to run, but that is still below the
minimum supported version of sm_52 currently. Just set this to sm_60 so
the tests pass in the future.
2024-02-09 14:05:40 -06:00
Craig Topper
7ad7db0d99 [RISCV] Fix typo in ABI name in test. NFC
ilp64->lp64.
2024-02-09 11:46:23 -08:00
Joseph Huber
07dc85ba0c
[NVVMReflect] Improve folding inside of the NVVMReflect pass (#81253)
Summary:
The previous patch did very simple folding that only worked for driectly
used branches. This patch improves this by traversing the use-def chain
to sipmlify every constant subexpression until it reaches a terminator
we can delete. The support should work for all expected cases now.
2024-02-09 13:39:03 -06:00
Philip Reames
5948d4de1d [RISCV] Add test coverage for buildvectors with long vslidedown sequences
In advance of an upcoming change.
2024-02-09 11:10:35 -08:00
Pranav Kant
2e4d2762b5
[X86][CodeGen] Emit float128 libcalls for math functions (#79611)
Make LLVM emit libcalls to proper float128 variants for float128 types.
2024-02-09 10:55:56 -08:00
Simon Pilgrim
9ba265636f [X86] ReplaceNodeResults - shrink i64 CTPOP to (shifted) CTPOP i32 if 32 or less active bits to avoid SSE2 codegen
32-bit targets perform i64 CTPOP as a v2i64 CTPOP - if we can perform this as a i32 CTPOP by shifting the source bits, then do so to avoid the gpr<->xmm

This also triggers on non-SSE2 capable targets, as can be seen with the minor codegen diffs in ctpop_shifted_mask16
2024-02-09 12:24:09 +00:00
Simon Pilgrim
047f8321f1 [X86] ctpop-mask.ll - add 32-bit with SSE2 test coverage
32-bit targets will try to use SSE2 <2 x i64> CTPOP expansion for i64 CTPOP
2024-02-09 12:24:09 +00:00
Jan Patrick Lehr
f661057865
Revert "[AMDGPU] Compiler should synthesize private buffer resource descriptor from flat_scratch_init" (#81234)
Reverts llvm/llvm-project#79586

This broke the AMDGPU OpenMP Offload buildbot.
The typical error message was that the GPU attempted to read beyong the
largest legal address.

Error message:
AMDGPU fatal error 1: Received error in queue 0x7f8363f22000:
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to
access memory beyond the largest legal address.
2024-02-09 09:57:38 +01:00
Diana Picus
bc6955f18c
[AMDGPU] Don't fix the scavenge slot at offset 0 (#79136)
At the moment, the emergency spill slot is a fixed object for entry
functions and chain functions, and a regular stack object otherwise.
This patch adopts the latter behaviour for entry/chain functions too. It
seems this was always the intention [1] and it will also save us a bit
of stack space in cases where the first stack object has a large
alignment.

[1]
34c8b835b1
2024-02-09 09:20:25 +01:00
DianQK
ccb46e8365
Reapply "[RegisterCoalescer] Clear instructions not recorded in ErasedInstrs but erased (#79820)"
This reverts commit 8316bf34ac21117f35bc8e6fafa2b3e7da75e1d5.
2024-02-09 15:58:48 +08:00
DianQK
8316bf34ac
Revert "[RegisterCoalescer] Clear instructions not recorded in ErasedInstrs but erased (#79820)"
This reverts commit 95b14da678f4670283240ef4cf60f3a39bed97b4.
2024-02-09 15:54:54 +08:00
Quentin Dian
95b14da678
[RegisterCoalescer] Clear instructions not recorded in ErasedInstrs but erased (#79820)
Fixes #79718. Fixes #71178.

The same instructions may exist in an iteration. We cannot immediately
delete instructions in `ErasedInstrs`.
2024-02-09 15:29:05 +08:00
Craig Topper
db88f30158 [RISCV] Add test for saving s10 with cm.push. NFC
If cm.push saves s10, it must also save s11 due to an encoding
limitation. We handle this in the code, but had no test for it.
2024-02-08 21:03:41 -08:00