52796 Commits

Author SHA1 Message Date
Simon Pilgrim
cda2b01df7 [X86] combineUIntToFP - fold vXiY -> vXf16 using SINT_TO_FP(ZEXT())
AVX512 targets can just as easily use UINT_TO_FP/SINT_TO_FP, but pre-AVX512 only have SINT_TO_FP instructions
2023-11-15 11:51:38 +00:00
chuongg3
692fbd6c00
[AArch64][GlobalISel] Support udot lowering for vecreduce add (#70784)
vecreduce_add(mul(ext, ext)) -> vecreduce_add(udot) 
vecreduce_add(ext) -> vecreduce_add(ext)

Vectors of scalar size of 8-bits with element count of multiples of 8
2023-11-15 11:41:46 +00:00
Jay Foad
0b2c3c66e2 [AMDGPU] Add test case for issue #71685
The bug was fixed by #71710.
2023-11-15 11:23:03 +00:00
Jay Foad
1e8c17e9c7
[AMDGPU] Allow folding to FMAMK with SGPR and immediate operand on GFX10+ (#72258)
Allow foldImmediate to create instructions like:

  v_fmamk_f32 v0, s0, 0x42000000, v0

This instruction has two "scalar values": s0 and 0x42000000. On GFX10+
this is allowed. This fold was originally implemented before the
compiler supported GFX10, when all ASICs were limited to one scalar
value.
2023-11-15 10:58:00 +00:00
Momchil Velikov
dedf2c6bb5
[AArch64] Refactor allocation of locals and stack realignment (#72028)
Factor out some stack allocation in a separate function. This patch
splits out the generic portion of a larger refactoring done as a part of
stack clash protection support.

The patch is almost, but not quite NFC. The only difference should
be that where we have adjacent allocation of stack space
for local SVE objects and non-local SVE objects the order
of `sub sp, ...` and `addvl sp, ...` instructions is reversed, because now
it's done with a single call to `emitFrameOffset` and it happens
add/subtract the fixed part before the scalable part, e.g.

    addvl sp, sp, #-2
    sub sp, sp, #16, lsl #12
    sub sp, sp, #16

becomes

    sub sp, sp, #16, lsl #12
    sub sp, sp, #16
    addvl sp, sp, #-2
2023-11-15 09:27:01 +00:00
Qiu Chaofan
426ad99bb2
[PowerPC] Forbid f128 SELECT_CC optimized into fsel (#71497) 2023-11-15 12:20:06 +08:00
Craig Topper
c44ac52e7d [RISC][GISel] Consider ABI copies when picking register bank for G_LOAD/STORE.
This is partially based on AArch64, but reduced to handle just the case
we currently have a test for.
2023-11-14 18:57:08 -08:00
Michael Maitland
a4f77f1ca3
[RISCV][GISEL] Use MO_PLT when Callee is a Global or Symbol (#71982)
SelectionDAG does the same thing in 74c83649547c2
2023-11-14 18:55:39 -05:00
Karthika Devi C
6726c99f88
[AArch64] Fix tryMergeAdjacentSTG function in PrologEpilog pass (#68873)
The tryMergeAdjacentSTG function tries to merge multiple
stg/st2g/stg_loop instructions. It doesn't verify the liveness of NZCV
flag before moving around STGloop which also alters NZCV flags. This was
not issue before the patch 5e612bc as these stack tag stores does not
alter the NZCV flags. But after the change, this merge function leads to
miscompilation because of control flow change in instructions. Added the
check to to see if the first instruction after insert point reads or
writes to NZCV flag and it's liveout state. This check happens after the
filling of merge list just before merge and bails out if necessary.
2023-11-14 14:43:33 -08:00
Christudasan Devadasan
8f7e9f3793 [AMDGPU] Precommit lit test for #72140. 2023-11-15 02:18:16 +05:30
Changpeng Fang
011c9eeb9a
GlobalISel: Guard return in llvm::getIConstantSplatVal (#71989)
getIConstantVRegValWithLookThrough could return NULL.
2023-11-14 12:23:54 -08:00
AdityaK
b7669ed95f
Fix error message when regalloc eviction advisor analysis could not be created (#72165) 2023-11-14 11:17:17 -08:00
Michael Maitland
a7bbcc4690
[RISCV][GISEL] Add support for lowerFormalArguments that contain scalable vector types (#70882)
Scalable vector types from LLVM IR can be lowered to scalable vector
types in MIR according to the RISCVAssignFn.
2023-11-14 13:15:41 -05:00
Acim Maravic
01c1c7a19e
[AMDGPU][CodeGen] Update support (soffset + offset) s_buffer_load's (#68302)
getBaseWithConstantOffset() is used for scalar and non-scalar buffer
loads. Diffrence between s_load and load instruction is that s_load
instruction extends 32-bit offset to 64-bits, so a 32-bit (address +
offset) should not cause unsigned 32-bit integer wraparound, because it
performs addition in 64-bits.
2023-11-14 19:06:45 +01:00
Acim-Maravic
f3138524db
[AMDGPU] Generic lowering for rint and nearbyint (#69596)
The are three different rounding intrinsics, that are brought down to
same instruction.

Co-authored-by: Acim Maravic <acim.maravic@amd.com>
2023-11-14 18:49:21 +01:00
Qiongsi Wu
c8b11091e8
[SelectionDAG] Handling Oversized Alloca Types under 32 bit Mode to Avoid Code Generator Crash (#71472)
Situations may arise leading to negative `NumElements` argument of an
`alloca` instruction. In this case the `NumElements` is treated as a
large unsigned value. Such large arrays may cause the size constant to
overflow during code generation under 32 bit mode, leading to a crash.
This PR limits the constant's bit width to the width of the pointer on
the target. With this fix,
```
alloca i32, i32 -1
```
and
```
alloca [4294967295 x i32], i32 1
```
generates the exact same PowerPC assembly code under 32 bit mode.
2023-11-14 10:52:51 -05:00
Momchil Velikov
33374c445d
[CFIFixup] Allow function prologues to span more than one basic block (#68984)
The CFIFixup pass assumes a function prologue is contained in a single
basic block. This assumption is broken with upcoming support for stack
probing (`-fstack-clash-protection`) in AArch64 - the emitted probing
sequence in a prologue may contain loops, i.e. more than one basic
block. The generated CFG is not arbitrary though:
 * CFI instructions are outside of any loops
* for any two CFI instructions of the function prologue one dominates
and is post-dominated by the other

Thus, for the prologue CFI instructions, if one is executed then all are
executed, there is a total order of executions, and the last instruction
in that order can be considered the end of the prologoue for the purpose
of inserting the initial `.cfi_remember_state` directive.

That last instruction is found by finding the first block in the
post-order traversal which contains prologue CFI instructions.
2023-11-14 15:02:03 +00:00
David Sherwood
bdc0afc871
[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets (#71166)
There are some workloads that are negatively impacted by using jump
tables when the number of entries is small. The SPEC2017 perlbench
benchmark is one example of this, where increasing the threshold to
around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum
threshold based on empirical evidence rather than science, and just
manually increased the threshold until I got the best performance
without impacting other workloads. For neoverse-v1 I saw around ~0.2%
improvement in the SPEC2017 integer geomean, and no overall change for
neoverse-n1. If we find issues with this threshold later on we can
always revisit this.

The most significant SPEC2017 score changes on neoverse-v1 were:

500.perlbench_r: +1.6%
520.omnetpp_r: +0.6%

and the rest saw changes < 0.5%.

I updated CodeGen/AArch64/min-jump-table.ll to reflect the new
threshold. For most of the affected tests I manually set the min number
of entries back to 4 on the RUN line because the tests seem to rely upon
this behaviour.
2023-11-14 13:00:28 +00:00
Simon Pilgrim
074e4ae0e7
[DAG] foldABSToABD - support abs(*ext(x) - *ext(y)) -> zext(abd*(x, y)) from different extension source types (#71670)
We currently limit the fold to cases where we're extending from the same source type, but we can safely perform this using the wider of mismatching source types (we're really just interested in having extension bits on both sources), ensuring we don't create additional extensions/truncations.
2023-11-14 12:56:42 +00:00
Simon Pilgrim
668454183a [X86] Regenerate expand-vp-int-intrinsics.ll
Add missing X86 checks
2023-11-14 12:48:52 +00:00
Diana
eb3c02fdc2
[AMDGPU] Use immediates for stack accesses in chain funcs (#71913)
Switch to using immediate offsets instead of the SP register to access
objects on the current stack frame in chain functions. This means we no
longer need to reserve a SP register just for accesing stack objects and
it also allows us to set the SP (when one is actually needed) to the
stack size from the very beginning.

This only works if we use a FixedObject for the ScavengeFI, which is
what we do for entry functions anyway (and we generally want to keep
chain functions close to amdgpu_cs behaviour where we don't have a good
reason to diverge).
2023-11-14 13:17:46 +01:00
Matthew Devereau
cc1244980b
[AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (#71795)
Adds the builtins:
void svldr_zt(uint64_t zt, const void *rn)
void svstr_zt(uint64_t zt, void *rn)

And the intrinsics:
call void @llvm.aarch64.sme.ldr.zt(i32, ptr)
tail call void @llvm.aarch64.sme.str.zt(i32, ptr)

Patch by: Kerry McLaughlin <kerry.mclaughlin@arm.com>
2023-11-14 11:27:41 +00:00
Momchil Velikov
65eaec82c0
[CFIFixup] Precommit test ahead of multi-block prologues support (#72033)
Precommit test for https://github.com/llvm/llvm-project/pull/68984
2023-11-14 10:45:28 +00:00
Nikita Popov
56c1d30183
[IR] Remove support for lshr/ashr constant expressions (#71955)
Remove support for the lshr and ashr constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-14 09:25:14 +01:00
Kai Luo
acdf7c8f27 [PowerPC] Precommit test to show impact of early-ifcvt on target without isel. NFC. 2023-11-14 06:10:05 +00:00
Craig Topper
028ed6125f
[RISCV][GISel] Support G_UMIN/UMAX/SMIN/SMAX legal with Zbb. (#72182) 2023-11-13 20:57:38 -08:00
Craig Topper
0a459dd4e9 [RISCV] Add tests for selecting G_BRCOND+G_ICMP. NFC
These should have been part of e0e0891d741588684b0803d7724e5080f9c75537
2023-11-13 15:29:34 -08:00
Craig Topper
29d75cb9dc [RISCV][GISel] Update legalize-smin.mir and legalize-smax.mir to test G_SMIN/G_SMAX.
Looks like an incomplete fixup was done after copying the umin/umax tests.
2023-11-13 13:49:14 -08:00
Craig Topper
915e092400 [RISCV] Select zext as sext when sign bit is 0 for -riscv-experimental-rv64-legal-i32
In our default SelectionDAG where i32 isn't legal, the zext will become
and i64 AND and often get optimized out on its own. With i32 legal, we
need to turn it in into sext.w and rely on RISCVOptWInstrs to remove it.
2023-11-13 12:21:36 -08:00
Craig Topper
05300222ba
[RISCV][GISel] Add really basic support for FP regbank selection for G_LOAD/G_STORE. (#70896)
Coerce the register bank based on the users of the G_LOAD or the
defining instruction for the G_STORE.

s64 on rv32 is handled by forcing the FPRB register bank.
2023-11-13 12:12:16 -08:00
Craig Topper
70ce047f7e
[RISCV] Legalize G_CTLZ/G_CTLZ_ZERO_UNDEF/G_CTTZ/G_CTTZ_ZERO_UNDEF. (#72014)
The base ISA does not support these operations. A future patch will
enable them for Zbb.
2023-11-13 11:22:48 -08:00
Tom Stellard
877226f01f
[X86] Simplify regex in pr42616.ll test (#71980) 2023-11-13 11:05:42 -08:00
Craig Topper
d8576e4542 [RISCV][GISel] Update RV64 legalize-ctpop.mir to account for constant shift amounts being i64 now.
This changed while the ctpop patch was in review and I forgot to update it.
2023-11-13 10:38:36 -08:00
Craig Topper
90dd4c470f
[RISCV][GISel] Legalize G_CTPOP. (#72005)
The base ISA does not have an instruction for this so we need to lower.
Zbb support will come in a future patch.
2023-11-13 10:26:32 -08:00
Felipe de Azevedo Piovezan
83729e6716
[SelectionDAG] Disable FastISel for swiftasync functions (#70741)
Most (x86) swiftasync functions tend to use both SelectionDAGISel and
FastISel lowering:
* FastISel argument lowering can only handle C calling convention.
* FastISel fails mid-BB in a number of ways, including in simple `ret
void` instructions under certain circumstances.

This dance of SelectionDAG (argument) -> FastISel (some instructions) ->
SelectionDAG(remaining instructions) is lossy; in particular, Argument
information lowering is cleared after that first SelectionDAG run.

Since swiftasync functions rely heavily on proper Argument lowering for
debug information, this patch disables the use of FastISel in such
functions.
2023-11-13 08:27:29 -08:00
Yingwei Zheng
d64d5ea102
[RISCV][CodeGenPrepare] Remove duplicated transform for zext. NFC. (#72053)
After #71534 and #72052, the transform `zext -> zext nneg` in
`RISCVCodeGenPrepare` is redundant.
2023-11-13 22:45:33 +08:00
David Green
2238363a5f
[AArch64] Prevent v1f16 vselect/setcc type expansion. (#72048)
PR #71614 identified an issue in the lowering of v1f16 vector compares,
where the `v1i1 setcc` is expanded to `v1i16 setcc`, and the `v1i16
setcc` tries to be expanded to a `v2i16 setcc` which fails. For floating
point types we can let them scalarize instead though, generating a
`setcc f16` that can be lowered using normal fp16 lowering.

07a8ff4892b2a54f0bd5843f863bcffa7a258f1f added a special case combine
for v1 vselect to expand the predicate type to the same size as the fcmp
operands. This turns that off for float types, allowing them to
scalarize naturally, which hopefully fixes the issue by preventing the
v1i16 setcc, meaning it wont try to widen to larger vectors.

The codegen might not be optimal, but as far as I can tell everything
generated successfully, providing that no `v1i16 setcc v1f16`
instructions get generated.
2023-11-13 14:42:52 +00:00
Jay Foad
a4196666ac
[AMDGPU] Revert "Preliminary patch for divergence driven instruction selection. Operands Folding 1." (#71710)
This reverts commit 201f892b3b597f24287ab6a712a286e25a45a7d9.
2023-11-13 13:53:10 +00:00
Simon Pilgrim
1a9fbf6166 [X86] combineLoad - reuse an existing VBROADCAST_LOAD constant for a smaller vector load of the same constant
Extends the existing code that performed something similar for SUBV_BROADCAST_LOAD, but this is just for cases where AVX2 targets loads full width 128-bit constant vectors but broadcasts the equivalent 256-bit constant vector

Fixes AVX2 case for Issue #70947
2023-11-13 11:59:04 +00:00
Jay Foad
47f29043f0 [AMDGPU] Fix a GlobalISel RUN line
This was added in D149795 without actually enabling GlobalISel.
2023-11-13 11:30:15 +00:00
Nemanja Ivanovic
563720c3be
[RISCV] Fix lowering of negative zero with Zdinx 32-bit (#71869)
The compiler currently abends with an impossible reg-to-reg copy when
producing a negative zero FP immediate on RV32 with -Zdinx. This is
because we emit a negation that uses FP registers. Emit the right node
to produce correct code.
2023-11-13 07:38:14 +01:00
Craig Topper
44e8bea400
[GISel][AArch64] Notify the Observer when CTTZ lowering changes the opcode to CTPOP. (#72008) 2023-11-12 19:36:24 -08:00
Carl Ritson
edc38a6cbd
[AMDGPU] Add option to pre-allocate SGPR spill VGPRs (#70626)
SGPR spill VGPRs are WWM registers so allow them to be allocated by
SIPreAllocateWWMRegs pass.
This intentionally prevents spilling of these VGPRs when enabled.
2023-11-13 12:21:18 +09:00
Carl Ritson
52b247b1d3
[PHIElimination] Handle subranges in LiveInterval updates (#69429)
Add subrange tracking and handling for LiveIntervals during PHI
elimination.
This requires extending MachineBasicBlock::SplitCriticalEdge to also
update subrange intervals.
2023-11-13 12:16:26 +09:00
Han Shen
ca10e3b2e5
[LLVM][NVPTX] Add BF16 vector instruction and fix lowering rules (#69415)
Add support for bf16x2 instructions such as setp, fneg, fabs, etc;
Fix the instructions that were not differentiated between sm_80 and
sm_90 support, such as fpround etc.
Add more bf16 test cases to ensure the correct behavior.

---------

Co-authored-by: shenhan03 <shenhan03@kuaishou.com>
2023-11-12 21:48:31 +08:00
David Green
a31538d29c [AArch64] Add a test showing inefficient register allocation around loop IVs. NFC 2023-11-12 13:11:03 +00:00
Craig Topper
ee95819503 [RISCV][GISel] Legalize G_FSHL/G_FSHR. 2023-11-11 20:23:29 -08:00
Craig Topper
b0e97c7757 [RISCV][GISel] Legalize G_ROTL/G_ROTR. 2023-11-11 20:12:07 -08:00
Craig Topper
7965a21f7a [RISCV] Add more packh patterns. 2023-11-11 19:31:23 -08:00
Craig Topper
bfb7843580 [RISCV] Add packw/packh patterns for -riscv-experimental-rv64-legal-i32 2023-11-11 17:52:22 -08:00