52796 Commits

Author SHA1 Message Date
Yusra Syeda
9a38a72f1d
[SystemZ][z/OS] This change adds support for the PPA2 section in zOS (#68926)
This PR adds support for the PPA2 fields.

---------

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-11-27 16:30:12 -05:00
Craig Topper
179a2e0443
[RISCV][GISel] Legalize and select G_BRINDIRECT. (#73059) 2023-11-27 13:09:47 -08:00
Craig Topper
9e86919626
[RISCV][GISel] Fix 2 indirect call bugs. (#73170)
We can't set MO_PLT on an indirect call.
We need to constrain the register class for the operand to the call
instruction.
2023-11-27 12:59:01 -08:00
David Li
c2ba2b2190
Fix ISel crash when lowering BUILD_VECTOR (#73186)
512bit vpbroadcastw is available only with AVX512BW. Avoid lowering
BUILD_VEC into vbroard_cast node when the condition is not met. This
fixed a crash (see the added new test).
2023-11-27 11:09:46 -08:00
Bjorn Pettersson
30afb21547 Revert "[MCP] Enhance MCP copy Instruction removal for special case (#70778)"
This reverts commit cae46f6210293ba4d3568eb21b935d438934290d.

Reverted due to miscompiles.
See https://github.com/llvm/llvm-project/issues/73512
2023-11-27 19:39:40 +01:00
Craig Topper
5f31dbd18d
[RISCV] Add register bank and instruction selection support for FP G_SELECT. (#72726)
Try to pick the FP register bank based on surrounding use/defs. Code is
basically copied from AArch64.

Need legalizer changes to make this more useful. Right now we're stuck
with only being able to FP select types less than or equal to XLen.
2023-11-27 10:38:25 -08:00
David Green
3c23ed156f [AArch64] Add a test to show scheduling aliasing between SVE loads and stores. NFC 2023-11-27 16:22:46 +00:00
Simon Pilgrim
286905351f [X86] vector-interleaved tests - add AVX512F/AVX512DQ/AVX512BW/AVX512DQBW-ONLY common prefixes to merge more SLOW/FAST checks
Not used by many vector-interleaved tests, but its a LOT easier to maintain if we use the same prefixes for all of them.
2023-11-27 15:06:24 +00:00
Igor Kirillov
839abdb0d2
[MachineLICM] Fix incorrect CSE on hoisted const load (#73007)
When hoisting an invariant load, we should not combine it with an
existing load through common subexpression elimination (CSE). This is
because there might be memory-changing instructions between the existing
load and the end of the block entering the loop.

Fixes https://github.com/llvm/llvm-project/issues/72855
2023-11-27 14:37:18 +00:00
Simon Pilgrim
edf645616f [X86] Regenerate vector-interleaved-store-i64-stride-4.ll 2023-11-27 13:48:36 +00:00
Shengchen Kan
cb112eb16c
[X86][CodeGen] Teach frame lowering to spill/reload registers w/ PUSHP/POPP, PUSH2[P]/POP2[P] (#73292)
#73092 supported the encoding/decoding for PUSHP/POPP
#73233 supported the encoding/decoding for PUSH2[P]/POP2[P]

In this patch, we teach frame lowering to spill/reload registers w/
these instructions.

1. Use PPX for balanced spill/reload
2. Use PUSH2/POP2 for continuous spills/reloads
3. PUSH2/POP2 must be 16B-aligned on the stack, so pad when necessary
2023-11-27 21:37:07 +08:00
Momchil Velikov
ac06d4e4cb Re-commit "[MachineSink][AArch64] Enable sink-and-fold by default (#72132)"
This re-commits 13fe0386454d after fixing a couple of issues in the LLDB
testsuite in ef9bcace834e and 6b87d84ff45d
2023-11-27 11:28:22 +00:00
Simon Pilgrim
11276563c8 [X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126)
If we are loading the same ptr at different vector widths, then reuse the largest load and just extract the low subvector.

Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value.

This is mainly useful for better constant sharing.
2023-11-27 10:26:26 +00:00
David Green
295edaab13 [AArch64][GlobalISel] Better vecreduce.fadd lowering. (PR #73294)
This changes the fadd legalization to handle fp16 types, and treats more types
as legal so that the backend can produce the correct patterns. This is
currently a missing identity fold for `fadd x -0.0 -> x`
2023-11-27 08:20:54 +00:00
Shengchen Kan
27c0bc9cae
[X86][MC] Allow to specify any of the 8/16/32/64 register names interchangeably for R16-R31 (#73421) 2023-11-27 15:25:19 +08:00
Zi Xuan Wu (Zeson)
e89324219a
[RISCV] Don't combine store of vmv.x.s/vfmv.f.s to vp_store with VL of 1 when it's indexed store (#73219)
Because we can't support vp_store with indexed address mode by lowering to vse intrinsic later.
2023-11-27 13:39:35 +08:00
Douglas Yung
1aa1d176ba Add "REQUIRES: asserts" to test as it requires the compiler to hit an assertion failure to pass and was failing in release builds. 2023-11-25 21:45:58 -08:00
Chen Zheng
abc405858d
[XCOFF] make related SD symbols as isFunction (#69553)
This will help tools like llvm-symbolizer recognizes more functions.
2023-11-26 11:59:09 +08:00
Craig Topper
75a9ed4246 [RISCV][GISel] Add simplest case of folding add with immediate into load/store address.
This covers the simm12 offset case.
2023-11-25 10:48:35 -08:00
Craig Topper
564ff80e22 [RISCV][GISel] Test G_FRAME_INDEX folding into store address. NFC 2023-11-25 10:48:31 -08:00
David Green
9cee94b81b
[GlobalISel] Add identity fold for fadd -0.0 (#73296)
-0.0 acts as the identity element for fadd. This doesn't try to add 0.0
too, which would require nsz fast math flags.
2023-11-25 08:35:26 +00:00
Craig Topper
26cf3aab83 [RISCV][GISel] Add more G_SEXTLOAD instruction selection tests. NFC 2023-11-24 23:58:11 -08:00
Craig Topper
f995afe7f2 [RISCV][GISel] Add G_FRAME_INDEX support to selectAddrRegImm.
We can fold the G_FRAME_INDEX into a load/store address.
2023-11-24 23:57:54 -08:00
Florian Hahn
20f634f275
[Thumb] Add test case where the machine-outliner clobbers LR.
Add ad test case where `bl OUTLINED_FUNCTION_0` clobbers LR, which in
turn is used the later call to memcpy to return to the caller.
2023-11-24 20:27:43 +00:00
Stefan Pintilie
d896b1f5a6
[PowerPC] Do not string pool globals that are part of llvm used. (#66848)
The string pooling pass was incorrectly pooling global varables that
were part of llvm.used or llvm.compiler.used. This patch fixes the pass
to prevent that by checking each candidate to make sure that it is not
in either of those lists.
2023-11-24 12:21:28 -05:00
Antonio Frighetto
0ff5281c94 [GlobalISel] Treat shift amounts as unsigned in matchShiftImmedChain
A miscompilation issue in the GISel pre-legalization
phase has been addressed with improved routines.

Fixes: https://github.com/llvm/llvm-project/issues/71440.
2023-11-24 18:14:52 +01:00
Craig Topper
5d501b1091
[GISel][RISCV] Fix several boundary cases in narrow G_SEXT_INREG. (#72719)
This fixes cases when SizeInBits is a multiple of the narrow size.

If SizeBits is equal to NarrowTy size, the first block would create an
illegal G_SEXT_INREG where the the extension size is equal to the type.
I tried to turn it into G_TRUNC+G_SEXT, but that just turned back into
G_SEXT_INREG causing an infinite loop. So punt to the splitting case.

In the for loop we should copy when the part ends on SizeInBits. In that
case there is no G_SEXT_INREG needed for partial. But we should note
that register in PartialExtensionReg for the first full part to use.

If the part starts on SizeInBits then we should do an AShr of
PartialExtensionReg.

We should only get to the G_SEXT_INREG case if the SizeInBits is in the
middle of the part.
2023-11-24 08:39:38 -08:00
Florian Hahn
820b3583c9
[AArch64] Add artificial clobbers to swift async context test.
Manually add clobbers for various register combinations to tests. This
highlights incorrectly performing shrink-wrapping, with
StoreSwiftAsyncContext expansion clobbering a live register.
2023-11-24 14:14:49 +00:00
pasmpe01
de6c9c84e2 [TLI][AArch64] Add TLI Mappings of @llvm.exp10 for ArmPL and SLEEF.
Update regex to _explicitly_ show which exp versions are added.
The previous regex used `exp[^e]` to avoid matching calls like:
`@llvm.experimental.stepvector`.

Note: ArmPL Mappings for scalable types are not yet utilized
(eg, `llvm.exp10.nxv2f64`, `llvm.exp10.nxv4f32`), as `replace-with-veclib`
pass needs improvements.
2023-11-24 12:24:33 +00:00
Jay Foad
28233b11ac
[AMDGPU] New AMDGPUInsertSingleUseVDST pass (#72388)
Add support for emitting GFX11.5 s_singleuse_vdst instructions. This is
a power saving feature whereby the compiler can annotate VALU
instructions whose results are known to have only a single use, so the
hardware can in some cases avoid writing the result back to VGPR RAM.

To begin with the pass is disabled by default because of one missing
feature: we need an exclusion list of opcodes that never qualify as
single-use producers and/or consumers. A future patch will implement
this and enable the pass by default.

---------

Co-authored-by: Scott Egerton <scott.egerton@amd.com>
2023-11-24 10:23:06 +00:00
David Green
b3dd14ce07 [AArch64] Add extra vecreduce.fmul tests. NFC 2023-11-24 10:00:00 +00:00
Phoebe Wang
ea81e31aa1
[X86][AVX10] Allow AVX10 use VBMI2 instructions (#73276) 2023-11-24 12:54:30 +08:00
Craig Topper
0a9c6bea6b [RISCV][GISel] Support G_CTTZ/CTLZ with Zbb. 2023-11-23 14:15:11 -08:00
Craig Topper
5bb03d25f7 [RISCV][GISel] Support G_CTPOP with Zbb. 2023-11-23 13:06:23 -08:00
Björn Pettersson
3114bd32e7
[StackColoring] Do not drop AA metadata when not doing remappings (#71958)
In the StackColoring pass we first scan for possible stack slot merges.
A SlotRemap map is setup with the remappings that should be performed.
Then the main work is done by calling remapInstructions and providing
that map.

Most of the work in remapInstructions would just be a waste of time in
situations when the SlotRemap map is empty, but it turns out that the
part that adjusts Alias Analysis information could end up dropping AA
metadata even when there are no stack slot merges being done. This
happens since all instruction's machine memory operands are considered,
and if we can't determine the underlying object that is accessed (using
getUnderlyingObjectsForCodeGen) then we conservatively drop AA metadata.

This patch simply avoids calling remapInstructions if we don't intend to
do any remappings (i.e. if SlotRemap is empty). That avoids touching AA
metadata when all we do is to remove lifetime markers. That seems like a
safe thing to do, as it is the same thing as happens when we bail out
early due to other reasons (e.g. when only having one lifetime marker).

For targets that do not care about Alias Analysis information after the
StackColoring pass this shouldn't have any impact, except that it might
improve compile time slightly as we now skip spending time in
remapInstructions when not doing any stack merges.
2023-11-23 18:10:40 +01:00
Simon Pilgrim
381efa4960 Revert rG67275263b3b781a "[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126)"
Missed an issue that we were calling continue from within the for loop - fixed version incoming shortly.
2023-11-23 16:50:58 +00:00
Jay Foad
cf1e0c0b07
[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Simon Pilgrim
67275263b3
[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126)
If we are loading the same ptr at different vector widths, then reuse the larger load and just extract the low subvector.

Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value.

This is mainly useful for better constant sharing.
2023-11-23 14:10:23 +00:00
Acim Maravic
376b22a371
[LLVM] Make s_getpc_b64 rematerializable (#71823) 2023-11-23 13:07:12 +01:00
hev
0d9f557b6c
[LoongArch] Disable mulodi4 and muloti4 libcalls (#73199)
This library function only exists in compiler-rt not libgcc. So this
would fail to link unless we were linking with compiler-rt.

Fixes https://github.com/ClangBuiltLinux/linux/issues/1958
2023-11-23 19:34:50 +08:00
Thorsten Schütt
b71b32ba87
[Gisel][AArch64] legalize G_IS_FPCLASS (#72796) 2023-11-23 10:31:05 +01:00
Craig Topper
1343d96ec1 [RISCV][GISel] Suppport G_BSWAP with Zbb. 2023-11-23 00:39:42 -08:00
Zhaoxuan Jiang
147c5d6686
[AArch64] Allow LDR merge with same destination register by renaming (#71908)
The patch is based on a reverted patch:
https://reviews.llvm.org/D103597. It was trying to rename registers
before alias check, which is not safe and causes miscompiles. This patch
does 2 things:

1. Do the renaming with necessary checks passed, including alias check.
2. Rename the register for the instructions between the pairs and
combine the second load into the first. By doing so we can just check
the renamability between the pairs and avoid scanning unknown amount of
instructions before/after the pairs.

Necessary refactoring has been made in order to reuse as much code
possible with STR renaming.
2023-11-23 08:21:27 +00:00
Pierre van Houtryve
d76d8e541d
[AMDGPU][NFC] Update GISel memory-legalizer-atomic-fence test (#72829)
Test needs to be moved to MIR checks and use
stop-after=si-memory-legalizer to avoid being optimized out in a future
patch.
2023-11-23 09:09:05 +01:00
hev
7414c0db96
[LoongArch] Precommit a test for smul with overflow (NFC) (#73212) 2023-11-23 15:15:26 +08:00
Wang Pengcheng
5973272af7
[RISCV] Add MinimumJumpTableEntries to TuneInfo (#72963)
This is like what AArch64 has done in #71166 except that we don't
handle `HasMinSize` case now.
2023-11-23 14:05:23 +08:00
Min-Yih Hsu
7c3c8a1277
[RISCV][GISel] Add support for G_IS_FPCLASS in F and D extensions (#72000)
Add legalizer, regbankselect, and isel supports for floating point
version of G_IS_FPCLASS.
2023-11-22 16:43:20 -08:00
Craig Topper
a845061935
[AArch64] Use the same fast math preservation for MachineCombiner reassociation as X86/PowerPC/RISCV. (#72820)
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
    
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.

Fixes #72777.
2023-11-22 14:17:45 -08:00
LWenH
32903b0b6d [MCP] fix PowerPC redundant copy instructions removal fail test cases, NFC 2023-11-23 01:54:53 +08:00
Florian Hahn
a842430c20
[AArch64] Add check that prologue insertion doesn't clobber live regs. (#71826)
This patch extends AArch64FrameLowering::emitProglogue to check if the
inserted prologue clobbers live registers.

It updates `llvm/test/CodeGen/AArch64/framelayout-scavengingslot.mir`
with an extra load to make x9 live before the store, preserving the
original test.

It uses the original
`llvm/test/CodeGen/AArch64/framelayout-scavengingslot.mir` as
`llvm/test/CodeGen/AArch64/emit-prologue-clobber-verification.mir`,
because there x9 is marked as live on entry, but used as scratch reg as
it is not callee saved.

The new assertion catches a mis-compile in
`store-swift-async-context-clobber-live-reg.ll` on
https://github.com/apple/llvm-project/tree/next
2023-11-22 16:49:33 +00:00