52796 Commits

Author SHA1 Message Date
Igor Kirillov
deb429e5b0 Revert "[CodeGen] Improve ExpandMemCmp for more efficient non-register aligned sizes handling (#69942)"
This reverts commit 9bcb30d31813bbdea6b65789f64aed3f0e58d507.
2023-10-27 14:12:45 +00:00
Christudasan Devadasan
a0eb6b88f9
[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924)
The insertion point determined by RA while attempting spills and liverange
split at the beginning of a block goes wrong at times, and the newly
inserted vector instructions are placed before the exec-mask restore
instruction which is wrong. It occurs mainly due to the dependency on
isBasicBlockPrologue that doesn't account early inserted instructions
(spills and splits) during RA and causes the block prolog break.

A better approach for deciding the insertion point should be worked out.
For now, improving the helper function to consider all possible early
insertions. This patch includes the spill instructions. The copies
associated with liverange split should also be included in the block
prolog.
2023-10-27 19:10:18 +05:30
Simon Pilgrim
37d9dc4793 [X86] Add test case for Issue #66150 2023-10-27 14:37:19 +01:00
Christudasan Devadasan
f9cd789658
[AMDGPU] Add pseudo instructions for SGPR spill to VGPR (#69923)
For a future patch, is it important to keep the lowered SGPR
spills to be recognized as spill instructions during regalloc.
Directly lowering them into V_WRITELANE/V_READLANE won't allow
us to attach the SPILL flag to their instructions.

This patch introduces the pseudo instructions with the SGPRSpill
flag set in their Desc. They will get lowered to equivalent
instructions later during post RA pseudo expansion.
2023-10-27 17:24:10 +05:30
Igor Kirillov
9bcb30d318
[CodeGen] Improve ExpandMemCmp for more efficient non-register aligned sizes handling (#69942)
* Enhanced the logic of ExpandMemCmp pass to merge contiguous
subsequences
  in LoadSequence, based on sizes allowed in `AllowedTailExpansions`.
* This enhancement seeks to minimize the number of basic blocks and
produce optimized code when using memcmp with non-register aligned sizes.
* Enable this feature for AArch64 with memcmp sizes modulo 8 equal to
  3, 5, and 6.
2023-10-27 12:41:08 +01:00
Luke Lau
c8e1fbc3cc
[RISCV] Keep same SEW/LMUL ratio if possible in forward transfer (#69788)
For instructions like vmv.s.x and friends where we don't care about LMUL
or the
SEW/LMUL ratio, we can change the LMUL in its state so that it has the
same
SEW/LMUL ratio as the previous state. This allows us to avoid more VL
toggles
later down the line (i.e. use vsetvli zero, zero, which requires that
the
SEW/LMUL ratio must be the same)

This is an alternative approach to the idea in #69259, but note that
they
don't catch exactly the same test cases.
2023-10-27 12:16:28 +01:00
Matt Arsenault
b8b491c9d7 AMDGPU: Add infinite looping testcase after subrange spilling change
This infinite looped after d8127b2ba8a87a610851b9a462f2fc2526c36e37
2023-10-27 17:42:14 +09:00
Phoebe Wang
58d4fe287e
[X86][EVEX512] Do not allow 512-bit memcpy without EVEX512 (#70420)
Solves crash mentioned in #65920.
2023-10-27 15:26:05 +08:00
Craig Topper
116eb323b1
[RISCV] Correct copyPhysReg for GPRPF64. (#70419)
GPRF64 represents a pair of registers. We were only copying the even
part. We need to copy the odd part too.
2023-10-26 23:54:46 -07:00
Craig Topper
8ff1422353 [RISCV] Fix incorrect use of Zfa fli instruction for negative minimum value. (#70411)
isSmallestNormalized() only considers the magnitude, not the sign.
2023-10-26 22:11:58 -07:00
Craig Topper
be0cbe9173 [RISCV] Add test cases showing fli being used for negative min normalized value.
We can only use fli for the positive normalized value.
2023-10-26 22:11:58 -07:00
Amara Emerson
c9e8b73694
[AArch64][GlobalISel] Add support for extending indexed loads. (#70373) 2023-10-26 13:38:09 -07:00
David Green
3fe8fd712b [AArch64] Fix st2 check for nearby store with debug info.
It needs to be skipping over debug instructions, whilst not counting them in
the MaxLookupDist.
2023-10-26 21:37:04 +01:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Craig Topper
d307dc5b51 [RISCV][GISel] Allow G_AND/G_OR/G_XOR to have s32 types on RV64.
Even though we don't have W instructions for them. This treats them
the same as other binary operators.
2023-10-26 11:00:43 -07:00
Amara Emerson
93659947d2
[AArch64][GlobalISel] Add support for pre-indexed loads/stores. (#70185)
The pre-index matcher just needs some small heuristics to make sure it
doesn't cause regressions. Apart from that it's a simple change, since
the only difference is an immediate operand of '1' vs '0' in the
instruction.
2023-10-26 10:29:12 -07:00
Matt Harding
bf92eba697
Fix comment in wasm unreachable test (#70340)
Some textual editing errors got through this pull request that was
merged a few weeks ago: https://github.com/llvm/llvm-project/pull/65876

This patch clears up the unintentional duplicated line, and white-space
at the end of the lines.
2023-10-26 10:23:32 -07:00
Simon Pilgrim
13a349425b [AArch64] Regenerate addr-of-ret-addr.ll 2023-10-26 15:35:17 +01:00
Simon Pilgrim
aaabf50d52 [AArch64] Regenerate tests to show missing constant comments 2023-10-26 15:35:17 +01:00
Alexander Richardson
f118d474eb
[AMDGPU] Use alloca address space in rewrite-out-arguments.ll (#70269)
This is needed for the transform to fire with a correct data layout.
Pre-commiting this change to keep the diff of D141060 smaller.
2023-10-26 15:08:58 +01:00
Luke Lau
2e85123bfe
[VP] Check if VP ops with functional intrinsics are speculatable (#69504)
Noticed whilst working on #69494. VP intrinsics whose functional
equivalent is
an intrinsic were being marked as their lanes being non-speculatable,
even if
the underlying intrinsic was speculatable.

This meant that

```llvm
  %1 = call <4 x i32> @llvm.vp.umax(<4 x i32> %x, <4 x i32> %y, <4 x i1> %mask, i32 %evl)
```

would be expanded out to

```llvm
  %.splatinsert = insertelement <4 x i32> poison, i32 %evl, i64 0
  %.splat = shufflevector <4 x i32> %.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
  %1 = icmp ult <4 x i32> <i32 0, i32 1, i32 2, i32 3>, %.splat
  %2 = and <4 x i1> %1, %mask
  %3 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y)
```

instead of

```llvm
  %1 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y)
```

The cause of this was isSafeToSpeculativelyExecuteWithOpcode checking
the
function attributes for the VP instruction itself, not the functional
intrinsic. Since isSafeToSpeculativelyExecuteWithOpcode expects an
already
materialized instruction, we can't use it directly for the intrinsic
case. So
this fixes it by manually checking the function attributes on the
intrinsic.
2023-10-26 13:46:32 +01:00
Christudasan Devadasan
bb2b7530ad [AMDGPU] precommit lit test for PR 69924. 2023-10-26 17:43:14 +05:30
Jay Foad
e9c4dc18bc Revert "[AMDGPU] Use S_CSELECT for uniform i1 ext (#69703)"
This reverts commit a1260b5209968c08886e3c6183aa793de8931578.

It was causing some Vulkan CTS failures.
2023-10-26 12:56:32 +01:00
Christudasan Devadasan
16fbc45f48 Revert "[AMDGPU] Cleanup hasUnwantedEffectsWhenEXECEmpty function (#70206)"
This reverts commit 7ce613fc77af092dd6e9db71ce3747b75bc5616e.
2023-10-26 17:04:28 +05:30
Piotr Sobczak
80abbeca8e [Inline Spiller] Consider bundles when marking defs as dead
Fix bug where the code expects just a single MI, but a series of
bundled MIs need to be handled instead.

The semi-formed bundled are created by SplitKit for the case where
not all lanes are live (buildSingleSubRegCopy). Then the remat
kicks in, and since the values that are copied in the bundle do not
need to be preserved due to the remat (dead defs), all instructions
in the bundle should be marked as dead.

However, only the first one gets marked as dead, which causes the
verifier to complain later with error: "Live range continues after
dead def flag".

Differential Revision: https://reviews.llvm.org/D156999
2023-10-26 11:52:55 +02:00
Piotr Sobczak
24865a6423 [Inline Spiller] Pre-commit test 2023-10-26 11:52:54 +02:00
Simon Pilgrim
547dc46122 [DAG] SimplifyDemandedBits - ensure we drop NSW/NUW flags when we simplify a SHL node's input
We already do this for variable shifts, but we missed it for constant shifts

Fixes #69965
2023-10-26 10:34:58 +01:00
Simon Pilgrim
12dfcc0238 [DAG] Update test case for Issue #69965
The previous reduced test case just showed a minor codegen regression, this test now shows the actual miscompilation
2023-10-26 10:34:58 +01:00
Piotr Sobczak
ba3d6e0499
[AMDGPU] Rematerialize scalar loads (#68778)
Extend the list of instructions that can be rematerialized in
SIInstrInfo::isReallyTriviallyReMaterializable() to support scalar
loads.

Try shrinking instructions to remat only the part needed for current
context. Add SIInstrInfo::reMaterialize target hook, and handle
shrinking of S_LOAD_DWORDX16_IMM to S_LOAD_DWORDX8_IMM as a proof of
concept.
2023-10-26 11:34:33 +02:00
Oliver Stannard
339faffd05 Revert "[AArch64] Move SLS later in pass pipeline"
The (MF.size() == 0) assertis is triggering when building at -O0.
Reverting this while I work out what is going wrong.

This reverts commit 7e8eccd990d37d2771ca5ad7a84f54c3cfc4a5e1.
2023-10-26 09:50:13 +01:00
Pierre van Houtryve
a1260b5209
[AMDGPU] Use S_CSELECT for uniform i1 ext (#69703)
Solves #59869
2023-10-26 09:57:14 +02:00
Matthew Devereau
18775a4941
[AArch64][SVE2] Use rshrnb for masked stores (#70026)
This patch is a follow up on https://reviews.llvm.org/D155299. This
patch combines add+lsr to rshrnb when 'B' in:

  C = A + B
  D = C >> Shift

is equal to (1 << (Shift-1), and the bits in the top half of each vector
element are zeroed or ignored, such as in a truncating masked store.
2023-10-26 08:42:25 +01:00
Stanislav Mekhanoshin
4602802240
[AMDGPU] Shrink to SOPK with 32-bit signed literals (#70263)
A literal like 0xffff8000 is valid to be used as KIMM in a SOPK
instruction, but at the moment our checks expect it to be fully sign
extended to a 64-bit signed integer. This is not required since all
cases which are being shrunk only accept 32-bit operands.

We need to sign extend the operand to 64-bit though so it passes the
verifier and properly printed.
2023-10-26 00:26:54 -07:00
Luke Lau
c285b7f513
[RISCV] Add tests for vmadd for VP intrinsics. NFC (#70042)
We have VP tests for vmacc but not vmadd. This copies the vmacc tests
but swaps
the false operand of vp.merge to be the multiplicand instead of the
addend.

This shows how we could fold the vmerge into the vmadd's mask if we
commuted %a
and %b.
2023-10-26 07:59:25 +01:00
KAWASHIMA Takahiro
926173c614
[AArch64] Prevent argument promotion of vector with size > 128 bits (#70034)
This patch prevents argument promotion from promoting pointers to
fixed-length vector types larger than 128 bits like `<8 x float>` into
the values of the pointees.

Such vector types are used for SVE VLS but there is no ABI for SVE VLS
arguments and the backend cannot lower such value arguments.

Fixes #69147
2023-10-26 14:51:20 +09:00
Craig Topper
109aa586f0
[RISCV] Add an experimental pseudoinstruction to represent a rematerializable constant materialization sequence. (#69983)
Rematerialization during register allocation is currently limited to a
single instruction with no inputs.

This patch introduces a pseudoinstruction that represents the
materialization of a constant. I've started with a sequence of 2
instructions for now, which covers at least the common LUI+ADDI(W) case.
This instruction will be expanded into real instructions immediately
after register allocation using a new pass. This gives the post-RA
scheduler a chance to separate the 2 instructions to improve ILP.

I believe this matches the approach used by AArch64.

Unfortunately, this loses some CSE opportunies when an LUI value is used
by multiple constants with different LSBs.

This feature is off by default and a new backend command line option is
added to enable it for testing.

This avoids the spill and reloads reported in #69586.
2023-10-25 17:20:32 -07:00
Craig Topper
716c0220f2
[RISCV][GISel] Add instruction selection for G_FADD/G_FSUB/G_FMUL/G_FDIV with F/D extensions. (#69808) 2023-10-25 13:37:34 -07:00
Evgenii Kudriashov
0a8f54c3fe
[X86][GlobalISel] Add legalization of 64-bit G_ICMP for i686 (#69478) 2023-10-25 22:30:42 +02:00
Craig Topper
c2b64dfaa4
[RISCV][GISel] Add regbank selection for G_FADD/G_FSUB/G_FMUL/G_FDIV with F/D extensions. (#69805)
This includes the plumbing for ValueMapping and PartialMapping.
2023-10-25 12:48:17 -07:00
Craig Topper
da1736eba6
[RISCV][GISel] Add legalizer support for G_FADD/G_FSUB/G_FMUL/G_FDIV with F/D extensions. (#69804)
This a simple patch to get initial FP support started.
2023-10-25 12:40:43 -07:00
Craig Topper
d32e801d74
[RISCV][GISel] Add FP calling convention support (#69138)
This includes support for using GPRs, FPRs, and stack.
2023-10-25 12:30:03 -07:00
Matthias Braun
94aaaf4fb4 Update m68k tests to new block placement
e3cf80c5c1fe55efd8216575ccadea0ab087e79c affected block placement of
some tests in the experimental m68k target. This updates them.
2023-10-25 11:33:56 -07:00
Craig Topper
7fde4ffbd3
[Mips][GISel] Fix a couple issues with passing f64 in 32-bit GPRs. (#69131)
MipsIncomingValueHandler::assignCustomValue should return 1 instead of
2. The return value is the number of additional ArgLocs being consumed.
It's assumed that at least 1 is consumed.

Correct the LocVT used for the spill when there are no registers left.
It should be f64 instead of i32. This allows a workaround to be removed
in the SelectionDAG path.
2023-10-25 11:28:22 -07:00
Mircea Trofin
c362cc2705 [mlgo][regalloc] Fix reference file post e3cf80c 2023-10-25 11:23:15 -07:00
Craig Topper
674b53d1a4 [RISCV][GISel] Add widenScalarToNextPow2 to G_SEXTLOAD/G_ZEXTLOAD legalization.
This fixes i8->i48 on RV64.
2023-10-25 11:20:18 -07:00
Craig Topper
8efd6799f0 [RISCV][GISel] Add clampScalar G_ZEXTLOAD/G_SEXTLOAD legalization rules.
This fixes i8->i16 on RV32/RV64 and i8/i16/i32->i64 on RV64.
2023-10-25 10:23:55 -07:00
Simon Pilgrim
c60bd0e578 [X86] Regenerate select-mmx.ll
Change i686 check-prefix to the more standard X86 instead of I32
2023-10-25 18:10:51 +01:00
Christudasan Devadasan
7ce613fc77
[AMDGPU] Cleanup hasUnwantedEffectsWhenEXECEmpty function (#70206)
The readlane & writelane instructions don't really depend on the the
EXEC mask and they should return false from here.
2023-10-25 22:10:16 +05:30
Min-Yih Hsu
e696379c0d
[RISCV][GISel] Falling back to SDISel for scalable vector type values (#70133)
This patch also tests the fallback of unsupported formal arguments.
2023-10-25 09:02:34 -07:00
Simon Pilgrim
c9c9bf0f20 [DAG] WidenVectorOperand - add basic handling for *_EXTEND_VECTOR_INREG nodes
Fixes Issue #70208
2023-10-25 16:52:15 +01:00