52796 Commits

Author SHA1 Message Date
pvanhout
c3cfbbc416 [GlobalISel] Add dead flags to implicit defs in ISel
Checks for implicit defs that are unused within a pattern and mark them as dead.

This is done directly at the TableGen level forr efficiency.
The instructions are directly created with the "dead" operand and no further analysis is needed later.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157273
2023-08-09 14:20:51 +02:00
Simon Pilgrim
8ae0e1f58d [X86] Create X86ISD::SHUF128 512-bit masks with getV4X86ShuffleImm8ForMask
This allows us to use the same canonicalizations as PSHUFD/SHUFPS etc. to avoid unnecessary demanded elts (better splat detection, blend pass through etc.) instead of defaulting to zero mask values.
2023-08-09 11:13:03 +01:00
Alex Bradbury
89b8ebf3d6 [LegalizeTypes][RISCV] Correct FP_TO_{S,U}INT expansion when bf16 isn't a legal type
As noted in D156990, the logic in ExpandIntRes_FP_TO_SINT assumes that
if the type action for the float type is TypeSoftPromoteHalf, is must
have been an f16 (half). However, the meaning of that type action has
been overloaded and it is used for both f16 and bf16. This patch adds an
appropriate check to ensure ISD::FP16_TO_FP or ISD::BF16_TO_FP is
emitted as required.

Differential Revision: https://reviews.llvm.org/D157287
2023-08-09 11:01:28 +01:00
Igor Kirillov
60e2a849b0 [CodeGen] Disable FP LD1RX instructions generation for Neoverse-V1
These instructions show worse performance on Neoverse-V1 compared
to pair of LDR(LDP)/MOV instructions.
This patch adds `no-sve-fp-ld1r` sub-target feature, which is enabled
only on Neoverse-V1.

Fixes https://github.com/llvm/llvm-project/issues/64498

Differential Revision: https://reviews.llvm.org/D157279
2023-08-09 09:33:45 +00:00
Simon Wallis
33b9634394 [ARM] v6-M XO: save CPSR around LoadStackGuard
For Thumb-1 Execute-Only, expandLoadStackGuardBase generates a tMOVimm32 pseudo when calculating the stack offset.
It does this in a context where the CSPR maybe be live. tMOVimm32 may corrupt CPSR.
To fix this, generate save/restore CPSR around the tMOVimm32 using MRS/MSR to/from a scratch register.

expandLoadStackGuardBase this runs after register allocation, so the scratch register needs to be a physical register.
Use R12 as a scratch register, as is usual when expanding a pseudo.
MSR/MRS are some of the few v6-M instructions which operate on a high register.

New stack-guard test case added which was generating incorrect code without the save/restore CPSR.

Reviewed By: stuij

Differential Revision: https://reviews.llvm.org/D156968
2023-08-09 08:40:35 +01:00
Konstantina Mitropoulou
2c5d1b5ab7 [DAGCombiner] Reassociate the operands from (OR (OR(CMP1, CMP2)), CMP3) to (OR (OR(CMP1, CMP3)), CMP2)
This happens when CMP1 and CMP3 have the same predicate (or CMP2 and CMP3 have
the same predicate).

This helps optimizations such as the fololowing one:
CMP(A,C)||CMP(B,C) => CMP(MIN/MAX(A,B), C)
CMP(A,C)&&CMP(B,C) => CMP(MIN/MAX(A,B), C)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156215
2023-08-08 20:08:01 -07:00
Konstantina Mitropoulou
51202b8d2e [NFC][DAGCombiner] Tests for future commit.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D155915
2023-08-08 20:05:23 -07:00
Weining Lu
f62c9252fc [LoongArch] Support -march=native and -mtune=
As described in [1][2], `-mtune=` is used to select the type of target
microarchitecture, defaults to the value of `-march`. The set of
possible values should be a superset of `-march` values. Currently
possible values of `-march=` and `-mtune=` are `native`, `loongarch64`
and `la464`.

D136146 has supported `-march={loongarch64,la464}` and this patch adds
support for `-march=native` and `-mtune=`.

A new ProcessorModel called `loongarch64` is defined in LoongArch.td
to support `-mtune=loongarch64`.

`llvm::sys::getHostCPUName()` returns `generic` on unknown or future
LoongArch CPUs, e.g. the not yet added `la664`, leading to
`llvm::LoongArch::isValidArchName()` failing to parse the arch name.
In this case, use `loongarch64` as the default arch name for 64-bit
CPUs.

Two preprocessor macros are defined based on user-provided `-march=`
and `-mtune=` options and the defaults.
- __loongarch_arch
- __loongarch_tune
Note that, to work with `-fno-integrated-cc1` we leverage cc1 options
`-target-cpu` and `-tune-cpu` to pass driver options `-march=` and
`-mtune=` respectively because cc1 needs these information to define
macros in `LoongArchTargetInfo::getTargetDefines`.

[1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc
[2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc

Reviewed By: xen0n, wangleiat, steven_wu, MaskRay

Differential Revision: https://reviews.llvm.org/D155824
2023-08-09 10:29:50 +08:00
David Green
c782e3497d [AArch64] Add VSHL knownBits handling.
These can be handled in the same way as other shifts.
2023-08-08 21:59:53 +01:00
David Green
2bb727297d [AArch64] Regenerate s/urem-seteq-* tests. NFC 2023-08-08 21:34:34 +01:00
Matt Arsenault
87b6f85c2b AMDGPU: Add syncscopes to some atomic tests
These were not testing what was intended, which should be the cases we
can directly select to the instructions.
2023-08-08 14:38:06 -04:00
Matt Arsenault
3371849194 AMDGPU: Round out system atomics tests
There were system scope tests only for integer min/max. Expand this to
cover all of the integer operations.
2023-08-08 14:38:05 -04:00
Matt Arsenault
7db933a716 AMDGPU: Fix broken test checks
There were incomplete generated checks plus some dead manual checks.
2023-08-08 14:38:05 -04:00
Simon Pilgrim
7593f9b59a [X86] combineConcatVectorOps - add handling for X86ISD::SHUF128 nodes.
Prevents regression on some future work to improve codegen for concat_vectors(extract_subvector(),extract_subvector()) patterns.

X86ISD::SHUF128 optimization is still pretty poor (especially the zmm variant), not optimizing the shuffle demanded elts like we do for SHUFPS.
2023-08-08 18:13:43 +01:00
Igor Kirillov
84d444f909 [CodeGen] Fix incorrect pattern FMLA_* pseudo instructions
* Remove the incorrect patterns from AArch64fmla_p/AArch64fmls_p
* Add correct patterns to AArch64fmla_m1/AArch64fmls_m1
* Refactor fma_patfrags for the sake of PatFrags

Fixes https://github.com/llvm/llvm-project/issues/64419

Differential Revision: https://reviews.llvm.org/D157095
2023-08-08 16:34:31 +00:00
pvanhout
96e1032a5e [AMDGPU] Add extended-image-insts to RemoveIncompatibleFunctions
Otherwise device libs still has issues at O0 (in OpenCL-CTS)

Depends on D156972 as well. They're unrelated fixes but both are needed to fix the issue.

Fixes SWDEV-402331

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D156973
2023-08-08 15:15:57 +02:00
pvanhout
98ccc70b93 [DAG] Fix crash in replaceStoreOfInsertLoad
Idx's type can be different from Ptr's, causing a "Binary operator types must match" assertion failure when emitting the MUL.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156972
2023-08-08 15:15:34 +02:00
Alex Bradbury
f7dbc8501f [LegalizeTypes][RISCV] Support libcalls for fpto{s,u}i of bfloat by extending to f32 first
As there is no direct bf16 libcall for these conversions, extend to f32
first.

This patch includes a tiny refactoring to pull out equivalent logic in
ExpandIntRes_XROUND_XRINT so it can be reused in
ExpandIntRes_FP_TO_{S,U}INT.

This patch also demonstrates incorrect codegen for RV32 without zfbfmin
for the newly enabled tests. As it doesn't introduce that incorrect
codegen (caused by the assumption that 'TypeSoftPromoteHalf' is only
used for f16 types), a fix will be added in a follow-up (D157287).

Differential Revision: https://reviews.llvm.org/D156990
2023-08-08 13:56:32 +01:00
Jolanta Jensen
932972305b [NFC][AArch64] Added checks for global entries in ReplaceWithVeclib testing
This patch added checks for global entries in ReplaceWithVeclib testing
using ArmPL and SLEEF vector libraries.

Differential Revision: https://reviews.llvm.org/D157258
2023-08-08 12:28:58 +00:00
Matt Devereau
e8efe7f9d1 [AArch64][SME2][SVE2p1] Choose strided or contiguous loads
Lower to the strided/contiguous addressing mode of
ld1/ldnt1 instructions depending on register allocation.

Differential Revision: https://reviews.llvm.org/D156311
2023-08-08 11:50:33 +00:00
Igor Kirillov
7542477d5d [CodeGen] Precommit tests for D157095 2023-08-08 11:38:15 +00:00
Igor Kirillov
b560d5c7e3 [CodeGen] Pre-commit tests showing incorrect pattern FMLA_* pseudo instructions
Differential Revision: https://reviews.llvm.org/D157094
2023-08-08 10:52:55 +00:00
David Green
de775f264d [DAG] Add constant SPLAT handling in getNodes SIGN_EXTEND_INREG
This helps simplify constant splats a little. Without this the code in
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L14072 always returns the
existing node.

Differential Revision: https://reviews.llvm.org/D157259
2023-08-08 10:27:55 +01:00
Simon Pilgrim
943fda567a [X86] matchTruncateWithPACK - canonically prefer v4i64 -> v4i32 shuffle vs truncation
Pulled out of LowerTruncateVecPackWithSignBits - prefer shuffles unless we can cheaply split the vector. ComputeNumSignBits struggles with vXi64 through bitcasts, so we're usually better off with shuffles.
2023-08-08 10:05:24 +01:00
Luke Lau
5d510ea724 [RISCV] Lower vro{l,r} for fixed vectors
We need to add new VL nodes to mirror ISD::ROTL and ISD::ROTR.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157295
2023-08-08 09:47:00 +01:00
Luke Lau
768740ef77 [RISCV] Lower unary zvbb ops for fixed vectors
This reuses the same strategy for fixed vectors as other ops, i.e. custom lower
to a scalable *_vl SD node.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157294
2023-08-08 09:46:57 +01:00
Luke Lau
44383ac7fd [RISCV] Add fixed vector tests for ct[l,t]z_zero_undef
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157293
2023-08-08 09:46:55 +01:00
David Green
0d0599249a [AArch64] Regenerate fpround mir tests. NFC 2023-08-08 09:24:05 +01:00
Ben Shi
57c6fe273f [CSKY] Optimize multiplication with immediates
Optimize "Rx * imm" for specific immediates to
([IXH32|IXW32|IXD32] (LSLI Rx, shift), Rx).

Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D154768
2023-08-08 14:13:49 +08:00
Ben Shi
731bab50be [CSKY][test][NFC] Add tests of multiplication with immediates
These tests will be optimized with IXH32/IXW32/IXD32
in the future.

Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D154332
2023-08-08 14:13:49 +08:00
Ben Shi
30b52a3574 [CSKY] Optimize conditional branch and value select with BTSTI
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D154768
2023-08-08 14:13:48 +08:00
Philip Reames
f0a9aacdb9 [RISCV] Use vmv.s.x for a constant build_vector when the entire size is less than 32 bits
We have a variant of this for splats already, but hadn't handled the case where a single copy of the wider element can be inserted producing the entire required bit pattern. This shows up mostly in very small vector shuffle tests.

Differential Revision: https://reviews.llvm.org/D157299
2023-08-07 17:15:05 -07:00
Nitin John Raj
c9fe119869 [RISCV][GlobalISel] Legalize G_ICMP and G_SELECT
Test legalization for (i7, i8, i16, i32, i48, i64) on rv32 and for (i8, i15, i16, i32, i64, i72, i128). Legalization fails for i96 on rv32 and i192 on rv64. Note that [i192 fails for AArch64](https://github.com/llvm/llvm-project/issues/64394).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157023
2023-08-07 16:44:29 -07:00
Nitin John Raj
cd61e8de06 [RISCV][GlobalISel] Legalize add/sub for wide and non-pow2 types
Legalize G_ADD, G_SUB, G_(S/U)ADD(O/E). We test for (s7, s48, s64, s96)
on rv32 and (s15, s72, s128, s192) on rv64.

Differential Revision: https://reviews.llvm.org/D157019
2023-08-07 16:43:53 -07:00
Nitin John Raj
3bcfd6e962 [RISCV][GlobalISel] Legalize logical instructions for nonpow 2 types
Legalize G_AND, G_OR, G_XOR for (s7, s48) on rv32 and (s15, s72) on rv64

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D157017
2023-08-07 16:23:47 -07:00
Matt Arsenault
4b1702e87a AMDGPU: Fix counting source modifiers as literal constants
This fixes over estimating code size. This was broken by
79f52af4cd9a76485dd50bcdbb5d393eb7a70103.

https://reviews.llvm.org/D157103
2023-08-07 18:40:16 -04:00
Nitin John Raj
649e1d1b9d [RISCV][GlobalISel] Legalize bitshift instructions for narrow types
Legalize G_SHL, G_ASHR and G_LSHR for types narrower and upto (and including) XLen: (i7, i8,
i16 and i32) for rv32 and (i8, i15, i16, i32 and i64) for rv64. This requires
adding some rules to handle G_ANYEXT, G_ZEXT and G_SEXT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D155772
2023-08-07 15:11:34 -07:00
Craig Topper
07c8bcc21d [AArch64] Narrow G_SEXT_INREG to s64 before lowering.
This avoids narrowing after it has been expanded to shifts. The
G_SEXT_INREG narrowing can use the second operand of the instruction to
optimize the narrowing.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D157172
2023-08-07 14:34:21 -07:00
Craig Topper
7cc615413f [RISCV] Add back handling of X > -1 to ISD::SETCC lowering.
There are cases where the -1 doesn't become visible until lowering
so the folding doesn't have a chance to run.

I think in these cases there is a missed DAGCombine for truncate (undef),
which I may fix separately, but RISC-V backend should protect itself.

Fixes #64503.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D157314
2023-08-07 13:00:57 -07:00
Nitin John Raj
b8fef7a6d4 [RISCV][GlobalISel] Legalize constants, undefined values, extension instructions, and (un)merge instructions for narrow types
Test legalization for (s7, s8, s16, s32, s48, s64, s96) for rv32, (s8, s15, s16, s32, s64, s72, s128, s192) for rv64.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156383
2023-08-07 11:17:08 -07:00
Nitin John Raj
1b74459df8 [RISCV][GlobalISel] Fix tests for addition, subtraction and logical instructions
Fix a bug introduced in a previous commit.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D156380
2023-08-07 10:25:03 -07:00
John Brawn
f83ab2b3be [ARM] Improve generation of thumb stack accesses
Currently when a stack access is out of range of an sp-relative ldr or
str then we jump straight to generating the offset with a literal pool
load or mov32 pseudo-instruction. This patch improves that in two
ways:
 * If the offset is within range of sp-relative add plus an ldr then
   use that.
 * When we use the mov32 pseudo-instruction, if putting part of the
   offset into the ldr will simplify the expansion of the mov32 then
   do so.

Differential Revision: https://reviews.llvm.org/D156875
2023-08-07 17:53:32 +01:00
Philip Reames
47fe3b3b9a [RISCV] Use v(f)slide1down for build_vector with dominant values
If we have a dominant value, we can still use a v(f)slide1down to handle the last value in the vector if that value is neither undef nor the dominant value.

Note that we can extend this idea to any tail of elements, but that's ends up being a near complete merge of the v(f)slide1down insert path, and requires a bit more untangling on profitability heuristics first.

Differential Revision: https://reviews.llvm.org/D157120
2023-08-07 07:54:29 -07:00
Philip Reames
999ac10d76 [RISCVGatherScatterLowering] Support broadcast base pointer
A broadcast base pointer is the same as a scalar base pointer for GEP semantics (when there's at least one other vector operand). This is the form that SLP likes to emit, so we should handle it.

Differential Revision: https://reviews.llvm.org/D157132
2023-08-07 07:42:04 -07:00
Jay Foad
56d92c1758 [MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Recommit after fixing AggressiveAntiDepBreaker in D156880.

Differential Revision: https://reviews.llvm.org/D156552
2023-08-07 15:41:40 +01:00
Jay Foad
68a0a37371 [AggressiveAntiDepBreaker] Tweak the fix for renaming a subregister of a live register
This patch tweaks the fix in D20627 "Do not rename registers that do not
start an independent live range" to only consider Data dependencies, not
Output or Anti dependencies. An Output or Anti dependency to a superreg
does not imply that that superreg is live at the current instruction.

This enables breaking anti-dependencies in a few more cases as shown by
the lit test updates.

Differential Revision: https://reviews.llvm.org/D156879
2023-08-07 15:41:40 +01:00
Alex Bradbury
380fd8201d [RISCV][test] Add non-zfbfmin RUN lines to bfloat-convert.ll
As requested in review for https://reviews.llvm.org/D156990

This additionally consistently uses the ilp32d/lp64d ABIs when the D
extension is enabled.
2023-08-07 14:39:12 +01:00
Simon Pilgrim
0d1f8532bc [X86] truncateVectorWithPACK - ensure we don't truncate to <1 x iXX> vector types
Fuzz testing noticed that the sub-128-bit vector splitting added in ef4330f4f3cc didn't correctly halt at <2 x iXX> truncations.
2023-08-07 14:11:42 +01:00
Simon Pilgrim
711dff4577 [X86] Add matchTruncateWithPACK helper for matching signbits/knownbits for PACKSS/PACKUS
Begin to consolidate the similar matching code we have - all have semi-similar constraints that still need merging together to ensure we get consistent codegen depending on when the truncate is lowered.
2023-08-07 14:11:42 +01:00
Jim Lin
f2bdc29f3e [RISCV] Add a blank line after end of RUN lines. NFC.
In most of testcases, it usually has a blank line after end of RUN lines for readability.
2023-08-07 18:38:09 +08:00