7335 Commits

Author SHA1 Message Date
Momchil Velikov
4b6968952e
[AArch64] Implement spill/fill of predicate pair register classes (#76068)
We are getting ICE with, e.g.
```
#include <arm_sve.h>

 void g();
 svboolx2_t f0(int64_t i, int64_t n) {
     svboolx2_t r = svwhilelt_b16_x2(i, n);
     g();
     return r;
 }
```
2023-12-22 15:54:12 +00:00
David Green
48b9106656 [AArch64] Add an strict fp reduction test. NFC 2023-12-22 13:25:00 +00:00
Matt Arsenault
0e46b49de4 Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit c398fa009a47eb24f88383d5e911e59e70f8db86.

PPC backend was fixed in 2f82662ce901c6666fceb9c6c5e0de216a1c9667
2023-12-22 16:46:22 +07:00
Vitaly Buka
0ccc1e7acd Revert "[AArch64] Fold more load.x into load.i with large offset"
Issue #76202

This reverts commit f5687636415969e6d945659a0b78734abdfb0f06.
2023-12-21 21:12:40 -08:00
Tomas Matheson
7bd17212ef Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947)
This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae.

Fix expensive checks failure by properly marking register def for ADR.
2023-12-21 18:32:55 +00:00
Tomas Matheson
9f0f558742 Revert "[AArch64] Codegen support for FEAT_PAuthLR"
This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5.

Builtbot failures with expensive checks enabled.
2023-12-21 16:25:55 +00:00
Tomas Matheson
5992ce90b8 [AArch64] Codegen support for FEAT_PAuthLR
- Adds a new +pc option to -mbranch-protection that will enable
  the use of PC as a diversifier in PAC branch protection code.

- When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination
  with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions
  (pacibsppc, retaasppc, etc) are used.

Documentation for the relevant instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/

Co-authored-by: Lucas Prates <lucas.prates@arm.com>
2023-12-21 14:18:33 +00:00
Paschalis Mpeis
2e3d77d6ed
[TLI] Pass replace-with-veclib works with Scalable Vectors. (#73642)
[TLI] Pass replace-with-veclib works with Scalable Vectors.

The pass is heavily refactored.
It uses the Masked variant of a TLI method when the Intrinsic operates on Scalable Vectors.

 Improve tests for ArmPL and SLEEF Intrinsics:
- Auto-generate test `armpl-intrinsics.ll`, and use active lane mask to have shorter `shufflevector` check lines.
- Update scripts now add `@llvm.compiler.used` instead of using the regex: `@[[LLVM_COMPILER_USED:[a-zA-Z0-9_$"\\.-]+]]`
-  Add simplifycfg pass and noalias to ensure tail folding. `noalias` attribute was added only to the `%in.ptr` parameter of the ArmPL Intrinsics.
2023-12-21 12:37:57 +00:00
zhongyunde 00443407
f568763641 [AArch64] Fold more load.x into load.i with large offset
The list of load.x is refer to canFoldIntoAddrMode on D152828.
Also support LDRSroX missed in canFoldIntoAddrMode
2023-12-21 18:54:15 +08:00
zhongyunde 00443407
32878c2065 [AArch64] merge index address with large offset into base address
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE
Fold
  mov     w8, #56952
  movk    w8, #15, lsl #16
  ldrb    w0, [x0, x8]
into
  add     x0, x0, 1036288
  ldrb    w0, [x0, 3704]

Only LDRBBroX is supported for the first time.
Fix https://github.com/llvm/llvm-project/issues/71917
2023-12-21 18:54:14 +08:00
zhongyunde 00443407
4bad0cb359 [AArch64] Precommit tests for PR75343, NFC 2023-12-21 18:54:14 +08:00
David Green
c0931d4950 [AArch64][GlobalISel] Lower scalarizing G_UNMERGE_VALUES to G_EXTRACT_VECTOR_ELT
This adds post-legalizing lowering of G_UNMERGE_VALUES which take a vector and
produce scalar values for each lane. They are converted to a G_EXTRACT_VECTOR_ELT
for each lane, allowing all the existing tablegen patterns to apply to them.

A couple of tablegen patterns need to be altered to make sure the type of the
constant operand is known, so that the patterns are recognized under global
isel.

Closes #75662
2023-12-21 09:22:23 +00:00
Hassnaa Hamdi
f3dcc0cba9
[LLVM][AArch64][tblgen]: Match clamp pattern (#75529)
Add isel pattern to replase min(max(v1,v2),v3) by clamp
Add tests for uclamp, sclamp, bfclamp, fclamp.
2023-12-20 14:36:58 +00:00
Rin
0894c2ee5f
[DAGCombiner] Avoid the pre-truncate of BUILD_VECTOR sources. (#75792)
Avoid the pre-truncate of BUILD_VECTOR sources when there is more than
one use. This can avoid using unnecessary movs later down the
instruction selection pipeline.
2023-12-19 15:25:38 +00:00
Antonio Frighetto
9aeb3336fd [AArch64] Ensure SplatBitSize conforms with the original lane width
A miscompilation issue has been addressed with improved checking.

Fixes: https://github.com/llvm/llvm-project/issues/75822.
2023-12-19 16:03:56 +01:00
Kerry McLaughlin
e9af57dfea
[Clang][SME2] Add builtins for moving multi-vectors to/from ZA (#71191)
Adds the following SME2 builtins:
 - svread_hor/ver,
 - svwrite_hor/ver,
 - svread_za64,
 - svwrite_za64

See https://github.com/ARM-software/acle/pull/217
2023-12-19 13:51:10 +00:00
Nathan Sidwell
d0285a31c8
aarch64: fix testcase (#75723)
Add missing < %s to RUN line.
2023-12-18 11:02:44 -05:00
Momchil Velikov
fd527def7e
[Clang][SVE2.1] Add floating-point variants of svrevd_XX (#75117) 2023-12-18 15:52:28 +00:00
Stefan Pintilie
c398fa009a Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit f4b5be1ecdc85ca4257b739afb8d57e23c7a8030.

The above change was breaking the clang-ppc64le-linux-test-suite bot.
2023-12-16 07:30:53 -06:00
chuongg3
70579c95bd
[AArch64][GlobalISel] Look into array's element (#74109)
In AArch64RegisterBankInfo, IsFPOrFPType() does not work correctly
with ArrayTypes and StructTypes as it does not not look at their
elements.

This caused some registers to be selected as gpr instead of fpr.
2023-12-15 10:46:57 +00:00
Matt Arsenault
f4b5be1ecd Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit 69c4930aad9659ec6ab846c8e7124d6afe044b1e.

See if this sticks after a few more coalescer assertions are fixed.
2023-12-15 10:51:47 +07:00
Jon Roelofs
b071b70317
[GlobalISel] Always direct-call IFuncs and Aliases (#74902)
This is safe because for both cases, the use must be in the same TU as the definition, and they cannot be forward declared.
2023-12-14 14:58:20 -07:00
Jon Roelofs
640c1d3dd1
[llvm] Support IFuncs on Darwin platforms (#73686)
... by lowering them as lazy resolve-on-first-use symbol resolvers. Note that this is subtly different timing than on ELF platforms, where ifunc resolution happens at load time.

Since ld64 and ld-prime don't support all the cases we need for these, we lower them manually in the AsmPrinter.
2023-12-14 14:40:52 -07:00
DianQK
7649d22306
[AArch64] ORRWrs is copy instruction when there's no implicit def of the X register (#75184)
Follows
https://github.com/llvm/llvm-project/pull/74682#issuecomment-1850268782.
Fixes #74680.
2023-12-14 19:19:55 +08:00
Simon Pilgrim
a0c7a29655 [GlobalISel] IRTranslator::translateGetElementPtr - don't assume a gep constant offset is representable as i64
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=65052
2023-12-14 11:02:38 +00:00
CarolineConcatto
f2464ca317
[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic (#69926)
This patch implements the builtins in Clang
and the LLVM-IR intrinsic for the following:

// Variants are also available for:
// _s8, _s16, _u16, _s32, _u32, _s64, _u64,
// _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64
uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t
pg, svuint8_t zn);

// Variants are also available for:
// _s8, _u16, _s16, _u32, _s32, _u64, _s64;
uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t
svminqv[_u8](svbool_t pg, svuint8_t zn);

// Variants are also available for _f32, _f64
float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t
svminnmqv[_f16](svbool_t pg, svfloat16_t zn);

According to the PR#257[1]

The reduction instruction uses scalable vectors as input and fixed
vectors as output, therefore we changed SVEEmitter to emit fixed vector
types in case the neon header(arm_neon.h) is not present.

[1]https://github.com/ARM-software/acle/pull/257

Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>
2023-12-13 15:45:59 +00:00
Momchil Velikov
d7ee99a4fc
[MachineSink] Clear kill flags of sunk addressing mode registers (#75072)
When doing sink-and-fold, the MachineSink clears the "killed" flags of
the operands of the sunk (and deleted) instruction. However, this is not
always sufficient. In some cases we can create the new load/store
instruction with operands other than the ones present in the deleted
instruction. One such example is folding a zero word extend into a
memory load on AArch64. The zero-extend is represented by a pair of
instructions - `MOV` (i.e. `ORRwrs`) followed by a `SUBREG_TO_REG`. The
`SUBREG_TO_REG` is deleted (it is the sunk instruction), but the new
load instruction mentions operands "killed" in the `MOV`, which is no
longer correct.

To fix this, clear the "killed" flags of the registers participating in
the addressing mode.
2023-12-13 09:15:28 +00:00
paperchalice
a930fec033
[CodeGen] Port InterleavedLoadCombine to new pass manager (#75164) 2023-12-13 12:46:22 +08:00
Tuan Chuong Goh
32532c2bbe [AArch64][GlobalISel] Test Pre-Commit for Look into array's element 2023-12-12 15:38:21 +00:00
Shreyansh Chouhan
5d12274646
[AArch64]: Added code for generating XAR instruction (#75085)
Fixes #61584
2023-12-12 05:48:45 +00:00
paperchalice
ce08c7ee1e
[CodeGen] Port SelectOptimize to new pass manager (#74920)
- Use `BlockFrequencyInfoWrapperPass` in legacy pass so member
`std::unique_ptr<BranchProbabilityInfo> BPI` could be removed.
- Member `DominatorTree *DT = nullptr` is unused, remove it.
2023-12-12 12:09:30 +08:00
Fangrui Song
072cea668e [test] Change llc -march to -mtriple
Similar to d20190e68413634b87f0f9426312a0e9d8456d18
2023-12-11 15:42:12 -08:00
James Y Knight
876816ff18
[AArch64] Set MaxAtomicSizeInBitsSupported. (#74385)
This will result in larger atomic operations getting expanded to
`__atomic_*` libcalls via AtomicExpandPass, which matches what Clang
already does in the frontend.

Additionally, adjust some comments, and remove partial code dealing with
larger-than-128bit atomics, as it's now unreachable.

AArch64 always supports 128-bit atomics, so there's no conditionals
needed here. (Though: we really ought to require that a 128-bit load is
available, not just a cmpxchg, which would mean conditioning on LSE2.
But that's future work.)

The arm64-irtranslator.ll test was adjusted as it was using an i258 type
as a hack to avoid IR atomic lowering to test GlobalISel behavior. Pass
-mattr=+lse and use i32, instead, to accomplish that goal in a way that
continues to work.
2023-12-11 17:55:07 -05:00
Jonathan Thackray
f576cbe44e
[AArch64] Correctly mark Neoverse N2 as an Armv9.0a core (#75055)
Neoverse N2 was incorrectly marked as an Armv8.5a core. This has been
changed to an Armv9.0a core. However, crypto options are not enabled
by default for Armv9 cores, so -mcpu=neoverse-n2+crypto is required
to enable crypto for this core.

Neoverse N2 Technical Reference Manual:
   https://developer.arm.com/documentation/102099/0003/
2023-12-11 18:52:25 +00:00
Jay Foad
35ebd92d3d
[GlobalISel] Add G_PREFETCH (#74863) 2023-12-11 11:06:50 +00:00
Serge Pavlov
18959c46e3 [NFC] Modify test to use autogenerated assertions 2023-12-11 14:38:01 +07:00
Oskar Wirga
9930f3e298
[AArch64] Fix case of 0 dynamic alloc when stack probing (#74877)
I accidentally closed
https://github.com/llvm/llvm-project/pull/74806

If the dynamic allocation size is 0, then we will still probe the
current sp value despite not decrementing sp! This results in
overwriting stack data, in my case the stack canary.

The fix here is just to load the value of [sp] into xzr which is
essentially a no-op but still performs a read/probe of the new page.
2023-12-10 08:01:29 -05:00
David Green
e3720bbc08 [AArch64] Extend and cleanup vector icmp test cases. NFC 2023-12-07 18:39:33 +00:00
Simon Pilgrim
f1200ca7ac
[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values (#74304)
If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.
2023-12-07 14:56:12 +00:00
Sjoerd Meijer
3acbd38492
[AArch64] Optimise MOVI + CMGT to CMGE (#74499)
This fixes a regression that occured for a pattern of MOVI + CMGT
instructions, which can be optimised to CMGE. I.e., when the signed
greater than compare has -1 as an operand, we can rewrite that as a
compare greater equal than 0, which is what CMGE does.

Fixes #61836
2023-12-07 08:32:02 +00:00
Thurston Dang
69c4930aad Revert "Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG""
This reverts commit 1f283a60a4bb896fa2d37ce00a3018924be82b9f.
Reason: breaks MSan buildbot
(https://lab.llvm.org/buildbot/#/builders/74/builds/24077)
2023-12-06 19:27:21 +00:00
Matthew Devereau
8186e1500b
[SME2] Add LUTI2 and LUTI4 single Builtins and Intrinsics (#73304)
See https://github.com/ARM-software/acle/pull/217

Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-12-06 16:35:56 +00:00
Matt Arsenault
1f283a60a4 Reapply "RegisterCoalescer: Add implicit-def of super register when coalescing SUBREG_TO_REG"
This reverts commit 9e50c6e6b5741895f58f3e530004052844b6af9f. A few assertion and verifier
errors have been fixed in the coalescer and allocator, so hopefully this sticks this time.
2023-12-06 23:07:22 +07:00
Matt Arsenault
546a9ce80c
CodeGen: Fix bypassing legality checks for IMPLICIT_DEF rematerialization (#73934)
It's permitted to have extra implicit-def operands of the same main
register
after the main register def. If there are implicit operands, use the
standard
legality checks which verify the operand contents.

Depends #73933
2023-12-06 21:43:19 +07:00
Matthew Devereau
30faf19a88
[SME2] Add LUTI2 and LUTI4 double Builtins and Intrinsics (#73305)
See https://github.com/ARM-software/acle/pull/217

Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-12-06 14:35:11 +00:00
Matthew Devereau
6704d6aadd
[SME2] Add LUTI2 and LUTI4 quad Builtins and Intrinsics (#73317)
See https://github.com/ARM-software/acle/pull/217

Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
2023-12-06 10:08:04 +00:00
Pranav Taneja
41507fe595
[GISel] Combine (Scalarize) vector load followed by an element extract. 2023-12-06 11:23:23 +05:30
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Florian Hahn
58dcac3948
[AArch64] Check X16&X17 in prologue if the fn has an SwiftAsyncContext. (#73945)
StoreSwiftAsyncContext clobbers X16 & X17. Make sure they are available
in canUseAsPrologue, to avoid shrink wrapping moving the pseudo to a
place where X16 or X17 are live.
2023-12-05 11:41:40 +00:00
Matt Arsenault
74c00d4329
LiveRangeEdit: Clear all dead flags when rematerializing (#73933)
It's allowed to rematerialize instructions with implicit-defs of the
same register as the single explicit def. If this happened, it was only
clearing the dead flags on the one main result.
2023-12-05 10:40:13 +07:00