8305 Commits

Author SHA1 Message Date
Daniil Kovalev
da083e358e
[PAC][CodeGen][ELF][AArch64] Support signed GOT (#113811)
This re-applies #96164 after revert in #102434.

Support the following relocations and assembly operators:

- `R_AARCH64_AUTH_ADR_GOT_PAGE` (`:got_auth:` for `adrp`)
- `R_AARCH64_AUTH_LD64_GOT_LO12_NC` (`:got_auth_lo12:` for `ldr`)
- `R_AARCH64_AUTH_GOT_ADD_LO12_NC` (`:got_auth_lo12:` for `add`)

`LOADgotAUTH` pseudo-instruction is introduced which is later expanded to
actual instruction sequence like the following.

```
adrp x16, :got_auth:sym
add x16, x16, :got_auth_lo12:sym
ldr x0, [x16]
autia x0, x16
```

If a resign is requested, like below, `LOADgotPAC` pseudo is used, and GOT
load is lowered similarly to `LOADgotAUTH`.

```
@var = global i32 0
define ptr @resign_globalvar() {
  ret ptr ptrauth (ptr @var, i32 3, i64 43)
}
```

If FPAC bit is not set and auth instruction is emitted, a check+trap sequence
similar to one used for `AUT` pseudo is emitted to ensure auth success.

Both SelectionDAG and GlobalISel are suppported.
For FastISel, we fall back to SelectionDAG.

Tests starting with 'ptrauth-' have corresponding variants w/o this prefix.

See also specification
https://github.com/ARM-software/abi-aa/blob/main/pauthabielf64/pauthabielf64.rst#appendix-signed-got
2024-11-01 12:21:10 +03:00
Thorsten Schütt
8e3772744d
[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114470)
There are patterns for:
* {nxv2s32, s32, s64},
* {nxv4s16, s16, s64},
* {nxv2s16, s16, s64}
2024-11-01 06:10:26 +01:00
Thorsten Schütt
aa70d846b0
[GlobalISel][AArch64] Legalize G_SPLAT_VECTOR (#114006)
{nxv8s16, s16} fails to select.
{nxv16s8, s8} no patterns available.
2024-10-31 22:20:08 +01:00
Sander de Smalen
41448c1d07 [AArch64] NFC: Add RUN line for +sve2 for sve-intrinsics-perm-select.ll
The codegen for SVE and SVE2 may be different (e.g. for splice and ext).
A follow-up patch will improve codegen for EXT.
2024-10-31 14:46:00 +00:00
Benjamin Maxwell
89a8c71db6
[SDAG] Support expanding FSINCOS to vector library calls (#114039)
This shares most of its code with the scalar sincos expansion. It allows
expanding vector FSINCOS nodes to a library call from the specified
`-vector-library`. The upside of this is it will mean the vectorizer
only needs to handle the sincos intrinsic, which has no memory effects,
and this can handle lowering the intrinsic to a call that takes output
pointers.
2024-10-31 12:41:43 +00:00
SpencerAbson
0800351da4
[AArch64][SVE] Use INS when moving elements from bottom 128b of SVE type (#114034)
Moving elements from a scalable vector to a fixed-lengh vector should
use[ INS (vector, element)
](https://developer.arm.com/documentation/100069/0606/SIMD-Vector-Instructions/INS--vector--element-)
when we know that the extracted element is in the bottom 128-bits of the
scalable vector. This avoids inserting unecessary UMOV/FMOV
instructions.
2024-10-31 10:36:00 +00:00
dnsampaio
28d0718033
[DAGCombiner] Add combine avg from shifts (#113909)
This teaches dagcombiner to fold:
`(asr (add nsw x, y), 1) -> (avgfloors x, y)`
`(lsr (add nuw x, y), 1) -> (avgflooru x, y)`

as well the combine them to a ceil variant:
`(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` 
`(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)`

iff valid for the target.

Removes some of the ARM MVE patterns that are now dead code.
It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the
immediate splatting as before.
2024-10-31 10:57:27 +01:00
Thorsten Schütt
6effab990c
Revert "[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE" (#114353)
Reverts llvm/llvm-project#114310
2024-10-31 05:41:16 +01:00
Thorsten Schütt
6bf214b7c6
[GlobalISel][AArch64] Legalize G_INSERT_VECTOR_ELT for SVE (#114310)
There are patterns for:
* {nxv2s32, s32, s64},
* {nxv4s16, s16, s64},
* {nxv2s16, s16, s64}
2024-10-31 04:56:41 +01:00
Thorsten Schütt
b3bb6f18bb
[GlobalISel] Import samesign flag (#114267)
Credits: https://github.com/llvm/llvm-project/pull/111419

Fixes icmp-flags.mir

First attempt: https://github.com/llvm/llvm-project/pull/113090

Revert: https://github.com/llvm/llvm-project/pull/114256
2024-10-30 19:56:25 +01:00
Thorsten Schütt
4b028773b2
Revert "[GlobalISel] Import samesign flag" (#114256)
Reverts llvm/llvm-project#113090
2024-10-30 17:03:17 +01:00
Thorsten Schütt
72b115301d
[GlobalISel] Import samesign flag (#113090)
Credits: https://github.com/llvm/llvm-project/pull/111419
2024-10-30 16:34:01 +01:00
Sander de Smalen
602f43686c
[AArch64] Add patterns for constructive splice. (#113912)
SVE2 adds the constructive splice instruction, which takes a tuple.
Even though the register allocator must ensure that the tuple uses
consecutive registers for the tuple, it's likely to be more efficient
than using the destructive splice instruction when the first operand
is reused.
2024-10-30 13:17:31 +00:00
Akshat Oke
44d0e9522a
[CodeGen][NewPM] Port TailDuplicate pass to NPM (#113293) 2024-10-30 11:48:40 +05:30
David Green
83ae171722
[AArch64] Add ComputeNumSignBits for VASHR. (#113957)
As with a normal ISD::SRA node, they take the number of sign bits of the
incoming value and increase it by the shifted amount.
2024-10-29 21:02:32 +00:00
Benjamin Maxwell
c3260c65e8
[IR] Add llvm.sincos intrinsic (#109825)
This adds the `llvm.sincos` intrinsic, legalization, and lowering.

The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).

```
declare { float, float }          @llvm.sincos.f32(float  %Val)
declare { double, double }        @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 }    @llvm.sincos.f80(x86_fp80  %Val)
declare { fp128, fp128 }          @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 }  @llvm.sincos.ppcf128(ppc_fp128  %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float>  %Val)
```

The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
2024-10-29 10:52:20 +00:00
David Green
5a5b78a84e [AArch64][GlobalISel] Lower aarch64.neon.smull/umull intrinsics.
As with other nodes, we can convert these into G_UMULL and G_SMULL aarch64
instructions.
2024-10-28 18:51:10 +00:00
Simon Pilgrim
670512b5c3 [AArch64] Regenerate srem-lkk.ll to add missing asm comments
Reduces diff in #112588
2024-10-28 16:00:45 +00:00
Jack Styles
86f76c3b17
[AArch64][Libunwind] Add Support for FEAT_PAuthLR DWARF Instruction (#112171)
As part of FEAT_PAuthLR, a new DWARF Frame Instruction was introduced,
`DW_CFA_AARCH64_negate_ra_state_with_pc`. This instructs Libunwind that
the PC has been used with the signing instruction. This change includes
three commits
- Libunwind support for the newly introduced DWARF Instruction
- CodeGen Support for the DWARF Instructions
- Reversing the changes made in #96377. Due to
`DW_CFA_AARCH64_negate_ra_state_with_pc`'s requirements to be placed
immediately after the signing instruction, this would mean the CFI
Instruction location was not consistent with the generated location when
not using FEAT_PAuthLR. The commit reverses the changes and makes the
location consistent across the different branch protection options.
While this does have a code size effect, this is a negligible one.

For the ABI information, see here:
853286c7ab/aadwarf64/aadwarf64.rst (id23)
2024-10-28 08:22:38 +00:00
Serge Pavlov
819abe412d
[Test] Fix usage of constrained intrinsics (#113523)
Some tests contain errors in constrained intrinsic usage, such as missed
or extra type parameters, wrong type parameters order and some other.

---------

Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>
2024-10-28 14:07:32 +07:00
Thorsten Schütt
7b3da7b3b2
[GlobalISel][AArch64] Legalize G_ADD, G_SUB, G_AND, G_OR, and G_XOR for SVE (#110561)
Credits: https://github.com/llvm/llvm-project/pull/72976

LLVM ERROR: cannot select: %3:zpr(<vscale x 2 x s64>) = G_MUL %0:fpr,
%1:fpr (in function: xmulnxv2i64)

;; mul
define void @xmulnxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b,
ptr %p) {
entry:
  %c = mul <vscale x 2 x i64> %a, %b
  store <vscale x 2 x i64> %c, ptr %p, align 16
   ret void
}

define void @mulnxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b,
ptr %p) {
entry:
  %c = mul <vscale x 4 x i32> %a, %b
  store <vscale x 4 x i32> %c, ptr %p, align 16
   ret void
}

define void @mulnxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b,
ptr %p) {
entry:
  %c = mul <vscale x 8 x i16> %a, %b
  store <vscale x 8 x i16> %c, ptr %p, align 16
  ret void
}

define void @mulnxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b,
ptr %p) {
entry:
  %c = mul <vscale x 16 x i8> %a, %b
  store <vscale x 16 x i8> %c, ptr %p, align 16
  ret void
}
2024-10-27 23:14:07 +01:00
Gaëtan Bossu
a0c318938a
[CodeGen][NFC] Properly split MachineLICM and EarlyMachineLICM (#113573)
Both are based on MachineLICMBase, and the functionality there is
"switched" based on a PreRegAlloc flag. This commit is simply about
trusting the original value of that flag, defined by the `MachineLICM`
and `EarlyMachineLICM` classes.

The `PreRegAlloc` flag used to be overwritten it based on MRI.isSSA(),
which is un-reliable due to how it is inferred by the MIRParser. I see
that we can now define isSSA in MIR (thanks @gargaroff ), meaning the
fix isn’t really needed anymore, but redefining that flag still feels
wrong.

Note that I'm looking into upstreaming more changes to MachineLICM, see
[the discourse
thread](https://discourse.llvm.org/t/extending-post-regalloc-machinelicm/82725).
2024-10-25 11:19:22 -07:00
Tex Riddell
c03d09ce3e
[aarch64] atan2 intrinsic lowering (p5) (#112611)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- `VecFuncs.def`: define intrinsic to sleef/armpl mapping
- `LegalizerHelper.cpp`: add missing fewerElementsVector handling for
the new atan2 intrinsic
- `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering
like neon instructions
- `AArch64LegalizerInfo.cpp`: Legalize atan2.

Part 5 for Implement the atan2 HLSL Function #70096.
2024-10-24 17:53:12 -07:00
Sander de Smalen
db0e376044
[AArch64] Fix failure with inline asm and svcount (#112537)
This fixes an issue where the compiler runs into an assertion failure
for the following example:

  register svcount_t pred asm("pn8") = svptrue_c8();
  asm("ld1w { z0.s, z4.s, z8.s, z12.s }, %[pred]/z, [x0]\n"
    :
    : [pred] "Uph" (pred)
    : "memory", "cc");

Here the register constraint that ends up in the LLVM IR is "{pn8}", but
the code in `TargetRegisterInfo::getRegForInlineAsmConstraint` that
parses that string, follows a path where it queries a suitable register
class for this register (<=> PPRorPNR regclass), for which it then
chooses `nxv16i1` as a suitable type. These choices individually are
correct, but the combined result isn't, because the type should be
`aarch64svcount`.
This then results in issues later on in SelectionDAGBuilder.cpp in
CopyToReg because the type of the actual value and the computed type
from the constraint don't match.

This PR pre-empts this issue by parsing the predicate explicitly and
returning the correct register class.
2024-10-24 17:41:07 +01:00
Benjamin Maxwell
cd0373e029
[AArch64] Allow single-element vector FP converts with +sme2p2 (#112905)
Follow up to #112213 now that the +sme2p2 feature flag has landed. The
single-element vector variants of FCVTZS, FCVTZU, UCVTF, and SCVTF are
allowed in streaming SVE mode with +sme2p2.

Reference:
-
https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/FCVTZS--vector--integer---Floating-point-convert-to-signed-integer--rounding-toward-zero--vector--
-
https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/UCVTF--vector--integer---Unsigned-integer-convert-to-floating-point--vector--
-
https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/SCVTF--vector--integer---Signed-integer-convert-to-floating-point--vector--
2024-10-24 11:21:17 +01:00
SpencerAbson
629d9809ab
[LLVM][AArch64] Add assembly/disassembly for FTMOPA and BFTMOPA (#113230)
This patch adds assembly/disassembly for the following SME2p2
instructions (part of the 2024 AArch64 ISA update)

      -  BFTMOPA (widening) - FEAT_SME2p2
      -  BFTMOPA (non-widening)  - FEAT_SME2p2 & FEAT_SME_B16B16
      -  FTMOPA (4-way) - FEAT_SME2p2 & FEAT_SME_F8F32
      -  FTMOPA (2-way, 8-to-16) - FEAT_SME2p2 & FEAT_SME_F8F16
      -  FTMOPA (2-way, 16-to-32) - FEAT_SME2p2
      -  FTMOPA (non-widening, f16) - FEAT_SME2p2 & FEAT_SME_F16F16
      -  FTMOPA (non-widening, f32) - FEAT_SME2p2

- Add new ZPR_K register class and ZK register operand
- Introduce assembler extension tests for the new sme2p2 feature

In accordance with:
https://developer.arm.com/documentation/ddi0602/latest/
Co-authored-by: Marian Lukac marian.lukac@arm.com
2024-10-23 15:40:57 +01:00
Ricardo Jesus
8a9921f569
[AArch64] Use INDEX for constant Neon step vectors (#113424)
When compiling for an SVE target we can use INDEX to generate constant
fixed-length step vectors, e.g.:
```
uint32x4_t foo() {
  return (uint32x4_t){0, 1, 2, 3};
}
```
Currently:
```
foo():
        adrp    x8, .LCPI1_0
        ldr     q0, [x8, :lo12:.LCPI1_0]
        ret
```
With INDEX:
```
foo():
        index   z0.s, #0, #1
        ret
```

The logic for this was already in `LowerBUILD_VECTOR`, though it was
hidden under a check for `!Subtarget->isNeonAvailable()`. This patch
refactors this to enable the corresponding code path unconditionally for
constant step vectors (as long as we can use SVE for them).
2024-10-23 15:20:33 +01:00
Vladimir Radosavljevic
401d123a1f
[MCP] Optimize copies when src is used during backward propagation (#111130)
Before this patch, redundant COPY couldn't be removed for the following
case:
```
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0
```
This patch adds support for tracking the users of the source register
during backward propagation, so that we can remove the redundant COPY in
the above case and optimize it to:
```
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1
```
2024-10-23 13:37:02 +02:00
Paul Walker
5bb34803a4 [NFC] Migrate tests to use autoupdate for CHECK lines. 2024-10-22 12:55:15 +00:00
Paul Walker
f1ade1f874
[LLVM][CodeGen][AArch64] while_le(#,max_int) -> all_active (#111183)
When the second operand of an incrementing while instruction is the
maximum value, comparisons that include equality can never fail.
2024-10-22 12:28:39 +01:00
James Chesterman
11c818816d
[AArch64] Improve index selection for histograms (#111150)
Removes unnecessary extends on the indices passed into histogram instructions. It also removes the instruction when the mask is zero.
2024-10-22 11:14:00 +01:00
Florian Mayer
23b18fa01e
[MTE] Do not allow local aliases to MTE globals (#106280)
With this change and appropriate linker changes
(https://r.android.com/3236256)
AOSP boots with memtag-global throughout the platform.

Without this change, we would sometimes generate PC-relative references
to tagged globals, which then do not have the proper tag.
2024-10-21 17:00:41 -07:00
David Green
009fb567ce [AArch64] Add patterns for combining qxtn+rshr to qrshrn
Similar to bd861d0e690cfd05184d86, this adds some patterns for converting
signed and unsigned variants of rshr+qxtn to qrshrn.
2024-10-21 21:06:48 +01:00
Ellis Hoag
e6ada7162e
[regalloc][basic] Change spill weight for optsize funcs (#112960)
Change the spill weight calculations for `optsize` functions to remove
the block frequency multiplier. For those functions, we do not want to
consider the runtime cost of spilling, only the codesize cost.

I built a large app with the basic and greedy (default) register
allocator enabled.

| Regalloc Type | Uncompressed Size Delta | Compressed Size Delta |
| - | - | - |
| Basic | -303.8 KiB (-0.23%) | -232.0 KiB (-0.39%) |
| Greedy | 159.1 KiB (0.12%) | 130.1 KiB (0.22%) |

Since I only saw a size win with the basic register allocator, I decided
to only change the behavior for that type.
2024-10-21 11:10:50 -07:00
Spencer Abson
42ba452aa9 [NFC] Fix -WError for unused Encode/Decode ZK methods
Remove the unused functions and register classes from the change below
4679583181
2024-10-21 16:11:58 +00:00
SpencerAbson
4679583181
[LLVM][AArch64] Add register classes for Armv9.6 assembly (#111717)
Add new register classes/operands and their encoder/decoder behaviour
required for the new Armv9.6 instructions (see
https://developer.arm.com/documentation/109697/2024_09/Feature-descriptions/The-Armv9-6-architecture-extension).

This work is the basis ofthe 2024 Armv9.6 architecture update effort for
SME.

Co-authored-by: Caroline Concatto caroline.concatto@arm.com
Co-authored-by: Marian Lukac marian.lukac@arm.com
Co-authored-by: Momchil Velikov momchil.velikov@arm.com
2024-10-21 15:49:24 +01:00
David Green
bd861d0e69 [AArch64] Add some basic patterns for qshrn.
With the truncssat nodes these are relatively simple tablegen patterns to add.
The existing intrinsics are converted to shift+truncsat to they can lower using
the new patterns.

Fixes #112925.
2024-10-21 15:04:20 +01:00
Nikita Popov
e2074c60bb [AArch64] Use implicitTrunc in isBitfieldDstMask() (NFC)
This code intentionally discards the high bits, so set
implicitTrunc=true. This is currently NFC but will enable an
APInt assertion in the future.
2024-10-21 15:53:21 +02:00
Thorsten Schütt
10f6d01e3d
[GlobalISel][AArch64] Legalize G_EXTRACT_SUBVECTOR (#112946)
for future combines
2024-10-19 22:42:49 +02:00
Thorsten Schütt
d8b17f2fb6
[GlobalISel] Combine G_UNMERGE_VALUES with anyext and build vector (#112370)
G_UNMERGE_VALUES (G_ANYEXT (G_BUILD_VECTOR))

ag G_UNMERGE_VALUES llvm/test/CodeGen/AArch64/GlobalISel | grep ANYEXT 
[ANYEXT] is build vector or shuffle vector

Prior art:
https://reviews.llvm.org/D87117
https://reviews.llvm.org/D87166
https://reviews.llvm.org/D87174
https://reviews.llvm.org/D87427

; CHECK-NEXT: [[BUILD_VECTOR2:%[0-9]+]]:_(<8 x s8>) = G_BUILD_VECTOR 
[[C2]](s8), [[C2]](s8), [[C2]](s8), [[C2]](s8), [[DEF1]](s8),
[[DEF1]](s8), [[DEF1]](s8), [[DEF1]](s8)

; CHECK-NEXT: [[ANYEXT1:%[0-9]+]]:_(<8 x s16>) = G_ANYEXT
[[BUILD_VECTOR2]](<8 x s8>)

; CHECK-NEXT: [[UV10:%[0-9]+]]:_(<4 x s16>), [[UV11:%[0-9]+]]:_(<4 x
s16>) = G_UNMERGE_VALUES
              [[ANYEXT1]](<8 x s16>)

Test:
llvm/test/CodeGen/AArch64/GlobalISel/combine-unmerge.mir
2024-10-19 09:41:43 +02:00
David Green
7e87c2ae5d [AArch64] Add some qshrn test cases. NFC 2024-10-18 19:05:57 +01:00
Benjamin Maxwell
5f7502bf1f
[AArch64][SVE] Support lowering fixed-length BUILD_VECTORS to ZIPs (#111698)
This allows lowering fixed-length (non-constant) BUILD_VECTORS (<=
128-bit) to a chain of ZIP1 instructions when Neon is not available,
rather than using the default lowering, which is to spill to the stack
and reload.

For example,

```
t5: v4f32 = BUILD_VECTOR(t0, t1, t2, t3)
```

Becomes:

```
zip1 z0.s, z0.s, z1.s // z0 = t0,t1,...
zip1 z2.s, z2.s, z3.s // z2 = t2,t3,...
zip1 z0.d, z0.d, z2.d // z0 = t0,t1,t2,t3,...
```

When values are already in FRPs, this generally seems to lead to a more
compact output with less movement to/from the stack.
2024-10-18 10:19:22 +01:00
David Green
2f792f6e71
[AArch64][GlobalISel] Add some post-legalization cast combines. (#112509)
This helps clear up some of the legalization artefacts. Not all of the
cast_combines are added (notably select combines) as they currently have
questionable benefit in the test updates.
2024-10-18 09:57:25 +01:00
Daniil Kovalev
6bb63002fc
[PAC] Fix address discrimination for type info vtable pointers (#102199)
In #99726, `-fptrauth-type-info-vtable-pointer-discrimination` was
introduced, which is intended to enable type and address discrimination
for type_info vtable pointers. However, some codegen logic for actually
enabling address discrimination was missing. This patch addresses the
issue.

Fixes #101716
2024-10-18 08:58:26 +03:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
David Green
569ad7cf34 [AArch64][GlobalISel] Move UseOutlineAtomics to a bool check. NFC
Similar to #111287, this moves the UseOutlineAtomics legalization rules to a
boolean predicate as opposed to needing the be nested functions.

There appeared to be a pair of redundant customIfs for s128 sizes (assuming
only scalars are supported).
2024-10-16 19:26:57 +01:00
David Green
a3010c7791
[GlobalISel] Add boolean predicated legalization action methods. (#111287)
Under AArch64 it is common and will become more common to have operation
legalization rules dependant on a feature of the architecture. For
example HasFP16 or the newer CSSC integer min/max instructions, among
many others. With the current legalization rules this either means
adding a custom predicate based on the feature as in
`legalIf([=](const LegalityQuery &Query) { return HasFP16 && ...; }` or
splitting the legalization rules into pieces that place rules optionally
into them base on the features available.

This patch proposes an alternative where the existing routines like
legalFor(..) are provided a boolean predicate, which if false skips
adding the rule. It makes the rules cleaner and will hopefully allow
them to scale better as we add more features.

The SVE predicates for loads/stores I have changed to just be always
available. Scalable vectors without SVE have never been supported, but
it could also add a condition.
2024-10-16 14:43:28 +01:00
Sander de Smalen
11ed7f2d3c [AArch64] NFC: Regenerate aarch64-sve-asm.ll
It should use update_mir_test_checks.py instead of
update_llc_test_checks.py.
2024-10-16 12:38:55 +00:00
Lewis Crawford
f5f00764ab
[DAGCombiner] Fix check for extending loads (#112182)
Fix a check for extending loads in DAGCombiner,
where if the result type has more bits than the
loaded type it should count as an extending load.

All backends apart from AArch64 ignore this
ExtTy argument to shouldReduceLoadWidth, so this
change currently only impacts AArch64.
2024-10-16 13:23:46 +01:00
Benjamin Maxwell
4c28d21f6a
[AArch64] Avoid single-element vector fp converts in streaming[-compatible] functions (#112213)
The single-element vector variants of FCVTZS, FCVTZU, UCVTF, and SCVTF
are only supported in streaming[-compatible] functions with `+sme2p2`.

Reference:
-
https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/FCVTZS--vector--integer---Floating-point-convert-to-signed-integer--rounding-toward-zero--vector--
-
https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/UCVTF--vector--integer---Unsigned-integer-convert-to-floating-point--vector--
-
https://developer.arm.com/documentation/ddi0602/2024-09/SIMD-FP-Instructions/SCVTF--vector--integer---Signed-integer-convert-to-floating-point--vector--

Codegen will be improved in follow up patches.
2024-10-16 10:00:49 +01:00