355 Commits

Author SHA1 Message Date
Sander de Smalen
2ad1d77b17
[AArch64] Match constants in SelectSMETileSlice (#151494)
If the slice is a constant then it should try to use `WZR + <imm>`
addressing mode if the constant fits the range.
2025-08-11 10:19:26 +01:00
Paul Walker
20293ebd31
[LLVM][CodeGen][SME] Only emit strided loads in streaming mode. (#150445)
The selection code for aarch64_sve_ld[nt]1_pn_x{2,4} intrinsics gates
the use of strided load instructions behind the SME2 target feature.
However, the instructions are only available in streaming mode.
2025-07-30 11:41:46 +01:00
Peter Collingbourne
2197671109
AArch64: Relax x16/x17 constraint on AUT in certain cases.
On most operating systems, the x16 and x17 registers are not special,
so there is no benefit, and only a code size cost, to constraining AUT to
only using them. Therefore, adjust the backend to only use the AUT pseudo
(renamed AUTx16x17 for clarity) on Darwin platforms. All other platforms
use an unconstrained variant of the pseudo, AUTxMxN, for selection.

Reviewers: ahmedbougacha, kovdan01, atrosinenko

Reviewed By: atrosinenko

Pull Request: https://github.com/llvm/llvm-project/pull/132857
2025-07-09 13:46:44 -07:00
John Brawn
b53c1e4ee8
[AArch64] Add ISel for postindex ld1/st1 in big-endian (#144387)
When big-endian we need to use ld1/st1 for vector loads and stores so
that we get the elements in the correct order, but this prevents
postindex addressing from being used. Fix this by adding the appropriate
ISel patterns, plus the relevant changes in ISelLowering and
ISelDAGToDAG to cause postindex addressing to be used.
2025-06-18 16:16:52 +01:00
Rajveer Singh Bharadwaj
95bbaca6c1
[AArch64] Extend usage of XAR instruction for fixed-length operations (#139460) 2025-06-12 10:54:01 +05:30
Jonathan Thackray
62cae4ffcb
[AArch64] Fix a multitude of AArch64 typos (NFC) (#143370)
Fix a multitude of typos in the AArch64 codebase using the
https://github.com/crate-ci/typos Rust package.
2025-06-09 22:13:22 +01:00
David Green
32837f376f
[AArch64] Handle XAR with v1i64 operand types (#141754)
When converting ROTR(XOR(a, b)) to XAR(a, b), or ROTR(a, a) to XAR(a, zero)
we were not handling v1i64 types, meaning illegal copies get generated. This
addresses that by generating insert_subreg and extract_subreg for v1i64 to
keep the values with the correct types.

Fixes #141746
2025-05-29 10:22:24 +01:00
Benjamin Maxwell
7a8090c037
[AArch64] Remove unused ISD nodes (NFC) (#140706)
Part of #140472.
2025-05-21 10:55:07 +01:00
Rajveer Singh Bharadwaj
36bb17aa65
[AArch64] Utilize XAR for certain vector rotates (#137629)
Resolves #137162

For cases when there isn't any `XOR` in the transformation, replace with
a zero register.
2025-05-09 10:02:31 +01:00
Kazu Hirata
c30776ab9a
[AArch64] Use ArrayRef::slice (NFC) (#133862) 2025-04-01 07:28:18 -07:00
Ricardo Jesus
74f5a028cb
Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#129732)" (#130625)
The original patch from #129732 exposed a bug in `getMemVTFromNode`, which was returning incorrect types for fixed length vectors.
2025-03-19 08:25:37 +00:00
Ricardo Jesus
21610e3ecc
Revert "[AArch64][SVE] Improve fixed-length addressing modes." (#130263)
This reverts commit f01e760c08365426de95f02dc2c2dc670eb47352.
2025-03-07 09:35:55 +00:00
Ricardo Jesus
f01e760c08
[AArch64][SVE] Improve fixed-length addressing modes. (#129732)
When compiling VLS SVE, the compiler often replaces VL-based offsets
with immediate-based ones. This leads to a mismatch in the allowed
addressing modes due to SVE loads/stores generally expecting immediate
offsets relative to VL. For example, given:
```c

svfloat64_t foo(const double *x) {
  svbool_t pg = svptrue_b64();
  return svld1_f64(pg, x+svcntd());
}
```

When compiled with `-msve-vector-bits=128`, we currently generate:
```gas
foo:
        ptrue   p0.d
        mov     x8, #2
        ld1d    { z0.d }, p0/z, [x0, x8, lsl #3]
        ret
```

Instead, we could be generating:
```gas
foo:
        ldr     z0, [x0, #1, mul vl]
        ret
```

Likewise for other types, stores, and other VLS lengths.

This patch achieves the above by extending `SelectAddrModeIndexedSVE`
to let constants through when `vscale` is known.
2025-03-06 09:27:07 +00:00
David Tellenbach
0fe0968c93
[AArch64][FEAT_CMPBR] Codegen for Armv9.6-a compare-and-branch (#116465)
This patch adds codegen for all Arm9.6-a compare-and-branch
instructions, that operate on full w or x registers. The instruction
variants operating on half-words (cbh) and bytes (cbb) are added in a
subsequent patch.

Since CB doesn't use standard 4-bit Arm condition codes but a reduced
set of conditions, encoded in 3 bits, some conditions are expressed by
modifying operands, namely incrementing or decrementing immediate
operands and swapping register operands. To invert a CB instruction it's
therefore not enough to just modify the condition code which doesn't
play particularly well with how the backend is currently organized. We
therefore introduce a number of pseudos which operate on the standard
4-bit condition codes and lower them late during codegen.
2025-02-19 13:58:20 -08:00
Csanád Hajdú
4a00c84fbb
[AArch64] Allow register offset addressing mode for prefetch (#124534)
Previously instruction selection failed to generate PRFM instructions
with register offsets because `AArch64ISD::PREFETCH` is not a
`MemSDNode`.
2025-01-28 09:16:40 +00:00
Momchil Velikov
b2073fb9b9
[AArch64] Prefer SVE2.2 zeroing forms of certain instructions with an all-true predicate (#120595)
When the predicate of a destructive operation is known to be all-true,
for example

    fabs z0.s, p0/m, z1.s

then the entire output register is written and we can use a zeroing
(instead of a merging) form of the instruction, for example

    fabs z0.s, p0/z, z1.s

thus eliminate the dependency on the input-output destination register
without the need to insert a `movprfx`.

This patch complements (and in the case of
2b3266c170,
fixes a regression) the following:

7f4414b2a1
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (4/11)
(https://github.com/llvm/llvm-project/pull/116830)

2474cf7ad1
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (3/11)
(https://github.com/llvm/llvm-project/pull/116829)

6f285d3115
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (2/11)
(https://github.com/llvm/llvm-project/pull/116828)

2b3266c170
[AArch64] Generate zeroing forms of certain SVE2.2 instructions (1/11)
(https://github.com/llvm/llvm-project/pull/116259)
2024-12-24 10:18:48 +00:00
Craig Topper
104ad9258a
[SelectionDAG] Rename SDNode::uses() to users(). (#120499)
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.

I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
2024-12-18 20:09:33 -08:00
SpencerAbson
b0f06769e6
[AArch64] Implement intrinsics for SME FP8 F1CVT/F2CVT and BF1CVT/BF2CVT (#118027)
This patch implements the following intrinsics:

8-bit floating-point convert to half-precision or BFloat16 (in-order).
``` c
  // Variant is also available for: _bf16[_mf8]_x2
  svfloat16x2_t svcvt1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
  svfloat16x2_t svcvt2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```

In accordance with https://github.com/ARM-software/acle/pull/323.

Co-authored-by: Marin Lukac marian.lukac@arm.com
Co-authored-by: Caroline Concatto caroline.concatto@arm.com
2024-12-08 19:34:01 +00:00
SpencerAbson
e4ee970c4b
[AArch64] Implement intrinsics for F1CVTL/F2CVTL and BF1CVTL/BF2CVTL (#116959)
This patch implements the following intrinsics:

8-bit floating-point convert to deinterleaved half-precision or
BFloat16.
``` c
  // Variant is also available for: _bf16[_mf8]_x2
  svfloat16x2_t svcvtl1_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
  svfloat16x2_t svcvtl2_f16[_mf8]_x2_fpm(svmfloat8_t zn, fpm_t fpm) __arm_streaming;
```

Defined in https://github.com/ARM-software/acle/pull/323

Co-authored-by: Caroline Concatto caroline.concatto@arm.com
Co-authored-by: Marian Lukac marian.lukac@arm.com
2024-11-28 12:37:02 +00:00
Simon Pilgrim
d800ea7cb1
Adjust MSVC disabled optimization pragmas to be _MSC_VER only (#116704)
Alter the #ifdef values from #110986 and #115292 to use _MSC_VER instead of _WIN32 to stop the pragmas being used on gcc/mingw builds

Noticed by @mstorsjo
2024-11-21 13:33:13 +00:00
Craig Topper
ce0cc8e9eb [AArch64][VE][X86] Use getSignedTargetConstant. NFC 2024-11-18 13:12:23 -08:00
Kazu Hirata
4048c64306
[llvm] Remove redundant control flow statements (NFC) (#115831)
Identified with readability-redundant-control-flow.
2024-11-12 10:09:42 -08:00
Simon Pilgrim
9123dc6abf
[AArch64] AArch64ISelDAGToDAG.cpp - disable inlining on MSVC release builds (#115292)
Similar to #110986 - disabling inlining on MSVC release builds avoids an excessive build time issue affecting all recent versions of CL.EXE

Fixes #114425
2024-11-07 12:35:58 +00:00
Sander de Smalen
3098200fcc
[ISel] Propagate disjoint flag in ShrinkDemandedOp (#114560)
When trying to evaluate an expression in a narrower type, the
DAGCombine should propagate the disjoint flag, as it's equally
valid on the narrower expression.

This helps improve better use of addressing modes for some
Arm SME instructions, for example.
2024-11-03 19:42:04 +00:00
Nikita Popov
e2074c60bb [AArch64] Use implicitTrunc in isBitfieldDstMask() (NFC)
This code intentionally discards the high bits, so set
implicitTrunc=true. This is currently NFC but will enable an
APInt assertion in the future.
2024-10-21 15:53:21 +02:00
CarolineConcatto
c0e97c4dfc
[Clang][LLVM][AArch64] Add intrinsic for LUTI4 SME2 instruction (#97755) (#109953)
This patch was reverted because of a failing C test.
It now has being solved and can be merged into main again 

This patch adds these intrinsics:

// Variants are also available for: _s8
svuint8x4_t svluti4_zt_u8_x4(uint64_t zt0, svuint8x2_t zn)
__arm_streaming __arm_in("zt0");

according to PR#324[1]
[1]ARM-software/acle#324
2024-09-30 12:59:06 +01:00
Lukacma
02c138f8d1
[AArch64] Implement intrinsics for SME2 FSCALE (#100128)
This patch implements these intrinsics:

FSCALE SINGLE AND MULTI
``` 
  // Variants are also available for:
  // [_single_f32_x2], [_single_f64_x2],
  // [_single_f16_x4], [_single_f32_x4], [_single_f64_x4]
  svfloat16x2_t svscale[_single_f16_x2](svfloat16x2_t zd, svfloat16_t zm) __arm_streaming;

  // Variants are also available for:
  //  [_f32_x2], [_f64_x2],
  //  [_f16_x4], [_f32_x4], [_f64_x4]
  svfloat16x2_t svscale[_f16_x2](svfloat16x2_t zd, svfloat16x2_t zm) __arm_streaming

```
(cf. https://github.com/ARM-software/acle/pull/323)

Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
2024-09-25 14:34:00 +01:00
Caroline Concatto
02f46d7fb8 Revert "[Clang][LLVM][AArch64] Add intrinsic for LUTI4 SME2 instruction (#97755)"
Going to revert to Fix test in clang as it is failing

This reverts commit 445d8b2d10b2bb9a5f50e3fe0671045acd309a04.
2024-09-25 09:25:28 +00:00
CarolineConcatto
445d8b2d10
[Clang][LLVM][AArch64] Add intrinsic for LUTI4 SME2 instruction (#97755)
This patch adds these intrinsics:

// Variants are also available for: _s8
svuint8x4_t svluti4_zt_u8_x4(uint64_t zt0, svuint8x2_t zn)
__arm_streaming __arm_in("zt0");

according to PR#324[1]
[1]ARM-software/acle#324
2024-09-25 09:53:23 +01:00
Momchil Velikov
f7fa75b208
[AArch64] Implement intrinsics for SME2 FAMIN/FAMAX (#99063)
This patch implements these intrinsics:

``` c
  // Variants are also available for:
  //  [_f32_x2], [_f64_x2],
  //  [_f16_x4], [_f32_x4], [_f64_x4]
  svfloat16x2_t svamax[_f16_x2](svfloat16x2 zd, svfloat16x2_t zm) __arm_streaming;
  svfloat16x2_t svamin[_f16_x2](svfloat16x2 zd, svfloat16x2_t zm) __arm_streaming;
```

(cf. https://github.com/ARM-software/acle/pull/324)

Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
2024-09-04 15:29:32 +01:00
Craig Topper
69115cce29 [AArch64] Use SelectionDAG::getSignedConstant/getAllOnesConstant. 2024-08-17 18:02:49 -07:00
Kazu Hirata
fcf6dc3365
[AArch64] Construct SmallVector<SDValue> with ArrayRef (NFC) (#102713) 2024-08-09 21:40:59 -07:00
Paul Walker
1fbd7be58f
[LLVM][ISel][SVE] Remove redundant merging fp patterns. (#101351)
Since "vselect cond, (binop, x, y), x" became the canonical form the
equivalent PatFrags for "binop x, (vselect cond, y, 0)" are no longer
required.
2024-08-01 13:59:38 +01:00
Ahmed Bougacha
d7e8a7487c
[AArch64][PAC] Lower auth/resign into checked sequence. (#79024)
This introduces 3 hardening modes in the authentication step of
auth/resign lowering:
- unchecked, which uses the AUT instructions as-is
- poison, which detects authentication failure (using an XPAC+CMP
  sequence), explicitly yielding the XPAC result rather than the
  AUT result, to avoid leaking
- trap, which additionally traps on authentication failure,
  using BRK #0xC470 + key (IA C470, IB C471, DA C472, DB C473.)

Not all modes are necessarily useful in all contexts, and there
are more performant alternative lowerings in specific contexts
(e.g., when I/D TBI enablement is a target ABI guarantee.)
These will be implemented separately.

This is controlled by the `ptrauth-auth-traps` function attributes,
and can be overridden using `-aarch64-ptrauth-auth-checks=`.

This also adds the FPAC extension, which we haven't needed
before, to improve isel when we can rely on HW checking.
2024-07-22 21:28:01 -07:00
John Brawn
3a14ffbae3
[AArch64] Implement GCS ACLE intrinsics (#96903)
This adds intrinsics defined in ARM-software/acle#260

Doing this requires some changes to the GCS instruction definitions, as
these intrinsics make use of how some instructions don't modify the
input register when GCS is disabled, and they need to be correctly
marked with mayLoad/mayStore/hasSideEffects for instruction selection to
work.
2024-07-11 14:09:36 +01:00
CarolineConcatto
6859e5a169
[CLANG][LLVM][AArch64]Add SME2.1 intrinsics for MOVAZ array to vector (#88901)
According to the specification in
    ARM-software/acle#309 this adds the intrinsics

    Move and zero multiple ZA single-vector groups to vector registers

    // Variants are also available for _za8_u8, _za16_s16, _za16_u16,
    // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
    // _za64_s64, _za64_u64 and _za64_f64
    svint8x2_t svreadz_za8_s8_vg1x2(uint32_t slice)
    __arm_streaming __arm_inout("za");

    // Variants are also available for _za8_u8, _za16_s16, _za16_u16,
    // _za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32,
    // _za64_s64, _za64_u64 and _za64_f64
    svint8x4_t svreadz_za8_s8_vg1x4(uint32_t slice)
    __arm_streaming __arm_inout("za");
2024-07-01 08:23:16 +01:00
CarolineConcatto
c9fc960650
[CLANG][LLVM][AArch64]SME2.1 intrinsics for MOVAZ tile to 2/4 vectors (#88710)
According to the specification in
ARM-software/acle#309 this adds the intrinsics

// Variants are also available for _za8_u8, _za16_s16, _za16_u16, //
_za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64,
_za64_u64 and _za64_f64
svint8x2_t svreadz_hor_za8_s8_vg2(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16, //
_za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64,
_za64_u64 and _za64_f64
svint8x4_t svreadz_hor_za8_s8_vg4(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16, //
_za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64,
_za64_u64 and _za64_f64
svint8x2_t svreadz_ver_za8_s8_vg2(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");

// Variants are also available for _za8_u8, _za16_s16, _za16_u16, //
_za16_f16, _za16_bf16, _za32_s32, _za32_u32, _za32_f32, // _za64_s64,
_za64_u64 and _za64_f64
svint8x4_t svreadz_ver_za8_s8_vg4(uint64_t tile, uint32_t slice)
__arm_streaming __arm_inout("za");
2024-06-26 12:54:37 +01:00
Sander de Smalen
c436649313
[AArch64] Remove all instances of the 'hasSVEorSME' interfaces. (#96543)
I've not added any new tests for these, because the original conditions
were wrong (they did not consider streaming mode) and we have tests for
the positive cases.
2024-06-25 13:27:06 +01:00
paperchalice
7652a59407
Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149)
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
2024-06-04 08:10:58 +08:00
paperchalice
8917afaf0e
Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146)
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1

It still breaks EXPENSIVE_CHECKS build. Sorry.
2024-06-02 14:31:52 +08:00
paperchalice
d2cdc8ab45
[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567)
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
2024-06-02 09:12:33 +08:00
Lukacma
e93799f260
[SME] Add intrinsics for FCVT(wid.) and FCVTL (#93202)
According to the specification in
https://github.com/ARM-software/acle/pull/309 this adds the intrinsics
```
svfloat32x2_t svcvt_f32[_f16_x2](svfloat16_t zn) __arm_streaming;
svfloat32x2_t svcvtl_f32[_f16_x2](svfloat16_t zn) __arm_streaming;

```
These are available only if __ARM_FEATURE_SME_F16F16 is enabled.

---------

Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
2024-05-29 11:34:24 +01:00
Lukacma
78cc9cbba2
[AArch64][SME] Add intrinsics for multi-vector BFCLAMP (#93532)
According to the specification in
https://github.com/ARM-software/acle/pull/309 this adds the intrinsics

```
  svbfloat16x2_t svclamp[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;

  svbfloat16x4_t svclamp[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;
```
These are available only  if __ARM_FEATURE_SME_B16B16 is enabled.
2024-05-29 10:44:58 +01:00
Lukacma
d67200e084
Revert "[AArch64][SME] Add intrinsics for multi-vector BFCLAMP" (#93531)
Reverts llvm/llvm-project#88251
2024-05-28 12:07:49 +01:00
Lukacma
12271710da
[AArch64][SME] Add intrinsics for multi-vector BFCLAMP (#88251)
According to the specification in
https://github.com/ARM-software/acle/pull/309 this adds the intrinsics

```
  svbfloat16x2_t svclamp[_single_bf16_x2](svbfloat16x2_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;

  svbfloat16x4_t svclamp[_single_bf16_x4](svbfloat16x4_t zd, svbfloat16_t zn,
                                        svbfloat16_t zm)  __arm_streaming;
```
These are available only  if __ARM_FEATURE_SME_B16B16 is enabled.
2024-05-28 11:34:03 +01:00
Lukacma
90a469057e
Revert "[SME] Add intrinsics for FCVT(wid.) and FCVTL" (#93196)
Reverts llvm/llvm-project#90215
2024-05-23 15:13:14 +01:00
Lukacma
05c154f2bc
[SME] Add intrinsics for FCVT(wid.) and FCVTL (#90215)
According to the specification in
https://github.com/ARM-software/acle/pull/309 this adds the intrinsics

```
svfloat32x2_t svcvt_f32[_f16_x2](svfloat16_t zn) __arm_streaming;
svfloat32x2_t svcvtl_f32[_f16_x2](svfloat16_t zn) __arm_streaming;

```
These are available only  if  __ARM_FEATURE_SME_F16F16 is enabled.

---------

Co-authored-by: Caroline Concatto <caroline.concatto@arm.com>
2024-05-23 14:32:34 +01:00
Hassnaa Hamdi
f7392f40f3
[AArch64] Add intrinsics for bfloat16 min/max/minnm/maxnm (#90105)
According to specifications in
[ARM-software/acle/pull/309](https://github.com/ARM-software/acle/pull/309)
Add following intrinsics:

```
// svmax single,multi
svbfloat16x2_t svmax_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svmax_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svmax_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svmax_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```

```
// svmin single,multi
svbfloat16x2_t svmin_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svmin_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svmin_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svmin_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```

```
// svmaxnm single,multi
svbfloat16x2_t svmaxnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svmaxnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svmaxnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svmaxnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```

```
// svminnm single,multi
svbfloat16x2_t svminnm_single_bf16_x2(svbfloat16x2_t zdn, svbfloat16_t zm)
svbfloat16x4_t svminnm_single_bf16_x4(svbfloat16x4_t zdn, svbfloat16_t zm)
svbfloat16x2_t svminnm_bf16_x2(svbfloat16x2_t zdn, svbfloat16x2_t zm)
svbfloat16x4_t svminnm_bf16_x4(svbfloat16x4_t zdn, svbfloat16x4_t zm)
```
- Variations other than bfloat16 are already supported.
2024-05-16 13:56:02 +01:00
Paul Walker
d48ebac038
[LLVM][SVE][CodeGen] Fix incorrect isel for signed saturating instructions. (#88136)
The immediate forms of SQADD and SQSUB interpret their immediate
operand as unsigned and thus effectively only support positive
immediate operands.
    
The original code is only wrong for the i8 cases because they
previously accepted all values, however, the new patterns enable more
uses and this I've added tests for the larger element types as well.
2024-04-15 12:24:42 +01:00
Eli Friedman
c83f23d6ab
[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894)
The existing heuristics were assuming that every core behaves like an
Apple A7, where any extend/shift costs an extra micro-op... but in
reality, nothing else behaves like that.

On some older Cortex designs, shifts by 1 or 4 cost extra, but all other
shifts/extensions are free. On all other cores, as far as I can tell,
all shifts/extensions for integer loads are free (i.e. the same cost as
an unshifted load).

To reflect this, this patch:

- Enables aggressive folding of shifts into loads by default.

- Removes the old AddrLSLFast feature, since it applies to everything
except A7 (and even if you are explicitly targeting A7, we want to
assume extensions are free because the code will almost always run on a
newer core).

- Adds a new feature AddrLSLSlow14 that applies specifically to the
Cortex cores where shifts by 1 or 4 cost extra.

I didn't add support for AddrLSLSlow14 on the GlobalISel side because it
would require a bunch of refactoring to work correctly. Someone can pick
this up as a followup.
2024-04-04 11:25:44 -07:00