9481 Commits

Author SHA1 Message Date
Ricardo Jesus
565f707beb
[AArch64] Allow splitting bitmasks for EOR/ORR. (#150394)
This patch extends #149095 for EOR and ORR.

It uses a simple partition scheme to try to find two suitable disjoint
bitmasks that can be used with EOR/ORR to reconstruct the original mask.

Fixes: #148987.
2025-08-07 10:42:30 +01:00
Ties Stuij
b9e133d5b6
[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A320 (#152156)
With this new A320 in-order core, we follow adding the
FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520
(#132246), which reaps the same code generation benefits of preferring
fixed over scalable when the cost is equal.

So when we have:
```
void foo(float* a, float* b, float* dst, unsigned n) {
    for (unsigned i = 0; i < n; ++i)
        dst[i] = a[i] + b[i];
}
```

When compiling without the feature enabled, we get:
```
...
    ld1b    { z0.b }, p0/z, [x0, x10]
    ld1b    { z2.b }, p0/z, [x1, x10]
    add     x12, x0, x10
    ldr     z1, [x12, #1, mul vl]
    add     x12, x1, x10
    ldr     z3, [x12, #1, mul vl]
    fadd    z0.s, z2.s, z0.s
    add     x12, x2, x10
    fadd    z1.s, z3.s, z1.s
    dech    x11
    st1b    { z0.b }, p0, [x2, x10]
    incb    x10, all, mul #2
    str     z1, [x12, #1, mul vl]
...
```

When compiling with, we get:
```
...
  	ldp	    q0, q1, [x12, #-16]
	ldp	    q2, q3, [x11, #-16]
	subs	x13, x13, #8
	fadd	v0.4s, v2.4s, v0.4s
	fadd	v1.4s, v3.4s, v1.4s
	add	    x11, x11, #32
	add	    x12, x12, #32
	stp	    q0, q1, [x10, #-16]
	add	    x10, x10, #32

...
```
2025-08-07 09:48:09 +01:00
Nikita Popov
406d9b1dd6
[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.

Move this information to ArgFlags to make it directly available in all
relevant places.

I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
2025-08-07 09:12:40 +02:00
David Green
0491d8bda7
[AArch64] Treat single-vector ext as legal shuffle masks. (#151909)
We can generate ext from shuffles like <2, 3, 0, 1> from a single vector
source. Add handling to isShuffleMaskLegal to allow DAG combines to
optimize to it.
2025-08-06 20:54:11 +01:00
Jonathan Thackray
c3d24217bf
[AArch64][llvm] Fix disassembly of ldt{add,set,clr} instructions using xzr/wzr (#152292)
The current disassembly of `ldt{add,set,clr}` instructions when using
`xzr/wzr` is incorrect. The Armv9.6-A Memory Systems specification says:

```
  For each of LDT{ADD|SET|CLR}{L}, there is the corresponding STT{ADD|SET|CLR}{L}
  alias, for the case where the register selected by the Rt field is XZR or WZR
```
and:
```
  LDT{ADD|SET|CLR}{A}{L} is equivalent to LD{ADD|SET|CLR}{A}{L} except that: <..conditions..>
```

The Arm ARM specifies the preferred form of disassembly for these
aliases:
```
   STADD <Xs>, [<Xn|SP>]
   is equivalent to
   LDADD <Xs>, XZR, [<Xn|SP>]
   and is always the preferred disassembly.
```
(ref: DDI 0487L.b C6-2317)

This means that `sttadd` is the preferred disassembly for `ldtadd w0,
wzr, [x2]` when Rt is `xzr` or `wzr`.

This change also aligns llvm disassembly with GNU binutils, as shown by
the following examples:

llvm before this change:
```
% cat test.s
stadd w0, [sp]
sttadd w0, [sp]
ldadd w0, wzr, [sp]
ldtadd w0, wzr, [sp]

% llvm-mc-20 -triple aarch64 -mattr=+lse,+lsui test.s
        stadd   w0, [sp]
        ldtadd  w0, wzr, [sp]
        stadd   w0, [sp]
        ldtadd  w0, wzr, [sp]
```
llvm after this change:
```
% llvm-mc -triple aarch64 -mattr=+lse,+lsui test.s
        stadd   w0, [sp]
        sttadd  w0, [sp]
        stadd   w0, [sp]
        sttadd  w0, [sp]
```
GCC-15 test:
```
% gas test.s -march=armv8-a+lsui+lse -o test.o
% objdump -dr test.o
   0:   b82003ff        stadd   w0, [sp]
   4:   192007ff        sttadd  w0, [sp]
   8:   b82003ff        stadd   w0, [sp]
   c:   192007ff        sttadd  w0, [sp]
```
Many thanks to Ezra Sitorus and Alice Carlotti for reporting and
confirming this issue.
2025-08-06 15:44:15 +01:00
Kazu Hirata
62fc0028bf
[Target] Remove unnecessary casts (NFC) (#152262)
value() already returns uint64_t.
2025-08-06 07:11:07 -07:00
Kazu Hirata
e9c510b151
[AArch64] Remove an unnecessary cast (NFC) (#152260)
Pred is already of CmpInst::Predicate.
2025-08-06 07:10:49 -07:00
Ricardo Jesus
d8f896172d
[AArch64] Improve lowering of scalar abs(sub(a, b)). (#151180)
This patch avoids a comparison against zero when lowering abs(sub(a, b))
patterns, instead reusing the condition codes generated by a subs of the
operands directly.

For example, currently:
```
  sxtb w8, w0
  sub w8, w8, w1, sxtb
  cmp w8, #0
  cneg w0, w8, mi
```
becomes:
```
  sxtb w8, w0
  subs w8, w8, w1, sxtb
  cneg w0, w8, mi
```

Together with #151177, this should handle the remaining patterns in
#118413.
2025-08-06 13:00:28 +01:00
Simon Pilgrim
9b7b382871 Fix MSVC truncation to char warning. NFC. 2025-08-06 11:39:48 +01:00
Ricardo Jesus
df34eaca59
[AArch64] Drop poison flags when lowering absolute difference patterns. (#152130)
As a follow-up to #151177, when lowering SELECT_CC nodes of absolute
difference patterns, drop poison-generating flags from the negated
operand to avoid inadvertently propagating poison.

As discussed in the PR above, I didn't find practical issues with the
current code, but it seems safer to do this preemptively.
2025-08-06 09:27:35 +01:00
Benjamin Maxwell
ff5fa711b3
[AArch64][SVE] Tweak how SVE CFI expressions are emitted (#151677)
The main change in this patch is we go from emitting the expression:

  @ cfa - NumBytes - NumScalableBytes * VG

To:

  @ cfa - VG * NumScalableBytes - NumBytes

That is, VG is the first expression. This is for a future patch that
adds an alternative way to resolve VG (which uses the CFA, so it is
convenient for the CFA to be at the top of the stack).

Since doing this is fairly churn-heavy, I took the opportunity to also
save up to 4-bytes per SVE CFI expression. This is done by folding
LEB128 constants to literals when in the range 0 to 31, and using the
offset in `DW_OP_breg*` expressions.
2025-08-06 09:21:57 +01:00
Matt Arsenault
342bf58f93
RuntimeLibcalls: Add entries for __security_check_cookie (#151843)
Avoids hardcoding string name based on target, and gets
the entry in the centralized list of emitted calls.
2025-08-06 10:26:36 +09:00
Daniel Paoliello
07da480614
[win][arm64ec] More fixes for building and testing Arm64EC Windows (#151409)
* `tools/llvm-objcopy/MachO/update-section-object.test` was failing on
Windows since the input file (`macho_sections.s`) might be checked out
with the wrong line ending, resulting in difference in the size of
sections being checked.
* Removed the check for Windows in `AArch64Arm64ECCallLowering`: when
`llc` is run without an explicit target, the module's target triple is
unknown so this assert fires.
* Expect `llvm/test/CodeGen/Generic/allow-check.ll` to fail for Arm64EC:
Global ISel is not supported.
2025-08-05 14:54:08 -07:00
Oliver Stannard
f6c2a357e7
[AArch64] Add Apple assembly syntax for recent instructions (#152111)
Some vector instructions override AsmString in the tablegen description,
but did not include the Apple syntax variant, so were printed without
operands.

Fixes #151330
2025-08-05 16:04:25 +01:00
Paul Walker
94d374ab6c
[LLVM][CGP] Allow finer control for sinking compares. (#151366)
Compare sinking is selectable based on the result of
hasMultipleConditionRegisters. This function is too coarse grained by
not taking into account the differences between scalar and vector
compares. This PR extends the interface to take an EVT to allow finer
control.
    
The new interface is used by AArch64 to disable sinking of scalable
vector compares, but with isProfitableToSinkOperands updated to maintain
the cases that are specifically tested.
2025-08-05 11:43:41 +01:00
Daniel Paoliello
717e753d1e
[win][arm64ec] Handle empty function names (#151609)
While testing Arm64EC, I observed that LLVM crashes when an empty
function name is used. My original fix in #151409 was to raise an error,
but this change now handles the empty name via
`Mangler::getNameWithPrefix` (which assigns a name to the function).

To get this working, I had to create the `Mangler` in
`TargetLoweringObjectFile` early so it would be available to Arm64EC's
lowering. There's no reason why `Mangler` is only created when
`Initialize` is called (or re-created if it exists), and so I moved
creation to the constructor and switched the raw pointer for a
`unique_ptr` to avoid the explicit `delete` in the destructor.
2025-08-04 16:20:55 -07:00
David Green
b30d5315b7
[AArch64] Add better fcmp costs for expanded predicates (#147940)
Certain fcmp predicates need to be expanded into multiple operations and
or'd together. This adds some more accurate cost modelling for them
based on the predicate. Unsupported operations are given the cost of a
libcall and the latency is set to 2 as that seemed to be fairly common
between different CPUs.
2025-08-04 13:42:57 +01:00
David Green
e136fb04f2
[AArch64] Add sve bf16 fpext and fpround costs. (#150485)
This prevents them from generating Invalid costs, as generating the
instructions seems to work fine with and without +bf16. The costs are
mostly taken from the number of instructions (minus ptrue and constants).
2025-08-04 09:47:41 +01:00
Fangrui Song
5d5ce06cae MCSymbolELF: Migrate away from classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 16:21:19 -07:00
Fangrui Song
e640ca8b9a MCSymbolELF: Migrate away from classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 15:45:36 -07:00
Fangrui Song
5570ce5cef MCSymbolELF: Migrate away from classof
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
2025-08-03 15:17:13 -07:00
Fangrui Song
d3589edafc MCAsmBackend::applyFixup: Change Data to indicate the relocated location
`Data` now references the first byte of the fixup offset within the current fragment.

MCAssembler::layout asserts that the fixup offset is within either the
fixed-size content or the optional variable-size tail, as this is the
most the generic code can validate without knowing the target-specific
fixup size.

Many backends applyFixup assert
```
assert(Offset + Size <= F.getSize() && "Invalid fixup offset!");
```

This refactoring allows a subsequent change to move the fixed-size
content outside of MCSection::ContentStorage, fixing the
-fsanitize=pointer-overflow issue of #150846

Pull Request: https://github.com/llvm/llvm-project/pull/151724
2025-08-02 09:27:06 -07:00
AZero13
23022a4683
[SelectionDAG] Move sign pattern check from AArch64 and ARM to general SelectionDAG (#151736)
This works on all cases much like the XOR case above it in SelectionDAG.
2025-08-01 14:46:51 -07:00
Daniel Paoliello
c696ecddee
[win][arm64ec] Handle available_externally functions (#151610)
While testing Arm64EC, I observed that LLVM crashes when an
`available_externally` function is used as it tries to place it in a
COMDAT, which is not permitted by the verifier.

This the fix from #151409 plus a dedicated test.
2025-08-01 14:04:39 -07:00
Sergei Barannikov
e0fa6569c8
[AArch64] Add getCondCode() helper (NFC) (#150521)
Follow-up to #150313.
2025-08-01 16:27:35 +03:00
Sander de Smalen
76ce464073
[AArch64] Dont inline streaming fn into non-streaming caller (#150595)
Without this change, the following test would fail to compile
with `-march=armv8-a+sme`:

```
  void func1(const svuint32_t *in, svuint32_t *out) {
    [&]() __arm_streaming { *out = *in; }();
  }
```

But in general, it's probably better never to inline
streaming functions into non-streaming functions, because
they will have been marked as 'streaming' for a reason
by the user.
2025-08-01 09:05:19 +01:00
Fangrui Song
491c7bdd58 MCAsmBackend::applyFixup: Replace Data.getSize() with F.getSize()
to facilitate replacing `MutableArrayRef<char> Data` (fragment content)
with the relocated location. This is necessary to fix the
pointer-overflow sanitizer issue and reland #150846
2025-08-01 00:31:51 -07:00
John Brawn
9a9b8b7d1c
[AArch64] Allow unrolling of scalar epilogue loops (#151164)
#147420 changed the unrolling preferences to permit unrolling of
non-auto vectorized loops by checking for the isvectorized attribute,
however when a loop is vectorized this attribute is put on both the
vector loop and the scalar epilogue, so this change prevented the scalar
epilogue from being unrolled.

Restore the previous behaviour of unrolling the scalar epilogue by
checking both for the isvectorized attribute and vector instructions in
the loop.
2025-07-31 11:03:41 +01:00
AZero13
10b323b993
[AArch64][GISel] Signed comparison using CMN is safe when the subtraction is nsw (#150480)
#141993 but for GISel, though for LHS, we now do the inverse compare, something
that SelDAG does not do as of now but I will do in a future patch.
2025-07-31 07:48:13 +01:00
David Green
3313cf4a83
[AArch64][GlobalISel] Add push_mul_through_s/zext (#141551)
This extends the existing push_add_through_zext to handle mul, similar
to performVectorExtCombine in SDAG. This allows muls to be pushed up the
tree of extends, operating on smaller vector types whilst keeping the
result the same (providing there are > 2x bits in the output).

matchExtAddvToUdotAddv needs to be adjusted to make sure it keeps
generating dot instructions from add(ext(mul(ext, ext))).
2025-07-31 07:38:11 +01:00
Prabhu Rajasekaran
17ccb849f3
[llvm] Extract and propagate callee_type metadata
Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds
from callee_type metadata attached to indirect call instructions.

Reviewers: nikic, ilovepi

Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87575
2025-07-30 14:56:39 -07:00
Paul Walker
3a4d506cb0 [LLVM][CodeGen][AArch64] Prevent invalid extract_elt within combineStoreValueFPToInt.
This reverts a small part of https://github.com/llvm/llvm-project/pull/147707
that triggers an isel failure because we cannot extract an >i32 element
into an i64 result.
2025-07-30 17:54:15 +00:00
Guy David
58d70dc62b
[AArch64] Keep floating-point conversion in SIMD (#147707)
Stores can be issued faster if the result is kept in the SIMD/FP
registers.
The `HasOneUse` guards against creating two floating point conversions,
if for example there's some arithmetic done on the converted value as
well. Another approach would be to inspect the user instructions during
lowering, but I don't see that type of check in the lowering too often.
2025-07-30 14:53:56 +03:00
Ricardo Jesus
4fdf07fd46
[AArch64] Use CNEG for absolute difference patterns. (#151177)
The current code generated for absolute difference patterns
(a > b ? a - b : b - a) typically consists of sequences of:
```
  sub w8, w1, w0
  subs w9, w0, w1
  csel w0, w9, w8, hi
```

The first sub is redundant if the csel is replaced by a cneg:
```
  subs w8, w0, w1
  cneg w0, w8, ls
```

This is achieved by canonicalising
```
  select_cc lhs, rhs, sub(lhs, rhs), sub(rhs, lhs), cc ->
  select_cc lhs, rhs, sub(lhs, rhs), neg(sub(lhs, rhs)), cc
  
  select_cc lhs, rhs, sub(rhs, lhs), sub(lhs, rhs), cc ->
  select_cc lhs, rhs, neg(sub(lhs, rhs)), sub(lhs, rhs), cc
```
as the second forms can already be matched.

This helps with some of the patterns in #118413.
2025-07-30 12:29:13 +01:00
Paul Walker
20293ebd31
[LLVM][CodeGen][SME] Only emit strided loads in streaming mode. (#150445)
The selection code for aarch64_sve_ld[nt]1_pn_x{2,4} intrinsics gates
the use of strided load instructions behind the SME2 target feature.
However, the instructions are only available in streaming mode.
2025-07-30 11:41:46 +01:00
David Green
4687a7647f [AArch64][GlobalISel] Lower udot/sdot intrinsics to G_UDOT/G_SDOT
This allows them to be selected using the same pathways as normal lowering.
USDOT is not handled yet as we do not yet have a node for it.
2025-07-30 11:02:30 +01:00
Ricardo Jesus
7a0024d694
[AArch64] Refactor AND/ANDS bitmask splitting (NFC). (#150619)
This patch generalises the logic for splitting bitmasks for AND/ANDS
immediate instructions, to prepare it to handle more opcodes, as in
#150394.
2025-07-30 09:28:52 +01:00
David Green
1f66724725 [AArch64] Create a performRNDRCombine to pull code out of PerformDAGCombine. NFC 2025-07-30 06:01:40 +01:00
paperchalice
adcad6adc9
[AArch64] Remove UnsafeFPMath (#150876)
We should always use fast math flags, remove these global flags
incrementally.
See also
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797
2025-07-29 12:41:09 +08:00
Fangrui Song
f517ac2083 MCSectionCOFF: Avoid cast
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
2025-07-26 10:04:04 -07:00
Fangrui Song
2571924ad6 MCSectionELF: Remove classof
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
2025-07-25 09:50:19 -07:00
Jonathan Cohen
81bbe98abf
Revert "[AArch64][Machine-Combiner] Split gather patterns into neon regs to multiple vectors (#142941)" (#150505)
Reverting due to reported miscompiles, will reland once it is fixed.
2025-07-25 14:09:23 +01:00
Anatoly Trosinenko
6e04e1e164
[AArch64][PAC] Introduce AArch64::PAC pseudo instruction (#146488)
Introduce a pseudo instruction carrying address and immediate modifiers
as separate operands to be selected instead of a pair of `MOVKXi` and
`PAC[ID][AB]` . The new pseudo instruction is expanded in AsmPrinter, so
that `MOVKXi` is emitted immediately before `PAC[ID][AB]`. This way, an
attacker cannot control the immediate modifier used to sign
the value, even if address modifier can be substituted.

To simplify the instruction selection, select `AArch64::PAC` pseudo
using TableGen pattern and post-process its `$AddrDisc` operand by
custom inserter hook - this eliminates duplication of the logic for
DAGISel and GlobalISel. Furthermore, this improves cross-BB analysis in
case of DAGISel.
2025-07-25 13:10:37 +03:00
David Green
8b54dbeefe
[AArch64] Ensure the type of LDNP/STNP is always v2i64 (#150378)
I think this is OK, that we always use v2i64 for the type of a LDNP/STNP
nodes. Bitcasting the type should be fine for little endian. This helps
with #150125.
2025-07-25 06:29:02 +01:00
Craig Topper
b82cf20bf4
[AArch64] Use getMergeValues to simplify code. NFC (#150337) 2025-07-24 12:14:35 -07:00
AZero13
34986003d1
[AArch64] Predicate should be NE for CBNZW (#150287)
Unable to reproduce yet, but this definitely seems wrong. Better safe
than sorry.

No effect on codegen as far as I know (because I have not been able to
repro).
2025-07-24 15:45:21 +01:00
Paul Walker
94aa08a3b0
[LLVM][CodeGen][SVE] Don't combine shifts at the expense of addressing modes. (#149873)
Fixes https://github.com/llvm/llvm-project/issues/149654
2025-07-24 10:54:48 +01:00
Sergei Barannikov
98562ffaaa
[AArch64] Fix the type of NZCV operands/results (#150313)
Consistently use `FlagsVT` for operands/results of nodes that
consume/produce NZCV flags.
Previously, some of the operands/results had incorrect `MVT::Glue` type
while others had `MVT_CC` type, which is supposed to be used for
condition codes (`AArch64CC::CondCode` enum).

Found by #150125.
2025-07-24 11:48:28 +03:00
Amina Chabane
b4edd827e4
[AArch64] Remove redundant FMOV for zero-extended i32/i16 loads to f64 (#146920)
Previously, a separate load, zext and FMOV instruction was emitted. This
patch adds a new TableGen pattern to avoid the unnecessary FMOV. A test
is included in test/CodeGen/AArch64/load_u64_from_u32.ll
2025-07-24 07:47:32 +01:00
Kazu Hirata
3e53d4d386
[llvm] Remove unused includes (NFC) (#150265)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-23 15:18:46 -07:00