9481 Commits

Author SHA1 Message Date
David Green
d9d71bdc14
[AArch64] Move BSL generation to lowering. (#151855)
It is generally better to allow the target independent combines before
creating AArch64 specific nodes (providing they don't mess it up). This
moves the generation of BSL nodes to lowering, not a combine, so that
intermediate nodes are more likely to be optimized. There is a small
change in the constant handling to detect legalized buildvector
arguments correctly.

Fixes #149380 but not directly. #151856 contained a direct fix for
expanding the pseudos.
2025-08-21 09:54:42 +01:00
David Green
bd94aabfb6
[AArch64][GlobalISel] Remove Selection code for s/uitofp. NFC (#154488)
These are already handled by tablegen patterns.
2025-08-20 20:52:42 +01:00
Benjamin Maxwell
478b4b012f
[AArch64][SME] Rework VG CFI information for streaming-mode changes (#152283)
This patch reworks how VG is handled around streaming mode changes.

Previously, for functions with streaming mode changes, we would:

- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` and `.cfi_restore vg` around streaming
  mode changes

Additionally, for locally streaming functions, we would:

- Also save the streaming VG in the prologue
- Emit `.cfi_offset vg, <incoming VG offset>` in the prologue
- Emit `.cfi_offset vg, <streaming VG offset>` and `.cfi_restore vg`
  around streaming mode changes

In both cases, this ends up doing more than necessary and would be hard
for an unwinder to parse, as using `.cfi_offset` in this way does not
follow the semantics of the underlying DWARF CFI opcodes.

So the new scheme in this patch is to:

In functions with streaming mode changes (inc locally streaming)

- Save the incoming VG in the prologue
- Emit `.cfi_offset vg, <offset>` in the prologue (not at streaming mode
  changes)
- Emit `.cfi_restore vg` after the saved VG has been deallocated 
- This will be in the function epilogue, where VG is always the same as
  the entry VG
- Explicitly reference the incoming VG expressions for SVE callee-saves
in functions with streaming mode changes
- Ensure the CFA is not described in terms of VG in functions with
  streaming mode changes

A more in-depth discussion of this scheme is available in:
https://gist.github.com/MacDue/b7a5c45d131d2440858165bfc903e97b

But the TLDR is that following this scheme, SME unwinding can be
implemented with minimal changes to existing unwinders. All unwinders
need to do is initialize VG to `CNTD` at the start of unwinding, then
everything else is handled by standard opcodes (which don't need changes
to handle VG).
2025-08-20 14:06:12 +01:00
Paul Walker
d6a688fb3d
[LLVM][CodeGen][SME] hasB16b16() is not sufficient to prove BFADD availability. (#154143)
The FEAT_SVE_B16B16 arithmetic instructions are only available to
streaming mode functions when SME2 is available.
2025-08-20 11:12:43 +01:00
SivanShani-Arm
460e9a8837
[LLVM][AArch64] Build attributes: Support switching to a defined subsection by name only (#154159)
The AArch64 build attribute specification now allows switching to an
already-defined subsection using its name alone, without repeating the
optionality and type parameters.

This patch updates the parser to support that behavior.

Spec reference: https://github.com/ARM-software/abi-aa/pull/230/files
2025-08-20 10:50:48 +01:00
jyli0116
9df7ca1f0f
[GlobalISel] Legalize Saturated Truncate instructions and intrinsics (#154340)
Adds legalization support for `G_TRUNC_SSAT_S`, `G_TRUNC_SSAT_S`,
`G_TRUNC_USAT_U` instructions for GlobalISel.
2025-08-20 10:37:22 +01:00
Kerry McLaughlin
c34cba0413
[AArch64][SME] Lower aarch64.sme.cnts* to vscale when in streaming mode (#154305)
In streaming mode, both the @llvm.aarch64.sme.cnts and @llvm.aarch64.sve.cnt
intrinsics are equivalent. For SVE, cnt* is lowered in instCombineIntrinsic
to @llvm.sme.vscale(). This patch lowers the SME intrinsic similarly when
in streaming-mode.
2025-08-20 09:48:36 +01:00
David Tellenbach
0542355147
[AArch64] Fix zero-register copying with zero-cycle moves (#154362)
Fix incorrect super-register lookup when copying from $wzr on subtargets
that lack zero-cycle zeroing but support 64-bit zero-cycle moves.

When copying from $wzr, we used the wrong register class to lookup the
super-register, causing

    $w0 = COPY $wzr

to get expanded as

    $x0 = ORRXrr $xzr, undef $noreg, implicit $wzr,

rather than the correct

    $x0 = ORRXrr $xzr, undef $xzr, implicit $wzr.
2025-08-19 21:07:16 +02:00
Tomer Shafir
ffddf33beb
[AArch64] Remove wrong processor feature (#151289)
`fmov dX, dY` is not a preferred instruction.

Previously introduced by:
https://github.com/llvm/llvm-project/pull/144152
2025-08-19 21:20:09 +03:00
Jie Fu
81f1b46cc6 [AArch64] Silent an unused-variable warning (NFC)
/llvm-project/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp:1042:11:
 error: unused variable 'TRI' [-Werror,-Wunused-variable]
    auto *TRI = MBB.getParent()->getSubtarget().getRegisterInfo();
          ^
1 error generated.
2025-08-19 20:32:33 +08:00
Benjamin Maxwell
7170a81241
[AArch64][SME] Rename EdgeBundles to Bundles (NFC) (#154295)
It seems some buildbots do not like the shadowing. See:
https://lab.llvm.org/buildbot/#/builders/137/builds/23838
2025-08-19 10:58:19 +01:00
Benjamin Maxwell
eb764040bc
[AArch64][SME] Implement the SME ABI (ZA state management) in Machine IR (#149062)
## Short Summary

This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI
for ZA state (e.g., lazy saves and agnostic ZA functions). This is
currently not enabled by default (but aims to be by LLVM 22). The goal
is for this new pass to more optimally place ZA saves/restores and to
work with exception handling.

## Long Description

This patch reimplements management of ZA state for functions with
private and shared ZA state. Agnostic ZA functions will be handled in a
later patch. For now, this is under the flag `-aarch64-new-sme-abi`,
however, we intend for this to replace the current SelectionDAG
implementation once complete.

The approach taken here is to mark instructions as needing ZA to be in a
specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly
defining or using ZA registers (such as $zt0 or $zab0) require the
"ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE"
state depending on the callee (having shared or private ZA).

We already add ZA register uses/definitions to machine instructions, so
no extra work is needed to mark these.

Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or
Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START.

These markers are then used by the MachineSMEABIPass to find
instructions where there is a transition between required ZA states.
These are the points we need to insert code to set up or restore a ZA
save (or initialize ZA).

To handle control flow between blocks (which may have different ZA state
requirements), we bundle the incoming and outgoing edges of blocks.
Bundles are formed by assigning each block an incoming and outgoing
bundle (initially, all blocks have their own two bundles). Bundles are
then combined by joining the outgoing bundle of a block with the
incoming bundle of all successors.

These bundles are then assigned a ZA state based on the blocks that
participate in the bundle. Blocks whose incoming edges are in a bundle
"vote" for a ZA state that matches the state required at the first
instruction in the block, and likewise, blocks whose outgoing edges are
in a bundle vote for the ZA state that matches the last instruction in
the block. The ZA state with the most votes is used, which aims to
minimize the number of state transitions.
2025-08-19 10:00:28 +01:00
David Sherwood
13d8ba7dea
[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086)
There are a couple of places in the loop vectoriser where we
want to calculate the cost of extracting the last lane in a
vector. However, we wrongly assume that asking for the cost
of extracting lane (VF.getKnownMinValue() - 1) is an accurate
representation of the cost of extracting the last lane. For
SVE at least, this is non-trivial as it requires the use of
whilelo and lastb instructions.

To solve this problem I have added a new
getReverseVectorInstrCost interface where the index is used
in reverse from the end of the vector. Suppose a vector has
a given ElementCount EC, the extracted/inserted lane would be
EC - 1 - Index. For scalable vectors this index is unknown at
compile time. I've added a AArch64 hook that better represents
the cost, and also a RISCV hook that maintains compatibility
with the behaviour prior to this PR.

I've also taken the liberty of adding support in vplan for
calculating the cost of VPInstruction::ExtractLastElement.
2025-08-19 09:31:37 +01:00
Kazu Hirata
5fdc7478a6
[AArch64] Replace SmallSet with SmallPtrSet (NFC) (#154264)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 30 instances that rely on this "redirection". Since the
redirection doesn't improve readability, this patch replaces SmallSet
with SmallPtrSet for pointer element types.

I'm planning to remove the redirection eventually.
2025-08-18 22:39:53 -07:00
AZero13
08a140add8
[AArch64] Fix build-bot assertion error in AArch64 (#154124)
Fixes build bot assertion.

I forgot to include logic that will be added in a future PR that handles
-1 correctly. For now, let's just return nullptr like we used to.
2025-08-18 15:12:07 +00:00
AZero13
0e52092ff7
[AArch64] Adjust comparison constant if adjusting it means less instructions (#151024)
Prefer constants that require less instructions to materialize, in both
Global-ISel and Selection-DAG
2025-08-18 14:56:45 +01:00
Benjamin Maxwell
81c06d198e
Reland "[AArch64][SME] Port all SME routines to RuntimeLibcalls" (#153417)
This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.
2025-08-18 14:53:40 +01:00
Jonathan Thackray
f38c83c582
[AArch64][llvm] Disassemble instructions in SYS alias encoding space more correctly (#153905)
For instructions in the `SYS` alias encoding space which take no
register operands, and where the unused 5 register bits are not all set
(0x31, 0b11111), then disassemble to a `SYS` alias and not the
instruction, since it is not considered valid.

This is because it is specified in the Arm ARM in text similar to this
(e.g. page C5-1037 of DDI0487L.b for `TLBI ALLE1`, or page C5-1585 for
`GCSPOPX`):
```
  Rt should be encoded as 0b11111. If the Rt field is not set to 0b11111,
  it is CONSTRAINED UNPREDICTABLE whether:
    * The instruction is UNDEFINED.
    * The instruction behaves as if the Rt field is set to 0b11111.
```

Since we want to follow "should" directives, and not encourage undefined
behaviour, only assemble or disassemble instructions considered valid.
Add an extra test-case for this, and all existing test-cases are
continuing to pass.
2025-08-18 14:41:41 +01:00
Jonathan Cohen
c6fe567064
[AArch64][MachineCombiner] Combine sequences of gather patterns (#152979)
Reland of #142941

Squashed with fixes for #150004, #149585

This pattern matches gather-like patterns where
values are loaded per lane into neon registers, and 
replaces it with loads into 2 separate registers, which
will be combined with a zip instruction. This decreases
the critical path length and improves Memory Level
Parallelism.

rdar://151851094
2025-08-18 15:10:59 +03:00
David Green
8f98529209 [AArch64] Remove SIMDLongThreeVectorTiedBHSabal tablegen class.
Similar to #152987 this removes SIMDLongThreeVectorTiedBHSabal as it is
equivalent to SIMDLongThreeVectorTiedBHS with a better TriOpFrag pattern.
2025-08-18 09:11:13 +01:00
Ahmad Yasin
1b0bce972b
Reorder checks to speed up getAppleRuntimeUnrollPreferences() (#154010)
- Delay load/store values calculation unless a best unroll-count is
found
- Remove extra getLoopLatch() invocation
2025-08-18 11:06:37 +03:00
Kazu Hirata
cbf5af9668
[llvm] Remove unused includes (NFC) (#154051)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-08-17 23:46:35 -07:00
David Green
732eb5427c
[AArch64] Replace SIMDLongThreeVectorBHSabd with SIMDLongThreeVectorBHS. (#152987)
We just need to use a BinOpFrag to share the patterns. This also moves
UABDL to where it belongs in with similar instructions, and removes some
patterns that are now handled by abd nodes. This is mostly NFC except
for GISel, which will catch back up when it handles abd nodes in the
same way.
2025-08-15 20:35:27 +01:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
David Green
144f3c4cbf
[AArch64] Adjust the scheduling info of SVE FCMP on Cortex-A510. (#153810)
According to the SWOG, these have a lower throughput than other
instructions. Mark them as taking multiple cycles to model that.
2025-08-15 15:45:33 +01:00
Gaëtan Bossu
9828745661
[AArch64][ISel] Select constructive EXT_ZZI pseudo instruction (#152554)
The patch adds patterns to select the EXT_ZZI_CONSTRUCTIVE pseudo
instead of the EXT_ZZI destructive instruction for vector_splice. This
only works when the two inputs to vector_splice are identical.

Given that registers aren't tied anymore, this gives the register
allocator more freedom and a lot of MOVs get replaced with MOVPRFX.

In some cases however, we could have just chosen the same input and
output register, but regalloc preferred not to. This means we end up
with some test cases now having more instructions: there is now a
MOVPRFX while no MOV was previously needed.
2025-08-15 14:30:24 +01:00
Gaëtan Bossu
fdd2d4df12
[AArch64] Define constructive EXT_ZZI pseudo instruction (#152552)
It will get expanded into MOVPRFX_ZZ and EXT_ZZI by the
AArch64ExpandPseudo pass. This instruction takes a single Z register as
input, as opposed to the existing destructive EXT_ZZI instruction.

Note this patch only defines the pseudo, it isn't used in any ISel
pattern yet. It will later be used for vector.extract.
2025-08-15 09:05:10 +01:00
David Green
5836bae463
[AArch64] Change the cost of fma and fmuladd to match fmul. (#152963)
As fmul and fmadd are so similar, their performance characteristics tend
to be the same on most platforms, at least in terms of reciprocal
throughputs. Processors capable of performing a given number of fmul per
cycle can usually perform the same number of fma, with the extra add
being relatively simple on top. This patch makes the scores of the two
operations the same, which brings the throughput cost of a fma/fmuladd
to 2, and the latency to 3, which are the defaults for fmul.

Note that we might also want to change the throughput cost of a fmul to
1, as most processors have ample bandwidth for them, but they should
still stay in-line with one another.
2025-08-14 21:53:45 +01:00
Elvis Wang
01fac67e2a
[TTI] Add cost kind to getAddressComputationCost(). NFC. (#153342)
This patch add cost kind to `getAddressComputationCost()` for #149955.

Note that this patch also remove all the default value in `getAddressComputationCost()`.
2025-08-14 16:01:44 +08:00
David Green
4c28bbf5b8 [AArch64] Fix ‘>= 0’ is always true warning. NFC 2025-08-14 08:17:10 +01:00
Nikita Popov
48beed5b71
Revert "[AArch64][SME] Port all SME routines to RuntimeLibcalls" (#153392)
This introduced a 5% compile-time regression on AArch64, see
https://llvm-compile-time-tracker.com/compare.php?from=b9138bde3562de5c28a239dbd303caf2406678c6&to=271688b87abe7cf45aceaff8266270a25eb7b436&stat=instructions:u.

Reverts llvm/llvm-project#152505.
2025-08-13 11:54:39 +00:00
Ahmad Yasin
1f2fb8e979
[AArch64] Tune unrolling prefs for more patterns on Apple CPUs (#149358)
Enhance the heuristics in `getAppleRuntimeUnrollPreferences` to let a
bit more loops to be unrolled.

Specifically, this patch adjusts two checks:
I. Tune the loop size budget from 8 to 10
II. Include immediate in-loop users of loaded values in the load/stores
dependencies predicate

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>

PR: https://github.com/llvm/llvm-project/pull/149358
2025-08-13 11:16:54 +01:00
Benjamin Maxwell
271688b87a
[AArch64][SME] Port all SME routines to RuntimeLibcalls (#152505)
This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.

Note: RuntimeLibcallEmitter had some issues with emitting non-unique
variable names for sets of libcalls, so I tweaked the output to avoid
the need for variables.
2025-08-13 08:48:59 +01:00
Trevor Gross
919021b0df
[Arm64EC] Add support for half (#152843)
`f16` is passed and returned in vector registers on both x86 on AArch64,
the same calling convention as `f32`, so it is a straightforward type to
support. The calling convention support already exists, added as part of
a6065f0fa55a ("Arm64EC entry/exit thunks, consolidated. (#79067)").
Thus, add mangling and remove the error in order to make `half` work.

MSVC does not yet support `_Float16`, so for now this will remain an
LLVM-only extension.

Fixes the `f16` portion of
https://github.com/llvm/llvm-project/issues/94434
2025-08-12 14:15:52 -07:00
Min-Yih Hsu
ca05058b49
[IA][RISCV] Recognize deinterleaved loads that could lower to strided segmented loads (#151612)
Turn the following deinterleaved load patterns
```
%l = masked.load(%ptr, /*mask=*/110110110110, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
%f2 = shufflevector %l, [2, 5, 8, 11]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1111)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
%f2 = poison
```
The mask `110110110110` is regarded as 'gap mask' since it effectively skips the entire third field / component.

Similarly,  turning the following snippet
```
%l = masked.load(%ptr, /*mask=*/110000110000, /*passthru=*/poison)
%f0 = shufflevector %l, [0, 3, 6, 9]
%f1 = shufflevector %l, [1, 4, 7, 10]
```
into
```
%s = riscv.vlsseg2(/*passthru=*/poison, %ptr, /*mask=*/1010)
%f0 = extractvalue %s, 0
%f1 = extractvalue %s, 1
```

Right now this patch only tries to detect gap mask from a constant mask supplied to a masked.load/vp.load.
2025-08-12 14:08:18 -07:00
Ricardo Jesus
ef5e65d27b
[AArch64] Fix stp kill when merging forward. (#152994)
As an alternative to #149177, iterate through all instructions in
`AArch64LoadStoreOptimizer`.
2025-08-12 14:19:43 +01:00
Sam Tebbs
0bfa1718af
[LV] Create in-loop sub reductions (#147026)
This PR allows the loop vectorizer to handle in-loop sub reductions by
forming a normal in-loop add reduction with a negated input.

Stacked PRs:
1. -> https://github.com/llvm/llvm-project/pull/147026
2. https://github.com/llvm/llvm-project/pull/147255
3. https://github.com/llvm/llvm-project/pull/147302
4. https://github.com/llvm/llvm-project/pull/147513
2025-08-12 10:22:41 +01:00
Benjamin Maxwell
d0c9599c41
[AArch64][SME] Use entry pstate.sm for conditional streaming-mode changes (#152169)
We only do conditional streaming mode changes in two cases:

- Around calls in streaming-compatible functions that don't have a
streaming body
- At the entry/exit of streaming-compatible functions with a streaming
body

In both cases, the condition depends on the entry pstate.sm value. Given
this, we don't need to emit calls to __arm_sme_state at every mode
change.

This patch handles this by placing a "AArch64ISD::ENTRY_PSTATE_SM" node
in the entry block and copying the result to a register. The register is
then used whenever we need to emit a conditional streaming mode change.
The "ENTRY_PSTATE_SM" node expands to a call to "__arm_sme_state" only
if (after SelectionDAG) the function is determined to have
streaming-mode changes.

This has two main advantages:

1. It allows back-to-back conditional smstart/stop pairs to be folded
2. It has the correct behaviour for EH landing pads
- These are entered with pstate.sm = 0, and should switch mode based on
the entry pstate.sm
   - Note: This is not fully implemented yet
2025-08-12 09:15:30 +01:00
Peter Collingbourne
a9227316bf MC: Introduce R_AARCH64_PATCHINST relocation type.
The R_AARCH64_PATCHINST relocation type is to support deactivation
symbols. For more information, see the RFC:
https://discourse.llvm.org/t/rfc-deactivation-symbols/85556

Part of the AArch64 psABI extension:
https://github.com/ARM-software/abi-aa/issues/340
2025-08-11 11:31:36 -07:00
Luke Lau
acb86fb9e0
[TTI] Consistently pass the pointer type to getAddressComputationCost. NFCI (#152657)
In some places we were passing the type of value being accessed, in
other cases we were passing the type of the pointer for the access.

The most "involved" user is
LoopVectorizationCostModel::getMemInstScalarizationCost, which is the
only call site that passes in the SCEV, and it passes along the pointer
type.

This changes call sites to consistently pass the pointer type, and
renames the arguments to clarify this.

No target actually checks the contents of the type passed, only to see
if it's a vector or not, so this shouldn't have an effect.
2025-08-11 18:00:12 +08:00
Sander de Smalen
2ad1d77b17
[AArch64] Match constants in SelectSMETileSlice (#151494)
If the slice is a constant then it should try to use `WZR + <imm>`
addressing mode if the constant fits the range.
2025-08-11 10:19:26 +01:00
Nikita Popov
e92b7e9641
[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)
It is common to have ABI requirements for illegal types: For example,
two i64 argument parts that originally came from an fp128 argument may
have a different call ABI than ones that came from a i128 argument.

The current calling convention lowering does not provide access to this
information, so backends come up with various hacks to support it (like
additional pre-analysis cached in CCState, or bypassing the default
logic entirely).

This PR adds the original IR type to InputArg/OutputArg and passes it
down to CCAssignFn. It is not actually used anywhere yet, this just does
the mechanical changes to thread through the new argument.
2025-08-11 08:57:53 +02:00
AZero13
e6b4daf48c
[AArch64] Support MI and PL (#150314)
Now, why would we want to do this?

There are a small number of places where this works:
1. It helps peepholeopt when less flag checking.
2. It allows the folding of things such as x - 0x80000000 < 0 to be
folded to cmp x, register holding this value
3. We can refine the other passes over time for this.
2025-08-11 07:41:38 +01:00
David Green
26b302fd8b [AArch64] Rename Cost -> PromotedCost to avoid shadowing error 2025-08-08 14:37:24 +01:00
David Green
7f1638efc1
[AArch64] Generalize costing for FP16 instructions (#150033)
This extracts the code for modelling a fp16 operation as
`fptrunc(fpop(fpext,fpext))` into a new function named
getFP16BF16PromoteCost so that it can be reused by the
arithmetic instructions. The function takes a lambda to
calculate the cost of the operation with the promoted type.
2025-08-08 13:40:07 +01:00
Graham Hunter
de72cca671
[CostModel] Provide a default model for histogram intrinsics (#149348)
Since we scalarize these intrinsics when the target does not support
them, we should model that for costing purposes.
2025-08-08 11:00:00 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Cullen Rhodes
e9d71efb83
[AArch64] Mark [usp]mull, [us]addl, [us]abdl as commutative (#152158)
Fixes #61461.
2025-08-08 09:35:28 +01:00
David Green
229ab5aa2b
[AArch64] Drop flags from BSP pseudos (#151856)
This prevents cases where some of the operands match from hitting
verifier errors with kill flags. These nodes should have been removed
earlier in most cases.

Fixes the direct issue from #149380. #151855 cleans up the codegen.
2025-08-08 07:47:56 +01:00
David Green
6f272d1ecf [AArch64] Move tryCombineToBSL. NFC
This is for #151855, to make the changes more obvious.
2025-08-07 16:45:58 +01:00