37822 Commits

Author SHA1 Message Date
Usha Gupta
d5a1f49827
[GISel] [NFC] Capitalize loop indices in GISelValueTracking.cpp for style consistency (#143113)
Following up on a comment on
https://github.com/llvm/llvm-project/pull/142355.
Updated other instances in the file as well.

@jayfoad
2025-06-06 23:14:50 +09:00
Florian Hahn
dde30a4731
[CGP] Bail out if (Base|Scaled)Reg does not dominate insert point. (#142949)
(Base|Scaled)Reg may not dominate the chosen insert point, if there are
multiple uses of the address. Bail out if that's the case, otherwise we
will generate invalid IR.

In some cases, we could probably adjust the insert point or hoist the
(Base|Scaled)Reg.

Fixes https://github.com/llvm/llvm-project/issues/142830.

PR: https://github.com/llvm/llvm-project/pull/142949
2025-06-06 12:38:30 +01:00
Benjamin Maxwell
c95bc41562
[AArch64][SDAG] Fix selection of extend of v1if16 SETCC (#140274)
There is a DAG combine, that folds:

```
t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = zero_extend t1
```

->

```
t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch
	t2: v1i64 = any_extend t1
```

This creates an issue on AArch64 when attempting to widen the result to
`v4i16`. The operand types (`v1f16`) are set to be scalarized, so the
"by hand" widening with `DAG.WidenVector` is used for them, however,
this only widens to the next power-of-2, so returns `v2f16`, which does
not match the result VF. The fix is to manually construct the widened
inputs using `INSERT_SUBVECTOR`.

Fixes #136540
2025-06-06 11:20:52 +01:00
Jay Foad
5f33b9d286
[MIRParser] Report register class errors in a deterministic order (#142928) 2025-06-06 10:03:34 +01:00
Guy David
4d4b7cc69e
[AArch64] Skip storing of stack arguments when lowering tail calls (#126735)
This issue starts in the selection DAG and causes the backend to emit
the following for a trivial tail call:
```
ldr w8, [sp]
str w8, [sp]
b func
```

I'm not too sure that checking for immutability of a specific stack
object is a good enough of a gurantee, because as soon a tail-call is
done lowering,`setHasTailCall()` is called and in that case perhaps a
pass is allowed to change the value of the object in-memory?

This can be extended to the ARM backend as well.
Removed the `tailcall` keyword from a few other test assets, I'm
assuming their original intent was left intact.
2025-06-06 11:26:24 +03:00
Matt Arsenault
b2266d6d79
RuntimeLibcalls: Rename fminimum_num/fmaximum_num enums (#143078)
Add the underscore to match the libm spelling
2025-06-06 16:23:26 +09:00
Kazu Hirata
34c011d544
[llvm] Use *Map::try_emplace (NFC) (#143002)
- try_emplace(Key) is shorter than insert(std::make_pair(Key, 0)).
- try_emplace performs value initialization without value parameters.
- We overwrite values on successful insertion anyway.
2025-06-05 16:14:31 -07:00
Stanley Gambarin
33974b41c7
[GlobalISel] support lowering of G_SHUFFLEVECTOR with pointer args (#141959) 2025-06-05 09:13:51 -07:00
Ryotaro Kasuga
ef60ee6005
[MachinePipeliner] Introduce a new class for loop-carried deps (#137663)
In MachinePipeliner, loop-carried memory dependencies are represented by
DAG, which makes things complicated and causes some necessary
dependencies to be missing. This patch introduces a new class to manage
loop-carried memory dependencies to simplify the logic. The ultimate
goal is to add currently missing dependencies, but this is a first step
of that, and this patch doesn't intend to change current behavior. This
patch also adds new tests that show the missed dependencies, which
should be fixed in the future.

Split off from #135148
2025-06-05 21:30:27 +09:00
Jeremy Morse
df4199c3a4
[DebugInfo] Use correct unit when creating variable across CU boundary (#133282)
When creating a static member DIE, we place it in a potentially
pre-existing context DIE, and that DIE might be located in a different
CU if we're in an LTO context. When we then add the source-file-ID to
the static member DIE, use the correct Unit to do so -- the one that
owns the context DIE. Otherwise we might assign a file-ID from one CU to
another, and there isn't a guarantee that they'll be the same file, or
even exist.

Fixes #109227

(I'd normally remove my home directory from these tests, but in this
circumstances the same-file-but-with-a-different-name nature of the
DIFile is part of the test).
2025-06-05 10:32:17 +01:00
Kazu Hirata
8b167db63a [CodeGen] Fix a warning
This patch fixes:

  llvm/lib/CodeGen/MacroFusion.cpp:65:12: error: unused variable
  'FirstCluster' [-Werror,-Wunused-variable]

  llvm/lib/CodeGen/MacroFusion.cpp:66:12: error: unused variable
  'SecondCluster' [-Werror,-Wunused-variable]
2025-06-05 01:18:34 -07:00
Ruiling, Song
0487db1f13
MachineScheduler: Improve instruction clustering (#137784)
The existing way of managing clustered nodes was done through adding
weak edges between the neighbouring cluster nodes, which is a sort of
ordered queue. And this will be later recorded as `NextClusterPred` or
`NextClusterSucc` in `ScheduleDAGMI`.

But actually the instruction may be picked not in the exact order of the
queue. For example, we have a queue of cluster nodes A B C. But during
scheduling, node B might be picked first, then it will be very likely
that we only cluster B and C for Top-Down scheduling (leaving A alone).

Another issue is:
```
   if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum)
      std::swap(SUa, SUb);
   if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster)))
```
may break the cluster queue.

For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3
2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2),
As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be
pred of 3. This makes both 1 and 2 become preds of 3, but there is no
edge between 1 and 2. Thus we get a broken cluster chain.

To fix both issues, we introduce an unordered set in the change. This
could help improve clustering in some hard case.

One key reason the change causes so many test check changes is: As the
cluster candidates are not ordered now, the candidates might be picked
in different order from before.

The most affected targets are: AMDGPU, AArch64, RISCV.

For RISCV, it seems to me most are just minor instruction reorder, don't
see obvious regression.

For AArch64, there were some combining of ldr into ldp being affected.
With two cases being regressed and two being improved. This has more
deeper reason that machine scheduler cannot cluster them well both
before and after the change, and the load combine algorithm later is
also not smart enough.

For AMDGPU, some cases have more v_dual instructions used while some are
regressed. It seems less critical. Seems like test `v_vselect_v32bf16`
gets more buffer_load being claused.
2025-06-05 15:28:04 +08:00
Acthink Yang
7263cd48e6
[LegalizeTypes][MSP430] Soften FAKE_USE operand (#142714)
Adds support for softening FAKE_USE operands.
Adds MSP430 tests that exercise the new softening code.

Fixes #137572
2025-06-05 10:53:57 +09:00
Nikita Popov
d74831efeb Revert "[SDAG] Fix fmaximum legalization errors (#142170)"
This reverts commit 58cc1675ec7b4aa5bc2dab56180cb7af1b23ade5.

I also made the incorrect assumption that we know both values are
+/-0.0 here as well. Revert for now.
2025-06-04 14:35:30 +02:00
Nikita Popov
42605b8aa3 Revert "[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732)"
This reverts commit 54da543a14da6dd0e594875241494949cb659b08.

I made a logic error here with the assumption that both values
are known to be +/-0.0.
2025-06-04 14:22:19 +02:00
Usha Gupta
cf348e886d
[GlobalISel] Add G_CONCAT_VECTOR handling in computeNumSignBits (#142355)
Code ported from SelectionDAG::ComputeNumSignBits
2025-06-04 11:11:18 +01:00
Nikita Popov
54da543a14
[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732)
When ordering signed zero, only check the sign of one of the values. We
already know at this point that both values must be +/-0.0, so it is
sufficient to check one of them to correctly order them.

For example, for fmaximum, if we know LHS is `+0.0` then we can always
select LHS, value of RHS does not matter. If LHS is `-0.0` we can always
select RHS, value of RHS doesn't matter.
2025-06-04 10:41:30 +02:00
Nikita Popov
b3ce9883f3
[SelectionDAG] Use reportFatalUsageError() for invalid operand bundles (#142613)
Replace the asserts with reportFatalUsageError(), as these can be
reached with invalid user-provided IR.

Fixes https://github.com/llvm/llvm-project/issues/142531.
2025-06-04 09:33:05 +02:00
YunQiang Su
bd831372b2
expandFMINIMUMNUM_FMAXIMUMNUM: Quiet is not needed for NaN vs NaN (#139237)
New LangRef doesn't requires quieting for NaN vs NaN, aka the result may
be sNaN for sNaN vs NaN.
See: https://github.com/llvm/llvm-project/pull/139228
2025-06-04 08:20:48 +08:00
Harrison Hao
0107c9333c
[DAG] canCreateUndefOrPoison – mark fneg/fadd/fsub/fmul/fdiv/frem as not poison generating (#142345)
After revisiting the LLVM Language Reference Manual, it is confirmed
that
plain floating-point operations (`fneg`, `fadd`, `fsub`, `fmul`, `fdiv`,
and `frem`)
propagate poison but do not inherently create new poison values. Thus, 
`SelectionDAG::canCreateUndefOrPoison` should return `false` for these 
operations by default.

Poison generation in FP instructions occurs only when specific fast-math
flags (`nnan`, `ninf`, or the collective fast) are present, as these
flags
explicitly convert NaN or Inf results into poison.

References:

- [`fneg` instruction
documentation](https://llvm.org/docs/LangRef.html#fneg-instruction)
- [`fadd` instruction
documentation](https://llvm.org/docs/LangRef.html#fadd-instruction)
- [`fsub` instruction
documentation](https://llvm.org/docs/LangRef.html#fsub-instruction)
- [`fmul` instruction
documentation](https://llvm.org/docs/LangRef.html#fmul-instruction)
- [`fdiv` instruction
documentation](https://llvm.org/docs/LangRef.html#fdiv-instruction)
- [`frem` instruction
documentation](https://llvm.org/docs/LangRef.html#frem-instruction)
- [Fast-Math Flags
documentation](https://llvm.org/docs/LangRef.html#fast-math-flags)
2025-06-03 19:21:40 +08:00
Luke Lau
9a2d4d176a
[SelectionDAG][AArch64] Legalize power of 2 vector.[de]interleaveN (#141513)
After https://github.com/llvm/llvm-project/pull/139893, we now have
[de]interleave intrinsics for factors 2-8 inclusive, with the plan to
eventually get the loop vectorizer to emit a single intrinsic for these
factors instead of recursively deinterleaving (to support scalable
non-power-of-2 factors and to remove the complexity in the interleaved
access pass).

AArch64 currently supports scalable interleaved groups of factors 2 and
4 from the loop vectorizer. For factor 4 this is currently emitted as a
series of recursive [de]interleaves, and normally converted to a target
intrinsic in the interleaved access pass.

However if for some reason the interleaved access pass doesn't catch it,
the [de]interleave4 intrinsic will need to be lowered by the backend.

This patch legalizes the node and any other power-of-2 factor to smaller
factors, so if a target can lower [de]interleave2 it should be able to
handle this without crashing.

Factor 3 will probably be more complicated to lower so I've left it out
for now. We can disable it in the AArch64 cost model when implementing
the loop vectorizer changes.
2025-06-03 12:05:44 +01:00
Matt Arsenault
742e84dc5d
SelectionDAG: Use unique_ptr for SwiftErrorValueTracking (#142532) 2025-06-03 19:15:03 +09:00
Simon Tatham
56acb06bc6
[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562)
In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or
more labels / basic blocks in the containing function which the assembly
code might jump to. If you're also compiling with branch target
enforcement via BTI, then previously listing a basic block as a possible
jump destination of an asm goto would cause a BTI instruction to be
placed at the start of the block, in case the assembly code used an
_indirect_ branch instruction (i.e. to a destination address read from a
register) to jump to that location. Now it doesn't do that any more:
branches to destination labels from the assembly code are assumed to be
direct branches (to a relative offset encoded in the instruction), which
don't require a BTI at their destination.

This change was proposed in https://discourse.llvm.org/t/85845 and there
seemed to be no disagreement. The rationale is:

1. it brings clang's handling of asm goto in Arm and AArch64 in line
with gcc's, which didn't generate BTIs at the target labels in the first
place.

2. it improves performance in the Linux kernel, which uses a lot of 'asm
goto' in which the assembly language just contains a NOP, and the
label's address is saved elsewhere to let the kernel self-modify at run
time to swap between the original NOP and a direct branch to the label.
This allows hot code paths to be instrumented for debugging, at only the
cost of a NOP when the instrumentation is turned off, instead of the
larger cost of an indirect branch. In this situation a BTI is
unnecessary (if the branch happens it's direct), and since the code
paths are hot, also a noticeable performance hit.

Implementation:

`SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target
labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each
target `MachineBasicBlock`. Previously it also called
`setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return
true, which caused a BTI to be added in the Arm backends.

Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more
on asm goto targets, but `hasAddressTaken()` also checks the flag set by
`setIsInlineAsmBrIndirectTarget()`. So call sites that were using
`hasAddressTaken()` don't need to be modified. But the Arm backends
don't call `hasAddressTaken()` any more: instead they test two more
specific query functions that cover all the reasons `hasAddressTaken()`
might have returned true _except_ being an asm goto target.

Testing:

The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual
change, where it expects not to see a `bti` instruction after
`[[LABEL]]`. The rest of the test changes are all churn, due to the
flags on basic blocks changing. Actual output code hasn't changed in any
of the existing tests, only comments and diagnostics.

Further work:

`RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp`
also call `hasAddressTaken()` in a way that might benefit from using the
same more specific check I've put in `ARMBranchTargets.cpp` and
`AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit
I've only changed the Arm backends, and left those alone.
2025-06-03 08:44:13 +01:00
mikael-nilsson-arm
09967917e7
[CodeGenPrepare] Fix signed overflow (#141487)
The signed addition could overflow which is undefined behavior, now the
code checks for it.
2025-06-03 09:27:25 +02:00
Pengcheng Wang
f393986b53
[MISched] Add templates for creating custom schedulers (#141935)
We rename `createGenericSchedLive` and `createGenericSchedPostRA`
to `createSchedLive` and `createSchedPostRA`, and add a template
parameter `Strategy` which is the generic implementation by default.

This can simplify some code for targets that have custom scheduler
strategy.
2025-06-03 11:37:40 +08:00
Kazu Hirata
54d836a080
[llvm] Use *Set::insert_range (NFC) (#138237) 2025-06-02 19:48:13 -07:00
Philip Reames
e723e15db1
[MCP] Handle iterative simplification during forward copy prop (#140267)
This is the follow up I mentioned doing in the review of 52b345d. That
change introduced an API for performing instruction simplifications
following copy propagation (e.g. things like recognizing ORI a0, a1,
zero is just a move). As noted in that review, we should be able to
perform iterative simplification as we move forward through the block,
but weren't because of the code structure.

The majority of this code is just deleting the special casing for
constant source and destination tracking, and merging the copy handling
with the main path. By assumption, the properties of copies (in terms of
register reads and writes), must be a subset of general instructions.

Once we do that, the iterative bit basically falls out from having the
tracking performed for copies which are recognized *after* we forward
prior uses.
2025-06-02 11:21:41 -07:00
Nikita Popov
58cc1675ec
[SDAG] Fix fmaximum legalization errors (#142170)
FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero
handling. This is problematic, because it assumes the equivalent integer
type is legal. Many targets have legal fp128, but illegal i128, so this
results in legalization failures.

Fix this by replacing IS_FPCLASS with checking the bitcast to integer
instead. In that case it is sufficient to use any legal integer type, as
we're just interested in the sign bit. This can be obtained via a stack
temporary cast. There is existing FloatSignAsInt functionality used for
legalization of FABS and similar we can use for this purpose.

Fixes https://github.com/llvm/llvm-project/issues/139380.
Fixes https://github.com/llvm/llvm-project/issues/139381.
Fixes https://github.com/llvm/llvm-project/issues/140445.
2025-06-02 10:14:33 +02:00
Jon Roelofs
798058fca5
[Remarks] Remove an upcast footgun. NFC (#142191)
CodeRegion's were previously passed as Value*, but then immediately
upcast to BasicBlock. Let's keep the type information around until the
use cases for non-BasicBlock code regions actually materialize.
2025-05-31 11:07:54 -07:00
Craig Topper
b4b3be7faa
[DAGCombiner] Teach SearchForAndLoads to handle an AND with 2 constant operands. (#142062)
If opaque constants are involved we can have an AND with 2 constant
operands that hasn't been simplified. If this is the case, we need
to modify at least one of the constants if it is out of range.
    
Fixes #142004
2025-05-30 16:00:43 -07:00
Craig Topper
c5a17e6bea
[DAGCombiner] Use APInt::isSubsetOf. NFC (#142029) 2025-05-30 09:01:36 -07:00
Aaron Puchert
73d6a48029
[WinEH] Track changes in WinEHPrepare pass (#134121)
Before this change, the pass would always claim to have changed IR if
there is a scope-based personality function. We add some plumbing to
track if there was an actual change.

This should be NFC, except that we might now preserve more analysis
passes.
2025-05-30 15:29:32 +02:00
Usha Gupta
7c996012ce
[GlobalISel] Add G_CONCAT_VECTOR computeKnownBits (#141933)
Code ported from SelectionDAG::computeKnownBits.
2025-05-30 10:44:59 +01:00
Nikita Popov
ea096c98ae
[SDAG] Remove noundef workaround for range metadata/attributes (#141745)
In https://reviews.llvm.org/D157685 I changed SDAG to only transfer
range metadata to SDAG if it also has !noundef. At the time, this was
necessary because SDAG incorrectly propagated poison when folding
logical and/or to bitwise and/or.

The root cause of that issue has since been addressed by
https://github.com/llvm/llvm-project/pull/84924, so drop the workaround
now.
2025-05-30 10:56:49 +02:00
Matt Arsenault
36b710a7e5
CodeGen: Convert some assorted errors to use reportFatalUsageError (#142031)
The test coverage is lacking for many of these errors.
2025-05-30 08:06:53 +02:00
Sebastian Kreutzer
6cb087a725
[XRay] Fix tail call sleds for AArch64 (#141403)
This addresses issue #141051.
XRay uses a special event kind for tail calls on some architectures.
This feature is implemented on AArch64, but wasn't fully activated.
Tests in `llvm/test/CodeGen/AArch64/xray-tail-call-sled.ll` were
incomplete and did not check for the emitted sled type.
This patch correctly enables emission of tail call sleds on AArch64 and
fixes the tests to check the sled kind.
2025-05-29 21:54:15 -07:00
Philip Reames
1651aa2943
[SDAG] Split the partial reduce legalize table by opcode [nfc] (#141970)
On it's own, this change should be non-functional. This is a preparatory
change for https://github.com/llvm/llvm-project/pull/141267 which adds a
new form of PARTIAL_REDUCE_*MLA. As noted in the discussion on that
review, AArch64 needs a different set of legal and custom types for the
PARTIAL_REDUCE_SUMLA variant than the currently existing
PARTIAL_REDUCE_UMLA/SMLA.
2025-05-29 14:05:31 -07:00
Nicholas Guy
a5d97ebe8b
[AArch64][SelectionDAG] Add type legalization for partial reduce wide adds (#141075)
Based on work initially done by @JamesChesterman.
2025-05-29 14:42:23 +01:00
Marius Kamp
10647685ca
[SDAG] Make Select-with-Identity-Fold More Flexible; NFC (#136554)
This change adds new parameters to the method
`shouldFoldSelectWithIdentityConstant()`. The method now takes the
opcode of the select node and the non-identity operand of the select
node. To gain access to the appropriate arguments, the call of
`shouldFoldSelectWithIdentityConstant()` is moved after all other checks
have been performed. Moreover, this change adjusts the precondition of
the fold so that it would work for `SELECT` nodes in addition to
`VSELECT` nodes.
    
No functional change is intended because all implementations of
`shouldFoldSelectWithIdentityConstant()` are adjusted such that they
restrict the fold to a `VSELECT` node; the same restriction as before.
    
The rationale of this change is to make more fine grained decisions
possible when to revert the InstCombine canonicalization of
`(select c (binop x y) y)` to `(binop (select c x idc) y)` in the
backends.
2025-05-29 09:46:39 +02:00
Justin Bogner
b7bb256703
Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)
This annotates the `Twine` passed to the constructors of the various
DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes
us to warn when we would try to print the twine after it had already
been destructed.

We also update `DiagnosticInfoUnsupported` to hold a `const Twine &`
like all of the other DiagnosticInfo classes, since this warning allows
us to clean up all of the places where it was being used incorrectly.
2025-05-28 12:26:39 -07:00
Luke Lau
6d88343662
[IA] Add support for [de]interleave{4,6,8} (#141512)
This teaches the interleaved access pass to the lower the intrinsics for
factors 4,6 and 8 added in #139893 to target intrinsics.

Because factors 4 and 8 could either have been recursively
[de]interleaved or have just been a single intrinsic, we need to check
that it's the former it before reshuffling around the values via
interleaveLeafValues.

After this patch, we can teach the loop vectorizer to emit a single
interleave intrinsic for factors 2 through to 8, and then we can remove
the recursive interleaving matching in interleaved access pass.
2025-05-28 11:44:41 +01:00
Fabian Ritter
8adcc8a669
[SelectionDAG] Introduce ISD::PTRADD (#140017)
This opcode represents the addition of a pointer value (first operand)
and an integer offset (second operand). PTRADD nodes are only generated
if the TargetMachine opts in by overriding
TargetMachine::shouldPreservePtrArith().

The PTRADD node and respective visitPTRADD() function were adapted by
@rgwott from the CHERI/Morello LLVM tree.
Original authors: @davidchisnall, @jrtc27, @arichardson.

The changes in this PR were extracted from PR #105669.

---------

Co-authored-by: David Chisnall <github@theravensnest.org>
Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com>
Co-authored-by: Alexander Richardson <alexrichardson@google.com>
Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
2025-05-28 09:09:17 +02:00
Ruiling, Song
3e47d8deba
MachineScheduler: Reset next cluster candidate for each node (#139513)
When a node is picked, we should reset its next cluster candidate to
null before releasing its successors/predecessors.
2025-05-28 14:53:46 +08:00
Peter Collingbourne
645f0e6723
IR: Make Module::getOrInsertGlobal() return a GlobalVariable.
After pointer element types were removed this function can only return
a GlobalVariable, so reflect that in the type and comments and clean
up callers.

Reviewers: nikic

Reviewed By: nikic

Pull Request: https://github.com/llvm/llvm-project/pull/141323
2025-05-27 12:23:12 -07:00
Kerry McLaughlin
b61144bf77
[AArch64] Allow lowering of more types to GET_ACTIVE_LANE_MASK (#140062)
Adds support for operand promotion and splitting/widening the result
of the ISD::GET_ACTIVE_LANE_MASK node.
For AArch64, shouldExpandGetActiveLaneMask now returns false for more
types which we know can be legalised.
2025-05-27 11:21:57 +01:00
Jon Roelofs
714096c132
[LLVM] Skip dumping inline SDag children (#141359)
If they're simple enough to render inline, we don't need to dump them
again in the recursive walk.
2025-05-26 19:40:01 -07:00
Kazu Hirata
89308de4b0
[llvm] Value-initialize values with *Map::try_emplace (NFC) (#141522)
try_emplace value-initializes values, so we do not need to pass
nullptr to try_emplace when the value types are raw pointers or
std::unique_ptr<T>.
2025-05-26 15:13:02 -07:00
Luke Lau
3033f202f6
[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893)
This adds [de]interleave intrinsics for factors of 4,6,8, so that every
interleaved memory operation supported by the in-tree targets can be
represented by a single intrinsic.

For context, [de]interleaves of fixed-length vectors are represented by
a series of shufflevectors. The intrinsics are needed for scalable
vectors, and we don't currently scalably vectorize all possible factors
of interleave groups supported by RISC-V/AArch64.

The underlying reason for this is that higher factors are currently
represented by interleaving multiple interleaves themselves, which made
sense at the time in the discussion in
https://github.com/llvm/llvm-project/pull/89018.

But after trying to integrate these for higher factors on RISC-V I think
we should revisit this design choice:

- Matching these in InterleavedAccessPass is non-trivial: We currently
only support factors that are a power of 2, and detecting this requires
a good chunk of code
- The shufflevector masks used for [de]interleaves of fixed-length
vectors are much easier to pattern match as they are strided patterns,
but for the intrinsics it's much more complicated to match as the
structure is a tree.
- Unlike shufflevectors, there's no optimisation that happens on
[de]interleave2 intriniscs
- For non-power-of-2 factors e.g. 6, there are multiple possible ways a
[de]interleave could be represented, see the discussion in #139373
- We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8
we're not really saving much

By representing these higher factors are interleaved-interleaves, we can
in theory support arbitrarily high interleave factors. However I'm not
sure this is actually needed in practice: SVE only has instructions
for factors 2,3,4, whilst RVV only supports up to factor 8.

This patch would make it much easier to support scalable interleaved
accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as
the loop vectorizer and InterleavedAccessPass wouldn't need to
construct and match trees of interleaves.

For interleave factors above 8, for which there are no hardware memory
operations to match in the InterleavedAccessPass, we can still keep the
wide load + recursive interleaving in the loop vectorizer.
2025-05-26 18:45:12 +01:00
Fangrui Song
a0901a2f87 Replace #include MCAsmLexer.h with AsmLexer.h
MCAsmLexer.h has been made a forwarder header since #134207
2025-05-25 11:57:29 -07:00
Jon Roelofs
346a72f2ca
[LLVM] Add color to SDNode ID's when dumping (#141295)
This is especially helpful for the recursive 'Cannot select:' dumps,
where colors help distinguish nodes at a quick glance.
2025-05-24 09:40:29 -07:00