llvm-project

Author	SHA1	Message	Date
Usha Gupta	d5a1f49827	[GISel] [NFC] Capitalize loop indices in GISelValueTracking.cpp for style consistency (#143113 ) Following up on a comment on https://github.com/llvm/llvm-project/pull/142355. Updated other instances in the file as well. @jayfoad	2025-06-06 23:14:50 +09:00
Florian Hahn	dde30a4731	[CGP] Bail out if (Base\|Scaled)Reg does not dominate insert point. (#142949 ) (Base\|Scaled)Reg may not dominate the chosen insert point, if there are multiple uses of the address. Bail out if that's the case, otherwise we will generate invalid IR. In some cases, we could probably adjust the insert point or hoist the (Base\|Scaled)Reg. Fixes https://github.com/llvm/llvm-project/issues/142830. PR: https://github.com/llvm/llvm-project/pull/142949	2025-06-06 12:38:30 +01:00
Benjamin Maxwell	c95bc41562	[AArch64][SDAG] Fix selection of extend of v1if16 SETCC (#140274 ) There is a DAG combine, that folds: ``` t1: v1i1 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = zero_extend t1 ``` -> ``` t1: v1i16 = setcc x:v1f16, y:v1f16, setogt:ch t2: v1i64 = any_extend t1 ``` This creates an issue on AArch64 when attempting to widen the result to `v4i16`. The operand types (`v1f16`) are set to be scalarized, so the "by hand" widening with `DAG.WidenVector` is used for them, however, this only widens to the next power-of-2, so returns `v2f16`, which does not match the result VF. The fix is to manually construct the widened inputs using `INSERT_SUBVECTOR`. Fixes #136540	2025-06-06 11:20:52 +01:00
Jay Foad	5f33b9d286	[MIRParser] Report register class errors in a deterministic order (#142928 )	2025-06-06 10:03:34 +01:00
Guy David	4d4b7cc69e	[AArch64] Skip storing of stack arguments when lowering tail calls (#126735 ) This issue starts in the selection DAG and causes the backend to emit the following for a trivial tail call: ``` ldr w8, [sp] str w8, [sp] b func ``` I'm not too sure that checking for immutability of a specific stack object is a good enough of a gurantee, because as soon a tail-call is done lowering,`setHasTailCall()` is called and in that case perhaps a pass is allowed to change the value of the object in-memory? This can be extended to the ARM backend as well. Removed the `tailcall` keyword from a few other test assets, I'm assuming their original intent was left intact.	2025-06-06 11:26:24 +03:00
Matt Arsenault	b2266d6d79	RuntimeLibcalls: Rename fminimum_num/fmaximum_num enums (#143078 ) Add the underscore to match the libm spelling	2025-06-06 16:23:26 +09:00
Kazu Hirata	34c011d544	[llvm] Use *Map::try_emplace (NFC) (#143002 ) - try_emplace(Key) is shorter than insert(std::make_pair(Key, 0)). - try_emplace performs value initialization without value parameters. - We overwrite values on successful insertion anyway.	2025-06-05 16:14:31 -07:00
Stanley Gambarin	33974b41c7	[GlobalISel] support lowering of G_SHUFFLEVECTOR with pointer args (#141959 )	2025-06-05 09:13:51 -07:00
Ryotaro Kasuga	ef60ee6005	[MachinePipeliner] Introduce a new class for loop-carried deps (#137663 ) In MachinePipeliner, loop-carried memory dependencies are represented by DAG, which makes things complicated and causes some necessary dependencies to be missing. This patch introduces a new class to manage loop-carried memory dependencies to simplify the logic. The ultimate goal is to add currently missing dependencies, but this is a first step of that, and this patch doesn't intend to change current behavior. This patch also adds new tests that show the missed dependencies, which should be fixed in the future. Split off from #135148	2025-06-05 21:30:27 +09:00
Jeremy Morse	df4199c3a4	[DebugInfo] Use correct unit when creating variable across CU boundary (#133282 ) When creating a static member DIE, we place it in a potentially pre-existing context DIE, and that DIE might be located in a different CU if we're in an LTO context. When we then add the source-file-ID to the static member DIE, use the correct Unit to do so -- the one that owns the context DIE. Otherwise we might assign a file-ID from one CU to another, and there isn't a guarantee that they'll be the same file, or even exist. Fixes #109227 (I'd normally remove my home directory from these tests, but in this circumstances the same-file-but-with-a-different-name nature of the DIFile is part of the test).	2025-06-05 10:32:17 +01:00
Kazu Hirata	8b167db63a	[CodeGen] Fix a warning This patch fixes: llvm/lib/CodeGen/MacroFusion.cpp:65:12: error: unused variable 'FirstCluster' [-Werror,-Wunused-variable] llvm/lib/CodeGen/MacroFusion.cpp:66:12: error: unused variable 'SecondCluster' [-Werror,-Wunused-variable]	2025-06-05 01:18:34 -07:00
Ruiling, Song	0487db1f13	MachineScheduler: Improve instruction clustering (#137784 ) The existing way of managing clustered nodes was done through adding weak edges between the neighbouring cluster nodes, which is a sort of ordered queue. And this will be later recorded as `NextClusterPred` or `NextClusterSucc` in `ScheduleDAGMI`. But actually the instruction may be picked not in the exact order of the queue. For example, we have a queue of cluster nodes A B C. But during scheduling, node B might be picked first, then it will be very likely that we only cluster B and C for Top-Down scheduling (leaving A alone). Another issue is: ``` if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum) std::swap(SUa, SUb); if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) ``` may break the cluster queue. For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3 2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2), As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be pred of 3. This makes both 1 and 2 become preds of 3, but there is no edge between 1 and 2. Thus we get a broken cluster chain. To fix both issues, we introduce an unordered set in the change. This could help improve clustering in some hard case. One key reason the change causes so many test check changes is: As the cluster candidates are not ordered now, the candidates might be picked in different order from before. The most affected targets are: AMDGPU, AArch64, RISCV. For RISCV, it seems to me most are just minor instruction reorder, don't see obvious regression. For AArch64, there were some combining of ldr into ldp being affected. With two cases being regressed and two being improved. This has more deeper reason that machine scheduler cannot cluster them well both before and after the change, and the load combine algorithm later is also not smart enough. For AMDGPU, some cases have more v_dual instructions used while some are regressed. It seems less critical. Seems like test `v_vselect_v32bf16` gets more buffer_load being claused.	2025-06-05 15:28:04 +08:00
Acthink Yang	7263cd48e6	[LegalizeTypes][MSP430] Soften FAKE_USE operand (#142714 ) Adds support for softening FAKE_USE operands. Adds MSP430 tests that exercise the new softening code. Fixes #137572	2025-06-05 10:53:57 +09:00
Nikita Popov	d74831efeb	Revert "[SDAG] Fix fmaximum legalization errors (#142170 )" This reverts commit 58cc1675ec7b4aa5bc2dab56180cb7af1b23ade5. I also made the incorrect assumption that we know both values are +/-0.0 here as well. Revert for now.	2025-06-04 14:35:30 +02:00
Nikita Popov	42605b8aa3	Revert "[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732 )" This reverts commit 54da543a14da6dd0e594875241494949cb659b08. I made a logic error here with the assumption that both values are known to be +/-0.0.	2025-06-04 14:22:19 +02:00
Usha Gupta	cf348e886d	[GlobalISel] Add G_CONCAT_VECTOR handling in computeNumSignBits (#142355 ) Code ported from SelectionDAG::ComputeNumSignBits	2025-06-04 11:11:18 +01:00
Nikita Popov	54da543a14	[SelectionDAG] Avoid one comparison when legalizing fmaximum (#142732 ) When ordering signed zero, only check the sign of one of the values. We already know at this point that both values must be +/-0.0, so it is sufficient to check one of them to correctly order them. For example, for fmaximum, if we know LHS is `+0.0` then we can always select LHS, value of RHS does not matter. If LHS is `-0.0` we can always select RHS, value of RHS doesn't matter.	2025-06-04 10:41:30 +02:00
Nikita Popov	b3ce9883f3	[SelectionDAG] Use reportFatalUsageError() for invalid operand bundles (#142613 ) Replace the asserts with reportFatalUsageError(), as these can be reached with invalid user-provided IR. Fixes https://github.com/llvm/llvm-project/issues/142531.	2025-06-04 09:33:05 +02:00
YunQiang Su	bd831372b2	expandFMINIMUMNUM_FMAXIMUMNUM: Quiet is not needed for NaN vs NaN (#139237 ) New LangRef doesn't requires quieting for NaN vs NaN, aka the result may be sNaN for sNaN vs NaN. See: https://github.com/llvm/llvm-project/pull/139228	2025-06-04 08:20:48 +08:00
Harrison Hao	0107c9333c	[DAG] canCreateUndefOrPoison – mark fneg/fadd/fsub/fmul/fdiv/frem as not poison generating (#142345 ) After revisiting the LLVM Language Reference Manual, it is confirmed that plain floating-point operations (`fneg`, `fadd`, `fsub`, `fmul`, `fdiv`, and `frem`) propagate poison but do not inherently create new poison values. Thus, `SelectionDAG::canCreateUndefOrPoison` should return `false` for these operations by default. Poison generation in FP instructions occurs only when specific fast-math flags (`nnan`, `ninf`, or the collective fast) are present, as these flags explicitly convert NaN or Inf results into poison. References: - [`fneg` instruction documentation](https://llvm.org/docs/LangRef.html#fneg-instruction) - [`fadd` instruction documentation](https://llvm.org/docs/LangRef.html#fadd-instruction) - [`fsub` instruction documentation](https://llvm.org/docs/LangRef.html#fsub-instruction) - [`fmul` instruction documentation](https://llvm.org/docs/LangRef.html#fmul-instruction) - [`fdiv` instruction documentation](https://llvm.org/docs/LangRef.html#fdiv-instruction) - [`frem` instruction documentation](https://llvm.org/docs/LangRef.html#frem-instruction) - [Fast-Math Flags documentation](https://llvm.org/docs/LangRef.html#fast-math-flags)	2025-06-03 19:21:40 +08:00
Luke Lau	9a2d4d176a	[SelectionDAG][AArch64] Legalize power of 2 vector.[de]interleaveN (#141513 ) After https://github.com/llvm/llvm-project/pull/139893, we now have [de]interleave intrinsics for factors 2-8 inclusive, with the plan to eventually get the loop vectorizer to emit a single intrinsic for these factors instead of recursively deinterleaving (to support scalable non-power-of-2 factors and to remove the complexity in the interleaved access pass). AArch64 currently supports scalable interleaved groups of factors 2 and 4 from the loop vectorizer. For factor 4 this is currently emitted as a series of recursive [de]interleaves, and normally converted to a target intrinsic in the interleaved access pass. However if for some reason the interleaved access pass doesn't catch it, the [de]interleave4 intrinsic will need to be lowered by the backend. This patch legalizes the node and any other power-of-2 factor to smaller factors, so if a target can lower [de]interleave2 it should be able to handle this without crashing. Factor 3 will probably be more complicated to lower so I've left it out for now. We can disable it in the AArch64 cost model when implementing the loop vectorizer changes.	2025-06-03 12:05:44 +01:00
Matt Arsenault	742e84dc5d	SelectionDAG: Use unique_ptr for SwiftErrorValueTracking (#142532 )	2025-06-03 19:15:03 +09:00
Simon Tatham	56acb06bc6	[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562 ) In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or more labels / basic blocks in the containing function which the assembly code might jump to. If you're also compiling with branch target enforcement via BTI, then previously listing a basic block as a possible jump destination of an asm goto would cause a BTI instruction to be placed at the start of the block, in case the assembly code used an _indirect_ branch instruction (i.e. to a destination address read from a register) to jump to that location. Now it doesn't do that any more: branches to destination labels from the assembly code are assumed to be direct branches (to a relative offset encoded in the instruction), which don't require a BTI at their destination. This change was proposed in https://discourse.llvm.org/t/85845 and there seemed to be no disagreement. The rationale is: 1. it brings clang's handling of asm goto in Arm and AArch64 in line with gcc's, which didn't generate BTIs at the target labels in the first place. 2. it improves performance in the Linux kernel, which uses a lot of 'asm goto' in which the assembly language just contains a NOP, and the label's address is saved elsewhere to let the kernel self-modify at run time to swap between the original NOP and a direct branch to the label. This allows hot code paths to be instrumented for debugging, at only the cost of a NOP when the instrumentation is turned off, instead of the larger cost of an indirect branch. In this situation a BTI is unnecessary (if the branch happens it's direct), and since the code paths are hot, also a noticeable performance hit. Implementation: `SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each target `MachineBasicBlock`. Previously it also called `setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return true, which caused a BTI to be added in the Arm backends. Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more on asm goto targets, but `hasAddressTaken()` also checks the flag set by `setIsInlineAsmBrIndirectTarget()`. So call sites that were using `hasAddressTaken()` don't need to be modified. But the Arm backends don't call `hasAddressTaken()` any more: instead they test two more specific query functions that cover all the reasons `hasAddressTaken()` might have returned true _except_ being an asm goto target. Testing: The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual change, where it expects not to see a `bti` instruction after `[[LABEL]]`. The rest of the test changes are all churn, due to the flags on basic blocks changing. Actual output code hasn't changed in any of the existing tests, only comments and diagnostics. Further work: `RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp` also call `hasAddressTaken()` in a way that might benefit from using the same more specific check I've put in `ARMBranchTargets.cpp` and `AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit I've only changed the Arm backends, and left those alone.	2025-06-03 08:44:13 +01:00
mikael-nilsson-arm	09967917e7	[CodeGenPrepare] Fix signed overflow (#141487 ) The signed addition could overflow which is undefined behavior, now the code checks for it.	2025-06-03 09:27:25 +02:00
Pengcheng Wang	f393986b53	[MISched] Add templates for creating custom schedulers (#141935 ) We rename `createGenericSchedLive` and `createGenericSchedPostRA` to `createSchedLive` and `createSchedPostRA`, and add a template parameter `Strategy` which is the generic implementation by default. This can simplify some code for targets that have custom scheduler strategy.	2025-06-03 11:37:40 +08:00
Kazu Hirata	54d836a080	[llvm] Use *Set::insert_range (NFC) (#138237 )	2025-06-02 19:48:13 -07:00
Philip Reames	e723e15db1	[MCP] Handle iterative simplification during forward copy prop (#140267 ) This is the follow up I mentioned doing in the review of 52b345d. That change introduced an API for performing instruction simplifications following copy propagation (e.g. things like recognizing ORI a0, a1, zero is just a move). As noted in that review, we should be able to perform iterative simplification as we move forward through the block, but weren't because of the code structure. The majority of this code is just deleting the special casing for constant source and destination tracking, and merging the copy handling with the main path. By assumption, the properties of copies (in terms of register reads and writes), must be a subset of general instructions. Once we do that, the iterative bit basically falls out from having the tracking performed for copies which are recognized after we forward prior uses.	2025-06-02 11:21:41 -07:00
Nikita Popov	58cc1675ec	[SDAG] Fix fmaximum legalization errors (#142170 ) FMAXIMUM is currently legalized via IS_FPCLASS for the signed zero handling. This is problematic, because it assumes the equivalent integer type is legal. Many targets have legal fp128, but illegal i128, so this results in legalization failures. Fix this by replacing IS_FPCLASS with checking the bitcast to integer instead. In that case it is sufficient to use any legal integer type, as we're just interested in the sign bit. This can be obtained via a stack temporary cast. There is existing FloatSignAsInt functionality used for legalization of FABS and similar we can use for this purpose. Fixes https://github.com/llvm/llvm-project/issues/139380. Fixes https://github.com/llvm/llvm-project/issues/139381. Fixes https://github.com/llvm/llvm-project/issues/140445.	2025-06-02 10:14:33 +02:00
Jon Roelofs	798058fca5	[Remarks] Remove an upcast footgun. NFC (#142191 ) CodeRegion's were previously passed as Value*, but then immediately upcast to BasicBlock. Let's keep the type information around until the use cases for non-BasicBlock code regions actually materialize.	2025-05-31 11:07:54 -07:00
Craig Topper	b4b3be7faa	[DAGCombiner] Teach SearchForAndLoads to handle an AND with 2 constant operands. (#142062 ) If opaque constants are involved we can have an AND with 2 constant operands that hasn't been simplified. If this is the case, we need to modify at least one of the constants if it is out of range. Fixes #142004	2025-05-30 16:00:43 -07:00
Craig Topper	c5a17e6bea	[DAGCombiner] Use APInt::isSubsetOf. NFC (#142029 )	2025-05-30 09:01:36 -07:00
Aaron Puchert	73d6a48029	[WinEH] Track changes in WinEHPrepare pass (#134121 ) Before this change, the pass would always claim to have changed IR if there is a scope-based personality function. We add some plumbing to track if there was an actual change. This should be NFC, except that we might now preserve more analysis passes.	2025-05-30 15:29:32 +02:00
Usha Gupta	7c996012ce	[GlobalISel] Add G_CONCAT_VECTOR computeKnownBits (#141933 ) Code ported from SelectionDAG::computeKnownBits.	2025-05-30 10:44:59 +01:00
Nikita Popov	ea096c98ae	[SDAG] Remove noundef workaround for range metadata/attributes (#141745 ) In https://reviews.llvm.org/D157685 I changed SDAG to only transfer range metadata to SDAG if it also has !noundef. At the time, this was necessary because SDAG incorrectly propagated poison when folding logical and/or to bitwise and/or. The root cause of that issue has since been addressed by https://github.com/llvm/llvm-project/pull/84924, so drop the workaround now.	2025-05-30 10:56:49 +02:00
Matt Arsenault	36b710a7e5	CodeGen: Convert some assorted errors to use reportFatalUsageError (#142031 ) The test coverage is lacking for many of these errors.	2025-05-30 08:06:53 +02:00
Sebastian Kreutzer	6cb087a725	[XRay] Fix tail call sleds for AArch64 (#141403 ) This addresses issue #141051. XRay uses a special event kind for tail calls on some architectures. This feature is implemented on AArch64, but wasn't fully activated. Tests in `llvm/test/CodeGen/AArch64/xray-tail-call-sled.ll` were incomplete and did not check for the emitted sled type. This patch correctly enables emission of tail call sleds on AArch64 and fixes the tests to check the sled kind.	2025-05-29 21:54:15 -07:00
Philip Reames	1651aa2943	[SDAG] Split the partial reduce legalize table by opcode [nfc] (#141970 ) On it's own, this change should be non-functional. This is a preparatory change for https://github.com/llvm/llvm-project/pull/141267 which adds a new form of PARTIAL_REDUCE_*MLA. As noted in the discussion on that review, AArch64 needs a different set of legal and custom types for the PARTIAL_REDUCE_SUMLA variant than the currently existing PARTIAL_REDUCE_UMLA/SMLA.	2025-05-29 14:05:31 -07:00
Nicholas Guy	a5d97ebe8b	[AArch64][SelectionDAG] Add type legalization for partial reduce wide adds (#141075 ) Based on work initially done by @JamesChesterman.	2025-05-29 14:42:23 +01:00
Marius Kamp	10647685ca	[SDAG] Make Select-with-Identity-Fold More Flexible; NFC (#136554 ) This change adds new parameters to the method `shouldFoldSelectWithIdentityConstant()`. The method now takes the opcode of the select node and the non-identity operand of the select node. To gain access to the appropriate arguments, the call of `shouldFoldSelectWithIdentityConstant()` is moved after all other checks have been performed. Moreover, this change adjusts the precondition of the fold so that it would work for `SELECT` nodes in addition to `VSELECT` nodes. No functional change is intended because all implementations of `shouldFoldSelectWithIdentityConstant()` are adjusted such that they restrict the fold to a `VSELECT` node; the same restriction as before. The rationale of this change is to make more fine grained decisions possible when to revert the InstCombine canonicalization of `(select c (binop x y) y)` to `(binop (select c x idc) y)` in the backends.	2025-05-29 09:46:39 +02:00
Justin Bogner	b7bb256703	Warn on misuse of DiagnosticInfo classes that hold Twines (#137397 ) This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.	2025-05-28 12:26:39 -07:00
Luke Lau	6d88343662	[IA] Add support for [de]interleave{4,6,8} (#141512 ) This teaches the interleaved access pass to the lower the intrinsics for factors 4,6 and 8 added in #139893 to target intrinsics. Because factors 4 and 8 could either have been recursively [de]interleaved or have just been a single intrinsic, we need to check that it's the former it before reshuffling around the values via interleaveLeafValues. After this patch, we can teach the loop vectorizer to emit a single interleave intrinsic for factors 2 through to 8, and then we can remove the recursive interleaving matching in interleaved access pass.	2025-05-28 11:44:41 +01:00
Fabian Ritter	8adcc8a669	[SelectionDAG] Introduce ISD::PTRADD (#140017 ) This opcode represents the addition of a pointer value (first operand) and an integer offset (second operand). PTRADD nodes are only generated if the TargetMachine opts in by overriding TargetMachine::shouldPreservePtrArith(). The PTRADD node and respective visitPTRADD() function were adapted by @rgwott from the CHERI/Morello LLVM tree. Original authors: @davidchisnall, @jrtc27, @arichardson. The changes in this PR were extracted from PR #105669. --------- Co-authored-by: David Chisnall <github@theravensnest.org> Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com> Co-authored-by: Alexander Richardson <alexrichardson@google.com> Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>	2025-05-28 09:09:17 +02:00
Ruiling, Song	3e47d8deba	MachineScheduler: Reset next cluster candidate for each node (#139513 ) When a node is picked, we should reset its next cluster candidate to null before releasing its successors/predecessors.	2025-05-28 14:53:46 +08:00
Peter Collingbourne	645f0e6723	IR: Make Module::getOrInsertGlobal() return a GlobalVariable. After pointer element types were removed this function can only return a GlobalVariable, so reflect that in the type and comments and clean up callers. Reviewers: nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/141323	2025-05-27 12:23:12 -07:00
Kerry McLaughlin	b61144bf77	[AArch64] Allow lowering of more types to GET_ACTIVE_LANE_MASK (#140062 ) Adds support for operand promotion and splitting/widening the result of the ISD::GET_ACTIVE_LANE_MASK node. For AArch64, shouldExpandGetActiveLaneMask now returns false for more types which we know can be legalised.	2025-05-27 11:21:57 +01:00
Jon Roelofs	714096c132	[LLVM] Skip dumping inline SDag children (#141359 ) If they're simple enough to render inline, we don't need to dump them again in the recursive walk.	2025-05-26 19:40:01 -07:00
Kazu Hirata	89308de4b0	[llvm] Value-initialize values with *Map::try_emplace (NFC) (#141522 ) try_emplace value-initializes values, so we do not need to pass nullptr to try_emplace when the value types are raw pointers or std::unique_ptr<T>.	2025-05-26 15:13:02 -07:00
Luke Lau	3033f202f6	[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893 ) This adds [de]interleave intrinsics for factors of 4,6,8, so that every interleaved memory operation supported by the in-tree targets can be represented by a single intrinsic. For context, [de]interleaves of fixed-length vectors are represented by a series of shufflevectors. The intrinsics are needed for scalable vectors, and we don't currently scalably vectorize all possible factors of interleave groups supported by RISC-V/AArch64. The underlying reason for this is that higher factors are currently represented by interleaving multiple interleaves themselves, which made sense at the time in the discussion in https://github.com/llvm/llvm-project/pull/89018. But after trying to integrate these for higher factors on RISC-V I think we should revisit this design choice: - Matching these in InterleavedAccessPass is non-trivial: We currently only support factors that are a power of 2, and detecting this requires a good chunk of code - The shufflevector masks used for [de]interleaves of fixed-length vectors are much easier to pattern match as they are strided patterns, but for the intrinsics it's much more complicated to match as the structure is a tree. - Unlike shufflevectors, there's no optimisation that happens on [de]interleave2 intriniscs - For non-power-of-2 factors e.g. 6, there are multiple possible ways a [de]interleave could be represented, see the discussion in #139373 - We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8 we're not really saving much By representing these higher factors are interleaved-interleaves, we can in theory support arbitrarily high interleave factors. However I'm not sure this is actually needed in practice: SVE only has instructions for factors 2,3,4, whilst RVV only supports up to factor 8. This patch would make it much easier to support scalable interleaved accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as the loop vectorizer and InterleavedAccessPass wouldn't need to construct and match trees of interleaves. For interleave factors above 8, for which there are no hardware memory operations to match in the InterleavedAccessPass, we can still keep the wide load + recursive interleaving in the loop vectorizer.	2025-05-26 18:45:12 +01:00
Fangrui Song	a0901a2f87	Replace #include MCAsmLexer.h with AsmLexer.h MCAsmLexer.h has been made a forwarder header since #134207	2025-05-25 11:57:29 -07:00
Jon Roelofs	346a72f2ca	[LLVM] Add color to SDNode ID's when dumping (#141295 ) This is especially helpful for the recursive 'Cannot select:' dumps, where colors help distinguish nodes at a quick glance.	2025-05-24 09:40:29 -07:00

1 2 3 4 5 ...

37822 Commits