llvm-project

Author	SHA1	Message	Date
Alexey Bataev	78777a204a	[LV]Split store-load forward distance analysis from other checks, NFC (#121156 ) The patch splits the store-load forwarding distance analysis from other dependency analysis in LAA. Currently it supports only power-of-2 distances, required to support non-power-of-2 distances in future. Part of #100755	2025-03-31 07:28:44 -04:00
Florian Hahn	809f857d2c	[VPlan] Support early-exit loops in optimizeForVFAndUF. (#131539 ) Update optimizeForVFAndUF to support early-exit loops by handling BranchOnCond(Or(..., CanonicalIV == TripCount)) via SCEV PR: https://github.com/llvm/llvm-project/pull/131539	2025-03-31 07:55:48 +01:00
Kazu Hirata	2fc08d4c31	[Vectorize] Use DenseMap::insert_range (NFC) (#133656 )	2025-03-30 22:57:45 -07:00
Han-Kuan Chen	65734de9b9	[SLP] NFC. Remove the redundant MainOp and AltOp find process. (#133642 )	2025-03-31 10:26:45 +08:00
Florian Hahn	5f56eaff8b	[VPlan] Remove duplicated VPDerivedIVRecipe handling (NFC). Also handled by an earlier, more general if above.	2025-03-30 22:27:44 +01:00
Florian Hahn	fd8fb71486	[VPlan] Handle scalar casts and blend in isUniformAfterVectorization. Currently should be NFC, but will be used by https://github.com/llvm/llvm-project/pull/117506.	2025-03-30 22:21:12 +01:00
Florian Hahn	424c8f9217	[VPlan] Remove dead UF argument from VPTransformState ctor (NFC).	2025-03-30 17:31:00 +01:00
Kazu Hirata	d8b078d550	[Transforms] Use llvm::append_range (NFC) (#133607 )	2025-03-29 18:57:50 -07:00
Florian Hahn	6b98134466	[VPlan] Re-enable narrowing interleave groups with interleaving. Remove the UF = 1 restriction introduced by 577631f0a5 building on top of 783a846507683, which allows updating all relevant users of the VF, VPScalarIVSteps in particular. This restores the full functionality of https://github.com/llvm/llvm-project/pull/106441.	2025-03-29 20:14:10 +00:00
vporpo	3db5be79d2	[SandboxVec] Add print-region pass (#131019 ) This patch implements a simple printing pass for regions. This is meant to be used in tests and for debugging.	2025-03-29 10:42:51 -07:00
Florian Hahn	77913b5d1d	[VPlan] Add instantiation of VPUnrollPartAccessor<3> to fix link error. Fix link errors with GCC by providing an explicit instantiation.	2025-03-28 22:05:54 +00:00
Florian Hahn	783a846507	[VPlan] Add VF as operand to VPScalarIVStepsRecipe. Similarly to other recipes, update VPScalarIVStepsRecipe to also take the runtime VF as argument. This removes some unnecessary runtime VF computations for scalable vectors. It will also allow dropping the UF == 1 restriction for narrowing interleave groups required in 577631f0a528.	2025-03-28 21:48:59 +00:00
Alexey Bataev	1bfc61064a	[SLP]Fix spill cost analysis for split vectorized nodes If the entry is SplitVectorize, it can be skipped in favor of its operands, operands allow correctly detect spill costs. Fixes #133288	2025-03-28 12:45:53 -07:00
Ramkumar Ramachandra	4c4e4e4299	[LV] Strengthen calls to collectInstsToScalarize (NFC) (#130642 ) Avoid the pattern of always calling collectInstsToScalarize after collectUniformsAndScalars, and call it in collectUniformsAndScalars instead. Also strengthen checks for early exits in the function.	2025-03-28 17:27:57 +00:00
Hari Limaye	bf5627c85e	[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828 ) Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the narrowest type necessary, when the trip-count of a loop is known to be constant and the only use of the recipe is the condition used by the vector loop's backedge branch.	2025-03-28 14:47:40 +00:00
Florian Hahn	7b75db5755	[VPlan] Add new VPIRPhi overlay for VPIRInsts wrapping phi nodes (NFC). (#129387 ) Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an overlay, to provide more convenient checking (via directly doing isa/dyn_cast/cast) and specialied execute/print implementations. Both VPIRInstruction and VPIRPhi share the same VPDefID, and are differentiated by the backing IR instruction. This pattern could alos be used to provide more specialized interfaces for some VPInstructions ocpodes, without introducing new, completely spearate recipes. An example would be modeling VPWidenPHIRecipe & VPScalarPHIRecip using VPInstructions opcodes and providing an interface to retrieve incoming blocks and values through a VPInstruction subclass similar to VPIRPhi. PR: https://github.com/llvm/llvm-project/pull/129387	2025-03-28 08:43:46 +00:00
Kazu Hirata	673f4705a8	[llvm] Use Set::insert_range (NFC) (#133353 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E.first); down to: Set.insert_range(llvm::make_first_range(Range)); In some cases, we can further fold that into the set declaration.	2025-03-27 20:44:20 -07:00
Florian Hahn	5eccd71ce4	[VPlan] Add assertion ensuring Plan's UF matches BestUF (NFC).	2025-03-27 19:29:55 +00:00
Florian Hahn	8ddbc01295	[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690 ) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 PR: https://github.com/llvm/llvm-project/pull/132690	2025-03-27 18:34:13 +00:00
Kazu Hirata	cde58bfc16	[Transforms] Use range constructors of *Set (NFC) (#133203 )	2025-03-27 07:51:58 -07:00
Simon Pilgrim	1715386e80	Fix MSVC signed/unsigned comparison warning. NFC.	2025-03-27 08:56:21 +00:00
Florian Hahn	2c7d40b2f0	[VPlan] Generalize SCALAR-STEPS removal to any unroll factor. Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled VPlan, which means there's a single UF, but it will be > 1 if unrolling took place.	2025-03-26 21:03:50 +00:00
David Green	de1c2f24bc	[LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. (#132170 ) This moves the checks of MinTripCountTailFoldingThreshold later, during the calculation of whether to tail fold. This allows it to check beforehand whether tail predication is required, either for scalable or fixed-width vectors. This option is only specified for AArch64, where it returns the minimum of 5. This patch aims to allow the vectorization of TC=4 loops, preventing them from performing slower when SVE is present.	2025-03-26 19:35:08 +00:00
David Sherwood	1c9fe8c8af	[LV] Optimise users of induction variables in early exit blocks (#130766 ) This is the second of two PRs that attempts to improve the IR generated in the exit blocks of vectorised loops with uncountable early exits. It follows on from PR #128880. In this PR I am improving the generated code for users of induction variables in early exit blocks. This required using a newly add VPInstruction called FirstActiveLane, which calculates the index of the first active predicate in the mask operand. I have added a new function optimizeEarlyExitInductionUser that is called from optimizeInductionExitUsers when handling users in early exit blocks.	2025-03-26 12:09:59 +00:00
Walter Lee	fed4727187	Mark maybe_unused variable (#133069 ) ... to avoid -Wunused-variable warnings/errors when assertions are off.	2025-03-26 11:51:09 +00:00
Florian Hahn	420c056f85	[VPlan] Add ComputeFindLastIVResult opcode (NFC). (#132689 ) This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix https://github.com/llvm/llvm-project/issues/126836. PR: https://github.com/llvm/llvm-project/pull/132689	2025-03-26 10:49:09 +00:00
Martin Storsjö	a2e5932e8b	Revert "[SLP] Make getSameOpcode support interchangeable instructions. (#132887 )" This reverts commit 6e66cfeeaec6f09a4454400e45d690457ecdd3de. This change causes crashes on compiling some inputs, see https://github.com/llvm/llvm-project/pull/127450#issuecomment-2752833710 and https://github.com/llvm/llvm-project/pull/127450#issuecomment-2753375326 for details.	2025-03-26 10:24:25 +02:00
Kazu Hirata	e87921304b	[Vectorize] Avoid repeated hash lookups (NFC) (#132661 ) Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-03-25 15:18:15 -07:00
Florian Hahn	577631f0a5	Reapply "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf. The recommmitted version limits to transform to cases where no interleaving is taking place, to avoid a mis-compile when interleaving. Original commit message: This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-25 20:57:10 +00:00
Ramkumar Ramachandra	8fb802e995	[LV] Improve code in collectInstsToScalarize (NFC) (#130643 )	2025-03-25 16:52:13 +00:00
Florian Hahn	b24694371c	[VPlan] Add instantiation of VPUnrollPartAccessor<2> to fix link errors. Speculative fix for missing definitions in some configs, including https://lab.llvm.org/buildbot/#/builders/159/builds/18760	2025-03-25 13:12:34 +00:00
Florian Hahn	dfca6c0d3b	[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655 ) After unrolling, there may be additional simplifications that can be applied. One example is removing SCALAR-STEPS for the first part where only the first lane is demanded. This removes redundant adds of 0 from a large number of tests (~200), many which I am still working on updating. In preparation for removing redundant WideIV steps added in https://github.com/llvm/llvm-project/pull/119284. PR: https://github.com/llvm/llvm-project/pull/123655	2025-03-25 12:57:24 +00:00
Han-Kuan Chen	6e66cfeeae	[SLP] Make getSameOpcode support interchangeable instructions. (#132887 ) We use the term "interchangeable instructions" to refer to different operators that have the same meaning (e.g., `add x, 0` is equivalent to `mul x, 1`). Non-constant values are not supported, as they may incur high costs with little benefit. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2025-03-25 19:46:15 +08:00
Alexey Bataev	8122bb9dbe	[SLP]Fix a check for non-schedulable instructions Need to fix a check for non-schedulable instructions in getLastInstructionInBundle function, because this check may not work correctly during the codegen. Instead, need to check that actually these instructions were never scheduled, since the scheduling analysis always performed before the codegen and is stable. Fixes #132841	2025-03-25 04:35:33 -07:00
Han-Kuan Chen	2682a9433b	[SLP][REVEC] Add ExtractSubvector cost for ExternalUses. (#132761 ) For llvm/test/Transforms/SLPVectorizer/revec-shufflevector.ll, ScalarCost and ExtraCost is 1, so the original scalar will be kept.	2025-03-25 18:58:54 +08:00
Ramkumar Ramachandra	e8d882a95b	[LV] Audit and fix nits in cl::opts (NFC) (#130601 ) Non-static cl::opts should be under the llvm namespace.	2025-03-25 10:19:45 +00:00
Martin Storsjö	b33bec9b21	Revert "[SLP] Make getSameOpcode support interchangeable instructions. (#127450 )" This reverts commit 71a0cfd93263552ddc0bfd2ea7b0abe9a578f87e. This commit triggers failed asserts when compiling ffmpeg. The issue is reproducible with a small standalone reproducer like this: void make_filters_from_proto(int filter[][2], int bands) { int c, q, n; for (;; q++) { n = 0; for (; n < 7; n++) { int theta = (q (n - 6) + (n >> 1) - 3) % bands; if (theta) c = theta; filter[q][n][0] = c; } } } $ clang -target x86_64-linux-gnu -c repro.c -O3 clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:989: llvm::SmallVector<llvm ::Value> {anonymous}::BinOpSameOpcodeHelper::InterchangeableInfo::getOperand(ll vm::Instruction) const: Assertion `FromCIValue.isZero() && "Cannot convert the instruction."' failed. The same issue also reproduces for a large number of other target triples, aarch64-linux-gnu and others.	2025-03-25 10:22:44 +02:00
Martin Storsjö	dd059338a2	Revert "[Vectorize] Fix a warning" This reverts commit 4c68061254c896214b7ad5ab807ac4ba11517812. Reverting as part of a revert of a preceding commit.	2025-03-25 10:21:05 +02:00
Kazu Hirata	4c68061254	[Vectorize] Fix a warning This patch fixes: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:855:52: error: unused variable 'SupportedOp' [-Werror,-Wunused-const-variable]	2025-03-24 17:38:47 -07:00
Han-Kuan Chen	71a0cfd932	[SLP] Make getSameOpcode support interchangeable instructions. (#127450 ) We use the term "interchangeable instructions" to refer to different operators that have the same meaning (e.g., `add x, 0` is equivalent to `mul x, 1`). Non-constant values are not supported, as they may incur high costs with little benefit. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2025-03-25 08:24:46 +08:00
Florian Hahn	5b38fb59df	[VPlan] Remove remaining references to VPScalarPHIRecipe (NFC). VPScalarPHIRecipe has been replaced by VPInstructions with PHI opcodes. Strip remaining dead references to VPScalarPHIRecipe.	2025-03-24 19:37:00 +00:00
Alexey Bataev	ad9909dd73	[SLP]Fix perfect diamond match with extractelements in scalars Need to drop all previous estimations/vectorizations, when found a perfect diamond match. This improves cost estimation and improves code emission. Also, need to adjust getScalarizationOverhead cost for non-poison input vector. Currently, it does not allow to estimate it correctly, so instead use conservative element-by-element insertelement cost for each unique scalar. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/132466	2025-03-24 09:29:18 -04:00
Luke Lau	6a8606e99e	[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300 ) VPReductionRecipes take a RecurrenceDescriptor, but only use the RecurKind and FastMathFlags in it when executing. This patch makes the recipe more lightweight by stripping it to only take the latter two. The motiviation for this is to simplify an upcoming patch to support in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to create an Or reduction, and by using RecurKind we can create an arbitrary reduction without needing a full RecurrenceDescriptor.	2025-03-24 19:18:54 +08:00
Florian Hahn	06fd10f1da	[VPlan] Don't create ExtractElement recipes for scalar plans. (#131604 ) ExtractElements are no-ops for scalar VPlans. Don't introduce them in handleUncountableEarlyExit if the plan has only a scalar VF. This fixes a crash trying to compute the cost of ExtractElement after 26ecf978951b79. PR: https://github.com/llvm/llvm-project/pull/131604	2025-03-23 22:00:02 +00:00
Martin Storsjö	ff3e2ba9eb	Revert "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit dfa665f19c52d98b8d833a8e9073427ba5641b19. This commit caused miscompilations in ffmpeg, see https://github.com/llvm/llvm-project/pull/106441 for details.	2025-03-23 23:27:39 +02:00
Florian Hahn	206b42c98e	[LV] Use VPBuilder to create ComputeReductionResult. (NFC) Update code to use VPBuilder, to simplify follow-up changes.	2025-03-23 16:10:07 +00:00
Florian Hahn	c482b8faea	[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC). Instead of executing the whole entry VPIRBB twice, first only execute the VPExpandSCEVRecipes and replace their uses with the expanded VPValue, which will be a live-in. This allows removing special logic in VPExpandSCEVRecipe to support executing twice and allows moving the ExpandedSCEVs map out of VPTransformState. It will also allow adding other recipes to the entry VPBB in the future.	2025-03-23 09:06:01 +00:00
Kazu Hirata	fae34938f6	[llvm] Use *Set::insert_range (NFC) (#132591 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range with iterator ranges. For each case, I've verified that foos is defined as make_range(foo_begin(), foo_end()) or in a similar manner.	2025-03-22 22:14:45 -07:00
Florian Hahn	dfa665f19c	[VPlan] Add transformation to narrow interleave groups. (#106441 ) This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-22 21:40:17 +00:00
Florian Hahn	0d3ba087f7	[LV] Move IV bypass value creation out of ILV (NFC) createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.	2025-03-22 20:36:45 +00:00

1 2 3 4 5 ...

5784 Commits