llvm-project

Author	SHA1	Message	Date
Florian Hahn	b24694371c	[VPlan] Add instantiation of VPUnrollPartAccessor<2> to fix link errors. Speculative fix for missing definitions in some configs, including https://lab.llvm.org/buildbot/#/builders/159/builds/18760	2025-03-25 13:12:34 +00:00
Florian Hahn	dfca6c0d3b	[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655 ) After unrolling, there may be additional simplifications that can be applied. One example is removing SCALAR-STEPS for the first part where only the first lane is demanded. This removes redundant adds of 0 from a large number of tests (~200), many which I am still working on updating. In preparation for removing redundant WideIV steps added in https://github.com/llvm/llvm-project/pull/119284. PR: https://github.com/llvm/llvm-project/pull/123655	2025-03-25 12:57:24 +00:00
Han-Kuan Chen	6e66cfeeae	[SLP] Make getSameOpcode support interchangeable instructions. (#132887 ) We use the term "interchangeable instructions" to refer to different operators that have the same meaning (e.g., `add x, 0` is equivalent to `mul x, 1`). Non-constant values are not supported, as they may incur high costs with little benefit. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2025-03-25 19:46:15 +08:00
Alexey Bataev	8122bb9dbe	[SLP]Fix a check for non-schedulable instructions Need to fix a check for non-schedulable instructions in getLastInstructionInBundle function, because this check may not work correctly during the codegen. Instead, need to check that actually these instructions were never scheduled, since the scheduling analysis always performed before the codegen and is stable. Fixes #132841	2025-03-25 04:35:33 -07:00
Han-Kuan Chen	2682a9433b	[SLP][REVEC] Add ExtractSubvector cost for ExternalUses. (#132761 ) For llvm/test/Transforms/SLPVectorizer/revec-shufflevector.ll, ScalarCost and ExtraCost is 1, so the original scalar will be kept.	2025-03-25 18:58:54 +08:00
Ramkumar Ramachandra	e8d882a95b	[LV] Audit and fix nits in cl::opts (NFC) (#130601 ) Non-static cl::opts should be under the llvm namespace.	2025-03-25 10:19:45 +00:00
Martin Storsjö	b33bec9b21	Revert "[SLP] Make getSameOpcode support interchangeable instructions. (#127450 )" This reverts commit 71a0cfd93263552ddc0bfd2ea7b0abe9a578f87e. This commit triggers failed asserts when compiling ffmpeg. The issue is reproducible with a small standalone reproducer like this: void make_filters_from_proto(int filter[][2], int bands) { int c, q, n; for (;; q++) { n = 0; for (; n < 7; n++) { int theta = (q (n - 6) + (n >> 1) - 3) % bands; if (theta) c = theta; filter[q][n][0] = c; } } } $ clang -target x86_64-linux-gnu -c repro.c -O3 clang: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:989: llvm::SmallVector<llvm ::Value> {anonymous}::BinOpSameOpcodeHelper::InterchangeableInfo::getOperand(ll vm::Instruction) const: Assertion `FromCIValue.isZero() && "Cannot convert the instruction."' failed. The same issue also reproduces for a large number of other target triples, aarch64-linux-gnu and others.	2025-03-25 10:22:44 +02:00
Martin Storsjö	dd059338a2	Revert "[Vectorize] Fix a warning" This reverts commit 4c68061254c896214b7ad5ab807ac4ba11517812. Reverting as part of a revert of a preceding commit.	2025-03-25 10:21:05 +02:00
Kazu Hirata	4c68061254	[Vectorize] Fix a warning This patch fixes: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:855:52: error: unused variable 'SupportedOp' [-Werror,-Wunused-const-variable]	2025-03-24 17:38:47 -07:00
Han-Kuan Chen	71a0cfd932	[SLP] Make getSameOpcode support interchangeable instructions. (#127450 ) We use the term "interchangeable instructions" to refer to different operators that have the same meaning (e.g., `add x, 0` is equivalent to `mul x, 1`). Non-constant values are not supported, as they may incur high costs with little benefit. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2025-03-25 08:24:46 +08:00
Florian Hahn	5b38fb59df	[VPlan] Remove remaining references to VPScalarPHIRecipe (NFC). VPScalarPHIRecipe has been replaced by VPInstructions with PHI opcodes. Strip remaining dead references to VPScalarPHIRecipe.	2025-03-24 19:37:00 +00:00
Alexey Bataev	ad9909dd73	[SLP]Fix perfect diamond match with extractelements in scalars Need to drop all previous estimations/vectorizations, when found a perfect diamond match. This improves cost estimation and improves code emission. Also, need to adjust getScalarizationOverhead cost for non-poison input vector. Currently, it does not allow to estimate it correctly, so instead use conservative element-by-element insertelement cost for each unique scalar. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/132466	2025-03-24 09:29:18 -04:00
Luke Lau	6a8606e99e	[VPlan] Only store RecurKind + FastMathFlags in VPReductionRecipe. NFCI (#131300 ) VPReductionRecipes take a RecurrenceDescriptor, but only use the RecurKind and FastMathFlags in it when executing. This patch makes the recipe more lightweight by stripping it to only take the latter two. The motiviation for this is to simplify an upcoming patch to support in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to create an Or reduction, and by using RecurKind we can create an arbitrary reduction without needing a full RecurrenceDescriptor.	2025-03-24 19:18:54 +08:00
Florian Hahn	06fd10f1da	[VPlan] Don't create ExtractElement recipes for scalar plans. (#131604 ) ExtractElements are no-ops for scalar VPlans. Don't introduce them in handleUncountableEarlyExit if the plan has only a scalar VF. This fixes a crash trying to compute the cost of ExtractElement after 26ecf978951b79. PR: https://github.com/llvm/llvm-project/pull/131604	2025-03-23 22:00:02 +00:00
Martin Storsjö	ff3e2ba9eb	Revert "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit dfa665f19c52d98b8d833a8e9073427ba5641b19. This commit caused miscompilations in ffmpeg, see https://github.com/llvm/llvm-project/pull/106441 for details.	2025-03-23 23:27:39 +02:00
Florian Hahn	206b42c98e	[LV] Use VPBuilder to create ComputeReductionResult. (NFC) Update code to use VPBuilder, to simplify follow-up changes.	2025-03-23 16:10:07 +00:00
Florian Hahn	c482b8faea	[VPlan] Only execute VPExpandSCEVRecipes once and remove them (NFC). Instead of executing the whole entry VPIRBB twice, first only execute the VPExpandSCEVRecipes and replace their uses with the expanded VPValue, which will be a live-in. This allows removing special logic in VPExpandSCEVRecipe to support executing twice and allows moving the ExpandedSCEVs map out of VPTransformState. It will also allow adding other recipes to the entry VPBB in the future.	2025-03-23 09:06:01 +00:00
Kazu Hirata	fae34938f6	[llvm] Use *Set::insert_range (NFC) (#132591 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch uses insert_range with iterator ranges. For each case, I've verified that foos is defined as make_range(foo_begin(), foo_end()) or in a similar manner.	2025-03-22 22:14:45 -07:00
Florian Hahn	dfa665f19c	[VPlan] Add transformation to narrow interleave groups. (#106441 ) This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-22 21:40:17 +00:00
Florian Hahn	0d3ba087f7	[LV] Move IV bypass value creation out of ILV (NFC) createInductionAdditionalBypassValues is only used for epilogue vectorization now. Move it out of ILV, which means we do not have to thread through ExpandedSCEVs and also don't have to track the bypass values in ILV. Instead, directly create them if needed after executing the epilogue plan. This moves more the epilogue specific logic out of the generic executePlan.	2025-03-22 20:36:45 +00:00
Florian Hahn	34631744af	[VPlan] Get DataLayout from SE in VPExpandSCEVRecipe::execute (NFC) This doesn't rely on State.CFG.	2025-03-22 15:49:57 +00:00
Alexey Bataev	3b0ec61156	[SLP][NFC] Redesign schedule bundle, separate from schedule data, NFC That's the initial patch, intended to support revectorization of the previously vectorized scalars. If the scalar is marked for the vectorization, it becomes a part of the schedule bundle, used to check dependencies and then schedule tree entry scalars into a single batch of instructions. Unfortunately, currently this info is part of the ScheduleData struct and it does not allow making scalars part of many bundles. The patch separates schedule bundles from the ScheduleData, introduces explicit class ScheduleBundle for bundles, allowing later to extend it to support revectorization of the previously vectorized scalars. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/131625	2025-03-21 13:36:57 -04:00
David Sherwood	4e69258bf3	[LoopVectorize] Add cost of generating tail-folding mask to the loop (#130565 ) At the moment if we decide to enable tail-folding we do not include the cost of generating the mask per VF. This can mean we make some poor choices of VF, which is definitely true for SVE-enabled AArch64 targets where mask generation for fixed-width vectors is more expensive than for scalable vectors. I've added a VPInstruction::computeCost function to return the costs of the ActiveLaneMask and ExplicitVectorLength operations. Unfortunately, in order to prevent asserts firing I've also had to duplicate the same code in the legacy cost model to make sure the chosen VFs match up. I've wrapped this up in a ifndef NDEBUG for now. The alternative would be to disable the assert completely when tail-folding, which I imagine is just as bad. New tests added: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Transforms/LoopVectorize/RISCV/tail-folding-cost.ll	2025-03-21 09:24:56 +00:00
Kazu Hirata	3520dc5e7a	[Vectorize] Fix a build This patch fixes: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:2263:19: error: expected ';' after return statement	2025-03-20 12:52:27 -07:00
Florian Hahn	c73ad7ba20	[VPlan] Add transformation to narrow interleave groups. This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. For now it only transforms load interleave groups feeding store groups. Depends on #106431. This lands the main parts of the approved https://github.com/llvm/llvm-project/pull/106441 as suggested to break things up a bit more.	2025-03-20 19:41:37 +00:00
Kazu Hirata	2df51fd9c4	[Transforms] Avoid repeated hash lookups (NFC) (#132146 )	2025-03-20 09:11:56 -07:00
Han-Kuan Chen	73558dc329	[SLP][REVEC] Fix getStoreMinimumVF only accept scalar types. (#132181 ) Fix "Element type of a VectorType must " "be an integer, floating point, or " "pointer type.".	2025-03-20 21:04:30 +08:00
Han-Kuan Chen	a5d4b50f93	[SLP] NFC. Change the inner loop and outer loop of appendOperandsOfVL. (#132152 )	2025-03-20 20:32:20 +08:00
Luke Lau	01f04252b6	[LV] Get FMFs from VectorBuilder in createSimpleReduction. NFC (#132017 ) The other createSimpleReduction takes the FMFs from the IRBuilder, so this aligns the VectorBuilder variant to do the same and reduce the possibility of there being a mismatch in flags.	2025-03-20 16:38:56 +08:00
Nikita Popov	0738f70615	[Intrinsics] Add Intrinsic::getFnAttributes() (NFC) (#132029 ) Most places that call Intrinsic::getAttributes() are only interested in the function attributes, so add a separate function for that. The motivation for this is that I'd like to add the ability to specify range attributes on intrinsics, which requires knowing the function type. This avoids needing to know the type for most attribute queries.	2025-03-20 09:20:39 +01:00
Han-Kuan Chen	c3e16337a4	[SLP][REVEC] Ignore UserTreeIndex if it is empty. (#131993 ) Previously, the all_of check did not consider the case where the TreeEntry is empty (i.e., when it is the first entry).	2025-03-20 11:31:49 +08:00
Kazu Hirata	0dcc201ac4	[Transforms] Use *Set::insert_range (NFC) (#132056 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-19 15:35:01 -07:00
Florian Hahn	2e13ec561c	[VPlan] Bail out on non-intrinsic calls in VPlanNativePath. Update initial VPlan-construction in VPlanNativePath in line with the inner loop path, in that it bails out when encountering constructs it cannot handle, like non-intrinsic calls. Fixes https://github.com/llvm/llvm-project/issues/131071.	2025-03-19 21:35:15 +00:00
Florian Hahn	11b8699572	[LV] Don't skip instrs with side-effects in reg pressure computation. (#126415 ) calculateRegisterUsage adds end points for each user of an instruction to Ends and ignores instructions not added to it, i.e. instructions with no users. This means things like stores aren't included, which in turn means values that are only used in stores are also not included for consideration. This means we underestimate the register usage in cases where the only users are things like stores. Update the code to don't skip instructions without users (i.e. not in Ends) if they have side-effects. PR: https://github.com/llvm/llvm-project/pull/126415	2025-03-19 15:13:43 +00:00
Luke Lau	f536f71580	[LV] Split RecurrenceDescriptor into RecurKind + FastMathFlags in LoopUtils. NFC (#132014 ) Split off from #131300, this splits up RecurrenceDescriptor arguments so that arbitrary recurrence kinds may be used down the line.	2025-03-19 22:56:57 +08:00
Kazu Hirata	9705010b5e	[Vectorize] Avoid repeated hash lookups (NFC) (#131962 )	2025-03-19 07:14:54 -07:00
Longsheng Mou	f3f7f08eca	[SLP] Fix Wsign-compare warning (NFC) (#131948 ) llvm-project/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4805:57: warning: comparison of integer expressions of different signedness: ‘int’ and ‘std::size_t’ {aka ‘long unsigned int’} [-Wsign-compare] [](const auto &P) { return P.value() % 2 != P.index() % 2; })) ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~	2025-03-19 17:01:42 +08:00
Florian Hahn	870f753f1f	[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC). Also include VPlan's BTC in the set of VPValues to materialize broadcasts for, if it is used.	2025-03-18 22:35:18 +00:00
Florian Hahn	8a91f6bcda	[VPlan] Use CurrentParentLoop instead of looking up via CFG (NFC). There is no need to look up the current parent loop via LoopInfo and the vector preheader; we can simply use CurrentParentLoop.	2025-03-18 22:11:47 +00:00
Florian Hahn	d51bc83511	[VPlan] Only skip live-ins with constants in materializeBroadccast (NFC) Currently this should be NFC, but will be needed in future patches.	2025-03-18 20:23:16 +00:00
Alexey Bataev	45090b3059	[SLP]Check the whole def-use chain in the tree to find proper dominance, if the last instruction is the same If the insertion point (last instruction) of the user nodes is the same, need to check the whole def-use chain in the tree to find proper dominance to prevent a compiler crash. Fixes #131818	2025-03-18 10:01:13 -07:00
Luke Lau	a4dc02c0e7	[VPlan] Rename VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe. NFC (#131086 ) After #128718 lands there will be two ways of performing a reversed widened memory access, either by performing a consecutive unit-stride access and a reverse, or a strided access with a negative stride. Even though both produce a reversed vector, only the former needs VPReverseVectorPointerRecipe which computes a pointer to the last element of each part. A strided reverse still needs a pointer to the first element of each part so it will use VPVectorPointerRecipe. This renames VPReverseVectorPointerRecipe to VPVectorEndPointerRecipe to clarify that a reversed access may not necessarily need a pointer to the last element.	2025-03-19 00:09:15 +08:00
Elvis Wang	ed19620b8c	[VPlan] Make VPReductionRecipe a VPRecipeWithIRFlags. NFC (#130881 ) This patch change the parent of the VPReductionRecipe from VPSingleDefRecipe to VPRecipeWithIRFlags and also print/get/drop/control flags by the VPRecipeWithIRFlags. This will remove the dependency of the underlying instruction. This patch also add a new function `setFastMathFlags()` to the VPRecipeWithIRFlags because the entire reduction chain may contains multiple instructions. And the underlying instruction may not contains the corresponding flags for this reduction. Split from #113903.	2025-03-18 10:08:23 +08:00
Florian Hahn	166937b49d	[LV] Cleanup after expanding SCEV predicate to constant. In some cases, SCEV isn't able to prove that no wrap checks are needed, while constant folding in SCEVExpander can. In those cases, we may leave around IR for computing the trip count, which is unused at this point but may be re-used later, triggering an assertion when trying to clean up SCEVExp after vectorization. Directly run the cleaner after expanding to a constant predicate to prevent any generated code from being re-used. Fixes https://github.com/llvm/llvm-project/issues/131281.	2025-03-17 21:26:51 +00:00
Jeffrey Byrnes	4336e5edbc	[SLP] Sort PHIs by ExtractElements when relevant (#131229 ) Considering the PHIs in order of element extracted can lead to better shuffles.	2025-03-17 14:19:46 -07:00
Alexey Bataev	ead9d6a56d	[SLP]Check VectorizableTree is not empty before accessing elements Need to check VectorizableTree is not empty before accessing elements. Fixes #131635	2025-03-17 11:04:38 -07:00
Luke Lau	67f1c033b8	[VPlan] Remove createReduction. NFCI (#131336 ) This is split off from #131300. A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so when it calls createReduction it always calls createSimpleReduction. If we replace the call then it leaves createReduction with one user in VPInstruction::ComputeReductionResult, which we can inline and then remove.	2025-03-18 00:18:15 +08:00
Luke Lau	eef5ea0c42	[VPlan] Account for dead FOR splice simplification in cost model (#131486 ) Fixes #131359 After #129645, a first-order recurrence will no longer have it's splice costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and is dead. The legacy cost model didn't account for this, so this accounts for it in planContainsAdditionalSimplifications to avoid the "VPlan cost model and legacy cost model disagreed" assertion.	2025-03-18 00:00:54 +08:00
Florian Hahn	17b4be8f63	[VPlan] Move setting name and adding VFs after recipe creation.(NFC) Recipe creation is the only place where the VF range is restricted. Move setting the VFs just after initial recipe creation.	2025-03-17 12:04:09 +00:00
Florian Hahn	4e9894498e	[VPlan] Truncate VFxUF if needed in VPWidenPointerInduction::execute. Create truncate if needed after 56b05a0d6. Note that this preserves the original behavior pre 56b05a0d6. If truncate would strip any set bits, then the explicit computation in the narrower type would wrap.	2025-03-16 11:37:58 +00:00

1 2 3 4 5 ...

5754 Commits