llvm-project

Author	SHA1	Message	Date
Alexander Kornienko	7165353506	Revert "[LoopVectorize] Support vectorization of overflow intrinsics" (#179819 ) Reverts llvm/llvm-project#174835, which causes clang crashes. See https://github.com/llvm/llvm-project/pull/174835#issuecomment-3844233831 and https://github.com/llvm/llvm-project/issues/179671 for details.	2026-02-05 15:41:49 +01:00
Florian Hahn	8240cf337a	[VPlan] Always set flags for overflowing ops etc via VPIRFlags. (#179138 ) Enforce that all VPInstructions set the correct OpType of the VPIRFlags. Flag mis-matches (e.g. VPInstruction Add without `OverflowingBinOp` being set) can cause crashes (e.g. in CSE) or potentially mis-compiles. Add a few helpers in VPBuilder to create common instructions with correct flags. PR: https://github.com/llvm/llvm-project/pull/179138	2026-02-03 12:33:23 +00:00
Ramkumar Ramachandra	a19cbc4b77	[VPlan] Rename VectorEndPointer's IndexedTy to SourceElementTy (NFC) (#178856 ) For consistency with IR terminology.	2026-02-01 11:30:26 +00:00
Florian Hahn	abfd56293c	[VPlan] Mark VPActiveLaneMaskPHIRecipe as readnone. (#177886 ) VPWidenActiveLaneMaskPHIRecipe does not have side-effects and also does not access memory. Mark accordingly. This allows hoisting of some invariant loads out of loops and also removing unused phi recipes in the future. In llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll, the hoisting makes vectorization profitable. PR: https://github.com/llvm/llvm-project/pull/177886	2026-01-30 16:12:30 +00:00
Sander de Smalen	b4c7518a0f	[LV] Add support for extended fadd reductions (#178447 ) This makes use of the llvm.vector.partial.reduce.fadd intrinsics added in #163975 to handle the following with FDOT: ``` float32_t fdot(float16_t *src, int N) { float32_t sum = 0.0f; for (int i=0; i<N; ++i) sum += src[i]; return sum; } ```	2026-01-30 08:27:57 +00:00
Florian Hahn	e36cd26618	[VPlan] Remove non-reductions after simplifications. (#176795 ) In some cases, we identify patterns as reductions, even though they can be simplified to a non-reduction. Mark VPReductionPHIRecipe as not reading from memory & not having side-effects, to clean them up. We also need to remove ComputeReductionResult VPInstructions with live-in arguments. This means there is actually no reduction, and we need to fold it to the live in. Otherwise we would incorrectly reduce the live-in. PR: https://github.com/llvm/llvm-project/pull/176795	2026-01-28 15:51:08 +00:00
Damian Heaton	762ba885f9	[LV] Add support for llvm.vector.partial.reduce.fadd (#163975 ) Allows the Loop Vectorizer to generate `llvm.vector.partial.reduce.fadd` intrinsics when sequences which match its requirements are found.	2026-01-28 15:05:34 +00:00
Florian Hahn	b794baf8e7	[TTI] Add VectorInstrContext for context-aware insert/extract costs. (#175982 ) This commit introduces the VectorInstrContext (VIC) infrastructure to improve cost estimates for insert/extracts based on the context instruction in which the insert/extract is used. This is similar to CastContextHint, and allows providing context on how the insert/extract is going to be used before creating IR. This is useful in the LoopVectorizer, where costs need to estimated before creating IR. The new hint currently only replaces an existing check in AArch64, but new uses will be introduced in follow-ups, including https://github.com/llvm/llvm-project/pull/177201. PR: https://github.com/llvm/llvm-project/pull/175982	2026-01-27 16:30:29 +00:00
Florian Hahn	a871b707b7	Reapply "[VPlan] Move VDef subclass ID to VPRecipeBase (NFC). (#174282 )" Move SubclassID to VPRecipeBase, and store VPRecipeBase directly in VPRecipeValue, instead of VPDef. This allows for some additional simplifications and VPDef now just holds various helpers to deal with removing and adding VPValues. This reverts commit 16395da0ff577750571b99fe28281ce6fb6a3ae8. PR: https://github.com/llvm/llvm-project/pull/174282	2026-01-24 13:22:48 +00:00
Florian Hahn	16395da0ff	Revert "[VPlan] Fold VPDef into VPRecipeBase (NFC). (#174282 )" This reverts commit f3ae334f4b7a8cf4fe0eb6ee7b2f2ef0879f522d. Committed with out-of-date message, revert to reland with updated message.	2026-01-24 13:16:45 +00:00
Florian Hahn	f3ae334f4b	[VPlan] Fold VPDef into VPRecipeBase (NFC). (#174282 ) A separate VDef is not needed any longer, fold i into VPRecipeBase to simplify code and class hierarchy. Depends on https://github.com/llvm/llvm-project/pull/172758. PR: https://github.com/llvm/llvm-project/pull/174282	2026-01-24 13:16:12 +00:00
Florian Hahn	14a209f852	[VPlan] Replace ComputeFindIVRes with ComputeRdxRes + cmp + sel (NFC) (#176672 ) Replace ComputeFindIVResult with ComputeReductionResult + explicit compare + select, to more explicitly and simpler model computing finding the first/last induction, which boils down to a min/max reduction + compare and select of the sentinel value. PR: https://github.com/llvm/llvm-project/pull/176672	2026-01-22 19:28:47 +00:00
Luke Lau	2036bc524b	[IR] Update IRBuilder::createVectorSplice to allow variable offsets (#177178 ) Following on from #174693, this updates IRBuilder to allow variable offsets, and splits the createVectorSplice function into two functions for left and right splices. We could preserve the existing createVectorSplice API but given there's only one LLVM-internal user of it in the loop vectorizer, and the notion of a negative offset doesn't exist in the intrinsics anymore, I've removed it. Happy to add it back if reviewers prefer though. I've also added unit tests since createVectorSpliceLeft has no coverage otherwise.	2026-01-22 04:21:57 +00:00
Florian Hahn	dd363d0629	[VPlan] Replace UnrollPart for VPScalarIVSteps with start index op (NFC) (#170906 ) Replace the unroll part operand for VPScalarIVStepsRecipe with the start index. This simplifies https://github.com/llvm/llvm-project/pull/170053 and is also a first step to break down the recipe into its components. PR: https://github.com/llvm/llvm-project/pull/170906	2026-01-21 22:13:13 +00:00
Florian Hahn	69fbab2b72	[VPlan] Fall back to legacy cost if operands may be force-scalarized. If any of the operands of a VPReplicateRecipe have been force-scalarized, then the legacy cost model skips the scalarization overhead, but we cannot match this in the VPlan cost model. Bail out for now in those very rare cases. Fixes https://github.com/llvm/llvm-project/issues/176720.	2026-01-19 21:25:54 +00:00
Ramkumar Ramachandra	302565b39e	[VPlan] Move VPDerivedIVRecipe::execute to VPlanRecipes (NFC) (#176577 )	2026-01-19 13:06:37 +00:00
Florian Hahn	ae1bd068db	[VPlan] Replace PhiR operand of ComputeAnyOfResult with VPIRFlags. (#175657 ) Replace the Phi recipe operand of ComputeAnyOfVResult with VPIRFlags, building on top of https://github.com/llvm/llvm-project/pull/174026. PR: https://github.com/llvm/llvm-project/pull/175657	2026-01-18 20:29:38 +00:00
Florian Hahn	3457e7efc3	[VPlan] Match inverted logical AND/OR for select costs. VPlan transforms may invert logical AND/OR selects, which can impact costs on targets the select is not cheap but the boolean AND/OR is. Also match the inverted logical AND/OR to improve accuracy of the cost estimation and fixes the underlying issue for the cost divergence between legacy and VPlan-based cost model that caused the revert of 01d34eb38fa058 in ed004cf42bf57c.	2026-01-18 16:15:42 +00:00
Florian Hahn	459990dcf7	[VPlan] Replace PhiR operand of ComputeFindIVResult with VPIRFlags. #174026 (#175461 ) Replace the Phi recipe operand of ComputeFindIVResult with VPIRFlags, building on top of https://github.com/llvm/llvm-project/pull/174026. PR: https://github.com/llvm/llvm-project/pull/175461	2026-01-17 16:23:33 +00:00
Florian Hahn	d528686f43	[VPlan] Add VPConstantInt for VPIRValues wrapping ConstantInts (NFC) (#175458 ) Follow-up to https://github.com/llvm/llvm-project/pull/174282: Introduce a new VPConstantInt overlay for VPIRValue, to make it easier to check and access constant int IR values. PR: https://github.com/llvm/llvm-project/pull/175458	2026-01-16 11:27:07 +00:00
Vishruth Thimmaiah	04baf1105f	[LoopVectorize] Support vectorization of overflow intrinsics (#174835 ) Enables support for marking overflow intrinsics `uadd`, `sadd`, `usub`, `ssub`, `umul` and `smul` as trivially vectorizable. Fixes #174617 --------- Signed-off-by: vishruth-thimmaiah <vishruththimmaiah@gmail.com>	2026-01-16 10:09:42 +00:00
Elvis Wang	aa11629192	[LV] Prevent `extract-lane` generate unused IRs with single vector operand. (#172798 ) When `extract-lane` only contains single vector operand. We can simplify it to `extractelement`. This patch makes `extract-lane` generate simple `extractelement` when it only contains single vector operand to prevent unused IR generated. This patch is mostly NFC, the unused IR should be removed in following IR passes.	2026-01-16 13:59:51 +08:00
Florian Hahn	f61aab79ce	[VPlan] Handle min/max recur kinds in ::printFlags. Following up to d5c11b9a24c84f, also handle min/max recurrence kinds in ::printFlags, so the proper kind is imprinted instead of icmp. NFC modulo debug printing changes	2026-01-14 18:16:11 +00:00
Graham Hunter	2abd6d6d7a	[LV] Vectorize conditional scalar assignments (#158088 ) Based on Michael Maitland's previous work: https://github.com/llvm/llvm-project/pull/121222 This PR uses the existing recurrences code instead of introducing a new pass just for CSA autovec. I've also made recipes that are more generic.	2026-01-14 14:59:18 +00:00
Florian Hahn	d5c11b9a24	[VPlan] Replace PhiR operand of ComputeRdxResult with VPIRFlags. (#174026 ) Remove the artificial PhiR operand of ComputeReductionResult, which was only used to look up recurrence kind, in-loop and ordered properties. Instead, encode them as VPIRFlags as suggested by @ayalz in https://github.com/llvm/llvm-project/pull/170223. This addresses a TODO to make codegen for ComputeReductionResult independent of looking up information from other recipes. This is NFC w.r.t. codegen, the printing has been improved to include the reduction type, and whether it is in-loop/ordered. PR: https://github.com/llvm/llvm-project/pull/174026	2026-01-14 07:45:44 +00:00
Florian Hahn	f9a8096067	[VPlan] Merge Select with previous cases in ::computeCost (NFC). Merge cases calling the same helper, as suggested post-commit in https://github.com/llvm/llvm-project/pull/174234	2026-01-12 22:19:28 +00:00
Elvis Wang	cd2caf6580	[LV] Simplify extract-lane with scalar operand to the scalar value itself. (#174534 ) This patch simplifies extract-lane(%lane_num, %X) to %X when %X is a scalar value. Extracting from a scalar is redundant since there is only one value to extract.	2026-01-12 10:03:44 +08:00
Aiden Grossman	7450a75b93	[VPlan] Allow truncation for lanes in VPScalarIVStepsRecipe (#175268 ) VPScalarIVStepsRecipe relies on APInt truncation in order to vectorize blocks with a width greater than the maximum value the types of some of their (changing) operands are able to hold (e.g., an i1 input with a vector width of 4). Simply reenable implicit truncation in ConstantInt::get() to cover this case. Remove the helper function given it is only called in one place to prevent accidentally using it elsewhere where we probably do not want implicit truncation turned on. This fixes another case that we saw after acb78bde6fb613a9af2a604bc69fa744a8cee850 did not fix that issue, which had the same stack trace. We still want to keep lane constants as unsigned. Somewhat similar to 6d1e7d4982fabc9e245897056a5425496df6a7a3. This test case comes from a tensorflow/XLA compilation from a test case in https://github.com/google-research/spherical-cnn.	2026-01-10 10:34:46 -05:00
Aiden Grossman	acb78bde6f	[VPlan] Use unsigned integers for lane start indices (#175231 ) a83c89495ba6fe0134dcaa02372c320cc7ff0dbf caused assertion failures here as if we have a single bit induction variable and two lanes (0 and 1), then the second lane index (1) will be out of bounds of what a signed 1-bit integer can hold. Lane indices are always >0 according to VPlanHelpers.h:125, and the lane representation in this code is also unsigned. The test case come from tensorflow/XLA.	2026-01-09 14:28:28 -08:00
Florian Hahn	31b93d6e38	[VPlan] Add specialized VPValue subclasses for different types (NFC) (#172758 ) This patch adds VPValue sub-classes for the different cases we currently have: * VPIRValue: A live-in VPValue that wraps an underlying IR value * VPSymbolicValue: A symbolic VPValue not tied to an underlying value, e.g. the vector trip count or VF VPValues * VPRecipeValue: A VPValue defined by a VPDef/VPRecipeBase. This has multiple benefits: * clearer constructors for each kind of VPValue * limited scope: for example allows moving VPDef member to VPRecipeValue, reducing size of other VPValues. * stricter type checking for member variables (e.g. using VPLiveIn in the Value -> live-in map in VPlan, or using VPSymbolicValue for symbolic member VPValues) There probably are additional opportunities for cleanups as follow-ups. PR: https://github.com/llvm/llvm-project/pull/172758	2026-01-07 20:29:05 +00:00
Ramkumar Ramachandra	bdc7681d63	[VPlan] Restore all-operands-inv WidenGEP logic (#174416 ) Restore the all-operands-invariant handling in WidenGEP::execute prior to 37f7b31 (Reland [VPlan] Handle WidenGEP in narrowToSingleScalars), as crashes have been reported. Fixes #173761.	2026-01-06 13:05:37 +00:00
Nikita Popov	707d18c8e7	[VPlan] Use getSigned() for index in VectorEndPointer recipe (#174426 ) The stride can be negative here, so we should use getSigned(). This avoids an assertion failure with https://github.com/llvm/llvm-project/pull/171456. It also avoids a miscompile if the index is >64-bit, but I don't think that can happen in practice.	2026-01-06 09:10:59 +01:00
Florian Hahn	16830b2164	[VPlan] Remove VPWidenSelectRecipe, use VPWidenRecipe instead (NFCI). (#174234 ) All extra state has been removed from VPWidenSelectRecipe at this point. There's no benefit of having a separate recipe and Select can easily be handled by the existing VPWidenRecipe. PR: https://github.com/llvm/llvm-project/pull/174234	2026-01-05 22:33:37 +00:00
Florian Hahn	188507e542	[VPlan] Inline createFindLastIVReduction into its only caller. (NFC) createFindLastIVReduction is only used for generating code for ComputeFindIVResult. Inline the code there, in preparation for https://github.com/llvm/llvm-project/pull/172569.	2026-01-04 13:31:47 +00:00
Florian Hahn	990883a690	[VPlan] Handle Alloca in VPReplicateRecipe::computeCost. (NFCI) Handle Alloca in the VPlan-based cost mode. This also updates the cost in the legacy cost model to clarify that we always compute the scalar cost.	2026-01-03 17:40:51 +00:00
Florian Hahn	b4d833135a	[VPlan] Handle non-free bitcasts in getCostForRecipeWithOpcode. Update bitcast cost handling to match the legacy cost model.	2026-01-02 18:04:13 +00:00
Florian Hahn	5ee82dffc6	[VPlan] Handle addrspacecast/ptrtoaddr in VPlan-based cost model. Also handle missing PtrToAddrs and AddrSpaceCast in getCostForRecipeWithOpcode. This makes sure all cast opcodes are handled, fixing a crash on loops replicating addrspacecast and ptrtoaddrs.	2026-01-01 10:35:35 +00:00
Florian Hahn	6f6fca136c	[VPlan] Re-use common cast cost logic for VPReplicateRecipe (NFCI). Move the logic to compute cast costs to getCostForRecipeWithOpcode and use for VPReplicateRecipe. This should match the costs computed by the legacy cost model for scalar casts.	2025-12-30 22:34:53 +00:00
Florian Hahn	524b1788c4	[VPlan] Add BranchOnTwoConds, use for early exit plans. (#172750 ) This PR introduces a new BranchOnTwoConds VPInstruction, that takes 2 boolean operands and must be placed in a block with 3 successors. If condition I is true, branches to successor I, otherwise falls through to check the next condition. If both conditions are false, branch to the third successor. This new branch recipe is used for early-exit loops, to simplify the representation in VPlan initially, by avoid the need for splitting the middle block early on, in a way that preserves the single-exit block property of regions. All exits still go through the latch block, but they can go to more than 2 successors. This idea was part of one of the original proposals for how to model early exits in VPlan, but at that point in time, there was no good way to handle this during code-gen, and we went with the early split-middle block approach initially. Now that we dissolve regions before ::execute, the new recipe can be lowered nicely after regions have been removed, to a set of VPBBs and BranchOnCond recipes. The initial lowering preserves the original structure with the split middle blocks. Follow-ups will improve the lowering to avoid this splitting, providing performance gains. PR: https://github.com/llvm/llvm-project/pull/172750	2025-12-29 19:39:38 +00:00
陈子昂	c9eb572b14	[LoopVectorize] Support vectorization of frexp intrinsic (#172957 ) This patch enables the vectorization of the llvm.frexp intrinsic. Following the suggestion in #112408, frexp is moved from isTriviallyScalarizable to isTriviallyVectorizable. Fixes #112408	2025-12-26 21:57:57 +00:00
Florian Hahn	1f331e453f	[VPlan] Only use isAddressSCEVForCost in getAddressAccessSCEV (NFC) Follow-up to https://github.com/llvm/llvm-project/pull/171204 to only rely on isAddressSCEVForCost in isAddressSCEVForCost, completely aligning with the legacy cost model.	2025-12-23 22:29:02 +00:00
Florian Hahn	44a8d9c135	Reapply "[VPlan] Use predicate from VPValue VPWidenSelectR::computeCost." (#173170 ) This reverts commit f42af14073228 and re-applies https://github.com/llvm/llvm-project/pull/172915. It has an additional check if the condition is a live-in, which makes sure we preserve the original behavior in that case. This should fix the crash that caused the revert. Original commit message: Instead of looking up the predicate from the VPValue condition instead of the underlying IR. This improves cost modeling in some cases, e.g. when we can fold operations like negations in compares. On AArch64, this leads to additional vectorization in a few cases in practice. Example lowering for the modified test case: https://llvm.godbolt.org/z/6nc6jo5eG	2025-12-22 22:38:31 +00:00
Florian Hahn	c43ccefc9f	[VPlan] Use PSE to construct SCEVs in getSCEVExprForVPValue (NFCI). getSCEVExprForVPValue is used to create SCEVs for expressions from the original loop, which may be predicated. Use PSE to construct predicated SCEVs if possible. This matches the legacy LV code behavior. Currently should be NFC, but will enable migrating more SCEV/cost-based computations to VPlan. The patch requires exposing a new getPredicatedSCEV helper to PredicatedScalarEvolution which just takes a SCEV, to avoid needing to go through IR values, which isn't an option for getSCEVExprForVPValue.	2025-12-21 22:39:49 +00:00
Florian Hahn	f42af14073	Revert "[VPlan] Use predicate from VPValue VPWidenSelectR::computeCost." (#173170 ) Reverts llvm/llvm-project#172915 Looks like this may be causing https://lab.llvm.org/buildbot/#/builders/128/builds/9590 to fail. Revert while I confirm.	2025-12-20 22:54:21 +00:00
Florian Hahn	e77246dbf4	[VPlan] Use predicate from VPValue VPWidenSelectR::computeCost. (#172915 ) Instead of looking up the predicate from the VPValue condition instead of the underlying IR. This improves cost modeling in some cases, e.g. when we can fold operations like negations in compares. On AArch64, this leads to additional vectorization in a few cases in practice. Example lowering for the modified test case: https://llvm.godbolt.org/z/6nc6jo5eG PR: https://github.com/llvm/llvm-project/pull/172915	2025-12-20 22:09:58 +00:00
Florian Hahn	1f78f6a2d6	[LV] Check Addr in getAddressAccessSCEV in terms of SCEV expressions. (#171204 ) getAddressAccessSCEV previously had some restrictive checks that limited pointer SCEV expressions passed to TTI to GEPs with operands that must either be invariant or marked as inductions. As a consequence, the check rejected things like `GEP %base, (%iv + 1)`, while the SCEV for the GEP should be as easily analyzeable as for `GEP %base, %v`, with the only difference being the of the AddRec start adjusted by 1. This patch changes the code to use a SCEV-based check, limiting the address SCEV to be loop invariant, an affine AddRec (i.e. induction ), or an add expression of such operands or a sign-extended AddRec. This catches all existing cases getAddressAccessSCEV caught, plus additional ones like the cases mentioned above. This means we pass address SCEVs in more cases, giving the backends a better change to make informed decisions. It also unifies the decision when to use an address SCEV between the legacy and VPlan-based cost model. An illustrative example of showing the impact are the gather-cost.ll tests. Previously they were considered not profitable to vectorize because we failed to determine that %gep.src_data = getelementptr inbounds [1536 x float], ptr @src_data, i64 0, i64 %mul has a relatively small constant stride. There may be some rough edges in the cost models, where not passing pointer SCEVs hid some incorrect modeling, but those issues should be fixed in the target cost models if they surface. PR: https://github.com/llvm/llvm-project/pull/171204	2025-12-19 22:05:27 +00:00
Mel Chen	f196b1d66f	[VPlan] Extract reverse operation for reverse accesses (#146525 ) This patch introduces VPInstruction::Reverse and extracts the reverse operations of loaded/stored values from reverse memory accesses. This extraction facilitates future support for permutation elimination within VPlan.	2025-12-18 14:57:48 +00:00
Florian Hahn	eb0c7e752f	[VPlan] Replace BranchOnCount with Compare + BranchOnCond (NFC). (#172181 ) Expand BranchOnCount to BranchOnCond + ICmp in convertToConcreteRecipes to simplify codegen. PR: https://github.com/llvm/llvm-project/pull/172181	2025-12-16 19:19:31 +00:00
Luke Lau	67d0e21a62	Reapply "[VPlan] Remove legacy costing inside VPBlendRecipe::computeCost (#171846 )" (#172261 ) This reapplies #171846 with a test case and fix for a legacy cost-model mismatch assertion. In the previous version of the patch, we only considered the plan to contain simplifications when it had a VPBlendRecipe and VF.isScalar() was true. However for some VPlans we may have a blend with only the first lane used: BLEND ir<%phi> = ir<%foo.res> ir<%bar.res>/ir<%c> CLONE ir<%gep> = getelementptr ir<%p>, ir<%phi> vp<%5> = vector-pointer ir<%gep> And in the legacy cost model we cost a blend as a phi if it's uniform: // If we know that this instruction will remain uniform, check the cost of // the scalar version. if (isUniformAfterVectorization(I, VF)) VF = ElementCount::getFixed(1); So this replaces the VF.isScalar() check with vputils::onlyFirstLaneUsed, which matches how the VPlan cost model mirrored the legacy model beforehand. A VPInstruction::Select will also emit a scalar select for a vector VF if only the first lane is used, so this also updates VPBlendRecipe::computeCost to reflect that too.	2025-12-16 06:30:54 +00:00
Ramkumar Ramachandra	0636225b93	[VPlan] Directly unroll VectorPointerRecipe (#168886 ) In an effort to get rid of VPUnrollPartAccessor and directly unroll recipes, start by directly unrolling VectorPointerRecipe, allowing for VPlan-based simplifications and simplification of the corresponding execute.	2025-12-15 10:54:06 +00:00

1 2 3 4 5 ...

590 Commits