llvm-project

Author	SHA1	Message	Date
Florian Hahn	2bdc1a1337	[LV] Use frozen start value for FindLastIV if needed. (#132691 ) FindLastIV introduces multiple uses of the start value, where in the original source there was only a single use, when the epilogue is vectorized. Each use of undef may produce a different result, so introducing multiple uses can produce incorrect results when the input is undef/poison. If the start value may be undef or poison, freeze it and use the frozen value, which will be the same at all uses. See the following scenarios in Alive2: * Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr * Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v * Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV * Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN The latter 2 show requiring freezing the resume phi. That means we cannot freeze in the preheader. We could move the freeze to the main iteration count check, but that would be a bit fragile to find and other transforms can sink the freeze if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 and https://github.com/llvm/llvm-project/pull/132690. Fixes https://github.com/llvm/llvm-project/issues/126836 PR: https://github.com/llvm/llvm-project/pull/132691	2025-04-04 11:48:01 +01:00
Florian Hahn	cdff7f0b6e	[LV] Retrieve middle VPBB via scalar ph to fix epilogue resumephis (NFC) If ScalarPH has predecessors, we may need to update its reduction resume values. If there is a middle block, it must be the first predecessor. Note that the first predecessor may not be the middle block, if the middle block doesn't branch to the scalar preheader. In that case, fixReductionScalarResumeWhenVectorizingEpilog will be a no-op. In preparation for https://github.com/llvm/llvm-project/pull/106748.	2025-04-03 21:46:48 +01:00
Alexey Bataev	daab7d0807	[SLP]Initial support for (masked)loads + compress and (masked)interleaved Added initial support for (masked)loads + compress and (masked)interleaved loads. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/132099	2025-04-03 13:17:40 -07:00
Alexey Bataev	7c4013d591	Revert "[SLP]Initial support for (masked)loads + compress and (masked)interleaved" This reverts commit 0bec0f5c059af5f920fe22ecda469b666b5971b0 to fix a crash reported in https://lab.llvm.org/buildbot/#/builders/143/builds/6668.	2025-04-03 12:58:49 -07:00
Alexey Bataev	0bec0f5c05	[SLP]Initial support for (masked)loads + compress and (masked)interleaved Added initial support for (masked)loads + compress and (masked)interleaved loads. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/132099	2025-04-03 13:21:22 -04:00
Ramkumar Ramachandra	6bbdc70066	[LV] Use getCallWideningDecision in more places (NFC) (#134236 )	2025-04-03 14:53:19 +01:00
Florian Hahn	380defd4b3	[VPlan] Update VPInterleaveRecipe to take debug loc directly as arg (NFC)	2025-04-02 22:46:38 +01:00
Florian Hahn	4b67c53e20	[VPlan] Use recipe debug loc instead of instr DLs in more cases (NFC) Update both VPInterleaveRecipe and VPReplicateRecipe codegen to use debug location directly from the recipe, not the underlying instruction. This removes another dependency on underlying instructions.	2025-04-02 21:51:17 +01:00
vporpo	a1b0b4997e	[SandboxVec][NFC] Replace std::regex with llvm::Regex (#134110 )	2025-04-02 13:46:56 -07:00
Krzysztof Drewniak	554859c736	[TTI] Make isLegalMasked{Load,Store} take an address space (#134006 ) In order to facilitate targets that only support masked loads/stores on certain address spaces (AMDGPU will support them in an upcoming patch, but only for address space 7), add an AddressSpace parameter to isLegalMaskedLoad and isLegalMaskedStore	2025-04-02 15:38:10 -05:00
Alexey Bataev	843ef77dc2	[SLP]Update mapping between values and their matching entries upon selection Need to update the mapping between gathered values and their matching entries, if the list of the entries is updated and only some of them are selected for final shuffling. Fixes #134085	2025-04-02 11:59:32 -07:00
Alexey Bataev	48a4b14cb6	[SLP]Fix whole vector registers calculations for compares Need to check that the calculated number of the elements is not larger than the original number of scalars to prevent a compiler crash. Fixes #134013	2025-04-02 07:26:40 -07:00
Han-Kuan Chen	5bbcc765cc	[SLP][REVEC] getNumElements should not be used as VF when REVEC is enabled. (#134031 )	2025-04-02 19:04:07 +08:00
Luke Lau	8107b430ed	[VPlan] Simplify select c, x, x -> x (#133731 ) As noted in 1a9358c090d0507be21c5e9b2d97a23ef1de8ab0, some simplifications can produce a redundant select where the true and false operands are the same, which this patch removes. The is_fpclass test was changed so the condition wasn't made dead.	2025-04-02 10:26:48 +01:00
Alexey Bataev	0e3049c562	[SLP]Support revectorization of the previously vectorized scalars If the scalar instructions is marked for the vectorization in the tree, it cannot be vectorized as part of the another node in the same tree, in general. It may prevent some potentially profitable vectorization opportunities, since some nodes end up being buildvector/gather nodes, which add to the total cost. Patch allows revectorization of the previously vectorized scalars. Reviewers: hiraditya, RKSimon Reviewed By: RKSimon, hiraditya Pull Request: https://github.com/llvm/llvm-project/pull/133091	2025-04-01 14:30:06 -04:00
Ningning Shi(史宁宁)	6b647de031	[NFC] Remove the unused hasMinSize() (#133838 ) The 'hasOptSize()' is 'hasFnAttribute(Attribute::OptimizeForSize) \|\| hasMinSize()', so we don't need another 'hasMinSize()'.	2025-04-01 15:23:34 +08:00
Alexey Bataev	cf6a452cc7	[SLP]Fix same/alternate analysis in split node analysis for compares getSameOpcode in some cases may consider 2 compares as having same opcode, even though previously they were considered as alternate. It may happen, because getSameOpcode looses info about previous instructions and their states. Need to use isAlternateInstruction function instead for the correct analysis. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/133769	2025-03-31 19:33:40 -04:00
Luke Lau	6afe5e5d1a	[LV][EVL] Peek through combination tail-folded + predicated masks (#133430 ) If a recipe was predicated and tail folded at the same time, it will have a mask like EMIT vp<%header-mask> = icmp ule canonical-iv, backedge-tc EMIT vp<%mask> = logical-and vp<%header-mask>, vp<%pred-mask> When converting to an EVL recipe, if the mask isn't exactly just the header-mask we copy the whole logical-and. We can remove this redundant logical-and (because it's now covered by EVL) and just use vp<%pred-mask> instead. This lets us remove the widened canonical IV in more places.	2025-03-31 21:28:39 +01:00
Luke Lau	b739a3cb65	[VPlan] Add m_Deferred. NFC (#133736 ) This copies over the implementation of m_Deferred which allows matching values that were bound in the pattern, and uses it for the (X && Y) \|\| (X && !Y) -> X simplifcation.	2025-03-31 21:01:28 +01:00
Alexey Bataev	bfd8cc0a3e	[SLP]Fix a check for the whole register use Need to check the value type, not the return type, of the instructions, when doing the analysis for the whole register use to prevent a compiler crash. Fixes #133751	2025-03-31 10:52:12 -07:00
Alexey Bataev	78777a204a	[LV]Split store-load forward distance analysis from other checks, NFC (#121156 ) The patch splits the store-load forwarding distance analysis from other dependency analysis in LAA. Currently it supports only power-of-2 distances, required to support non-power-of-2 distances in future. Part of #100755	2025-03-31 07:28:44 -04:00
Florian Hahn	809f857d2c	[VPlan] Support early-exit loops in optimizeForVFAndUF. (#131539 ) Update optimizeForVFAndUF to support early-exit loops by handling BranchOnCond(Or(..., CanonicalIV == TripCount)) via SCEV PR: https://github.com/llvm/llvm-project/pull/131539	2025-03-31 07:55:48 +01:00
Kazu Hirata	2fc08d4c31	[Vectorize] Use DenseMap::insert_range (NFC) (#133656 )	2025-03-30 22:57:45 -07:00
Han-Kuan Chen	65734de9b9	[SLP] NFC. Remove the redundant MainOp and AltOp find process. (#133642 )	2025-03-31 10:26:45 +08:00
Florian Hahn	5f56eaff8b	[VPlan] Remove duplicated VPDerivedIVRecipe handling (NFC). Also handled by an earlier, more general if above.	2025-03-30 22:27:44 +01:00
Florian Hahn	fd8fb71486	[VPlan] Handle scalar casts and blend in isUniformAfterVectorization. Currently should be NFC, but will be used by https://github.com/llvm/llvm-project/pull/117506.	2025-03-30 22:21:12 +01:00
Florian Hahn	424c8f9217	[VPlan] Remove dead UF argument from VPTransformState ctor (NFC).	2025-03-30 17:31:00 +01:00
Kazu Hirata	d8b078d550	[Transforms] Use llvm::append_range (NFC) (#133607 )	2025-03-29 18:57:50 -07:00
Florian Hahn	6b98134466	[VPlan] Re-enable narrowing interleave groups with interleaving. Remove the UF = 1 restriction introduced by 577631f0a5 building on top of 783a846507683, which allows updating all relevant users of the VF, VPScalarIVSteps in particular. This restores the full functionality of https://github.com/llvm/llvm-project/pull/106441.	2025-03-29 20:14:10 +00:00
vporpo	3db5be79d2	[SandboxVec] Add print-region pass (#131019 ) This patch implements a simple printing pass for regions. This is meant to be used in tests and for debugging.	2025-03-29 10:42:51 -07:00
Florian Hahn	77913b5d1d	[VPlan] Add instantiation of VPUnrollPartAccessor<3> to fix link error. Fix link errors with GCC by providing an explicit instantiation.	2025-03-28 22:05:54 +00:00
Florian Hahn	783a846507	[VPlan] Add VF as operand to VPScalarIVStepsRecipe. Similarly to other recipes, update VPScalarIVStepsRecipe to also take the runtime VF as argument. This removes some unnecessary runtime VF computations for scalable vectors. It will also allow dropping the UF == 1 restriction for narrowing interleave groups required in 577631f0a528.	2025-03-28 21:48:59 +00:00
Alexey Bataev	1bfc61064a	[SLP]Fix spill cost analysis for split vectorized nodes If the entry is SplitVectorize, it can be skipped in favor of its operands, operands allow correctly detect spill costs. Fixes #133288	2025-03-28 12:45:53 -07:00
Ramkumar Ramachandra	4c4e4e4299	[LV] Strengthen calls to collectInstsToScalarize (NFC) (#130642 ) Avoid the pattern of always calling collectInstsToScalarize after collectUniformsAndScalars, and call it in collectUniformsAndScalars instead. Also strengthen checks for early exits in the function.	2025-03-28 17:27:57 +00:00
Hari Limaye	bf5627c85e	[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828 ) Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the narrowest type necessary, when the trip-count of a loop is known to be constant and the only use of the recipe is the condition used by the vector loop's backedge branch.	2025-03-28 14:47:40 +00:00
Florian Hahn	7b75db5755	[VPlan] Add new VPIRPhi overlay for VPIRInsts wrapping phi nodes (NFC). (#129387 ) Add a new VPIRPhi subclass of VPIRInstruction, that purely serves as an overlay, to provide more convenient checking (via directly doing isa/dyn_cast/cast) and specialied execute/print implementations. Both VPIRInstruction and VPIRPhi share the same VPDefID, and are differentiated by the backing IR instruction. This pattern could alos be used to provide more specialized interfaces for some VPInstructions ocpodes, without introducing new, completely spearate recipes. An example would be modeling VPWidenPHIRecipe & VPScalarPHIRecip using VPInstructions opcodes and providing an interface to retrieve incoming blocks and values through a VPInstruction subclass similar to VPIRPhi. PR: https://github.com/llvm/llvm-project/pull/129387	2025-03-28 08:43:46 +00:00
Kazu Hirata	673f4705a8	[llvm] Use Set::insert_range (NFC) (#133353 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E.first); down to: Set.insert_range(llvm::make_first_range(Range)); In some cases, we can further fold that into the set declaration.	2025-03-27 20:44:20 -07:00
Florian Hahn	5eccd71ce4	[VPlan] Add assertion ensuring Plan's UF matches BestUF (NFC).	2025-03-27 19:29:55 +00:00
Florian Hahn	8ddbc01295	[VPlan] Manage FindLastIV start value in ComputeFindLastIVResult (NFC) (#132690 ) Keep the start value as operand of ComputeFindLastIVResult. A follow-up patch will use this to make sure the start value is frozen if needed. Depends on https://github.com/llvm/llvm-project/pull/132689 PR: https://github.com/llvm/llvm-project/pull/132690	2025-03-27 18:34:13 +00:00
Kazu Hirata	cde58bfc16	[Transforms] Use range constructors of *Set (NFC) (#133203 )	2025-03-27 07:51:58 -07:00
Simon Pilgrim	1715386e80	Fix MSVC signed/unsigned comparison warning. NFC.	2025-03-27 08:56:21 +00:00
Florian Hahn	2c7d40b2f0	[VPlan] Generalize SCALAR-STEPS removal to any unroll factor. Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled VPlan, which means there's a single UF, but it will be > 1 if unrolling took place.	2025-03-26 21:03:50 +00:00
David Green	de1c2f24bc	[LoopVectorizer][AArch64] Move getMinTripCountTailFoldingThreshold later. (#132170 ) This moves the checks of MinTripCountTailFoldingThreshold later, during the calculation of whether to tail fold. This allows it to check beforehand whether tail predication is required, either for scalable or fixed-width vectors. This option is only specified for AArch64, where it returns the minimum of 5. This patch aims to allow the vectorization of TC=4 loops, preventing them from performing slower when SVE is present.	2025-03-26 19:35:08 +00:00
David Sherwood	1c9fe8c8af	[LV] Optimise users of induction variables in early exit blocks (#130766 ) This is the second of two PRs that attempts to improve the IR generated in the exit blocks of vectorised loops with uncountable early exits. It follows on from PR #128880. In this PR I am improving the generated code for users of induction variables in early exit blocks. This required using a newly add VPInstruction called FirstActiveLane, which calculates the index of the first active predicate in the mask operand. I have added a new function optimizeEarlyExitInductionUser that is called from optimizeInductionExitUsers when handling users in early exit blocks.	2025-03-26 12:09:59 +00:00
Walter Lee	fed4727187	Mark maybe_unused variable (#133069 ) ... to avoid -Wunused-variable warnings/errors when assertions are off.	2025-03-26 11:51:09 +00:00
Florian Hahn	420c056f85	[VPlan] Add ComputeFindLastIVResult opcode (NFC). (#132689 ) This moves the logic for computing the FindLastIV reduction result to its own opcode. A follow-up patch will update the new opcode to also take the start value, to fix https://github.com/llvm/llvm-project/issues/126836. PR: https://github.com/llvm/llvm-project/pull/132689	2025-03-26 10:49:09 +00:00
Martin Storsjö	a2e5932e8b	Revert "[SLP] Make getSameOpcode support interchangeable instructions. (#132887 )" This reverts commit 6e66cfeeaec6f09a4454400e45d690457ecdd3de. This change causes crashes on compiling some inputs, see https://github.com/llvm/llvm-project/pull/127450#issuecomment-2752833710 and https://github.com/llvm/llvm-project/pull/127450#issuecomment-2753375326 for details.	2025-03-26 10:24:25 +02:00
Kazu Hirata	e87921304b	[Vectorize] Avoid repeated hash lookups (NFC) (#132661 ) Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-03-25 15:18:15 -07:00
Florian Hahn	577631f0a5	Reapply "[VPlan] Add transformation to narrow interleave groups. (#106441 )" This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf. The recommmitted version limits to transform to cases where no interleaving is taking place, to avoid a mis-compile when interleaving. Original commit message: This patch adds a new narrowInterleaveGroups transfrom, which tries convert a plan with interleave groups with VF elements to a plan that instead replaces the interleave groups with wide loads and stores processing VF elements. This effectively is a very simple form of loop-aware SLP, where we use interleave groups to identify candidates. This initial version is quite restricted and hopefully serves as a starting point for how to best model those kinds of transforms. Depends on https://github.com/llvm/llvm-project/pull/106431. Fixes https://github.com/llvm/llvm-project/issues/82936. PR: https://github.com/llvm/llvm-project/pull/106441	2025-03-25 20:57:10 +00:00
Ramkumar Ramachandra	8fb802e995	[LV] Improve code in collectInstsToScalarize (NFC) (#130643 )	2025-03-25 16:52:13 +00:00

1 2 3 4 5 ...

5804 Commits