llvm-project

Author	SHA1	Message	Date
Alexey Bataev	9400270449	[SLP]Fix comparator for vector operands of extractelements in PHICompare Need to make comparator to follow strict-weak ordering to fix compiler crashes. Fixes #138178	2025-05-01 14:28:20 -07:00
Florian Hahn	8d02529f77	[VPlan] Consistently use ArrayRef<VPValue *> for operands in ctors (NFC) (#137798 ) Now that there is an ArrayRef constructor taking iterator_ranges, use it consistently to slightly simplify code. Depends on https://github.com/llvm/llvm-project/pull/137796. PR: https://github.com/llvm/llvm-project/pull/137798	2025-05-01 21:19:10 +01:00
Samuel Tebbs	fa769655e7	[LV] NFC: Make VPPartialReductionRecipe a VPReductionRecipe	2025-04-30 19:44:40 +01:00
Luke Lau	2cd829fc2c	[VectorUtils][VPlan] Consolidate VPWidenIntrinsicRecipe::onlyFirstLaneUsed and isVectorIntrinsicWithScalarOpAtArg (#137497 ) We can reuse isVectorIntrinsicWithScalarOpAtArg in VectorUtils to determine if only the first lane will be used for a VPWidenIntrinsicRecipe, provided that we also move the VP EVL operand check into it. This was needed by a local patch I was working on that created a VPWidenIntrinsicRecipe with a VP intrinsic, and prevents the need to update the scalar arguments in two places.	2025-05-01 01:25:41 +08:00
Jonas Paulsson	f5c8c1eedb	[SLPVectorizer] Move X86 specific handling into X86TTIImpl. (#137830 ) `ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" ` changed SLPVectorizer getScalarizationOverhead() to call TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some cases. This was due to X86 specific handlings in these (overridden) methods, and unfortunately the general preference of TTI.getScalarizationOverhead() was dropped. If VL is available it should always be preferred to use getScalarizationOverhead(), and this is indeed the case for SystemZ which has a special insertion instruction that can insert two GPR64s. Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with codegen"` reworked SLPVectorizer getGatherCost() which together with ad9909d caused the SystemZ test vec-elt-insertion.ll to fail. This patch restores the SystemZ test and reverts the change in SLPVectorizer getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always called again. The ForPoisonSrc argument is now passed on to the TTI method so that X86 can handle this as required. Fixes: #135346	2025-04-30 17:11:27 +02:00
Florian Hahn	d431921677	Revert "[VPlan] Add canonical IV during construction (NFC)." This reverts commit e17122fffa8d233fcf9f717354ecda46173f1b8d. Revert as this seems to break some unit tests on some bots.	2025-04-29 22:55:11 +01:00
Florian Hahn	e17122fffa	[VPlan] Add canonical IV during construction (NFC). This addresses an existing TODO and simply moves the current code to add canonical IV recipes to the initial skeleton construction, at the same place where the corresponding region will be introduced.	2025-04-29 22:38:59 +01:00
Florian Hahn	7e71466900	[VPlan] Preserve dbg location on canonical IVs in native path. Pass the debug location of the primary IV to addCanonicalIVRecipes in the native path, matching the behavior of inner loop vectorization.	2025-04-29 21:40:42 +01:00
Luke Lau	c5d780bb72	[VPlan] Remove no longer needed VP intrinsic handling in VPWidenIntrinsicRecipe::computeCost. NFCI (#137573 ) Whenever calls were transformed to VP intrinsics with EVL tail folding in #110412, this workaround was added in computeCost to avoid an assertion when checking ICA.getArgs(). However it turned out that the actual arguments were never used and this assertion was removed in #115983 afterwards, so it's now fine to leave the arguments empty and use the type based cost instead. The type based cost and value based cost are the same for these VP intrinsics. This was tested by adding back in the transformation code in #110412 and checking that no assertions were still hit.	2025-04-29 21:55:20 +08:00
Gaëtan Bossu	c5c4f0d11c	[SLP] Simplify tryToFindDuplicates() (NFC) (#135766 ) This NFC aims to simplify the control-flow and interfaces used in tryToFindDuplicates(). The point is to make it easier to understand where decisions for scalar de-duplication are made. In particular: - Limit indentation - Rename some variables to better match their use case - Always give consistent outputs for VL and ReuseShuffleIndices. This makes it possible to use the same code for building gather TreeEntry everywhere. This also allows to remove the TryToFindDuplicates lambda.	2025-04-29 14:47:22 +01:00
Luke Lau	b0f2bfc7e4	[VPlan] Use correct non-FMF constructor in VPInstructionWithType createNaryOp (#137632 ) Currently if we try to create a VPInstructionWithType without a FMF via VPBuilder::createNaryOp we will use the constructor that asserts `assert(isFPMathOp() && "this op can't take fast-math flags");`. This fixes it by checking if FMFs have a value, similar to the other createNaryOp overloads. This is needed by #129508	2025-04-29 20:35:19 +08:00
Florian Hahn	d68b446933	[IR] Add matchers for remaining FP min/max intrinsics (NFC). (#137612 ) Add dedicated matchers for minimum,maximum,minimumnum and maximumnum intrinsics, similar for the existing matchers for maxnum and minnum. As suggested in https://github.com/llvm/llvm-project/pull/137335. PR: https://github.com/llvm/llvm-project/pull/137612	2025-04-29 12:20:00 +01:00
Ramkumar Ramachandra	49842426f3	[LV] Fix MinBWs in WidenIntrinsic case (#137005 ) There is a bug in the computation and handling of MinBWs in the case of VPWidenIntrinsicRecipe: a crash is observed in VPlanTransforms::truncateToMinimalBitwidths due to a mismatch between the number of recipes processed and the number of entries in MinBWs. Fix handling of calls in llvm::computeMinimumValueSizes, and handle VPWidenIntrinsicRecipe in truncateToMinimalBitwidths, fixing the bug. Fixes #87407.	2025-04-29 09:47:38 +01:00
Luke Lau	7ca6490636	[VPlan] Factor out isUnrolled() helper in VPWidenIntOrFpInductionRecipe. NFC (#137635 ) Split off from #129508, this generalizes getSplatVFValue and getLastUnrolledPartOperand so they don't need changed if another operand is added.	2025-04-29 10:18:47 +08:00
Florian Hahn	d2ce88a939	[VPlan] Create initial skeleton before creating regions. (NFC) Move out the logic to prepare for vectorization to a separate transform, before creating loop regions. This was discussed as follow-up in https://github.com/llvm/llvm-project/pull/136455. This just moves the existing code around slightly and will simplify follow-up patches to include the exiting edges during initial VPlan construction.	2025-04-28 21:51:32 +01:00
Florian Hahn	043b04acff	Reapply "[VPlan] Fold NOT into predicate of wide compares." (#130347 ) This reverts commit 8dd160f4767f971572eac065c8650d9202ff5bf9. The recommit contains an adjustment to planContainsAdditionalSimplifications, which considers changes to the original predicate for compares. Original commit message: Add simplification to fold negation into a compare, if the negation is the only user of the compare. This removes a number of redundant negations. Alive2 Proofs for FPCMP test changes: https://alive2.llvm.org/ce/z/WGDz9U PR: https://github.com/llvm/llvm-project/pull/129430	2025-04-28 20:01:37 +01:00
Alexey Bataev	73d90ec825	[SLP][NFC]Consider non-profitable trees with only phis, gathers, splits and small nodes with reuses Improves compile time for non-profitable cases. Fixes #135965	2025-04-28 03:56:08 -07:00
Florian Hahn	ec1016f7ef	[IVDescriptors] Support reductions with minimumnum/maximumnum. (#137335 ) Add a new reduction recurrence kind for reductions with minimumnum/maximumnum. Such reductions can be vectorized without nsz/nnans, same as reductions with maximum/minimum intrinsics. Note that a new reduction kind is needed to make sure partial reductions are also combined with minimumnum/maximumnum. Note that the final reduction to a scalar value is performed with vector.reduce.fmin/fmax. This should be fine, as the results of the partial reductions with maximumnum/minimumnum silences any sNaNs. In-loop and reductions in SLP are not supported yet, as there's no reduction version of maximumnum/minimumnum yet and fmax may be incorrect. PR: https://github.com/llvm/llvm-project/pull/137335	2025-04-28 11:16:36 +01:00
Florian Hahn	92bfbbc4e5	[VPlan] Invert condition if needed when creating inner regions. (#132292 ) As pointed out by @iamlouk in https://github.com/llvm/llvm-project/pull/129402, the current code doesn't handle latches with different successor orders correctly. Introduce a `NOT`, if needed. Depends on https://github.com/llvm/llvm-project/pull/129402 PR: https://github.com/llvm/llvm-project/pull/132292	2025-04-28 09:40:43 +01:00
Luke Lau	92c3af7c3e	[VPlan] Use correct constructor when cloning VPWidenIntrinsicRecipe without underlying CallInst (#137493 ) I noticed this when working on a patch downstream, and I don't think this is an issue upstream yet. But if a VPWidenIntrinsicRecipe is created without an underlying CallInst, e.g. in createEVLRecipe, it will crash if you try to clone it because it assumes the CallInst always exists. This fixes it by using the CallInst-less constructor in this case.	2025-04-28 10:08:45 +08:00
Kazu Hirata	5cfd81b0cc	[llvm] Use range constructors of *Set (NFC) (#137552 )	2025-04-27 15:59:57 -07:00
Kazu Hirata	1f56716a7e	[llvm] Use hash_combine_range with ranges (NFC) (#137530 )	2025-04-27 12:31:28 -07:00
Florian Hahn	2e934170b0	[LV] Remove LoopVectorizationLegality from InnerLoopVectorizer (NFC). a51e28278 removed the last real use of Legal in InnerLoopVectorizer. Now that it isn't used any longer, remove it to avoid new users being introduced.	2025-04-27 20:30:48 +01:00
Florian Hahn	826f237cb4	[VPlan] Don't added separate vector latch block (NFC). Simplify initial VPlan construction by not creating a separate vector.latch block, which isn't needed and will get folded away later. This has been suggested as independent clean-up multiple times.	2025-04-26 22:03:18 +01:00
Florian Hahn	c4d84e1b00	[VPlan] Use replaceSuccessor/replacePredecessor in insertBlock (NFC). Use replaceSuccessor/replacePredecessor in insertBlockAfter/insertBlockBefore. This preserves the predecessor order, which in turns is needed to not invalidate existing phi recipes. At the moment this is NFC, but enables additional uses in the future.	2025-04-25 20:46:10 +01:00
Florian Hahn	df21288247	[VPlan] Replace ExtractFromEnd with Extract(Last\|Penultimate)Element (NFC). (#137030 ) ExtractFromEnd only has 2 uses, extracting the last and penultimate elements. Replace it with 2 separate opcodes, removing the need to materialize and handle a constant argument. PR: https://github.com/llvm/llvm-project/pull/137030	2025-04-25 16:27:29 +01:00
Matt Arsenault	4ea2278e39	SLPVectorizer: Use use_empty instead of hasNUses(0) (#137336 )	2025-04-25 17:27:01 +02:00
Florian Hahn	7cce38beea	[VPlan] Remove dead SE argument from handleUncountableEarlyExit (NFC). ScalarEvolution is not used by the function, remove the dead arg.	2025-04-24 19:59:05 +01:00
Alexey Bataev	a7a74b349d	[SLP]Improve reordering of the alternate nodes Better to preserve the original order of the alternate nodes to avoid inter-lane shuffling, select/insert subvector patterns provide better perf. Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/136329	2025-04-24 14:33:10 -04:00
Alexey Bataev	f427890a1d	[SLP]Fix PHI comparator to make it follow weak strict ordering restriction Fixes #137164	2025-04-24 11:08:17 -07:00
Simon Pilgrim	f572a5951a	[VectorCombine] Ensure canScalarizeAccess handles cases where the index type can't represent all inbounds values Fixes #132563	2025-04-24 14:17:55 +01:00
Florian Hahn	06d4876982	[VPlan] Replace checking IR loop with checking VPlan predecessors (NFC). Update check to use VPEarlyExitBlock's predecessors, which removes a dependence on underlying IR and is more in line with the comment below.	2025-04-24 12:29:34 +01:00
Florian Hahn	5d136f90a9	[VPlan] Manage instruction metadata in VPlan. (#135272 ) Add a new helper to manage IR metadata that can be progated to generated instructions for recipes. This helps to remove a number of remaining uses of getUnderlyingInstr during VPlan execution. PR: https://github.com/llvm/llvm-project/pull/135272	2025-04-24 11:57:19 +01:00
Luke Lau	3883b27ba8	[VPlan] Fix typo in assertion. NFC (#137009 )	2025-04-24 16:36:32 +08:00
Florian Hahn	e268f71c59	[VPlan] Remove unneeded early continue. (NFC) As suggested in https://github.com/llvm/llvm-project/pull/136455, now unreachable exit blocks won't have any phi nodes.	2025-04-24 08:59:30 +01:00
Florian Hahn	15bb1db4a9	[VPlan] Remove ILV::sinkScalarOperands. (#136023 ) Remove legacy ILV sinkScalarOperands, which is superseded by the sinkScalarOperands VPlan transforms. There are a few cases that aren't handled by VPlan's sinkScalarOperands, because the recipes doesn't support replicating. Those are pointer inductions and blends. We could probably improve this further, by allowing replication for more recipes, but I don't think the extra complexity is warranted. Depends on https://github.com/llvm/llvm-project/pull/136021. PR: https://github.com/llvm/llvm-project/pull/136023	2025-04-24 08:37:49 +01:00
Florian Hahn	71f2c1e204	[VPlan] Use early exit in ::extractLastLaneOfFirstOperand (NFC). Reduce indent level, as suggested in https://github.com/llvm/llvm-project/pull/136455.	2025-04-23 21:55:35 +01:00
Florian Hahn	ff36508d21	[VPlan] Remove redundant setting of parent in createLoopRegion (NFC). The regions parents will be set when the parents are set after creating the parent region.	2025-04-23 21:45:15 +01:00
Florian Hahn	3fbbe9b8d0	[VPlan] Add exit phi operands during initial construction (NFC). (#136455 ) Add incoming exit phi operands during the initial VPlan construction. This ensures all users are added to the initial VPlan and is also needed in preparation to retaining exiting edges during initial construction. PR: https://github.com/llvm/llvm-project/pull/136455	2025-04-23 20:40:42 +01:00
Ramkumar Ramachandra	bdf21ca8ac	[LV] Fix missing entry in willGenerateVectors (#136712 ) willGenerateVectors switches on opcodes of a recipe, but Histogram is missing in the switch statement, which could cause a crash in some cases. The crash was initially observed when developing another patch.	2025-04-23 19:06:38 +01:00
Nicholas Guy	1ce709cb84	[LV] Fix crash when building partial reductions using types that aren't known scale factors (#136680 )	2025-04-23 13:19:18 +01:00
David Green	98b6f8dc69	[CostModel] Remove optional from InstructionCost::getValue() (#135596 ) InstructionCost is already an optional value, containing an Invalid state that can be checked with isValid(). There is little point in returning another optional from getValue(). Most uses do not make use of it being a std::optional, dereferencing the value directly (either isValid has been checked previously or the Cost is assumed to be valid). The one case that does in AMDGPU used value_or which has been replaced by a isValid() check.	2025-04-23 07:46:27 +01:00
Florian Hahn	bf33f03f5a	[VPlan] Move Ingredient accesses closer to uses (NFC). Move accessing Ingredient closer to its only use for VPWidenMemoryRecipes.	2025-04-22 20:26:15 +01:00
Alexey Bataev	f52b01b6cf	[SLP][NFC]Rename functions/variables, limit visibility to meet the coding standards, NFC	2025-04-22 09:56:31 -07:00
Alexey Bataev	9c388f1f05	[SLP]Prefer segmented/deinterleaved loads to strided and fix codegen Need to estimate, which one is preferable, deinterleaved/segmented loads or strided. Segmented loads can be combined, improving the overall performance. Reviewers: RKSimon, hiraditya Reviewed By: hiraditya, RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/135058	2025-04-22 12:11:01 -04:00
Alexey Bataev	0252d338fa	[SLP]Model single unique value insert + shuffle as splat + select, where profitable When we have the remaining unique scalar, that should be inserted into non-poison vector and into non-zero position: ``` %vec1 = insertelement %vec, %v, pos1 %res = shuffle %vec1, poison, <0, 1, 2,..., pos1, pos1 + 1, ..., pos1, ...> ``` better to estimate if it is profitable to model it as is or model it as: ``` %bv = insertelement poison, %v, 0 %splat = shuffle %bv, poison, <poison, ..., 0, ..., 0, ...> %res = shuffle %vec, %splat, <0, 1, 2,..., pos1 + VF, pos1 + 1, ...> ``` Reviewers: preames, hiraditya, RKSimon Reviewed By: preames Pull Request: https://github.com/llvm/llvm-project/pull/136590	2025-04-22 11:30:29 -04:00
Luke Lau	980531cac0	[VPlan] Fix MayReadFromMemory/MayWriteToMemory on VPWidenIntrinsicRecipe (#136684 ) These seem to be the wrong way round, e.g. see the definition at Instruction::mayReadFromMemory(). If an instruction only writes to memory then it's known to not read memory, and so on. Only noticed this when using VPWidenIntrinsicRecipe in a local patch and wondered why it kept on getting DCEd despite the intrinsic writing to memory.	2025-04-22 22:50:12 +08:00
David Green	d20604e5b6	[CostModel] Plumb CostKind into getExtractWithExtendCost (#135523 ) This will likely not affect much with the current uses of the function, but if we have getExtractWithExtendCost we can plumb CostKind through it in the same way as other costmodel functions.	2025-04-22 15:09:43 +01:00
David Sherwood	ef72b93626	[LV] Use requested calling convention for vector math routines (#136122 ) Some vector math routines, e.g. ArmPL, specify a particular calling convention on the routines which can help improve performance by specifying what registers have to be preserved across the call.	2025-04-22 09:33:52 +01:00
Florian Hahn	8c83355d5b	[VPlan] Handle VPIRPhi in VPRecipeBase::isPhi (NFC). Also handle VPIRPhi in VPRecipeBase::isPhi, to simplify existing code dealing with VPIRPhis. Suggested as part of https://github.com/llvm/llvm-project/pull/136455.	2025-04-21 21:04:20 +01:00

1 2 3 4 5 ...

5950 Commits