5950 Commits

Author SHA1 Message Date
Alexey Bataev
9400270449 [SLP]Fix comparator for vector operands of extractelements in PHICompare
Need to make comparator to follow strict-weak ordering to fix compiler
crashes.

Fixes #138178
2025-05-01 14:28:20 -07:00
Florian Hahn
8d02529f77
[VPlan] Consistently use ArrayRef<VPValue *> for operands in ctors (NFC) (#137798)
Now that there is an ArrayRef constructor taking iterator_ranges, use it
consistently to slightly simplify code.

Depends on https://github.com/llvm/llvm-project/pull/137796.

PR: https://github.com/llvm/llvm-project/pull/137798
2025-05-01 21:19:10 +01:00
Samuel Tebbs
fa769655e7 [LV] NFC: Make VPPartialReductionRecipe a VPReductionRecipe 2025-04-30 19:44:40 +01:00
Luke Lau
2cd829fc2c
[VectorUtils][VPlan] Consolidate VPWidenIntrinsicRecipe::onlyFirstLaneUsed and isVectorIntrinsicWithScalarOpAtArg (#137497)
We can reuse isVectorIntrinsicWithScalarOpAtArg in VectorUtils to
determine if only the first lane will be used for a
VPWidenIntrinsicRecipe, provided that we also move the VP EVL operand
check into it.

This was needed by a local patch I was working on that created a
VPWidenIntrinsicRecipe with a VP intrinsic, and prevents the need to
update the scalar arguments in two places.
2025-05-01 01:25:41 +08:00
Jonas Paulsson
f5c8c1eedb
[SLPVectorizer] Move X86 specific handling into X86TTIImpl. (#137830)
`ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" `
changed SLPVectorizer getScalarizationOverhead() to call
TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some
cases. This was due to X86 specific handlings in these (overridden) methods,
and unfortunately the general preference of TTI.getScalarizationOverhead()
was dropped. If VL is available it should always be preferred to use
getScalarizationOverhead(), and this is indeed the case for SystemZ which
has a special insertion instruction that can insert two GPR64s.

Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with
codegen"` reworked SLPVectorizer getGatherCost() which together with
ad9909d caused the SystemZ test vec-elt-insertion.ll to fail.

This patch restores the SystemZ test and reverts the change in SLPVectorizer
getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always
called again. The ForPoisonSrc argument is now passed on to the TTI method
so that X86 can handle this as required.

Fixes: #135346
2025-04-30 17:11:27 +02:00
Florian Hahn
d431921677
Revert "[VPlan] Add canonical IV during construction (NFC)."
This reverts commit e17122fffa8d233fcf9f717354ecda46173f1b8d.

Revert as this seems to break some unit tests on some bots.
2025-04-29 22:55:11 +01:00
Florian Hahn
e17122fffa
[VPlan] Add canonical IV during construction (NFC).
This addresses an existing TODO and simply moves the current code to add
canonical IV recipes to the initial skeleton construction, at the same
place where the corresponding region will be introduced.
2025-04-29 22:38:59 +01:00
Florian Hahn
7e71466900
[VPlan] Preserve dbg location on canonical IVs in native path.
Pass the debug location of the primary IV to addCanonicalIVRecipes in
the native path, matching the behavior of inner loop vectorization.
2025-04-29 21:40:42 +01:00
Luke Lau
c5d780bb72
[VPlan] Remove no longer needed VP intrinsic handling in VPWidenIntrinsicRecipe::computeCost. NFCI (#137573)
Whenever calls were transformed to VP intrinsics with EVL tail folding
in #110412, this workaround was added in computeCost to avoid an
assertion when checking ICA.getArgs().

However it turned out that the actual arguments were never used and this
assertion was removed in #115983 afterwards, so it's now fine to leave
the arguments empty and use the type based cost instead.

The type based cost and value based cost are the same for these VP
intrinsics.

This was tested by adding back in the transformation code in #110412 and
checking that no assertions were still hit.
2025-04-29 21:55:20 +08:00
Gaëtan Bossu
c5c4f0d11c
[SLP] Simplify tryToFindDuplicates() (NFC) (#135766)
This NFC aims to simplify the control-flow and interfaces used in tryToFindDuplicates(). The point is to make it easier to understand where decisions for scalar de-duplication are made.

In particular:
 - Limit indentation
 - Rename some variables to better match their use case
- Always give consistent outputs for VL and ReuseShuffleIndices. This makes it possible to use the same code for building gather TreeEntry everywhere. This also allows to remove the TryToFindDuplicates lambda.
2025-04-29 14:47:22 +01:00
Luke Lau
b0f2bfc7e4
[VPlan] Use correct non-FMF constructor in VPInstructionWithType createNaryOp (#137632)
Currently if we try to create a VPInstructionWithType without a FMF via
VPBuilder::createNaryOp we will use the constructor that asserts
`assert(isFPMathOp() && "this op can't take fast-math flags");`.

This fixes it by checking if FMFs have a value, similar to the other
createNaryOp overloads.

This is needed by #129508
2025-04-29 20:35:19 +08:00
Florian Hahn
d68b446933
[IR] Add matchers for remaining FP min/max intrinsics (NFC). (#137612)
Add dedicated matchers for minimum,maximum,minimumnum and maximumnum
intrinsics, similar for the existing matchers for maxnum and minnum.

As suggested in https://github.com/llvm/llvm-project/pull/137335.

PR: https://github.com/llvm/llvm-project/pull/137612
2025-04-29 12:20:00 +01:00
Ramkumar Ramachandra
49842426f3
[LV] Fix MinBWs in WidenIntrinsic case (#137005)
There is a bug in the computation and handling of MinBWs in the case of
VPWidenIntrinsicRecipe: a crash is observed in
VPlanTransforms::truncateToMinimalBitwidths due to a mismatch between
the number of recipes processed and the number of entries in MinBWs. Fix
handling of calls in llvm::computeMinimumValueSizes, and handle
VPWidenIntrinsicRecipe in truncateToMinimalBitwidths, fixing the bug.

Fixes #87407.
2025-04-29 09:47:38 +01:00
Luke Lau
7ca6490636
[VPlan] Factor out isUnrolled() helper in VPWidenIntOrFpInductionRecipe. NFC (#137635)
Split off from #129508, this generalizes getSplatVFValue and
getLastUnrolledPartOperand so they don't need changed if another operand
is added.
2025-04-29 10:18:47 +08:00
Florian Hahn
d2ce88a939
[VPlan] Create initial skeleton before creating regions. (NFC)
Move out the logic to prepare for vectorization to a separate transform,
before creating loop regions. This was discussed as follow-up
in https://github.com/llvm/llvm-project/pull/136455.

This just moves the existing code around slightly  and will simplify
follow-up patches to include the exiting edges during initial VPlan
construction.
2025-04-28 21:51:32 +01:00
Florian Hahn
043b04acff
Reapply "[VPlan] Fold NOT into predicate of wide compares." (#130347)
This reverts commit 8dd160f4767f971572eac065c8650d9202ff5bf9.

The recommit contains an adjustment to planContainsAdditionalSimplifications,
which considers changes to the original predicate for compares.

Original commit message:

Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.

Alive2 Proofs for FPCMP test changes:  https://alive2.llvm.org/ce/z/WGDz9U

PR: https://github.com/llvm/llvm-project/pull/129430
2025-04-28 20:01:37 +01:00
Alexey Bataev
73d90ec825 [SLP][NFC]Consider non-profitable trees with only phis, gathers, splits and small nodes with reuses
Improves compile time for non-profitable cases.
Fixes #135965
2025-04-28 03:56:08 -07:00
Florian Hahn
ec1016f7ef
[IVDescriptors] Support reductions with minimumnum/maximumnum. (#137335)
Add a new reduction recurrence kind for reductions with
minimumnum/maximumnum. Such reductions can be vectorized without
nsz/nnans, same as reductions with maximum/minimum intrinsics.

Note that a new reduction kind is needed to make sure partial reductions
are also combined with minimumnum/maximumnum.

Note that the final reduction to a scalar value is performed with
vector.reduce.fmin/fmax. This should be fine, as the results of the
partial reductions with maximumnum/minimumnum silences any sNaNs.

In-loop and reductions in SLP are not supported yet, as there's no
reduction version of maximumnum/minimumnum yet and fmax may be
incorrect.

PR: https://github.com/llvm/llvm-project/pull/137335
2025-04-28 11:16:36 +01:00
Florian Hahn
92bfbbc4e5
[VPlan] Invert condition if needed when creating inner regions. (#132292)
As pointed out by @iamlouk in
https://github.com/llvm/llvm-project/pull/129402, the current code
doesn't handle latches with different successor orders correctly.
Introduce a `NOT`, if needed.

Depends on  https://github.com/llvm/llvm-project/pull/129402

PR: https://github.com/llvm/llvm-project/pull/132292
2025-04-28 09:40:43 +01:00
Luke Lau
92c3af7c3e
[VPlan] Use correct constructor when cloning VPWidenIntrinsicRecipe without underlying CallInst (#137493)
I noticed this when working on a patch downstream, and I don't think
this is an issue upstream yet.

But if a VPWidenIntrinsicRecipe is created without an underlying
CallInst, e.g. in createEVLRecipe, it will crash if you try to clone it
because it assumes the CallInst always exists.

This fixes it by using the CallInst-less constructor in this case.
2025-04-28 10:08:45 +08:00
Kazu Hirata
5cfd81b0cc
[llvm] Use range constructors of *Set (NFC) (#137552) 2025-04-27 15:59:57 -07:00
Kazu Hirata
1f56716a7e
[llvm] Use hash_combine_range with ranges (NFC) (#137530) 2025-04-27 12:31:28 -07:00
Florian Hahn
2e934170b0
[LV] Remove LoopVectorizationLegality from InnerLoopVectorizer (NFC).
a51e28278 removed the last real use of Legal in InnerLoopVectorizer. Now
that it isn't used any longer, remove it to avoid new users being
introduced.
2025-04-27 20:30:48 +01:00
Florian Hahn
826f237cb4
[VPlan] Don't added separate vector latch block (NFC).
Simplify initial VPlan construction by not creating a separate
vector.latch block, which isn't needed and will get folded away later.
This has been suggested as independent clean-up multiple times.
2025-04-26 22:03:18 +01:00
Florian Hahn
c4d84e1b00
[VPlan] Use replaceSuccessor/replacePredecessor in insertBlock (NFC).
Use replaceSuccessor/replacePredecessor in
insertBlockAfter/insertBlockBefore. This preserves the predecessor
order, which in turns is needed to not invalidate existing phi recipes.

At the moment this is NFC, but enables additional uses in the future.
2025-04-25 20:46:10 +01:00
Florian Hahn
df21288247
[VPlan] Replace ExtractFromEnd with Extract(Last|Penultimate)Element (NFC). (#137030)
ExtractFromEnd only has 2 uses, extracting the last and penultimate
elements. Replace it with 2 separate opcodes, removing the need to
materialize and handle a constant argument.

PR: https://github.com/llvm/llvm-project/pull/137030
2025-04-25 16:27:29 +01:00
Matt Arsenault
4ea2278e39
SLPVectorizer: Use use_empty instead of hasNUses(0) (#137336) 2025-04-25 17:27:01 +02:00
Florian Hahn
7cce38beea
[VPlan] Remove dead SE argument from handleUncountableEarlyExit (NFC).
ScalarEvolution is not used by the function, remove the dead arg.
2025-04-24 19:59:05 +01:00
Alexey Bataev
a7a74b349d
[SLP]Improve reordering of the alternate nodes
Better to preserve the original order of the alternate nodes to avoid
inter-lane shuffling, select/insert subvector patterns provide better
perf.

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/136329
2025-04-24 14:33:10 -04:00
Alexey Bataev
f427890a1d [SLP]Fix PHI comparator to make it follow weak strict ordering restriction
Fixes #137164
2025-04-24 11:08:17 -07:00
Simon Pilgrim
f572a5951a [VectorCombine] Ensure canScalarizeAccess handles cases where the index type can't represent all inbounds values
Fixes #132563
2025-04-24 14:17:55 +01:00
Florian Hahn
06d4876982
[VPlan] Replace checking IR loop with checking VPlan predecessors (NFC).
Update check to use VPEarlyExitBlock's predecessors, which removes a
dependence on underlying IR and is more in line with the comment below.
2025-04-24 12:29:34 +01:00
Florian Hahn
5d136f90a9
[VPlan] Manage instruction metadata in VPlan. (#135272)
Add a new helper to manage IR metadata that can be progated to generated
instructions for recipes.

This helps to remove a number of remaining uses of getUnderlyingInstr
during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/135272
2025-04-24 11:57:19 +01:00
Luke Lau
3883b27ba8
[VPlan] Fix typo in assertion. NFC (#137009) 2025-04-24 16:36:32 +08:00
Florian Hahn
e268f71c59
[VPlan] Remove unneeded early continue. (NFC)
As suggested in
https://github.com/llvm/llvm-project/pull/136455, now unreachable exit
blocks won't have any phi nodes.
2025-04-24 08:59:30 +01:00
Florian Hahn
15bb1db4a9
[VPlan] Remove ILV::sinkScalarOperands. (#136023)
Remove legacy ILV sinkScalarOperands, which is superseded by the
sinkScalarOperands VPlan transforms.

There are a few cases that aren't handled by VPlan's sinkScalarOperands,
because the recipes doesn't support replicating. Those are pointer
inductions and blends.

We could probably improve this further, by allowing replication for more
recipes, but I don't think the extra complexity is warranted.

Depends on https://github.com/llvm/llvm-project/pull/136021.

PR: https://github.com/llvm/llvm-project/pull/136023
2025-04-24 08:37:49 +01:00
Florian Hahn
71f2c1e204
[VPlan] Use early exit in ::extractLastLaneOfFirstOperand (NFC).
Reduce indent level, as suggested in
https://github.com/llvm/llvm-project/pull/136455.
2025-04-23 21:55:35 +01:00
Florian Hahn
ff36508d21
[VPlan] Remove redundant setting of parent in createLoopRegion (NFC).
The regions parents will be set when the parents are set after creating
the parent region.
2025-04-23 21:45:15 +01:00
Florian Hahn
3fbbe9b8d0
[VPlan] Add exit phi operands during initial construction (NFC). (#136455)
Add incoming exit phi operands during the initial VPlan construction.
This ensures all users are added to the initial VPlan and is also needed
in preparation to retaining exiting edges during initial construction.

PR: https://github.com/llvm/llvm-project/pull/136455
2025-04-23 20:40:42 +01:00
Ramkumar Ramachandra
bdf21ca8ac
[LV] Fix missing entry in willGenerateVectors (#136712)
willGenerateVectors switches on opcodes of a recipe, but Histogram is
missing in the switch statement, which could cause a crash in some
cases. The crash was initially observed when developing another patch.
2025-04-23 19:06:38 +01:00
Nicholas Guy
1ce709cb84
[LV] Fix crash when building partial reductions using types that aren't known scale factors (#136680) 2025-04-23 13:19:18 +01:00
David Green
98b6f8dc69
[CostModel] Remove optional from InstructionCost::getValue() (#135596)
InstructionCost is already an optional value, containing an Invalid
state that can be checked with isValid(). There is little point in
returning another optional from getValue(). Most uses do not make use of
it being a std::optional, dereferencing the value directly (either
isValid has been checked previously or the Cost is assumed to be valid).
The one case that does in AMDGPU used value_or which has been replaced
by a isValid() check.
2025-04-23 07:46:27 +01:00
Florian Hahn
bf33f03f5a
[VPlan] Move Ingredient accesses closer to uses (NFC).
Move accessing Ingredient closer to its only use for
VPWidenMemoryRecipes.
2025-04-22 20:26:15 +01:00
Alexey Bataev
f52b01b6cf [SLP][NFC]Rename functions/variables, limit visibility to meet the coding standards, NFC 2025-04-22 09:56:31 -07:00
Alexey Bataev
9c388f1f05
[SLP]Prefer segmented/deinterleaved loads to strided and fix codegen
Need to estimate, which one is preferable, deinterleaved/segmented
loads or strided. Segmented loads can be combined, improving
the overall performance.

Reviewers: RKSimon, hiraditya

Reviewed By: hiraditya, RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/135058
2025-04-22 12:11:01 -04:00
Alexey Bataev
0252d338fa
[SLP]Model single unique value insert + shuffle as splat + select, where profitable
When we have the remaining unique scalar, that should be inserted into
non-poison vector and into non-zero position:
```
%vec1 = insertelement %vec, %v, pos1
%res = shuffle %vec1, poison, <0, 1, 2,..., pos1, pos1 + 1, ..., pos1,
...>
```
better to estimate if it is profitable to model it as is or model it as:
```
%bv = insertelement poison, %v, 0
%splat = shuffle %bv, poison, <poison, ..., 0, ..., 0, ...>
%res = shuffle %vec, %splat, <0, 1, 2,..., pos1 + VF, pos1 + 1, ...>
```

Reviewers: preames, hiraditya, RKSimon

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/136590
2025-04-22 11:30:29 -04:00
Luke Lau
980531cac0
[VPlan] Fix MayReadFromMemory/MayWriteToMemory on VPWidenIntrinsicRecipe (#136684)
These seem to be the wrong way round, e.g. see the definition at
Instruction::mayReadFromMemory().
If an instruction only writes to memory then it's known to not read
memory, and so on.

Only noticed this when using VPWidenIntrinsicRecipe in a local patch and
wondered why it kept on getting DCEd despite the intrinsic writing to
memory.
2025-04-22 22:50:12 +08:00
David Green
d20604e5b6
[CostModel] Plumb CostKind into getExtractWithExtendCost (#135523)
This will likely not affect much with the current uses of the function,
but if we have getExtractWithExtendCost we can plumb CostKind through it
in the same way as other costmodel functions.
2025-04-22 15:09:43 +01:00
David Sherwood
ef72b93626
[LV] Use requested calling convention for vector math routines (#136122)
Some vector math routines, e.g. ArmPL, specify a particular
calling convention on the routines which can help improve
performance by specifying what registers have to be preserved
across the call.
2025-04-22 09:33:52 +01:00
Florian Hahn
8c83355d5b
[VPlan] Handle VPIRPhi in VPRecipeBase::isPhi (NFC).
Also handle VPIRPhi in VPRecipeBase::isPhi, to simplify existing code
dealing with VPIRPhis.

Suggested as part of https://github.com/llvm/llvm-project/pull/136455.
2025-04-21 21:04:20 +01:00