5663 Commits

Author SHA1 Message Date
Florian Hahn
72376e19df
[VPlan] Remove unused VPWidenIntrinsicRecipe constructor (NFC) 2025-03-06 20:24:30 +00:00
Ramkumar Ramachandra
ddffb74afd
[LV] Strip unreachable SCEV-check blocks (#130079)
emitSCEVChecks checks if SCEVCheckCond matches zero, and returns
nullptr. However, it sets SCEVCheckCond as used before it does this,
which prevents it from being removed during cleanup, resulting in
unreachable blocks being emitted. Fix this.
2025-03-06 19:30:25 +00:00
Ramkumar Ramachandra
00f3089c2e
[LV] Use PatternMatch in emitTransformedIndex (NFC) (#130081) 2025-03-06 19:23:31 +00:00
Alexey Bataev
4959025bbc [SLP]Fix non-determinism in reused elements analysis
Need to use consistent storages for unique elements, when going to
iterate over them to avoid non-determinism in reused elements analysis.

Fixes #130082
2025-03-06 10:12:49 -08:00
Luke Lau
6d89c042e3
[VPlan] Remove dead AnyOf reduction case in VPReductionRecipe. NFCI (#130048)
From what I understand, we only create VPReductionRecipes for in-loop
reductions, and we don't currently support in-loop AnyOf reductions.

We only create VPReductionRecipes in the !PhiR->isInLoop() section of
adjustRecipesForReductions, and this comment from the initial patch
seems to confirm this
https://reviews.llvm.org/D108136#anchor-inline-1038338, so I think we
can remove this check in the condition logic.

I checked compiling SPEC 2017 with -prefer-inloop-predicates and the
added assertion doesn't trigger.
2025-03-07 01:05:53 +08:00
Alexey Bataev
31845cf06c Revert "[SLP]Fix non-determinism in reused elements analysis"
This reverts commit 3158525afdc3677457712963ef45c83f4f8f900f to fix
a bug revealed in https://lab.llvm.org/buildbot/#/builders/123/builds/14930
2025-03-06 08:59:08 -08:00
Alexey Bataev
3158525afd [SLP]Fix non-determinism in reused elements analysis
Need to use consistent storages for unique elements, when going to
iterate over them to avoid non-determinism in reused elements analysis.

Fixes #130082
2025-03-06 08:51:31 -08:00
Alexey Bataev
1182be503d [SLP]Fix a crash for buildvector nodes with parent phi nodes with same incoming blocks
If trying to find matching buildvector node for another nodes, and both
nodes are used by vectorized phi nodes and are coming from the same
parent block, this nodes should be considered matched to avoid a crash.
2025-03-06 07:42:43 -08:00
hanbeom
5d1029b4a8
[VectorCombine] Handle shuffle of selects (#128032)
(shuffle(select(c1,t1,f1)), (select(c2,t2,f2)), m)
-> (select (shuffle c1,c2,m), (shuffle t1,t2,m), (shuffle f1,f2,m))

The behaviour of SelectInst on vectors is the same as for
`V'select[i] = Condition[i] ? V'True[i] : V'False[i]`.

If a ShuffleVector is performed on two selects, it will be like:
`V'[mask] = (V'select[i] = Condition[i] ? V'True[i] : V'False[i])`

That's why a ShuffleVector with two SelectInst is equivalent to
first ShuffleVector Condition/True/False and then SelectInst that
result.

This patch implements the transforming described above.

Proof: https://alive2.llvm.org/ce/z/97wfHp
Fixes #120775
2025-03-06 12:43:47 +00:00
Luke Lau
5e54c92314
[VPlan] Fix crash when unrolling in-loop reduction chains (#129840)
If an in-loop reduction is chained e.g.

    WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2>
    REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>)
    REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>)

When we try to unroll the second add reduction, we crash because we
currently expect the chain to be a VPReductionPHIRecipe, when in fact
it's the previous reduction. This relaxes the cast to a dyn_cast, so we
end up unrolling to:

    WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2>
    WIDEN-REDUCTION-PHI ir<%rdx>.1 = phi ir<0>, ir<%add2>.1, ir<1>
    WIDEN-REDUCTION-PHI ir<%rdx>.2 = phi ir<0>, ir<%add2>.2, ir<2>
    WIDEN-REDUCTION-PHI ir<%rdx>.3 = phi ir<0>, ir<%add2>.3, ir<3>
    REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>)
    REDUCE ir<%add1>.1 = ir<%rdx>.1 + reduce.add (ir<%x>.1)
    REDUCE ir<%add1>.2 = ir<%rdx>.2 + reduce.add (ir<%x>.2)
    REDUCE ir<%add1>.3 = ir<%rdx>.3 + reduce.add (ir<%x>.3)
    REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>)
    REDUCE ir<%add2>.1 = ir<%add1>.1 + reduce.add (ir<%y>.1)
    REDUCE ir<%add2>.2 = ir<%add1>.2 + reduce.add (ir<%y>.2)
    REDUCE ir<%add2>.3 = ir<%add1>.3 + reduce.add (ir<%y>.3)

This fixes a crash when building 525.x264_r from SPEC CPU 2017 on
AArch64 with -mllvm -prefer-inloop-reductions
2025-03-05 19:13:23 +08:00
Luke Lau
e1cea0d928
[LV][TTI] Remove unused ReductionFlags. NFC (#129858)
No in-tree targets currently use it in the
preferInLoopReduction/preferPredicatedReductionSelect TTI hooks. It
looks like it used to be used in LoopUtils, at least in
8ca60db40bd944dc5f67e0f200a403b4e03818ea, but I presume it was replaced
by RecurrenceDescriptor.
2025-03-05 18:31:12 +08:00
Alexey Bataev
855178af99
[SLP]Fix/improve getSpillCost analysis
Previous implementation may took some extra time, when walked over the
same instructions several times. And also it did not include proper
analysis for cross-basic-block use of the vectorized values. This
version fixes it.

It walks over the tree and checks the deps between entries and their
operands. If there are non-vectorized calls in between, it adds
a single(!) spill cost, because the vector value should be
spilled/reloaded only once.

Also, this version caches analysis for each entries, which are detected,
and do not repeats it, uses data, found during previous analysis for
previous nodes.

Also, it has the internal limit. If the number of instructions
between nodes and their operands is too big (> than ScheduleRegionSizeBudget / VectorizableTree.size()), it is considered that the spill is required. It allows to improve compile time.

Reviewers: preames, RKSimon, mikhailramalho

Reviewed By: preames

Pull Request: https://github.com/llvm/llvm-project/pull/129258
2025-03-04 15:47:23 -05:00
Florian Hahn
b2d70e8796
[VPlan] Use Builder to create cast recipes in VPlanTransforms (NFC).
Use VPBuilder in a few more places. This avoids manual insertions and
will make changing the cast recipe easier in the future.
2025-03-04 13:39:12 +00:00
Luke Lau
47fb9c4bb9
[VPlan] Add Name argument to VPWidenPHIRecipe. NFC (#129527)
This allows a different IR name for the generated phi to be used. This
is split off from #118638 and helps remove some of the diffs in it.
2025-03-04 16:47:21 +08:00
Ramkumar Ramachandra
80bdfcd411
[LoopUtils] Don't wrap in getLoopEstimatedTripCount (#129080)
getLoopEstimatedTripCount returns the trip count based on profiling
data, and its documentation says that it could return 0 when the trip
count is zero, but this is not the case: a valid trip count can never be
zero, and it returns 0 when the unsigned ExitCount is incremented by 1
and wraps. Some callers are careful about checking for this zero value
in an std::optional, but it makes for an API with footguns, as a
std::optional return value indicates that a non-nullopt value would be a
valid trip count. Fix this by explicitly returning std::nullopt when the
return value would wrap, and strip additional checks in callers. This
also fixes a minor bug in LoopVectorize.
2025-03-04 08:43:08 +00:00
Florian Hahn
15770a1e9d
[VPlan] Remove dead recipes in entry when merging regions. (NFC)
Also remove recipes in the entry of the region that will be removed.
This makes sure we don't leave any dead users around. NFC at the moment,
but avoids causing issues in the future.
2025-03-04 08:26:27 +00:00
Florian Hahn
dfc5f37e3a
[VPlan] Move onlyFirstLaneUsed to VPWidenInductionRecipe (NFC).
Move onlyFirstLaneUsed from VPWidenIntOrFpInductionRecipe and
VPWidenPointerInduction to VPWidenInductionRecipe. Also mark step value
as having only its first lane used.
2025-03-03 22:43:47 +00:00
Florian Hahn
87f837cb26
[VPlan] Remove unneeded classof with VPHeaderRecipe args (NFC).
The extra classof implementation is not needed any longer.
2025-03-03 20:52:28 +00:00
Mel Chen
9b4ad2fe50
[LV][EVL] Support fixed-order recurrence idiom with EVL tail folding. (#124093)
This patch converts the llvm.vector.splice intrinsic to
llvm.experimental.vp.splice, ensuring that fixed-order recurrences
execute correctly when tail folding by EVL is enable.
Due to the non-VFxUF penultimate EVL issue, the EVL from the previous
iteration will be preserved and used in llvm.experimental.vp.splice.
2025-03-03 21:27:13 +08:00
chrisPyr
71f4c7dabe
[NFC]Make file-local cl::opt global variables static (#126486)
#125983
2025-03-03 13:46:33 +07:00
Florian Hahn
ba7e27381f
[VPlan] Use VP_CLASSOF_IMPL in VPWidenRecipe. (NFC) 2025-03-02 11:40:23 +00:00
Florian Hahn
f937b17e85
[LV] Don't query SCEV for non-invariant values in cost model.
This fixes a divergence between VPlan and legacy cost model, matching
behavior further up in getInstructionCost as well.

Fixes https://github.com/llvm/llvm-project/issues/129236.
2025-03-02 10:55:52 +00:00
Florian Hahn
75270e3750
[VPlan] Don't print VPlan DT after VPlan construction. (NFC)
Remove unnecessary code to just print VPlan dominator tree.
2025-03-01 21:15:56 +00:00
Simon Pilgrim
5ddf40fa78
[VectorCombine] scalarizeLoadExtract - don't create scalar loads if any extract is waiting to be erased (#129375)
If any extract is waiting to be erased, then bail out as this will distort the cost calculation and possibly lead to infinite loops.

Fixes #129373
2025-03-01 16:54:22 +00:00
Florian Hahn
9f37cdca52
[VPlan] Update VPTransformState accessors to take const VPValue (NFC).
This will enable using const VPValue * pointers are in more places.
2025-03-01 13:15:37 +00:00
Simon Pilgrim
8f4d2e02be [VectorCombine] scalarizeLoadExtract - add debug message for match + cost-comparison
Helps with debugging to show to that the fold found the match, and shows the old + new costs to indicate whether the fold was/wasn't profitable.
2025-03-01 09:57:08 +00:00
Jie Fu
7cf2f602df [Vectorize] Fix unused variable warnings (NFC)
/llvm-project/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Passes/TransactionAcceptOrRevert.cpp:24:8: error: unused variable 'CostBefore' [-Werror,-Wunused-variable]
  auto CostBefore = SB.getBeforeCost();
       ^
/llvm-project/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Passes/TransactionAcceptOrRevert.cpp:25:8: error: unused variable 'CostAfter' [-Werror,-Wunused-variable]
  auto CostAfter = SB.getAfterCost();
       ^
2 errors generated.
2025-03-01 09:39:58 +08:00
vporpo
45d018097c
[SandboxVec][NFC] Add LLVM_DEBUG dumps (#129335)
This patch updates/adds LLVM_DEBUG dumps.
It moves the DEBUG_TYPE into SandboxVectorizer/Debug.h such that it can
be shared across all components of the vectorizer.
2025-02-28 16:10:34 -08:00
vporpo
6ff0f69fec
[SandboxVec][BottomUpVec] Fix vectorization of vector constants (#129290)
This patch fixes the value we generate when we vectorize constants.
2025-02-28 14:37:48 -08:00
Alexey Bataev
a36a67c79a [SLP]Fix the analysis of the user buildvector nodes for minbitwidth
If the user node is a buildvector/gather node and it has no internal
instructions state, need to check properly for this state and check the
type of the node itself, not its operands.

Fixes #129242
2025-02-28 13:17:14 -08:00
Florian Hahn
f9b2497055
[VPlan] Use const for VPBasicBlock* in key in VPBB2IRBB (NFC).
This allows queries in places where only a const pointer to VPBasiBlocks
is available.
2025-02-28 20:45:11 +00:00
Alexey Bataev
e1e20c07e4 [SLP]Fix bitwidth analysis for signed nodes, incoming into UITOFP nodes
If the signed node is the operand of UITOFP, the bitwidth analysis
should consider minimum value between incoming bitwidth and the bitwidth
of the UITOFP node.

Fixes #129244
2025-02-28 11:50:50 -08:00
vporpo
c7529248cd
[SandboxVec][BottomUpVec] Add -sbvec-stop-bndl flag for debugging (#129132)
This patch adds a helper flag for bisection debugging. This flag
force-stops vectorization after this many bundles have been considered
for vectorization.
Using -sbvec-stop-bndl=0 will not vectorize the code at all.
2025-02-28 11:19:41 -08:00
Florian Hahn
c0bf4b2c57
[VPlan] Remove unneeded VPValue::getLiveInIRValue() const (NFC).
The accessor is not needed/used.
2025-02-28 17:01:19 +00:00
vporpo
32bcc9f0d3
[SandboxVec] Add option -sbvec-allow-file for bisection debugging (#129127)
This new option lets you specify an allow-list of source files and
disables vectorization if the IR is not in the list. This can be used
for debugging miscompiles.
2025-02-27 14:15:04 -08:00
vporpo
adf0abf354
[SandboxVec][BottomUpVec] Add -sbvec-stop-at flag for debugging (#129097)
When debugging miscompiles we need a way to force-stop the vectorizer
early. This helps figure out which invocation is generating incorrect
code.
2025-02-27 13:33:54 -08:00
Florian Hahn
6ce41db6b0
[VPlan] Preserve DebugLoc for VPBranchOnMaskRecipe.
Update code to set and generate debug location for branch recipe
2025-02-27 20:19:42 +00:00
Florian Hahn
253d691596
[VPlan] Update VPBranchOnMaskRecipe to always set the mask (NFC).
The mask is always available at construction time. Make it non-optional
to simlpify code.
2025-02-27 18:53:24 +00:00
vporpo
e2b0d5df84
[SandboxVec][Scheduler] Enforce scheduling SchedBundle instrs back-to-back (#128092)
This patch fixes the behavior of the scheduler by making sure the instrs
that are part of a SchedBundle are scheduled back-to-back.
2025-02-27 10:23:50 -08:00
Florian Hahn
1e1b9bccc0
[VPlan] Simplify BLEND %a, %b, NOT(%m) -> BLEND %b, %a, %m. (#128375)
Avoid negations for normalized blends by reordering operands.

PR: https://github.com/llvm/llvm-project/pull/128375
2025-02-27 17:43:24 +00:00
Alexey Bataev
69effe054c [SLP]Check for potential safety of the truncation for vectorized scalars with multi uses
If the vectorized scalars has multiple uses, need to check if it is safe
to truncate the vectorized value, before actually trying doing it.
Otherwise, the compiler may loose some important bits, which may lead to
a miscompilation.

Fixes #129057
2025-02-27 08:41:46 -08:00
David Sherwood
65c45bfa7d
[LoopVectorize][NFC] Fix formatting issue with a comment (#129033) 2025-02-27 12:51:04 +00:00
John Brawn
8150ab93f7
[LoopVectorize] Use CodeSize as the cost kind for minsize (#124119)
Functions marked with minsize should aim for minimum code size, so the
vectorizer should use CodeSize for the cost kind and also the cost we
compare should be the cost for the entire loop: it shouldn't be divided
by the number of vector elements and block costs shouldn't be divided by
the block probability.

Possibly we should also be doing this for optsize as well, but there are
a lot of tests that assume the current behaviour and the definition of
optsize is less clear than minsize (for minsize the goal is to "keep the
code size of this function as small as possible" whereas for optsize
it's "keep the code size of this function low").
2025-02-27 11:07:02 +00:00
Benjamin Maxwell
3307b0374a
[LV] Teach the loop vectorizer llvm.sincos is trivially vectorizable (#128035)
Depends on #123210
2025-02-27 09:37:06 +00:00
Alexey Bataev
39bab1de33 [SLP]Check if the operand for removal is the reduction operand, awaiting for the reduction
If the operand of the instruction-to-be-removed is a reduction value,
which is not reduced yet, and, thus, it has no users, it may be removed
during operands analysis.

Fixes #128736
2025-02-26 14:17:11 -08:00
Alexey Bataev
418a987285 [SLP]Do not use node, if it is a subvector or buildvector node
If the buildvector has some matches with another node, which is
a subvector of another buildvector node, need to check for this and
cancel matching to avoid incorrect ordering of the nodes.

Fixes #128770
2025-02-26 13:25:37 -08:00
Florian Hahn
4277c21059
[VPlan] Introduce explicit broadcasts for live-ins. (#124644)
Add a new VPInstruction::Broadcast opcode and use it to materialize
explicit broadcasts of live-ins. The initial patch only materlizes the
broadcasts if the vector preheader dominates all uses that need it.
Later patches will pick the best valid insert point, thus retiring
implicit hoisting of broadcasts from VPTransformsState::get().

PR: https://github.com/llvm/llvm-project/pull/124644
2025-02-26 13:57:51 +00:00
Han-Kuan Chen
a12ca57c1c
[SLP][REVEC] Add getScalarizationOverhead helper function to reduce error when REVEC is enabled. (#128530) 2025-02-25 23:16:05 +08:00
Florian Hahn
522b05afb6
[VPlan] Construct immutable VPIRBBs for exit blocks at construction(NFC) (#128374)
Constract immutable VPIRBasicBlocks for all exit blocks up front and
keep a list of them. Same as the scalar header, they are leaf nodes of
the VPlan and won't change. Some exit blocks may be unreachable, e.g. if
the scalar epilogue always executes or depending on optimizations.

This simplifies both the way we retrieve the exit blocks as well as
hooking up the exit blocks.

PR: https://github.com/llvm/llvm-project/pull/128374
2025-02-25 14:23:27 +00:00
Elvis Wang
8009c1fd81
[LV][VPlan] Prevent calculate cost for skiped instructions in precomputeCosts(). (#127966)
Skip calculating instruction costs for exit conditions in
precomputeCosts() when it should be skipped.

Reported from:
https://github.com/llvm/llvm-project/issues/115744#issuecomment-2670479463
Godbolt for reduced test cases: https://godbolt.org/z/fr4YMeqcv
2025-02-25 11:09:09 +08:00