emitSCEVChecks checks if SCEVCheckCond matches zero, and returns
nullptr. However, it sets SCEVCheckCond as used before it does this,
which prevents it from being removed during cleanup, resulting in
unreachable blocks being emitted. Fix this.
Need to use consistent storages for unique elements, when going to
iterate over them to avoid non-determinism in reused elements analysis.
Fixes#130082
From what I understand, we only create VPReductionRecipes for in-loop
reductions, and we don't currently support in-loop AnyOf reductions.
We only create VPReductionRecipes in the !PhiR->isInLoop() section of
adjustRecipesForReductions, and this comment from the initial patch
seems to confirm this
https://reviews.llvm.org/D108136#anchor-inline-1038338, so I think we
can remove this check in the condition logic.
I checked compiling SPEC 2017 with -prefer-inloop-predicates and the
added assertion doesn't trigger.
Need to use consistent storages for unique elements, when going to
iterate over them to avoid non-determinism in reused elements analysis.
Fixes#130082
If trying to find matching buildvector node for another nodes, and both
nodes are used by vectorized phi nodes and are coming from the same
parent block, this nodes should be considered matched to avoid a crash.
(shuffle(select(c1,t1,f1)), (select(c2,t2,f2)), m)
-> (select (shuffle c1,c2,m), (shuffle t1,t2,m), (shuffle f1,f2,m))
The behaviour of SelectInst on vectors is the same as for
`V'select[i] = Condition[i] ? V'True[i] : V'False[i]`.
If a ShuffleVector is performed on two selects, it will be like:
`V'[mask] = (V'select[i] = Condition[i] ? V'True[i] : V'False[i])`
That's why a ShuffleVector with two SelectInst is equivalent to
first ShuffleVector Condition/True/False and then SelectInst that
result.
This patch implements the transforming described above.
Proof: https://alive2.llvm.org/ce/z/97wfHpFixes#120775
If an in-loop reduction is chained e.g.
WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2>
REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>)
REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>)
When we try to unroll the second add reduction, we crash because we
currently expect the chain to be a VPReductionPHIRecipe, when in fact
it's the previous reduction. This relaxes the cast to a dyn_cast, so we
end up unrolling to:
WIDEN-REDUCTION-PHI ir<%rdx> = phi ir<0>, ir<%add2>
WIDEN-REDUCTION-PHI ir<%rdx>.1 = phi ir<0>, ir<%add2>.1, ir<1>
WIDEN-REDUCTION-PHI ir<%rdx>.2 = phi ir<0>, ir<%add2>.2, ir<2>
WIDEN-REDUCTION-PHI ir<%rdx>.3 = phi ir<0>, ir<%add2>.3, ir<3>
REDUCE ir<%add1> = ir<%rdx> + reduce.add (ir<%x>)
REDUCE ir<%add1>.1 = ir<%rdx>.1 + reduce.add (ir<%x>.1)
REDUCE ir<%add1>.2 = ir<%rdx>.2 + reduce.add (ir<%x>.2)
REDUCE ir<%add1>.3 = ir<%rdx>.3 + reduce.add (ir<%x>.3)
REDUCE ir<%add2> = ir<%add1> + reduce.add (ir<%y>)
REDUCE ir<%add2>.1 = ir<%add1>.1 + reduce.add (ir<%y>.1)
REDUCE ir<%add2>.2 = ir<%add1>.2 + reduce.add (ir<%y>.2)
REDUCE ir<%add2>.3 = ir<%add1>.3 + reduce.add (ir<%y>.3)
This fixes a crash when building 525.x264_r from SPEC CPU 2017 on
AArch64 with -mllvm -prefer-inloop-reductions
No in-tree targets currently use it in the
preferInLoopReduction/preferPredicatedReductionSelect TTI hooks. It
looks like it used to be used in LoopUtils, at least in
8ca60db40bd944dc5f67e0f200a403b4e03818ea, but I presume it was replaced
by RecurrenceDescriptor.
Previous implementation may took some extra time, when walked over the
same instructions several times. And also it did not include proper
analysis for cross-basic-block use of the vectorized values. This
version fixes it.
It walks over the tree and checks the deps between entries and their
operands. If there are non-vectorized calls in between, it adds
a single(!) spill cost, because the vector value should be
spilled/reloaded only once.
Also, this version caches analysis for each entries, which are detected,
and do not repeats it, uses data, found during previous analysis for
previous nodes.
Also, it has the internal limit. If the number of instructions
between nodes and their operands is too big (> than ScheduleRegionSizeBudget / VectorizableTree.size()), it is considered that the spill is required. It allows to improve compile time.
Reviewers: preames, RKSimon, mikhailramalho
Reviewed By: preames
Pull Request: https://github.com/llvm/llvm-project/pull/129258
getLoopEstimatedTripCount returns the trip count based on profiling
data, and its documentation says that it could return 0 when the trip
count is zero, but this is not the case: a valid trip count can never be
zero, and it returns 0 when the unsigned ExitCount is incremented by 1
and wraps. Some callers are careful about checking for this zero value
in an std::optional, but it makes for an API with footguns, as a
std::optional return value indicates that a non-nullopt value would be a
valid trip count. Fix this by explicitly returning std::nullopt when the
return value would wrap, and strip additional checks in callers. This
also fixes a minor bug in LoopVectorize.
Also remove recipes in the entry of the region that will be removed.
This makes sure we don't leave any dead users around. NFC at the moment,
but avoids causing issues in the future.
Move onlyFirstLaneUsed from VPWidenIntOrFpInductionRecipe and
VPWidenPointerInduction to VPWidenInductionRecipe. Also mark step value
as having only its first lane used.
This patch converts the llvm.vector.splice intrinsic to
llvm.experimental.vp.splice, ensuring that fixed-order recurrences
execute correctly when tail folding by EVL is enable.
Due to the non-VFxUF penultimate EVL issue, the EVL from the previous
iteration will be preserved and used in llvm.experimental.vp.splice.
This patch updates/adds LLVM_DEBUG dumps.
It moves the DEBUG_TYPE into SandboxVectorizer/Debug.h such that it can
be shared across all components of the vectorizer.
If the user node is a buildvector/gather node and it has no internal
instructions state, need to check properly for this state and check the
type of the node itself, not its operands.
Fixes#129242
If the signed node is the operand of UITOFP, the bitwidth analysis
should consider minimum value between incoming bitwidth and the bitwidth
of the UITOFP node.
Fixes#129244
This patch adds a helper flag for bisection debugging. This flag
force-stops vectorization after this many bundles have been considered
for vectorization.
Using -sbvec-stop-bndl=0 will not vectorize the code at all.
This new option lets you specify an allow-list of source files and
disables vectorization if the IR is not in the list. This can be used
for debugging miscompiles.
If the vectorized scalars has multiple uses, need to check if it is safe
to truncate the vectorized value, before actually trying doing it.
Otherwise, the compiler may loose some important bits, which may lead to
a miscompilation.
Fixes#129057
Functions marked with minsize should aim for minimum code size, so the
vectorizer should use CodeSize for the cost kind and also the cost we
compare should be the cost for the entire loop: it shouldn't be divided
by the number of vector elements and block costs shouldn't be divided by
the block probability.
Possibly we should also be doing this for optsize as well, but there are
a lot of tests that assume the current behaviour and the definition of
optsize is less clear than minsize (for minsize the goal is to "keep the
code size of this function as small as possible" whereas for optsize
it's "keep the code size of this function low").
If the operand of the instruction-to-be-removed is a reduction value,
which is not reduced yet, and, thus, it has no users, it may be removed
during operands analysis.
Fixes#128736
If the buildvector has some matches with another node, which is
a subvector of another buildvector node, need to check for this and
cancel matching to avoid incorrect ordering of the nodes.
Fixes#128770
Add a new VPInstruction::Broadcast opcode and use it to materialize
explicit broadcasts of live-ins. The initial patch only materlizes the
broadcasts if the vector preheader dominates all uses that need it.
Later patches will pick the best valid insert point, thus retiring
implicit hoisting of broadcasts from VPTransformsState::get().
PR: https://github.com/llvm/llvm-project/pull/124644
Constract immutable VPIRBasicBlocks for all exit blocks up front and
keep a list of them. Same as the scalar header, they are leaf nodes of
the VPlan and won't change. Some exit blocks may be unreachable, e.g. if
the scalar epilogue always executes or depending on optimizations.
This simplifies both the way we retrieve the exit blocks as well as
hooking up the exit blocks.
PR: https://github.com/llvm/llvm-project/pull/128374