All dependencies on code from LoopVectorize.cpp have been
removed/refactored. Move the ::execute implementations to other recipe
definitions in VPlanRecipes.cpp
This commit refactors the implementation of VPReductionRecipe to use
reference instead of pointer for member RdxDesc. Because the member
RdxDesc in VPReductionRecipe should not be a nullptr, using a reference
will provide clearer semantics.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D158058
Address post-commit simplification suggestion for 8a56179bcd8c:
Store operator only for floating point inductions (i.e. the binary op is
a FPMathOperator).
Address post-commit simplification suggestion for 8a56179bcd8c: Replace
IsTruncated by conditionally setting TruncResultTy only if truncation
is required.
VPlan has become an integral part of the inner loop vectorizer pipeline
that has been actively developed over the previous years. Let's move
VectorizationPlan.rst from the proposal stage to bring the docs in line
and to avoid confusion when reading the docs.
Reviewed By: rengolin
Differential Revision: https://reviews.llvm.org/D157593
Explicitly pass InductionKind and InductionBinOp to
emitTransformedIndex. Only those values are needed from the induction
descriptor. This makes explicit what is needed for the function and
allows future use cases where the a full induction descriptor object is
not available.
Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for
VPInstruction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157144
Model wrap flags directly using VPRecipeWithIRFlags and clean up the
duplicated *NUW opcodes.
D157144 will build on this and also model FMFs for VPInstruction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157194
Replace ConditionalAssume set by treating conditional assumes like other
predicated instructions (i.e. create a VPReplicateRecipe with a mask)
and later remove any assume recipes with masks during VPlan cleanup.
This reduces coupling of VPlan construction and Legal by removing a
shared set between the 2 and results in a cleaner code structure
overall.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157034
Split up tryToBuildVPlanWithVPRecipes into intial plan creation and
optimizations, by introducing a VPLanTransform::optimize helper.
Depends on D154640.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154644
The last dependency of code defined in LoopVectorize.cpp has been
removed a while ago. Move VPTransformState::get() to VPlan.cpp where
other members are also defined.
Update adjustRecipesForReductions to directly use the VPlan def-use
chains for in-loop reductions to collect the reduction operations that
need adjusting.
This allows the removal of
* ReductionChainMap
* recording of recipes for instruction in the reduction chain
* removes late uses of getVPValue
* removes to need for removeVPValueFor.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D155845
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.
Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.
Shrink operands before creating the new instruction to make sure the
same evaluation order is used on all platforms. This fixes buildbot
failures due to different argument evaluation order on different
systems.
This reverts commit eea9258648ce73507f6f85c395de978af659d498.
That commit triggered crashes in the following testcase:
$ cat reduced.c
typedef struct {
int a[8]
} b;
typedef struct {
b *c;
short d
} e;
void f() {
int g;
char *h;
e *i = f;
short j = i->d;
int a = i->c->a[0];
for (;;)
for (; g < a; g++) {
*h = j * i->d >> 8;
h++;
}
}
$ clang -target aarch64-linux-gnu -w -c -O2 reduced.c
Reorder VPlan transforms slightly so they are all grouped together,
after disabling Value -> VPValue lookup. In terms of codegen impact,
this should be NFC modulo a small number of instruction reorderings.
Preparation to split up tryToBuildVPlanWithVPRecipes in a follow-up.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154640
After D150027, all relevant recipes should model their IR flags
directly. Instead of removing the flags after codegen as part of
fixReductions, drop poison generating flags directly from the recipes.
Depends on D150027.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D150028
If a candidate VF for epilogue vectorization is greater than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154264
If a candidate VF for epilogue vectorization is less than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154264
When a scalar epilogue is required, at least one iteration of the scalar loop
has to execute. Adjust ConstTripCount accordingly to avoid picking a max VF
that results in a dead vector loop.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154261
requiresScalarEpilogue only checks if the selected VF is vectorizing
(and not scalar). Update it to just take a boolean, to make it clearer
what information is used and to allow callers without a VF (used in a
follow-up patch).
This patch extends LoopVectorize to handle the vectorization of interleaved
memory accesses with scalable vectors when mask is required or/and predicated
tail folding is enabled.
Differential Revision: https://reviews.llvm.org/D152258
In same cases, the stride may not be a constant. Just skip those cases
for now. This should only happen for cases where LV interleaves only, if
it is vectorized the stride needs to be versioned to a constant.
After constructing the initial VPlan, replace VPValues for versioned
strides with their constant counterparts.
Differential Revision: https://reviews.llvm.org/D147783
This patch uses the (de)interleaving intrinsics introduced in
D141924 to handle vectorization of interleaving groups with a
factor of 2 for scalable vectors.
Reviewed By: fhahn, reames
Differential Revision: https://reviews.llvm.org/D145163
If the value was already known to not be uniform for the previous
(smaller VF), it cannot be uniform for the larger VF.
This slightly reduces compile-time, once uniformity checks are becoming
a bit more expensive due to using SCEV rewriting (D148841).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D151658
After 572cfa3fde5433, isUniform now checks VF based uniformity instead of
just invariance as before.
As follow-up cleanup suggested in D148841, separate the invariance check
out and update callers that currently check only for invariance.
This also moves the implementation of isUniform from LoopAccessAnalysis
to LoopVectorizationLegality, as LoopAccesAnalysis doesn't use the more
general isUniform.
This patch uses SCEV to check if a value is uniform across a given VF.
The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the vector lane 0
(offset 0) compare it to the expressions for lanes 1 to the last vector
lane (VF - 1). If they are equal, consider the expression uniform.
While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D148841
Update collectLoopUniforms to identify uniform pointers using
Legal::isUniform. This is more powerful and brings pointer
classification here in sync with setCostBasedWideningDecision
which uses isUniformMemOp. The existing mis-match in reasoning
can causes crashes due to D134460, which is fixed by this patch.
Fixes https://github.com/llvm/llvm-project/issues/60831.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D150991