This patch is separated from D154953 to see what tests are affected by this
change alone according comment.
Depend on the related updating of LangRef on D155193.
Reviewed By: paulwalker-arm, nikic, david-arm
Differential Revision: https://reviews.llvm.org/D155350
Arm Performance Libraries contain math library which provides
vectorized versions of common math functions.
This patch allows to use it with clang and llvm via -fveclib=ArmPL or
-vector-library=ArmPL, so loops with such calls can be vectorized.
The executable needs to be linked with the amath library.
Arm Performance Libraries are available at:
https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries
Reviewed by: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D154508
Update computeMinimumValueSizes to check if an instruction's operands
can safely be truncated.
If more than MinBW bits are demanded by for the operand or if the
operand is a constant and cannot be safely truncated, it is not safe to
evaluate the instruction in the narrower MinBW. Skip those cases.
Fixes https://github.com/llvm/llvm-project/issues/47927
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D154717
If a candidate VF for epilogue vectorization is greater than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154264
If a candidate VF for epilogue vectorization is less than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154264
This patch prevents invalid load groups from being formed, where a load
needs to be moved across a conflicting store.
Once we hit a store that conflicts with a load with an existing
interleave group, we need to stop adding earlier loads to the group, as
this would force hoisting the previous stores in the group across the
conflicting load.
To detect such cases, add a new CompletedLoadGroups set, which is used
to keep track of load groups to which no earlier loads can be added.
Fixes https://github.com/llvm/llvm-project/issues/63602
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D154309
When a scalar epilogue is required, at least one iteration of the scalar loop
has to execute. Adjust ConstTripCount accordingly to avoid picking a max VF
that results in a dead vector loop.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154261
We started seeing new failure after D142886. Looks like it enabled new cases and we hit an assert:
assert(Current->getNumDefinedValues() == 1 &&
"only recipes with a single defined value expected");
When we do instruction sinking for the first order recurrence we hit an assert if instruction doesn't have single def. In case instruction doesn't produce any new def there is no new users and nothing to sink.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D151204
Update trip count of test in
pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll to
make sure epilogue vectorization will still trigger after D154261,
checking for the original issue.
Move the original test to limit-vf-by-tripcount.ll for testing new
functionality of D154261.
This reverts commit 20f0c68fd83a0147a8ec1722bd2e848180610288.
https://reviews.llvm.org/D153966#4464594 reports an optimization
regression in Rust.
Additionally this change has caused an unexpected 0.3% compile-time
regression.
This patch extends LoopVectorize to handle the vectorization of interleaved
memory accesses with scalable vectors when mask is required or/and predicated
tail folding is enabled.
Differential Revision: https://reviews.llvm.org/D152258
Also replace aarch64_be-*-eabi with aarch64_be
Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver.
We want to avoid it elsewhere as well. Just use the common "aarch64" without
other triple components.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D153943
{mini|maxi}mum intrinsics are different from {min|max}num intrinsics in
the propagation of NaN and signed zero. Also, the minnum/maxnum
intrinsics require the presence of nsz flags to be valid reductions in
vectorizer. In this regard, we introduce a new recurrence kind and also
add support for identifying reduction patterns using these intrinsics.
The reduction intrinsics and lowering was introduced here: 26bfbec5d2.
There are tests added which show how this interacts across chains of
min/max patterns.
Differential Revision: https://reviews.llvm.org/D151482
In same cases, the stride may not be a constant. Just skip those cases
for now. This should only happen for cases where LV interleaves only, if
it is vectorized the stride needs to be versioned to a constant.
After constructing the initial VPlan, replace VPValues for versioned
strides with their constant counterparts.
Differential Revision: https://reviews.llvm.org/D147783
This patch uses the (de)interleaving intrinsics introduced in
D141924 to handle vectorization of interleaving groups with a
factor of 2 for scalable vectors.
Reviewed By: fhahn, reames
Differential Revision: https://reviews.llvm.org/D145163
Add extra tests with cases where SCEV predicates can be proven to always
be false. The test in pointer-induction.ll has been adjusted to avoid
the induction always to wrap.
For Neon, the default nonconst stride cost is conservative,
and it is a local variable, which is not convenience to
to tune the loop vectorize.
So I try to use a option, which is similar to SVEGatherOverhead brought in D115143.
Fix https://github.com/llvm/llvm-project/issues/63082.
Reviewed By: dmgreen, fhahn
Differential Revision: https://reviews.llvm.org/D152253
This reverts commit 02369b75fdd7b5fc5d9b47f1b60587c225918511.
At the moment, live-outs used *only* for the resume values in the scalar
loop are not modeled in VPlan yet. This means first-order recurrence
recipes could be removed, when a scalar epilogue is required and the
only use of a FOR is outside the loop.
Keep treating recurrence recipes as having side-effects for now, to
avoid them being removed.
Fixes#62954.
If the step is not loop-invariant, we cannot create a modified AddRec,
as the start needs to be loop-invariant. Mark those cases as
CannotAnalyze and bail out, to fix a crash.
This patch uses SCEV to check if a value is uniform across a given VF.
The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the vector lane 0
(offset 0) compare it to the expressions for lanes 1 to the last vector
lane (VF - 1). If they are equal, consider the expression uniform.
While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D148841
Update collectLoopUniforms to identify uniform pointers using
Legal::isUniform. This is more powerful and brings pointer
classification here in sync with setCostBasedWideningDecision
which uses isUniformMemOp. The existing mis-match in reasoning
can causes crashes due to D134460, which is fixed by this patch.
Fixes https://github.com/llvm/llvm-project/issues/60831.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D150991