collectInstsToScalarize may decide to scalarize a call. If so, we have
to update the widening decision for the call, otherwise the call won't
be scalarized as expected during VPlan construction.
This issue was uncovered by f82543d509.
This moves the logic to create simplified operands using SCEV to MUL
recipe creation. This is needed to match the behavior of the legacy's cost
model. TODOs are to extend to other opcodes and move to a transform.
Note that this also restricts the number of SCEV simplifications we
apply to more precisely match the cases handled by the legacy cost
model.
Fixes https://github.com/llvm/llvm-project/issues/107015.
Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to
TTI if available, as there are multiple places in TTI which use the
IntrinsicInst.
Fixes https://github.com/llvm/llvm-project/issues/107016.
The LoopIdiomVectorize pass already creates calls to the intrinsic
experimental_cttz_elts, but PR #88385 will start calling this more
too so I've created a helper for it.
Branches exiting the loop will remain regardless, so don't consider them
in collectValuesToIgnore.
This fixes another divergence between legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/106780.
Implement cost computation for VPWidenCallRecipe. In some cases, targets
use argument info to compute intrinsic costs. If all operands of the
call are VPValues with an underlying IR value, use the IR values as
arguments.
PR: https://github.com/llvm/llvm-project/pull/106731
NEON has non-IEEE compliant denormal flushing and the compiler should
check if it safe to vectorize instructions for NEON in non-fast math
mode.
Fixes https://github.com/llvm/llvm-project/issues/106909
When the shuffle masks are `PoisonMaskElem`, there is not need to check
the cost of `SK_ExtractSubvector`. It is free. Otherwise, it will cause
the compiler to crash.
Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) &&
"SK_ExtractSubvector index out of range"' failed.
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/106449
Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/106449
After landing support for actual vectorization of the "clustered" loads,
need better estimate the cost between the masked gather and clustered loads.
This includes estimation of the address calculation and better
estimation of the gathered loads. Also, this estimation now relies on
SLPCostThreshold option, allowing modify the behavior of the compiler.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/105858
If the operand node has the same scalars as one of the vectorized nodes,
the compiler could miss this and incorrectly request minbitwidth data
for the wrong node. It may lead to a compiler crash, because the
vectorized node might have different minbw result.
Fixes https://github.com/llvm/llvm-project/issues/106667
A optimizable cast can also be removed by VPlan simplifications. Remove
the restriction from planContainsAdditionalSimplifications, as this
causes it to miss relevant simplifications, triggering false positives
for the cost decision verification.
Also adds debug output for printing additional cost-precomputations.
Fixes https://github.com/llvm/llvm-project/issues/106641.
If the value is used in Scalar several times, the first attempt to find
its position in the node (if ReuseShuffleIndices and ReorderIndices not
empty) may fail. In this case need to find another copy of the same
value and try again.
Fixes https://github.com/llvm/llvm-project/issues/106626
This ensures we skip any instructions identified to be ignored by the
legacy cost model as well. Fixes a divergence between legacy and
VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/106417.
Need to consider the maximum type size in the graph before doing attempt
for the vectorization of non-power-of-2 number of elements, which may be
less than MinVF.
This isn't quite just code motion as the four different versions we had
of this routine differed in whether they ignored the "size" marker used
to represent undef. I doubt this matters in practice, but it is a
functional change.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bbd0d34.
Fixes https://github.com/llvm/llvm-project/issues/106248.
Need to use original cmp type i1 when estimating the cost for the
buildvector node, not its operand types to prevent compiler crash upon
TTI cost estimation.
Currently, SLP uses shuffle for the external user of `InsertElementInst`
and iterates through the `InsertElementInst` chain to fill the mask with
constant indices. However, it may override the original Vec lane. Using
the original Vec lane is sufficient.
Live-ins that are used as exit values don't need to be extracted, they
can be passed through directly. This fixes a crash when trying to
extract from a live-in.
Fixes https://github.com/llvm/llvm-project/issues/106257.
This is a step towards further breaking up the rather large
tryToBuildVPlanWithVPRecipes. It moves logic create interleave groups to
VPlanTransforms.cpp, where similar replacements for other recipes are
defined as well (e.g. EVL-based ones)
Build on the -slp-vectorize-non-power-of-2 experimental option, and
support vectorizing reductions with 2^N-1 sized vector.
Specifically, two related changes:
1) When searching for a profitable VL, start with the 2^N-1 reduction
width.
If cost model does not select that VL, return to power of two boundaries
when halfing the search VL. The later is mostly for simplicity.
2) Reduce the minimum reduction width from 4 to 3 when supporting
non-power
of two vectors. This is required to support <3 x Ty> cases.
One thing which isn't directly related to this change, but I want to
note for clarity is that the non-power-of-two vectorization appears to
be sensative to operand order of reduction. I haven't yet fully figured
out why, but I suspect this is non-power-of-two specific.
This particular variable name is shadowed by another lower in the
function, so reducing it's scope to it's single use removes the
shadowing and makes the code much less error prone.