Outline some often used common code to dedicated variables in order
to make code compact. Rename variables to more accurately reflect
their purpose. Apply const qualifier where appropriate.
Fix and add bit more explanation comment for the existing code.
Update the code to create Trunc/Ext recipes directly in
adjustRecipesForReductions instead of fixing it up later in
fixReductions.
This explicitly models the required conversions and also makes sure they
are generated at the right place (instead of after the exit condition),
hence the changes in a few tests.
This reverts commit b5743d4798b250506965e07ebab806a3c2d767cc.
This causes some minor compile-time impact. Revert for now, better
to do the change more gradually.
This patch is based off of
https://github.com/llvm/llvm-project/pull/67543.
We are currently using the exact trip count to make decisions regarding
the maximum VF. We can instead use the upper bound TC, which will be the
same as the constant trip count when that is known.
LoopVectorize currently queries VFDatabase repeatedly for each CI,
and each query to VFDatabase rescans all vector variants.
This patch instead makes a decision for each call once per VF based
on the cost of scalarization vs. function call to a vector variant
of the function vs. a vector intrinsic, then caches the decision
along with relevant info for use in planning and plan execution.
There's no need to repeatedly query and reset the state for
LoopExitInstDef. This removes one of the last uses of
VPTransformState::reset, by use a vector to store and update the
results. No other code should try to retrieve the result from State
outside the fixReductionCall.
Allow length changing shuffle masks in the "bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'" fold.
It also exposes some poor shuffle mask detection for extract/insert subvector cases inside improveShuffleKindFromMask
First stage towards addressing Issue #67803
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
This caused asserts:
Assertion failed: NumElts > 1 && "Expected at least 2-element fixed length vector(s).",
file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp, line 7096
see comment on 59a67ea35d
> Need to consider the length of the original vector for extractelements,
> not the length, matched number of the scalars. It fixes 2 issues: 1)
> improves cost estimation; 2) Fixes crashes after D158449.
This reverts commit 59a67ea35d608480257fc64ec3e5106ef50de740.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
We should check whether the element type is non-byte-sized, not
the vector type. For types like <32 x i1> the whole type is
byte-sized, but the individual elements (that we scalarize to)
are not.
Fixes https://github.com/llvm/llvm-project/issues/67060.
scheduling, is previously vectorized.
If the main node was vectorized already, but does not require
scheduling, we still can try to vectorize it in this new node instead of
gathering.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.
This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.
This has the advantage of keeping the logic (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D158779
Update the logic to update the successors and predecessors of region
blocks directly. This adds special handling for header and latch blocks
in place, and removes the separate loop to fix up the region blocks.
Helps to simplify D158333.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D159136
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:
* Drop the call entirely if the sole purpose of it is to support a no-op
bitcast (remove the no-op bitcast as well).
* Replace with `PointerType::get()`/`PointerType::getUnqual()`
This is a NFC cleanup effort.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D155232
Header phi recipes have the start value (incoming from outside the loop)
as first operand. This wasn't the case for VPWidenPHIRecipes. Instead
the start value was picked during execute() by doing extra work.
To be in line with other recipes, ensure the operand order is as
expected during construction.
Reordering of possible strided nodes in bottom-to-top order requires
top-to-bottom reordering of the operands of such nodes, which is not
supported. Need to disable reordering of strided operands to avoid
compiler crashes.
This reverts commit 9a99944df068b29b905cd8ba9a2132cc6382b6fb.
Due to test suite failures on all our SVE buildbots e.g.:
https://lab.llvm.org/buildbot/#/builders/184/builds/7375
clang: ../llvm/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:3565:
InstructionCost llvm::AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind,
VectorType *, ArrayRef<int>, TTI::TargetCostKind, int, VectorType *,
ArrayRef<const Value *>): Assertion `Mask.size() == TpNumElts && "Expected Mask and Tp size to match!"' failed.
artificial for better cost estimation.
Need to use original source vector type, not the one artificially
constructed, based on the number of vectorized scalars. It affect the
cost significantly.