the middle of reduction ops.
Need to emit freeze instruction not only in the case, where the root is
bool logical op, but also if we reduce several scalars, but unable to
say precisely, if the root is bool logical op.
If the very first reduction operation is not bool logical op, but some
others are, still need to emit the boo logic op for all the extra
reduction operations to avoid incorrect poison propagation.
type.
Need to check, if the number of vector registers, returned by TTI, is
not greater than total number of mask element and not zero, before
trying to perform any operations. TTI still may return non-valid number
of registers.
Currently when building gather/buildvector node, we try to build nodes
shuffles without taking into account separate vector registers. We can
improve final codegen and the whole vectorization process by including
this info into the analysis and the vector code emission, allows to emit
better vectorized code.
Differential Revision: https://reviews.llvm.org/D149742
Currently when building gather/buildvector node, we try to build nodes
shuffles without taking into account separate vector registers. We can
improve final codegen and the whole vectorization process by including
this info into the analysis and the vector code emission, allows to emit
better vectorized code.
Differential Revision: https://reviews.llvm.org/D149742
C++20 comes with std::erase to erase a value from std::vector. This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.
We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.
Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
This is re-commit of #69392 and also fixes issue #69670 which was
uncovered with the prior commit.
For delayed gather emission it may be incorrect to use stab instruction
as insertion point if it is a PHI operand. For that case insertion point
is adjusted to be at the end of block, ensuring that prior dependecy
vector code is emitted earlier.
If the mask is resizing and the mask size is greater than than the
length of the vector, being reused from extractelement instructions, the
mask for undefs cannot be treated as identity, must be treated as
a broadcast.
Outline some often used common code to dedicated variables in order
to make code compact. Rename variables to more accurately reflect
their purpose. Apply const qualifier where appropriate.
Fix and add bit more explanation comment for the existing code.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
This caused asserts:
Assertion failed: NumElts > 1 && "Expected at least 2-element fixed length vector(s).",
file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp, line 7096
see comment on 59a67ea35d
> Need to consider the length of the original vector for extractelements,
> not the length, matched number of the scalars. It fixes 2 issues: 1)
> improves cost estimation; 2) Fixes crashes after D158449.
This reverts commit 59a67ea35d608480257fc64ec3e5106ef50de740.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Differential Revision: https://reviews.llvm.org/D158449
Need to consider the length of the original vector for extractelements,
not the length, matched number of the scalars. It fixes 2 issues: 1)
improves cost estimation; 2) Fixes crashes after D158449.
scheduling, is previously vectorized.
If the main node was vectorized already, but does not require
scheduling, we still can try to vectorize it in this new node instead of
gathering.
Reordering of possible strided nodes in bottom-to-top order requires
top-to-bottom reordering of the operands of such nodes, which is not
supported. Need to disable reordering of strided operands to avoid
compiler crashes.
This reverts commit 9a99944df068b29b905cd8ba9a2132cc6382b6fb.
Due to test suite failures on all our SVE buildbots e.g.:
https://lab.llvm.org/buildbot/#/builders/184/builds/7375
clang: ../llvm/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:3565:
InstructionCost llvm::AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind,
VectorType *, ArrayRef<int>, TTI::TargetCostKind, int, VectorType *,
ArrayRef<const Value *>): Assertion `Mask.size() == TpNumElts && "Expected Mask and Tp size to match!"' failed.
artificial for better cost estimation.
Need to use original source vector type, not the one artificially
constructed, based on the number of vectorized scalars. It affect the
cost significantly.
Make add() function smart enough to understand that the shuffle of
a single entry is requested, if it sees that the second node is the same
as the first.
scalar.
No need to scan the whole graph when trying to find matching node for
the scalar, vectorized in several nodes, better to store corresponding
nodes along and scan just this small list.
scheduling, is previously vectorized.
If the main node was vectorized already, but does not require
scheduling, we still can try to vectorize it in this new node instead of
gathering.
instruction.
Need to check if the operand scalars are vectorized in the a different
vector node, if the main instruction is already gets vectorized in other
vector node.
vectorized node uses.
If the instruction is vectorized in many different vector nodes, it may
break the dependency analysis for gathered nodes with matched scalars.
Need to properly check the dependency between such gather nodes to avoid
cycle dependency.
On the very first iteration for the reductions, when trying to build
reduction for boolean logic operations, no need to compare LHS/RHS with
the Reduction(VectorizedTree), need to compare with actual parameters of
the reduction operations.