instruction.
Need to check if the operand scalars are vectorized in the a different
vector node, if the main instruction is already gets vectorized in other
vector node.
vectorized node uses.
If the instruction is vectorized in many different vector nodes, it may
break the dependency analysis for gathered nodes with matched scalars.
Need to properly check the dependency between such gather nodes to avoid
cycle dependency.
On the very first iteration for the reductions, when trying to build
reduction for boolean logic operations, no need to compare LHS/RHS with
the Reduction(VectorizedTree), need to compare with actual parameters of
the reduction operations.
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.
At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152537
If the scalar does not need to be scheduled and it was vectorized
already in one of the vector nodes, we still can try to vectorize it in
another node. Just does not need account its cost in the scalar total
cost, as it will be handled in the main vectorized node.
Differential Revision: https://reviews.llvm.org/D159205
We still can try to vectorize the bundle of the instructions, even if
the
repeated number of instruction is non-power-of-2. In this case need to
adjust the cost (calculate the cost only for unique scalar instructions)
and cost of the extracts. Also, when scheduling the bundle need to
schedule only unique scalars to avoid compiler crash because of the
multiple dependencies. Can be safely applied only if all scalars's users
are also vectorized and do not require memory accesses (this one is
a temporarily requirement, can be relaxed later).
---------
Co-authored-by: Alexey Bataev <a.bataev@outlook.com>
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D158148
If the masked gathers can be reordered, it may produce strided access
pattern and the reordering does not affect common reodering, better to
try to reorder masked gathers for better performance.
Differential Revision: https://reviews.llvm.org/D157009
If the reduced values is constant-foldable and was folded to a constant
during previous transformations, need to excluded it from the list of
the reduced values-instructions as non-matchable.
The routine uses data stored at TreeEntry node for building a mask
so it is natural to make it a method for the type. That will simplify
its interface and reduces data transfer.
The method is added as buildAltOpShuffleMask.
Differential Revision: https://reviews.llvm.org/D157545
We can use buildShuffleEntryMask() to build the shuffle mask correctly
not only for the alternate nodes with reuses, but also for the nodes
without reused scalars. It allows better to estimate the cost of the
node and emit better code.
Differential Revision: https://reviews.llvm.org/D157413
The issue is actually related to ScatterVectorize nodes. If such node
gets reordered during bottom-to-top reordering, it may have associated
non-empty ReorderIndices. In this case, such nodes need to be handled
the same way as regular Vectorize nodes, not NeedToGather nodes. In this
case we need to reorder ReorderIndices array rather than scalars.
If the actual instruction bitwidth does not match its original size,
need to reestimate the casting opcode, the compiler cannot rely on the
one, provided in the instruction.
insertelement instructions.
If the original vector has undef, not poison values, which are not
rewritten by later insertelement instructions, need to transform shuffle
with the undef vector, not a poison vector, and actual indices, not
PoisonMaskElem, otherwise the transformation may produce more poisons
output than the input.
Need to check the scalars if they can be vectorized before trying to
schedule them. It may save compile time and improve vectorization on
large functions/basic blocks.
Differential Revision: https://reviews.llvm.org/D154891
**TL;DR:** This PR modifies a comparator. The comparator is used in a subsequent call to llvm::stable_sort. Sorting comparators should follow strict weak ordering - in particular, (x < x) should return false. This PR adds a fix to avoid an infinite loop when the inputs to the comparator are equal.
**Details**:
Sometimes when two equivalent tensors passed into the comparator, we encounter infinite looping (at aae2eaae2c/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (L4049))
Although it seems like this comparator will never be called with two equivalent pointers, some sanitizers, e.g. https://chromium.googlesource.com/chromiumos/third_party/gcc/+/refs/heads/stabilize-zako-5712.88.B/libstdc++-v3/include/bits/stl_algo.h#360, will add checks for (x < x). When this sanitizer is used with the current implementation, it triggers a comparator check for (x < x) which runs into the infinite loop
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D155874
Need to check for FixedVectorType, not a vector type, since later
compiler performs unconditional cast to FixedVectorType and gets the
number of elements in this type.
in getLastInstructionInBundle(), NFC.
Instead of building EntryToLastInstruction before the vectorization,
build it automatically during the calls to getLastInstructionInBundle()
function.
Need to account reshuffling, required for the reused elements in the
buildvector nodes, which are copies (perfect match) of other nodes, but
include reused elements.
Differential Revision: https://reviews.llvm.org/D149966
in getLastInstructionInBundle(), NFC.
Instead of building EntryToLastInstruction before the vectorization,
build it automatically during the calls to getLastInstructionInBundle()
function.
match the size of base node (PR63668).
Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
match the size of base node (PR63668).
Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.