On the very first iteration for the reductions, when trying to build
reduction for boolean logic operations, no need to compare LHS/RHS with
the Reduction(VectorizedTree), need to compare with actual parameters of
the reduction operations.
If the scalar does not need to be scheduled and it was vectorized
already in one of the vector nodes, we still can try to vectorize it in
another node. Just does not need account its cost in the scalar total
cost, as it will be handled in the main vectorized node.
Differential Revision: https://reviews.llvm.org/D159205
This reverts commit db5d845c73ee2d64f1a5bab3fc72edece9e3a7ba.
As per PR discussion "Looks like we've missed lowering of bitcasts
between v2f16 and v2i16 and it breaks XLA."
We still can try to vectorize the bundle of the instructions, even if
the
repeated number of instruction is non-power-of-2. In this case need to
adjust the cost (calculate the cost only for unique scalar instructions)
and cost of the extracts. Also, when scheduling the bundle need to
schedule only unique scalars to avoid compiler crash because of the
multiple dependencies. Can be safely applied only if all scalars's users
are also vectorized and do not require memory accesses (this one is
a temporarily requirement, can be relaxed later).
---------
Co-authored-by: Alexey Bataev <a.bataev@outlook.com>
The issue #55208 describes a current deficiency of the SLPVectorizer,
namely that it doesn't vectorize code written with lrint, while similar
code written with rint is vectorized. Add a test corresponding to this
issue for the RISC-V target.
Recently, 7f26c27 turned on SLP by default for RISC-V, and although
there are quite a few tests for SLP under the X86/ target, it is unclear
whether the same constructs would be vectorized on RISC-V. This patch
takes a step in the direction of remedying this, by noticing that ctpop
is often vectorized on RISC-V, and adding four tests for different
integer widths.
On sm_90 some instructions now support i16x2 which allows hardware to
execute more efficiently add, min and max instructions.
In order to support that we need to make i16x2 a native type in the
backend. This does the necessary changes to make i16x2 a native type and
adds support for the instructions natively supporting i16x2.
This caused a negative test in nvptx slp to start passing. Changed the
test to a positive one as the IR is correctly vectorized.
In the past we sort of pretended float might be implementable
as a non-IEEE type but that never realistically would work. Exotic
FP types would need to be added to the IR. Turning these
into FP operations enables FP tracking optimizations.
https://reviews.llvm.org/D151937
This is a resurrection of D18874. This was previously wrong with
fneg conflated with fsub, but we now have a proper fneg instruction.
Additionally, I think it is now clearer that IR float=IEEE float,
and a different bit layout would require adding a different IR type.
https://reviews.llvm.org/D151934
Some callers pass in an empty mask to represent "unknown". We should use the generic costs for these cases. We can add VL=1 costing seperately if desired.
Reapplying after revert. A new test had been added, and I'd missed updating it when rebasing before. This is a great happy accident as I hadn't figured out how to get SLP to exercise this case, I'd merely noticed it via inspection.
This assertion is introduced by D157425.
We should calculate the cost iff `Mask` is not empty.
Fixes 64901
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D158590
If the masked gathers can be reordered, it may produce strided access
pattern and the reordering does not affect common reodering, better to
try to reorder masked gathers for better performance.
Differential Revision: https://reviews.llvm.org/D157009
If the reduced values is constant-foldable and was folded to a constant
during previous transformations, need to excluded it from the list of
the reduced values-instructions as non-matchable.
This allows use with non-0 address space stacks. llvm_ptr_ty should
never be used. This could use some more percolation up through mlir,
but this is enough to fix existing tests.
https://reviews.llvm.org/D156666
The issue is actually related to ScatterVectorize nodes. If such node
gets reordered during bottom-to-top reordering, it may have associated
non-empty ReorderIndices. In this case, such nodes need to be handled
the same way as regular Vectorize nodes, not NeedToGather nodes. In this
case we need to reorder ReorderIndices array rather than scalars.
If the actual instruction bitwidth does not match its original size,
need to reestimate the casting opcode, the compiler cannot rely on the
one, provided in the instruction.
The cost of vector instructions has always been high under AArch64, in order to
add a high cost for inserts/extracts, shuffles and scalarization. This is a
conservative approach to limit the scope of unusual SLP vectorization where the
codegen ends up being quite poor, but has always been higher than the correct
costs would be for any specific core.
This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a
generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is
also overridden for integer vector at the same time, to remove the effect of
lane 0 being considered free for integer vectors (something that should only be
true for float when scalarizing).
The lower insert/extract cost will reduce the cost of insert, extracts,
shuffling and scalarization. The adjustments of ScalaizationOverhead will
increase the cost on integer, especially for small vectors. The end result will
be lower cost for float and long-integer types, some higher cost for some
smaller vectors. This, along with the raw insert/extract cost being lower, will
generally mean more vectorization from the Loop and SLP vectorizer.
We may end up regretting this, as that vectorization is not always profitable.
In all the benchmarking I have done this is generally an improvement in the
overall performance, and I've attempted to address the places where it wasn't
with other costmodel adjustments.
Differential Revision: https://reviews.llvm.org/D155459
insertelement instructions.
If the original vector has undef, not poison values, which are not
rewritten by later insertelement instructions, need to transform shuffle
with the undef vector, not a poison vector, and actual indices, not
PoisonMaskElem, otherwise the transformation may produce more poisons
output than the input.
As noted on #63980 rotate by immediate amounts is much cheaper than variable amounts.
This still needs to be expanded to vector rotate cases, and we need to add reasonable funnel-shift costs as well (very tricky as there's a huge range in CPU behaviour for these).
Need to check the scalars if they can be vectorized before trying to
schedule them. It may save compile time and improve vectorization on
large functions/basic blocks.
Differential Revision: https://reviews.llvm.org/D154891
Need to check for FixedVectorType, not a vector type, since later
compiler performs unconditional cast to FixedVectorType and gets the
number of elements in this type.
Need to account reshuffling, required for the reused elements in the
buildvector nodes, which are copies (perfect match) of other nodes, but
include reused elements.
Differential Revision: https://reviews.llvm.org/D149966
match the size of base node (PR63668).
Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
match the size of base node (PR63668).
Need to adjust the check for assert and take into account case where the
original scalars are reused and were extended to match the vector factor
of the reused SLP node.
This patch adds support for vectorized reduction of maximum/minimum
intrinsics which are under the appropriate reduction kind.
Differential Revision: https://reviews.llvm.org/D154463
minimum/maximum tests from D154463. This contains tests where we vectorize
minimum/maximum as well as the tests where we currently do not identify
reduction patterns.
Differential Revision: https://reviews.llvm.org/D155096