Need to update matching between the original reduced values and their
vectorized matches after ordered reduction vectorization to avoid
a compiler crash
Patch models ordered reductions as a series of extractelements for the
cases which cannot be modeled as unordered reductions.
Fixes#50590
Reviewers: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/182644
If the instructions state is alternate and/or contains non-directly
matching instructions, need to check if it is better to represent such
operations as non-alternate with copyables.
To do this, we need to compare operands between the instructions in their
different representations and choose the best one for optimal
vectorization.
Reviewers: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/183777
shl-based reduced values in many cases serve as a bitcast/bswap-based
transfromation root, but need to improve analysis for better matching.
This patch merges reduction candidates into a single reduced value
array, if there are only 2 different candidate arrays, one of them has
only single element, the second is a list of shl instructions. Also,
sorts these shl instructions by their shift amount and merges with the
single candidate, if it is profitable to have a copyable reduction.
The original support for copyables leads to a regression in x264 in
RISCV, this patch improves detection of the copyable candidates by more
precise checking of the profitability and adds and extra check for
splitnode reduction, if it is profitable.
Fixes#184313
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/185697
If current buildvector node is part of the combined nodes of the
matching candidate node, this matching candidate must be considered as
non-matching to prevent wrong def-use chain
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/187491
Currently, SLP vectorizer do not care about loops and their trip count.
It may lead to inefficient vectorization in some cases. Patch adds loop
nest-aware tree building and cost estimation.
When it comes to tree building, it now checks that tree do not span
across different loop nests. The nodes from other loop nests are
immediate buildvector nodes.
The cost model adds the knowledge about loop trip count. If it is
unknown, the default value is used, controlled by the
-slp-cost-loop-min-trip-count=<value> option. The cost of the vector
nodes in the loop is multiplied by the number of iteration (trip count),
because each vector node will be executed the trip count number of
times. This allows better cost estimation.
Original Reviewers:
jdenny-ornl, vporpo, hiraditya, RKSimon
Original PR: https://github.com/llvm/llvm-project/pull/150450
Recommit after revert in c7bd3062f1dac975cf9b706f457b3c55b4bf57ff and in 4e500bd0015042b0cd4b7c87b81caeea06072d24
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/187391
Currently, SLP vectorizer do not care about loops and their trip count.
It may lead to inefficient vectorization in some cases. Patch adds loop
nest-aware tree building and cost estimation.
When it comes to tree building, it now checks that tree do not span
across different loop nests. The nodes from other loop nests are
immediate buildvector nodes.
The cost model adds the knowledge about loop trip count. If it is
unknown, the default value is used, controlled by the
-slp-cost-loop-min-trip-count=<value> option. The cost of the vector
nodes in the loop is multiplied by the number of iteration (trip count),
because each vector node will be executed the trip count number of
times. This allows better cost estimation.
Reviewers: jdenny-ornl, vporpo, hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/150450
Recommit after revert in c7bd3062f1dac975cf9b706f457b3c55b4bf57ff
Fix the checks for the non-power-of-2 base bswaps by checking the
power-of-2 of the source type, not the target scalar type. Plus, add
cost estimation for zext, if the source type does not match the scalar type and fixes final bitcasting for the reduced values.
Fixes https://github.com/llvm/llvm-project/pull/184018#issuecomment-4053477562
Fix the checks for the non-power-of-2 base bswaps by checking the
power-of-2 of the source type, not the target scalar type. Plus, add
cost estimation for zext, if the source type does not match the scalar type.
Fixes https://github.com/llvm/llvm-project/pull/184018#issuecomment-4053477562
If looking for the match of the gather/buildvector node and its root is
a first node, which also a buildvector/gather, and has no state, we
should skip the analysis for such nodes to prevent a compiler crash
Fixes#185851
Currently, SLP vectorizer do not care about loops and their trip count.
It may lead to inefficient vectorization in some cases. Patch adds loop
nest-aware tree building and cost estimation.
When it comes to tree building, it now checks that tree do not span
across different loop nests. The nodes from other loop nests are
immediate buildvector nodes.
The cost model adds the knowledge about loop trip count. If it is
unknown, the default value is used, controlled by the
-slp-cost-loop-min-trip-count=<value> option. The cost of the vector
nodes in the loop is multiplied by the number of iteration (trip count),
because each vector node will be executed the trip count number of
times. This allows better cost estimation.
Reviewers: jdenny-ornl, vporpo, hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/150450
**Summary**
Fixes a miscompilation where commutative operations (e.g., or, and, mul)
with a left-hand side constant were incorrectly transformed into
non-commutative operations (e.g., shl, sub).
**The Problem**
In `BinOpSameOpcodeHelper::getOperand`, when a constant is at `Pos ==
0`, the helper was failing to swap operand order for new non-commutative
target opcodes. This resulted in inverted logic, such as transforming
`or 0, %x` into `shl 0, %x` (resulting in 0) instead of the correct `%x
<< 0`.
**The Fix**
The existing logic only protected the Sub opcode. This patch generalizes
the fix to all non-commutative instructions by using
`!Instruction::isCommutative(ToOpcode)`. This ensures that for any
directional operation, the variable is correctly placed on the LHS and
the constant on the RHS.
**Changes**
SLPVectorizer.cpp: Replaced the specific Sub check with a general
isCommutative check.
Regression Test: Added lhs-constant-non-cummutative.ll to cover shl,
sub, and ashr targets.
Fixes#185186
Added support for zero extending the bitcasted/bswapped type to the
original type, if it is larger than the original scalar type
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/184018
Need to be careful, when filling the mask for fully matched nodes, the
masks may differ in sizes
Fixes a crash reported in test/Transforms/SLPVectorizer/X86/mask-size-less-common-mask.ll
If the select/zext comparison has negate predicate and is used in
several places, it should not be considered as a candidate for inversed
zext/select pattern, it will be replaced by a negate vector predicate,
leading to an incorrect codegen for other uses
In the reordered RHS path of matchesShlZExt, the code never checked that
each shift amount (0, Stride, 2×Stride, …) appears at most once. When
the same shift appeared in multiple lanes, it still filled Order,
producing a non-permutation (e.g. Order = [0,0,0,1]). That led to bad
shuffle masks and miscompilation (e.g. shuffles with poison).
The patch adds an explicit duplicate check: before setting Order[Idx] =
Pos, it ensures Pos has not been seen before, using a SmallBitVector
SeenPositions(VF). If a position is seen twice, the function returns
false and the optimization is not applied.
Converts reduced or(select %cmp, bitmask, 0) to zext(bitcast %vector_cmp to
i<num_reduced_values>) to in
Reviewers: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/181940
Some of the zext i1 (cmp) + select sequences can be transformed by
inverting compare predicates to remove extra shuffles, like
zext 1 (cmp ne) + select (cmp eq), 0, 2 can be modeled as select <2
x > (cmp ne), <1, 2>, zeroinitializer
Reviewers: RKSimon, hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/181580
If the revec is enabled, may have the number of parts (registers) for
the combined node, not a single element node, so need to check for
potential out-of-bounds access
Fixes#181798
The patch changes the maximum tree size analysis. 1. Do not increase
depth for type changing nodes (like casts and compares), allowing more
deeper trees to be built. 2. Removes NotProfitableForVectorization
workaround, not needed anymore after throttling enabled
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/180950
Added basic estimations for the external uses, when calculating the cost
of the non-profitable trees. Excluding stores/insertelement, as thay are
very good candidates for the vectorization. Also, tuned
buildvector/gather cost with minimum bitwidth analysis data.
Reviewers: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/178024