Extract values state and operands analysis/building into a separate
class. This class allows to localize instrutions state and operands
building for future support of copyable elements vectorization.
Reviewers: HanKuanChen, RKSimon
Reviewed By: HanKuanChen
Pull Request: https://github.com/llvm/llvm-project/pull/138724
This NFC aims to simplify the interfaces used in `buildTree()` to make
it easier to understand where decisions for legality are made.
In particular, there is now a single point of definition for legality
decisions. This makes it clear where all those decisions are made.
Previously, multiple variables with a large scope were passed by
reference.
If one user node is non-schedulable and another one is schedulable, such
nodes should be considered matched. The selection of the actual insert
point in this case differs and the insert points may match, which may
cause a compiler crash because of the broken def-use chain.
Fixes#137797
Most callers want a constant index. Instead of making every caller
create a ConstantInt, we can do it in IRBuilder. This is similar to
createInsertElement/createExtractElement.
`ad9909d "[SLP]Fix perfect diamond match with extractelements in scalars" `
changed SLPVectorizer getScalarizationOverhead() to call
TTI.getVectorInstrCost() instead of TTI.getScalarizationOverhead() in some
cases. This was due to X86 specific handlings in these (overridden) methods,
and unfortunately the general preference of TTI.getScalarizationOverhead()
was dropped. If VL is available it should always be preferred to use
getScalarizationOverhead(), and this is indeed the case for SystemZ which
has a special insertion instruction that can insert two GPR64s.
Then ` 33af951 "[SLP]Synchronize cost of gather/buildvector nodes with
codegen"` reworked SLPVectorizer getGatherCost() which together with
ad9909d caused the SystemZ test vec-elt-insertion.ll to fail.
This patch restores the SystemZ test and reverts the change in SLPVectorizer
getScalarizationOverhead() so that TTI.getScalarizationOverhead() is always
called again. The ForPoisonSrc argument is now passed on to the TTI method
so that X86 can handle this as required.
Fixes: #135346
This NFC aims to simplify the control-flow and interfaces used in tryToFindDuplicates(). The point is to make it easier to understand where decisions for scalar de-duplication are made.
In particular:
- Limit indentation
- Rename some variables to better match their use case
- Always give consistent outputs for VL and ReuseShuffleIndices. This makes it possible to use the same code for building gather TreeEntry everywhere. This also allows to remove the TryToFindDuplicates lambda.
Add a new reduction recurrence kind for reductions with
minimumnum/maximumnum. Such reductions can be vectorized without
nsz/nnans, same as reductions with maximum/minimum intrinsics.
Note that a new reduction kind is needed to make sure partial reductions
are also combined with minimumnum/maximumnum.
Note that the final reduction to a scalar value is performed with
vector.reduce.fmin/fmax. This should be fine, as the results of the
partial reductions with maximumnum/minimumnum silences any sNaNs.
In-loop and reductions in SLP are not supported yet, as there's no
reduction version of maximumnum/minimumnum yet and fmax may be
incorrect.
PR: https://github.com/llvm/llvm-project/pull/137335
Better to preserve the original order of the alternate nodes to avoid
inter-lane shuffling, select/insert subvector patterns provide better
perf.
Reviewers: RKSimon, hiraditya
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/136329
Need to estimate, which one is preferable, deinterleaved/segmented
loads or strided. Segmented loads can be combined, improving
the overall performance.
Reviewers: RKSimon, hiraditya
Reviewed By: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/135058
When we have the remaining unique scalar, that should be inserted into
non-poison vector and into non-zero position:
```
%vec1 = insertelement %vec, %v, pos1
%res = shuffle %vec1, poison, <0, 1, 2,..., pos1, pos1 + 1, ..., pos1,
...>
```
better to estimate if it is profitable to model it as is or model it as:
```
%bv = insertelement poison, %v, 0
%splat = shuffle %bv, poison, <poison, ..., 0, ..., 0, ...>
%res = shuffle %vec, %splat, <0, 1, 2,..., pos1 + VF, pos1 + 1, ...>
```
Reviewers: preames, hiraditya, RKSimon
Reviewed By: preames
Pull Request: https://github.com/llvm/llvm-project/pull/136590
This will likely not affect much with the current uses of the function,
but if we have getExtractWithExtendCost we can plumb CostKind through it
in the same way as other costmodel functions.
Need to consider the ordering for all nodes with the specified ordering,
not only loads/store/extracts.
Reviewers: hiraditya, RKSimon
Reviewed By: hiraditya
Pull Request: https://github.com/llvm/llvm-project/pull/136185
Need to cache and use cached data for compressed loads before codegen to
avoid side-effects, caused by the earlier vectorization, which may
affect the analysis.
Need to pre-cache last instruction to avoid unexpected changes in the
last instruction detection during the vectorization, caused by adding
the new vector instructions, which add new uses and may affect the
analysis.
Need to check if the operand node of the split vectorize node has reuses
and check if it is possible to build the order for this node to reorder
it correctly.
Fixes#135912
Duplicates are handled in BoUpSLP::processBuildVector (see TryPackScalars), support for duplicates in getGatherCost is not needed anymore.
Reviewers: hiraditya, RKSimon
Reviewed By: hiraditya, RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/135834
If the potential split node is a perfect/shuffled match of another split
node, need to skip creation of the another split node with the same
scalars, it should be a buildvector.
Fixes#135800
We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., `add x, 0` is equivalent to
`mul x, 1`).
Non-constant values are not supported, as they may incur high costs with
little benefit.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
We should align REVEC with the SLP algorithm as closely as possible. For
example, by applying REVEC-specific handling when calling IRBuilder's
Create methods, performing cost analysis via TTI, and expanding shuffle
masks using transformScalarShuffleIndicesToVector.
reference commit: 3b18d47ecbaba4e519ebf0d1bc134a404a56a9da
We use the term "interchangeable instructions" to refer to different
operators that have the same meaning (e.g., `add x, 0` is equivalent to
`mul x, 1`).
Non-constant values are not supported, as they may incur high costs with
little benefit.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
If the buildvector node contains constants and non-constants, need to
consider shuffling of the constant vec and insertion of unique elements
into the vector. Also, if there is an input vector, need to consider the
cost of shuffling source vector and constant vector and then insertion
and shuffling of the non-constant elements.
Reviewers: hiraditya, RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/135245
When REVEC is enabled, ScalarTy may be a FixedVectorType. Compare its
element type to decide if casting is needed. Also apply mask
transformation accordingly.
Moved check from buildTree_rec function to a separate
isLegalToVectorizeScalars function.
Reviewers: RKSimon, hiraditya
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/134132
Patch removes the restriction for the revectorization of the previously
vectorized scalars in split nodes, and moves the cost profitability
check to avoid regressions.
Reviewers: hiraditya, RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/134286