Added some extra checks for comapreCMP function if IsCompatibility is
false to make it meat the strict weak ordering requirements to be
correctly used in sort functions.
ec146cb7c0b4 added two new ReducKinds for float maximum and minimum
(along with support in loop vectorizer).
Enumerate these kinds in SLPVectorizer to fix mlir windows build bot
errors:
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(13883): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(13883): warning C4062: enumerator 'llvm::RecurKind::FMinimum' in switch of enum 'llvm::RecurKind' is not handled
We don't want the existence of debug instructions affect codegen so we now
ignore debug instructions and other "isAssumeLikeIntrinsics in the
"extend schedule region" search loop in
BoUpSLP::BlockScheduling::extendSchedulingRegion.
Differential Revision: https://reviews.llvm.org/D152441
There are several issues in the current implementation. The instructions
are not properly ordered, if they are placed in different basic blocks,
need to reverse the order of blocks. Also, need to exclude
non-vectorizable nodes and check for CallBase, not CallInst, otherwise
invoke calls are not handled correctly.
For a GEP in a pointer chain, if:
1) a pointer chain is unit-strided
2) the base pointer wasn't folded and is sitting in a register somewhere
3) the distance between the GEP and the base pointer is small enough and
can be folded into the addressing mode of the using load/store
Then we can exclude that GEP from the total cost of the pointer chain,
as it will likely be folded away.
In order to check if 3) holds, we need to know the type of memory access
being made by the users of the pointer chain. For that, we need to pass
along a new argument to getPointersChainCost. (Using the source pointer
type of the GEP isn't accurate, see https://reviews.llvm.org/D149889 for
more details).
Also note that 2) is currently an assumption, and could be modelled more
accurately.
This prevents some unprofitable cases from being SLP vectorized on
RISC-V by making the scalar costs cheaper and closer to the actual
codegen.
For now the getPointersChainCost hook is duplicated for RISC-V to prevent
disturbing other targets, but could be merged back in and shared with
other targets in a following patch.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D149654
`tryToVectorizePair()` adds a level of indirection over `tryToVectorizeList()`.
I am not really sure why it is needed, it looks redundant.
I replaced all calls to `tryToVectorizePair()` with calls to
`tryToVectorizeList()` and I am not seeing any failures.
Differential Revision: https://reviews.llvm.org/D151004
IsUniformStride is only used when the stride is a unit-stride, i.e. in a
plain wide vector load. This tightens the condition and renames it to
isUnitStride. It removes the old unused getUniformStrided() variant, as
isUnitStride should now imply that the stride is known.
Reviewed By: vdmitrie, ABataev
Differential Revision: https://reviews.llvm.org/D150662
This deprecates `vectorizeSimpleInstructions()` and replaces it with separate
functions that vectorize CmpInsts and Inserts.
Differential Revision: https://reviews.llvm.org/D149993
- Rename `LimitForRegisterSize` to `MaxVFOnly` to make the meaning of the limit less ambiguous
- Rename `OpsWidth` to `ActualVF`, which makes it clear that this is the VF we are using for vectorization.
- Replace the if-else code for the initialization of OpsWidth with an std::min.
Differential Revision: https://reviews.llvm.org/D150241
Introduce processBuildVector as a next step to generalize code for cost
estimation and code emission for gather/buildvector nodes.
Differential Revision: https://reviews.llvm.org/D149973
The pass should not try to revectorize instructions with constant
operands, which were not folded by the IRBuilder. It prevents the
non-terminating loop in the SLP vectorizer for non foldable constant
operations.
insts.
If the vectorizable GEP node is built, which should not be scheduled,
and at least one node is a non-gep instruction, need to insert the
vectorized instructions before the last instruction in the list, not
before the first one, otherwise the instructions may be emitted in the
wrong order.
This includes a couple of changes:
1. Moves the code that changes the root node out of the `TryToReduce` lambda and out of the traversal loop.
2. Since that code moved, there isn't much left in `TryToReduce` so the code was inlined.
3. The phi node variable `P` was also being used as a flag that turns on/off the exploration of operands as new seeds. This patch uses a new variable `TryOperandsAsNewSeeds` for this.
4. Simplifies the code executed when vectorization fails.
The logic of the code should be identical to the original, but I may be missing something not caught by tests.
Differential Revision: https://reviews.llvm.org/D149627
Added basic implementation of ShuffleCostBuilder class in
ShuffleCostEstimator and generalized BaseShuffleAnalysis::createShuffle
function to support emission of Value */InstructionCost for the
vectorization/cost estimation.
Differential Revision: https://reviews.llvm.org/D149171
This makes it explicit that these functions work with instructions, and avoids
calling them if the operand is not an instruction.
Differential Revision: https://reviews.llvm.org/D149465
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.
Differential Revision: https://reviews.llvm.org/D149256
If two nodes share the same value, which is replaced in one of the
nodes, need to automatically replace same value in all nodes. Btter to
use WeakTrackingVH for this to fix compiler crash.
llvm.is.fpclass is different from other vectorizable intrinsics in that
it is overloaded on an argument type, not on the return type.
Differential Revision: https://reviews.llvm.org/D148905
Currently the compiler calculates the compensation cost for the
extractelements, removed during vectorization. But if the extractelement
instruction is used in several nodes, we can calculate the compensation
for them several times.
Differential Revision: https://reviews.llvm.org/D148806
Moved computeExtractCost to ShuffleCostEstimator class as another step
for unifying actual codegen/cost estimation for buildvectors.
Differential Revision: https://reviews.llvm.org/D148864
There are 2 problems in the cost estimation for buildvector/gather.
1. If the buildvector/gather node is the same as another one node, need
to estimate the cost of this node as 0.
2. The cost of inserting float point register to non-poison vector is
not 0, it should not be considered free.
Differential Revision: https://reviews.llvm.org/D148801
If the partial matching is found and some other scalars must be
inserted, need to account the cost of the extractelements, transformed
to shuffles, and/or reused entries and calculate the cost of inserting
constants properly into the non-poison vectors.
Also, fixed the cost calculation for final gather/buildvector sequence.
Differential Revision: https://reviews.llvm.org/D148362
Implemented the reshuffling in finalize member function + add basic
support for add member functions, used during vector build.
Part of D110978
Differential Revision: https://reviews.llvm.org/D148279
Implemented the reshuffling in finalize member function + add basic
support for add member functions, used during vector build.
Part of D110978
Differential Revision: https://reviews.llvm.org/D148279
Removed definitions of vectorizeBasicBlock and VectorizeConfig
(possibly a remnant from the BBVectorize pass that was removed
way back in 2017).
Also reduced amount of include dependencies to Transforms/Vectorize.h.
Limit turns out to be implemented in the exact same way for all calls to
tryToVectorizeSequence(). So this patch removes it and implements it internally
as a lambda function.
Differential Revision: https://reviews.llvm.org/D148382
Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial
implementation of the gather/buildvector cost estimation for buildvector
nodes. It will allow to use general codegen infrastructure for better
cost estimation + it improves the cost estimation for the
gathers/buildvectors.
Improved part of D110978.
Differential Revision: https://reviews.llvm.org/D148174
By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support).
Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>).
This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6).
Differential Revision: https://reviews.llvm.org/D148149
Instead of abstract cost of the scalar reduction ops, try to use the
cost of actual reduction operation instructions, where possible. Also,
remove the estimation of the vectorized GEPs pointers for reduced loads,
since it is already handled in the tree.
Differential Revision: https://reviews.llvm.org/D148036
Added ShuffleCostEstimator class and the first adjustExtracts member,
which is just a copy of previous AdjustExtractCost lambda.
Differential Revision: https://reviews.llvm.org/D147787