Now that IR flags are modeled as part of VPRecipeWithIRFlags, include
the flags when printing recipes.
Depends on D150027.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D150029
For a GEP in a pointer chain, if:
1) a pointer chain is unit-strided
2) the base pointer wasn't folded and is sitting in a register somewhere
3) the distance between the GEP and the base pointer is small enough and
can be folded into the addressing mode of the using load/store
Then we can exclude that GEP from the total cost of the pointer chain,
as it will likely be folded away.
In order to check if 3) holds, we need to know the type of memory access
being made by the users of the pointer chain. For that, we need to pass
along a new argument to getPointersChainCost. (Using the source pointer
type of the GEP isn't accurate, see https://reviews.llvm.org/D149889 for
more details).
Also note that 2) is currently an assumption, and could be modelled more
accurately.
This prevents some unprofitable cases from being SLP vectorized on
RISC-V by making the scalar costs cheaper and closer to the actual
codegen.
For now the getPointersChainCost hook is duplicated for RISC-V to prevent
disturbing other targets, but could be merged back in and shared with
other targets in a following patch.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D149654
`tryToVectorizePair()` adds a level of indirection over `tryToVectorizeList()`.
I am not really sure why it is needed, it looks redundant.
I replaced all calls to `tryToVectorizePair()` with calls to
`tryToVectorizeList()` and I am not seeing any failures.
Differential Revision: https://reviews.llvm.org/D151004
I was unable to find a case where this actually changes generated code,
but it enables the bug fix in D144434. It also brings codegen in line
with the handling of stores to uniform addresses in the cost model
(D134460).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D144491
IsUniformStride is only used when the stride is a unit-stride, i.e. in a
plain wide vector load. This tightens the condition and renames it to
isUnitStride. It removes the old unused getUniformStrided() variant, as
isUnitStride should now imply that the stride is known.
Reviewed By: vdmitrie, ABataev
Differential Revision: https://reviews.llvm.org/D150662
This deprecates `vectorizeSimpleInstructions()` and replaces it with separate
functions that vectorize CmpInsts and Inserts.
Differential Revision: https://reviews.llvm.org/D149993
Update VPReplicateRecipe to use VPRecipeWithIRFlags for IR flag
handling. Retire separate MayGeneratePoisonRecipes map.
Depends on D149082.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D150027
Split off from D143938. This moves the planning logic to select the
vectorization factor to LoopVectorizationPlanner as a step towards only
computing costs for individual VFs in LoopVectorizationCostModel and do
planning in LVP.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D150197
When generating code for the epilogue vector loop, we need to re-use the
expansion results for induction steps generated for the main vector
loop, as the pre-header of the epilogue vector loop may not dominate the
vector preheader of the epilogue.
This fixes a reported crash. Note that this is a workaround which should
be removed soon once induction resume value creation is handled in VPlan
directly.
LV/LAA will speculate that (some) strided access patterns have unit stride, and insert runtime checks if required.
LV cost models a multiply by such a stride as free. We did this by keeping around the StrideSet structure, just to check if one of the operands were one of the strides we speculated.
We can instead just ask PredicatedScalarEvolution if either of the operands are one (after predicates are applied). We get mostly the same result - PSE can prove it in more cases in theory - and simpler code.
The original commit wasn't quite NFC, and this was caught by an arguably overly strong assert. Specifically, I'd failed to strip off the integer cast off the SCEV before saving it in the map. The result - other than a failed assert - is that we'd speculate on the casted unknown, not the unknown. The only case I can think of where that might change behavior would be a sext(i1 load). I doubt that case is interesting in practice, but it's good to be strictly NFC on this change regardless.
Original commit message follows..
The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.
We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.
Differential Revision: https://reviews.llvm.org/D147750
This reverts commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a. Running this through broader testing after rebasing is revealing a crash. Reverting while I investigate.
Update skeleton creation logic to use SCEV expansion results from
expanding the pre-header. This avoids another set of SCEV expansions
that may happen after the CFG has been modified.
Fixes#58811.
Depends on D147964.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147965
The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.
We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.
Differential Revision: https://reviews.llvm.org/D147750
A pseudo probe is created with dwarf line information shared with its nearest instruction. If the instruction comes with a dwarf discriminator, it will be shared with the probe as well. This can confuse the later FS-AFDO discriminator assignment pass. To fix this, I'm cleaning up the discriminator fields for probes when they are inserted.
I also notice another possibility to change the discriminator field of pseudo probes in the pipeline before the FS discriminator assignment pass. That is the loop unroller, which assigns duplication factor to instruction being vectorized. I'm disabling that for pseudo probe intrinsics specifically, also for callsites with probes.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D148569
- Rename `LimitForRegisterSize` to `MaxVFOnly` to make the meaning of the limit less ambiguous
- Rename `OpsWidth` to `ActualVF`, which makes it clear that this is the VF we are using for vectorization.
- Replace the if-else code for the initialization of OpsWidth with an std::min.
Differential Revision: https://reviews.llvm.org/D150241
Extend VPRecipeWithIRFlags to also include InBounds and use for VPWidenGEPRecipe.
The last remaining recipe that needs updating for
MayGeneratePoisonRecipes is VPReplicateRecipe.
Depends on D149081.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D149082
This patch introduces a VPRecipeWithIRFlags class to record various IR
flags for a recipe. This allows de-coupling of IR flags from the
underlying instructions. The main benefit is that it allows dropping of
IR flags from recipes directly, without the need to go through
State::MayGeneratePoisonRecipes. The plan is to remove
MayGeneratePoisonRecipes once all relevant recipes are transitioned.
It also allows dropping IR flags during VPlan-to-VPlan transforms, which
will be used in a follow-up patch to implement truncateToMinimalBitwidths
as VPlan-to-VPlan transform.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D149079
Recipes for interleave group members are recorded directly in the
RecipeBuilder. Use it directly instead of going indirectly through
VPlan's Value->VPValue mapping.
New that def-use chains are modeled directly in VPlan, we can simply use
the operands of the recipe we are replacing. There is no need to use the
operands of the underlying instruction to look up a VPValue.
Introduce processBuildVector as a next step to generalize code for cost
estimation and code emission for gather/buildvector nodes.
Differential Revision: https://reviews.llvm.org/D149973
To generate cast instructions, the result type is needed. To allow
creating widened casts without underlying instruction, introduce a new
VPWidenCastRecipe that also holds the result type.
This functionality will be used in a follow-up patch to
implement truncateToMinimalBitwidths as VPlan-to-VPlan transform.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D149081
The pass should not try to revectorize instructions with constant
operands, which were not folded by the IRBuilder. It prevents the
non-terminating loop in the SLP vectorizer for non foldable constant
operations.
insts.
If the vectorizable GEP node is built, which should not be scheduled,
and at least one node is a non-gep instruction, need to insert the
vectorized instructions before the last instruction in the list, not
before the first one, otherwise the instructions may be emitted in the
wrong order.
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.
The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.
D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix#58811.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147964
The step is already expanded in the VPlan. Use this expansion instead.
This is a step towards modeling fixing up IV users in VPlan.
It also fixes a crash casued by SCEV-expanding the Step expression in
fixupIVUsers, where the IR is in an incomplete state
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147963
This includes a couple of changes:
1. Moves the code that changes the root node out of the `TryToReduce` lambda and out of the traversal loop.
2. Since that code moved, there isn't much left in `TryToReduce` so the code was inlined.
3. The phi node variable `P` was also being used as a flag that turns on/off the exploration of operands as new seeds. This patch uses a new variable `TryOperandsAsNewSeeds` for this.
4. Simplifies the code executed when vectorization fails.
The logic of the code should be identical to the original, but I may be missing something not caught by tests.
Differential Revision: https://reviews.llvm.org/D149627
Added basic implementation of ShuffleCostBuilder class in
ShuffleCostEstimator and generalized BaseShuffleAnalysis::createShuffle
function to support emission of Value */InstructionCost for the
vectorization/cost estimation.
Differential Revision: https://reviews.llvm.org/D149171