Introduce processBuildVector as a next step to generalize code for cost
estimation and code emission for gather/buildvector nodes.
Differential Revision: https://reviews.llvm.org/D149973
To generate cast instructions, the result type is needed. To allow
creating widened casts without underlying instruction, introduce a new
VPWidenCastRecipe that also holds the result type.
This functionality will be used in a follow-up patch to
implement truncateToMinimalBitwidths as VPlan-to-VPlan transform.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D149081
The pass should not try to revectorize instructions with constant
operands, which were not folded by the IRBuilder. It prevents the
non-terminating loop in the SLP vectorizer for non foldable constant
operations.
insts.
If the vectorizable GEP node is built, which should not be scheduled,
and at least one node is a non-gep instruction, need to insert the
vectorized instructions before the last instruction in the list, not
before the first one, otherwise the instructions may be emitted in the
wrong order.
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.
The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.
D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix#58811.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147964
The step is already expanded in the VPlan. Use this expansion instead.
This is a step towards modeling fixing up IV users in VPlan.
It also fixes a crash casued by SCEV-expanding the Step expression in
fixupIVUsers, where the IR is in an incomplete state
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147963
This includes a couple of changes:
1. Moves the code that changes the root node out of the `TryToReduce` lambda and out of the traversal loop.
2. Since that code moved, there isn't much left in `TryToReduce` so the code was inlined.
3. The phi node variable `P` was also being used as a flag that turns on/off the exploration of operands as new seeds. This patch uses a new variable `TryOperandsAsNewSeeds` for this.
4. Simplifies the code executed when vectorization fails.
The logic of the code should be identical to the original, but I may be missing something not caught by tests.
Differential Revision: https://reviews.llvm.org/D149627
Added basic implementation of ShuffleCostBuilder class in
ShuffleCostEstimator and generalized BaseShuffleAnalysis::createShuffle
function to support emission of Value */InstructionCost for the
vectorization/cost estimation.
Differential Revision: https://reviews.llvm.org/D149171
The comment is stale, as UserVF is handled before selectVectorizationFactor
is called. Clarify the comment by remove the mention of UserVF.
Suggested as independent improvement in D143938.
LVP operates on the loop it stores in TheLoop. Use it instead of the
argument, to be in line with other member functions.
Suggested as independent improvement in D143938.
The function checks if there's a plan with the specified VF. Update the
comment to match the implementation.
Pointed out as independent improvement in D143938.
Move calls of collect* helpers closer to where the cost-model is used.
Should help simplifying D142669 & D142670.
Differential Revision: https://reviews.llvm.org/D142674
This makes it explicit that these functions work with instructions, and avoids
calling them if the operand is not an instruction.
Differential Revision: https://reviews.llvm.org/D149465
The entry to the plan is the preheader of the vector loop and
guaranteed to be a VPBasicBlock. Make sure this is the case by
adjusting the type.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D149005
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.
Differential Revision: https://reviews.llvm.org/D149256
If two nodes share the same value, which is replaced in one of the
nodes, need to automatically replace same value in all nodes. Btter to
use WeakTrackingVH for this to fix compiler crash.
As far as I can tell, LoopVectorize preserves SCEV, mainly by dint
of forgetting the loop being vectorized. We should mark it as
preserved in the pass manager.
This is a very small compile-time improvement.
Differential Revision: https://reviews.llvm.org/D149147
This is a follow on to D147117 and D147355. In both cases, we were adding special cases to compute zext(BTC+1) instead of zext(BTC)+1 when the BTC+1 computation was known not to overflow.
Differential Revision: https://reviews.llvm.org/D148661
Now that we store the ScalarCost in the VectorizationFactor it is possible to
use it to get a slightly more accurate cost in isMoreProfitable between two
vector factors. This extends the logic added in D101726 to non-tail-folded
cases, using the costs of `VecCost * (TripCount / VF) + ScalarCost * (TripCount % VF)`
to compare VFs where the TripCount is known and we are not folding the tail.
This shouldn't alter very much as small trip counts are usually not vectorized,
but does seem to help in the testcase where 4 * VF4 is chosen as profitable
compared to 2 * VF8 + 4 * scalar.
Differential Revision: https://reviews.llvm.org/D147720
llvm.is.fpclass is different from other vectorizable intrinsics in that
it is overloaded on an argument type, not on the return type.
Differential Revision: https://reviews.llvm.org/D148905
Currently the compiler calculates the compensation cost for the
extractelements, removed during vectorization. But if the extractelement
instruction is used in several nodes, we can calculate the compensation
for them several times.
Differential Revision: https://reviews.llvm.org/D148806
Moved computeExtractCost to ShuffleCostEstimator class as another step
for unifying actual codegen/cost estimation for buildvectors.
Differential Revision: https://reviews.llvm.org/D148864
There are 2 problems in the cost estimation for buildvector/gather.
1. If the buildvector/gather node is the same as another one node, need
to estimate the cost of this node as 0.
2. The cost of inserting float point register to non-poison vector is
not 0, it should not be considered free.
Differential Revision: https://reviews.llvm.org/D148801
This reverts the revert commit 3d8ed8b5192a59104bfbd5bf7ac84d035ee0a4a5.
The new version of the patch adds a set to avoid duplicating work in
isFixedOrderRecurrence, which was previously done through the removed
SinkAfter map.
Original commit message:
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).
Depends on D142885.
Depends on D142589.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D142886
If the partial matching is found and some other scalars must be
inserted, need to account the cost of the extractelements, transformed
to shuffles, and/or reused entries and calculate the cost of inserting
constants properly into the non-poison vectors.
Also, fixed the cost calculation for final gather/buildvector sequence.
Differential Revision: https://reviews.llvm.org/D148362
Implemented the reshuffling in finalize member function + add basic
support for add member functions, used during vector build.
Part of D110978
Differential Revision: https://reviews.llvm.org/D148279
Implemented the reshuffling in finalize member function + add basic
support for add member functions, used during vector build.
Part of D110978
Differential Revision: https://reviews.llvm.org/D148279
This reverts the revert commit 8c2276f89887d0a27298a1bbbd2181fa54bbb509.
The updated patch re-orders the getDefiningRecipe check in getVPalue to avoid
a use-after-free.
Original commit message:
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.
This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.
It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).
At the moment, this is NFC, but will enable additional simplifications
in D147783.
Depends on D147891.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147892