This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D158148
This commit refactors the implementation of VPReductionRecipe to use
reference instead of pointer for member RdxDesc. Because the member
RdxDesc in VPReductionRecipe should not be a nullptr, using a reference
will provide clearer semantics.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D158058
Use the printOperands for printing VPInstruction's operands to be more
in line with other recipes and ensure consistent printing after D15719.
Also removes some stray spaces in print output.
Update VPReplicateRecipe to use VPRecipeWithIRFlags for IR flag
handling. Retire separate MayGeneratePoisonRecipes map.
Depends on D149082.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D150027
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.
The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.
D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix#58811.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147964
The entry to the plan is the preheader of the vector loop and
guaranteed to be a VPBasicBlock. Make sure this is the case by
adjusting the type.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D149005
This reverts the revert commit 8c2276f89887d0a27298a1bbbd2181fa54bbb509.
The updated patch re-orders the getDefiningRecipe check in getVPalue to avoid
a use-after-free.
Original commit message:
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.
This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.
It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).
At the moment, this is NFC, but will enable additional simplifications
in D147783.
Depends on D147891.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147892
Asan detects heap-use-after-free, see D147892.
This reverts commit 4fc190351e5af901b6107d162d07e1fbca90934f.
This reverts commit 668045eb77628be13e448ffbb855473ffca1cc43.
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.
This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.
It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).
At the moment, this is NFC, but will enable additional simplifications
in D147783.
Depends on D147891.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147892
This patch adds a NeedsMaskForGaps field to VPInterleaveRecipe to record
whether a mask for gaps is needed. This removes a dependence on the cost
model in VPlan code-generation.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D147467
There is no need to store information about invariance in the recipe.
Replace the fields with checks of the operands using
isDefinedOutsideVectorRegions.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D144489
This patch adds the predicate as additional operand to VPReplicateRecipe
during initial construction. The predicated recipes are later moved into
replicate regions. This simplifies constructions and some VPlan
transformations, like fixed-order recurrence handling.
It also improves codegen in some cases (e.g. for in-loop reductions),
because the recipes remain in the same block.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D143865
Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to
make existing code clearer and more compact.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D142055
This updates the VPAllSuccessorsIterator to not connect the
VPRegionBlock itself to its successors. The successors are connected to
the exit block of the region. At the moment, this doesn't change any
exisint functionality.
But the new schema ensures the following property when used for
VPDominatorTree:
1. Entry & exit blocks of regions dominate the successors of the region.
This allows for convenient checking of dominance between defs and uses
that are not defined in the same region. I will share a follow-up patch
to use it for the VPDominatorTree soon.
Depends on D140500.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D140511
This reduces the size of VPlan.h and avoids future growth of the file
when the graph traits are extended in future patches.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D140500
Explicitly track the UFs supported in a VPlan. This is needed to
allow transformations to restrict the UFs which are supported.
Discussed as separate improvement in D135017.
After 32f1c5531b, getDefiningRecipe returns a VPRecipeBase* so there's
no need to cast to VPRecipeBase.
Suggested by @Ayal during review of D136068, thanks!
The return value of getDef is guaranteed to be a VPRecipeBase and all
users can also accept a VPRecipeBase *. Most users actually case to
VPRecipeBase or a specific recipe before using it, so this change
removes a number of redundant casts.
Also rename it to getDefiningRecipe to make the name a bit clearer.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D136068
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D132585
This patch updates the VPlan native path to use VPRegionBlocks for all
loops in a loop nest. Up to now, only the outermost loop used a region.
This is a step towards unifying both paths and keep things consistent
between them. It also prepares various code-gen parts for modeling the
pre-header in the inner loop vectorizer (D121624).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D123005
The implementations of VPlanDominatorTree, VPlanLoopInfo and VPlanPredicator
are all incompatible with modeling loops in VPlans as region without
explicit back-edges.
Those pieces are not actively used and only exercised by a few gtest
unit tests. They are at the moment blocking progress towards unifying
the native and inner-loop vectorizer paths in D121624 and D123005.
I think we should not block forward progress on unused pieces of code,
so this patch removes the utilities for now. The plan is to re-introduce
them as needed in a way that is compatible with the unified VPlan scheme
used in both the inner loop vectorizer and the native path.
Reviewed By: sguggill
Differential Revision: https://reviews.llvm.org/D123017
This addresses an existing TODO by keeping a mapping of external IR
Value * definitions wrapped in VPValues for use in a VPlan.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D123700
Record widening decisions for memory operations within the planned recipes and
use the recorded decisions in code-gen rather than querying the cost model.
Differential Revision: https://reviews.llvm.org/D110479
As suggested in D99294, this adds a getVPSingleValue helper to use for
recipes that are guaranteed to define a single value. This replaces uses
of getVPValue() which used to default to I = 0.
When iterating over const blocks, the base type in the lambdas needs
to use const VPBlockBase *, otherwise it cannot be used with input
iterators over const VPBlockBase.
Also adjust the type of the input iterator range to const &, as it
does not take ownership of the input range.
This patch adds a blocksOnly helpers which take an iterator range
over VPBlockBase * or const VPBlockBase * and returns an interator
range that only include BlockTy blocks. The accesses are casted to
BlockTy.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D101093
This patch adds a new iterator to traverse through VPRegionBlocks and a
GraphTraits specialization using the iterator to traverse through
VPRegionBlocks.
Because there is already a GraphTraits specialization for VPBlockBase *
and co, a new VPBlockRecursiveTraversalWrapper helper is introduced.
This allows us to provide a new GraphTraits specialization for that
type. Users can use the new recursive traversal by using this wrapper.
The graph trait visits both the entry block of a region, as well as all
its successors. Exit blocks of a region implicitly have their parent
region's successors. This ensures all blocks in a region are visited
before any blocks in a successor region when doing a reverse post-order
traversal of the graph.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D100175
Add an initial version of a helper to determine whether a recipe may
have side-effects.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D100259
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
LIT test would become too obscure. I can imagine that we'd want to CHECK
against VPlan dumps after multiple transformations instead. That would be
easier with plain text dumps than with DOT format.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96628
This reverts commit 6b053c9867a3ede32e51cef3ed972d5ce5b38bc0.
The build is broken:
ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const
>>> referenced by LoopVectorize.cpp
>>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
LIT test would become too obscure. I can imagine that we'd want to CHECK
against VPlan dumps after multiple transformations instead. That would be
easier with plain text dumps than with DOT format.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96628
The individual recipes have been updated to manage their operands using
VPUser a while back. Now that the transition is done, we can instead
make VPRecipeBase a VPUser and get rid of the toVPUser helper.
I am trying to untangle the fast-math-flags propagation logic
in the vectorizers (see a6f022127 for SLP).
The loop vectorizer has a mix of checking FP function attributes,
IR-level FMF, and just wrong assumptions.
I am trying to avoid regressions while fixing this, and I think
the IR-level logic is good enough for that, but it's hard to say
for sure. This would be the 1st step in the clean-up.
The existing test that I changed to include 'fast' actually shows
a miscompile: the function only had the equivalent of nnan, but we
created new instructions that had fast (all FMF set). This is
similar to the example in https://llvm.org/PR35538
Differential Revision: https://reviews.llvm.org/D95452
This patch unifies the way recipes and VPValues are printed after the
transition to VPDef.
VPSlotTracker has been updated to iterate over all recipes and all
their defined values to number those. There is no need to number
values in Value2VPValue.
It also updates a few places that only used slot numbers for
VPInstruction. All recipes now can produce numbered VPValues.
The new test case here contains a first order recurrences and an
instruction that is replicated. The first order recurrence forces an
instruction to be sunk _into_, as opposed to after the replication
region. That causes several things to go wrong including registering
vector instructions multiple times and failing to create dominance
relations correctly.
Instead we should be sinking to after the replication region, which is
what this patch makes sure happens.
Differential Revision: https://reviews.llvm.org/D93629