This patch implements a simple pass that tries to de-duplicate packs. If
there are two packing patterns inserting the exact same values in the
exact same order, then we will keep the top-most one of them. Even
though such patterns may be optimized away by subsequent passes it is
still useful to do this within the vectorizer because otherwise the cost
estimation may be off, making the vectorizer over conservative.
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
Apart from the stylistic improvement, lookup has the nice property of
returning a default-constructed object on failure-to-find, while find
returns the end iterator, which cannot be dereferenced.
After updating #118638 on tip of tree, expanding
VPWidenIntOrFpInductionRecipes fails because it needs the loop region to
get the latch to insert the increment into:
VPBasicBlock *ExitingBB =
Plan->getVectorLoopRegion()->getExitingBasicBlock();
Builder.setInsertPoint(ExitingBB,
ExitingBB->getTerminator()->getIterator());
auto *Next = Builder.createNaryOp(AddOp, {Prev, Inc}, Flags,
WidenIVR->getDebugLoc(), "vec.ind.next");
However after #117506, the region is dissolved so it doesn't work.
This shuffles the dissolveLoopRegions steps to be after
convertToConcreteRecipes so we can use the region when expanding
VPWidenIntOrFpInductionRecipes
Directly replace the canonical IV when we dissolve the containing
region. That ensures that it won't get removed before the region gets
removed, which would result in an invalid region.
This removes the current ordering constraint between
convertToConcreteRecipes and dissolving regions.
PR: https://github.com/llvm/llvm-project/pull/142372
Move VPlan-based calculateRegisterUsage from LoopVectorize
to VPlanAnalysis.cpp. It is a VPlan-based analysis and this helps
to reduce the size of LoopVectorize.
PR: https://github.com/llvm/llvm-project/pull/135673
5f39be5 ([VPlan] Use InstSimplifyFolder instead of TargetFolder) updated
simplifyRecipe to fold live-ins to Values that are not necessarily
Constant, but forgot to update the corresponding PredPHI folder, which
still folds PredPHI constant -> constant. Update it to fold PredPHI
LiveIn -> LiveIn.
Fixes#141968.
CodeRegion's were previously passed as Value*, but then immediately
upcast to BasicBlock. Let's keep the type information around until the
use cases for non-BasicBlock code regions actually materialize.
Move code to get the index of a predecessor and successor to helpers in
VPBlockBase, to avoid duplication and enable future reuse.
Split off from https://github.com/llvm/llvm-project/pull/140409.
Gather nodes with parents, which scalar instructions are used only
outside, are generated before the whole tree vectorization. Need to
teach isGatherShuffledSingleRegisterEntry to check that such nodes are
emitted first and they cannot depend on other nodes, which are emitted
later.
Fixes#141628
This patch implement the VPlan-based cost model for VPReduction,
VPExtendedReduction and VPMulAccumulateReduction.
With this patch, we can calculate the reduction cost by the VPlan-based
cost model so remove the reduction costs in `precomputeCost()`.
Ref: Original instruction based implementation:
https://reviews.llvm.org/D93476
Now that there is only a single FindLastIV recurrence kind, simply pass
the sentinel value instead of the full recurrence descriptor to tighten
the interface.
Now that there is only a single AnyOf recurrence kind, simply pass the
start value instead of the full recurrence descriptor, to tighten the
interface.
Manage fast-math flags using VPIRFlags from VPInstruciton, in inline
with other VPInstructions. With this change, we now print the correctly
flags for ComputeReductionResult, other than that NFC.
Hopefully this fixes the build failure at
https://lab.llvm.org/buildbot/#/builders/116/builds/13423. gcc-14
seems to be able to deduce the type and compile this fine, but for
gcc-7 we need to avoid the Use/Value mismatch I guess.
For more powerful folding with operands that are not necessarily
all-constant, use InstSimplifyFolder instead of TargetFolder in
tryToConstantFold, and rename the function tryToFoldLiveIns.
This adds support for unary operands, and unary + ternary intrinsics in
scalarizeOpOrCmp (FKA scalarizeBinOpOrCmp).
The motivation behind this is to scalarize more intrinsics in
VectorCombine rather than in DAGCombine, so we can sink splats across
basic blocks: see https://github.com/llvm/llvm-project/pull/137786
The main change required is to generalize the existing VecC0/VecC1 rules
across n-ary ops:
- An operand can either be a constant vector or an insert of a scalar
into a constant vector
- If it's an insert, the index needs to be static and in bounds
- If it's an insert, all indices need to be the same across all operands
- If all the operands are constant vectors, bail as it will get constant
folded anyway
This addresses a TODO where previously scalarizeBinopOrCmp
conservatively bailed if one of the operands was a load.
getVectorInstrCost was updated to take in values in
https://reviews.llvm.org/D140498 so we can pass in the scalar value to
be inserted, which should return an accurate cost for a gather.
To prevent regressions on x86 this tries to constant fold NewVecC up
front so we can pass it into TTI and get a more accurate cost.
We want to remove this restriction on RISC-V since this is always
profitable whether or not the scalar is a load.
Update initial construction to connect the Plan's entry to the scalar
preheader during initial construction. This moves a small part of the
skeleton creation out of ILV and will also enable replacing
VPInstruction::ResumePhi with regular VPPhi recipes.
Resume phis need 2 incoming values to start with, the second being the
bypass value from the scalar ph (and used to replicate the incoming
value for other bypass blocks). Adding the extra edge ensures we
incoming values for resume phis match the incoming blocks.
PR: https://github.com/llvm/llvm-project/pull/140132
Update to only build an initial, plain-CFG VPlan once, and then
transform & optimize clones.
This requires changes to ::clone() for VPInstruction and
VPWidenPHIRecipe to allow for proper cloning of the recipes in the
initial VPlan.
PR: https://github.com/llvm/llvm-project/pull/141363
This patch moves the logic to manage IR flags to a separate VPIRFlags
class. For now, VPRecipeWithIRFlags is the only class that inherits
VPIRFlags. The new class allows for simpler passing of flags when
constructing recipes, simplifying the constructors for various recipes
(VPInstruction in particular, which now just has 2 constructors, one
taking an extra VPIRFlags argument.
This mirrors the approach taken for VPIRMetadata and makes it easier to
extend in the future. The patch also adds a unified flagsValidForOpcode
to check if the flags in a VPIRFlags match the provided opcode.
PR: https://github.com/llvm/llvm-project/pull/140621
Building on top of https://github.com/llvm/llvm-project/pull/114305,
replace VPRegionBlocks with explicit CFG before executing.
This brings the final VPlan closer to the IR that is generated and
helps to simplify codegen.
It will also enable further simplifications of phi handling during
execution and transformations that do not have to preserve the
canonical IV required by loop regions. This for example could include
replacing the canonical IV with an EVL based phi while completely
removing the original canonical IV.
PR: https://github.com/llvm/llvm-project/pull/117506
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.