A VPInstruction only has its first lane used if all users use its first
lane only. Use vputils::onlyFirstLaneUsed to continue checking the
recipe's users to handle more cases.
Besides allowing additional introduction of scalar steps when
interleaving in some cases, this also enables using an Add VPInstruction
to model the increment - as a follow up.
Currently when interleaving vector calls with linear arguments,
the Part is ignored and all vector calls use the initial value
from the first lane of the current iteration.
Fix this to extract from the correct part of the linear vector.
Add a new recipe to model scalar cast instructions, without relying on
an underlying instruction.
This allows creating scalar casts, without relying on an underlying
instruction (like the current VPReplicateRecipe). The new recipe is
used to explicitly model both truncating the induction step and the
VPDerivedIVRecipe, thus simplifying both the recipe and code
needed to introduce it.
Truncating VPWidenIntOrFpInductionRecipes should also be modeled using
the new recipe, as follow-up.
PR: https://github.com/llvm/llvm-project/pull/78113
Instead of using the debug location of the underlying instruction, use
the debug location from the recipe. This removes an unneeded dependency
of the underlying instruction.
This patch introduces a new common base class for recipes defining a
single result VPValue. This has been discussed/mentioned at various
previous reviews as potential follow-up and helps to replace various
getVPSingleValue calls.
PR: https://github.com/llvm/llvm-project/pull/77023
At the moment, block and edge masks are created on demand, which means
that they are inserted at the point where they are demanded and then
cached. It is possible that the mask for a block is looked up later at a
point that's not dominated by the point where the mask has been
inserted.
To avoid this, create masks up front on entry to the corresponding basic
block and leave it to VPlan simplification to remove unneeded masks.
Note that we need to create masks for all blocks, if any of the blocks
in the loop needs predication, as computing the mask of a block depends
on the masks of its predecessor.
Needed for #76090.
https://github.com/llvm/llvm-project/pull/76635
As suggested as follow-up in
https://github.com/llvm/llvm-project/pull/72164, manage inbounds via
VPRecipeWithIRFlags.
Note that in some cases we can now preserve inbounds in a few more
cases.
With #70253 landed, selects for reduction results are explicitly used by
ComputeReductionResult and Selects can be marked as not having
side-effects again.
This reverts the revert commit 173032902c960d4d0d67b521d8c149553d8e8ba3.
This patch introduces a new ComputeReductionResult opcode to compute the
final reduction result in the middle block. The code from fixReduction
has been moved to ComputeReductionResult, after some earlier cleanup
changes to model parts of fixReduction explicitly elsewhere as needed.
The recipe may be broken down further in the future.
Note that the phi nodes to merge the reduction result from the trip
count check and the middle block, to be used as resume value for the
scalar remainder loop are also generated based on
ComputeReductionResult.
Once we have a VPValue for the reduction result, this can also be
modeled explicitly and moved out of the recipe.
llvm/lib/IR/Type.cpp:694:
Assertion `isValidElementType(ElementType) && "Element type of a
VectorType must be an integer, floating point, or pointer type."'
failed.
Stack dump:
llvm::FixedVectorType::get(llvm::Type*, unsigned int)
llvm::VPWidenCallRecipe::execute(llvm::VPTransformState&)
llvm::VPBasicBlock::execute(llvm::VPTransformState*)
llvm::VPRegionBlock::execute(llvm::VPTransformState*)
llvm::VPlan::execute(llvm::VPTransformState*)
...
Happens with function calls of void return type.
Move vector pointer generation to a separate VPVectorPointerRecipe.
This untangles address computation from the memory recipes future
and is also needed to enable explicit unrolling in VPlan.
https://github.com/llvm/llvm-project/pull/72164
This reverts commit 19918ac34dc5d304ec6ad413ceae1d4394abe28f.
Fixes#75298. There is still a case where we miss the correct users
outside the main vector loop for reductions, and that is tail-folded
loops with reductions where the final value is stored after the loop.
This should be handled explicitly in #70253
This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPlan::prepareToExecute. Initially, the VF * UF
applies only to the main vector loop region. Once we extend the
scope of VPlan in the future, we may want to associate different VFxUFs
with different vector loop regions (e.g. the epilogue vector loop)
This allows explicitly parameterizing recipes that rely on the
VF * UF, like the canonical induction increment. At the moment, this
mainly helps to avoid generating some duplicated calls to vscale with
scalable vectors. It should also allow using EVL as induction increments
explicitly in D99750. Referring to VF * UF is also needed in other
places that we plan to migrate to VPlan, like the minimum trip count
check during skeleton creation.
The first version creates the value for VF * UF directly in
prepareToExecute to limit the scope of the patch. A follow-on patch will
model VF * UF computation explicitly in VPlan using recipes.
Moved from Phabricator (https://reviews.llvm.org/D157322)
A new disjoint flag was added for OR instructions in #72583.
Update VPRecipeWithIRFlags to also support the new flag. This
allows printing and preserving the disjoint flag in vectorized code.
Compiler crashes when the assertion triggered for zext nneg instruction,
that checks that the instruction cannot produce poison. Changed the base
class for widencast recipe to handle dropping nneg flag to avoid
compiler crash.
This patch replaces the IR based truncateToMinimalBitwidths with a VPlan
version. This has 3 benefits:
1) the VPlan-based version is simpler; we don't need to implement
special codegen for each supported instruction type like the IR based
one.
2) Removes a dependency on the cost-model after VPlan execution and
3) Removes a use of getVPValue that uses underlying values after VPlan
execution (See removed FIXME).
Depends on D149081.
Depends on D149079.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D149903
Parameters marked as uniform take a scalar value, assuming the value is
invariant in the scalar loop. In order to support this, we need to stop
asking for a vector function variant with a default shape assuming that all
arguments will become vector arguments, and instead consider all available
variants and their parameter types.
This patch adds initial type inferrence for VPValues. It infers the
scalar type of a VPValue, by bottom-up traversing through defining
recipes until root nodes with known types are reached (e.g. live-ins or
load recipes). The types are then propagated top down through
operations.
This is intended as building block for a VPlan-based cost model, which
will need access to type information for VPValues/recipes.
Initial testing is done by asserting the inferred type matches the type
of the result value generated for a widen and replicate recipes.
Header phi recipes have the start value (incoming from outside the loop)
as first operand. This wasn't the case for VPWidenPHIRecipes. Instead
the start value was picked during execute() by doing extra work.
To be in line with other recipes, ensure the operand order is as
expected during construction.
Now that VPInstruction can manage fast math flags via
VPRecipeWithIRFlags, use them directly to model the fast-math flags of
the select created for the final reduction value instead of adding them
late.
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.
At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152537
This directly models the flags as part of the recipe, which allows
dropping them using the VPlan infrastructure when required.
It also allows removing the full reference to InductionDescriptor and
limit it to only the opcode.
VPWidenRecipe only needs the opcode to widen, all other information
(flags, debug loc and operands) is already modeled directly via the
recipe.
This removes the remaining uses of the underlying instruction from
VPWidenRecipe::execute.
Add a dedicated debug location to VPRecipeBase to remove another
unneeded use of the underlying LLVM IR instruction and also consolidate
various DL fields in sub classes.
Each recipe can have debug location and it shouldn't rely on reference
to the underlying LLVM IR instructions to retain it. See various recipes
that had separate DL fields already.
Update VPBlendRecipe::print() to print the result directly, instead of
relying on the stored Phi pointer. This brings the recipe in line with
how other recipes are printed.
Extend VPRecipeWithIRFlags to also manage predicates for compares. This
allows removing the custom ICmpULE opcode from VPInstruction which was a
workaround for missing proper predicate handling.
This simplifies the code a bit while also allowing compares with any
predicates. It also fixes a case where the compare predixcate wasn't
printed properly for VPReplicateRecipes.
Discussed/split off from D150398.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D158992
ICmp codegen for VPInstructionD will be extended for other predicates,
and the operands could be any values (not just IV and TC as implied by
the names). Suggested cleanup from 150398.
All dependencies on code from LoopVectorize.cpp have been
removed/refactored. Move the ::execute implementations to other recipe
definitions in VPlanRecipes.cpp
This commit refactors the implementation of VPReductionRecipe to use
reference instead of pointer for member RdxDesc. Because the member
RdxDesc in VPReductionRecipe should not be a nullptr, using a reference
will provide clearer semantics.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D158058
Address post-commit simplification suggestion for 8a56179bcd8c: Replace
IsTruncated by conditionally setting TruncResultTy only if truncation
is required.
Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for
VPInstruction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157144
Model wrap flags directly using VPRecipeWithIRFlags and clean up the
duplicated *NUW opcodes.
D157144 will build on this and also model FMFs for VPInstruction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157194
Use the printOperands for printing VPInstruction's operands to be more
in line with other recipes and ensure consistent printing after D15719.
Also removes some stray spaces in print output.
Update generateInstruction to return the produced value instead of
setting it for each opcode. This reduces the amount of duplicated code
and is a preparation for D153696.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154240