Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.
In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
Use the reduction resume values from the phis in the scalar header,
instead of collecting them in a map. This removes some complexity from
the general executePlan code paths and pushes it to only the epilogue
vectorization part.
This patch implements explicit unrolling by UF as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).
It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)
In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.
The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.
This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.
The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.
A follow-up patch will clean up all references to VPTransformState::UF.
Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.
PR: https://github.com/llvm/llvm-project/pull/95842
Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe
and VPScalarIVStepsRecipe.
Use them to simplify the code to create scalar IV steps slightly.
Use getBestVF to select VF up-front and only use
selectVectorizationFactor to get the VF legacy VF to check the
vectorization decision matches the VPlan-based cost model.
PR: https://github.com/llvm/llvm-project/pull/103033
As suggested in https://github.com/llvm/llvm-project/pull/103033, more
accurately rename to getPlanFor , as it simplify returns the VPlan for
VF, relying on the fact that there is a single VPlan for each VF at the
moment.
This patch moves the logic to create remarks for instructions with
invalid costs to work on recipes and decoupling it from
selectVectorizationFactor. This is needed to replace the remaining uses
of selectVectorizationFactor with getBestPlan using the VPlan-based cost
model.
The current implementation iterates over all VPlans and their recipes
again, to find recipes with invalid costs, which is more work but will
only be done when remarks for LV are enabled. Once the remaining uses of
selectVectorizationFactor are retired, we can collect VPlans with
invalid costs as part of getBestPlan if we want to optimize the remarks
case a bit, at the cost of adding additional complexity.
PR: https://github.com/llvm/llvm-project/pull/99322
Replace getBestPlan by getBestVF which simply finds the best
VF out of the VFs for the available VPlans.
Then use getBestPlan to retrieve the corresponding VPlan.
This allows using getBestVF & getBestPlan for epilogue vectorization
as well. As the same plan may be used to vectorize both the main
and epilogue loop, restricting the VF of the best plan would cause
issues.
PR: https://github.com/llvm/llvm-project/pull/98821
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.
A number of crashes have been fixed by separate fixes, including
ttps://github.com/llvm/llvm-project/pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.
Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
The HeaderVPBB is retrieved in a similar fashion already.
This is in preparation for splitting up the very large
tryToBuildVPlanWithVPRecipes into several distinct functions, as
suggested multiple times, including in
https://github.com/llvm/llvm-project/pull/94760
This patch moves the check if any vector instructions will be generated
from getInstructionCost to be based on VPlan. This simplifies
getInstructionCost, is more accurate as we check the final result and
also allows us to exit early once we visit a recipe that generates
vector instructions.
The helper can then be re-used by the VPlan-based cost model to match
the legacy selectVectorizationFactor behavior, this fixing a crash and
paving the way to recommit
https://github.com/llvm/llvm-project/pull/92555.
PR: https://github.com/llvm/llvm-project/pull/96622
This reverts commit 242cc200ccb24e22eaf54aed7b0b0c84cfc54c0b and
eea150c84053035163f307b46549a2997a343ce9, as it is causing a build bot
failure and there have been a number of crashes reported at
https://github.com/llvm/llvm-project/pull/92555
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.
Extra tests for crashes discovered when building Chromium have been
added in fb86cb7ec157689e, 3be7312f81ad2.
Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
This reverts commit 90fd99c0795711e1cf762a02b29b0a702f86a264.
This reverts commit 43e6f46936e177e47de6627a74b047ba27561b44.
Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.
This reverts commit 46080abe9b136821eda2a1a27d8a13ceac349f8c.
Extra tests have been added in 52d29eb287.
Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
This reverts the revert commit c6e01627acf859.
This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in bce3bfced5fe0b019 and an assertion has been added in
c7209cbb8be7a3c65813.
--------------------------------
Original commit message:
Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.
The patch incorporates feedback from https://reviews.llvm.org/D153697.
This fixes the #62565, as now there aren't multiple uses of the
start/new values.
Fixes https://github.com/llvm/llvm-project/issues/62565
PR: https://github.com/llvm/llvm-project/pull/78304
This reverts the revert commit 589c7abb03448.
This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in 399ff08e29d.
--------------------------------
Original commit message:
Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.
The patch incorporates feedback from https://reviews.llvm.org/D153697.
This fixes the #62565, as now there aren't multiple uses of the
start/new values.
Fixes https://github.com/llvm/llvm-project/issues/62565
PR: https://github.com/llvm/llvm-project/pull/78304
Recommit with a fix for the use-after-free causing the revert.
This reverts the revert commit f872043e055f4163c3c4b1b86ca0354490174987.
Original commit message:
Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.
This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.
Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.
There are other places that may drop poison-generating flags; those
likely need similar treatment.
Fixes https://github.com/llvm/llvm-project/issues/81872
PR: https://github.com/llvm/llvm-project/pull/83821
This reverts commit c2c1e6ee4ce0df3d000ba880fa6cf58441da6462. It creates
a use after free.
==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08
READ of size 1 at 0x50f000001760 thread T0
#0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13
#1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>)::$_0::operator()(llvm::VPRecipeBase*) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23
#2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Can be reproduced with asan on
Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
Transforms/LoopVectorize/X86/pr81872.ll
Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.
This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.
Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.
There are other places that may drop poison-generating flags; those
likely need similar treatment.
Fixes https://github.com/llvm/llvm-project/issues/81872
PR: https://github.com/llvm/llvm-project/pull/83821
This patch introduces a new ComputeReductionResult opcode to compute the
final reduction result in the middle block. The code from fixReduction
has been moved to ComputeReductionResult, after some earlier cleanup
changes to model parts of fixReduction explicitly elsewhere as needed.
The recipe may be broken down further in the future.
Note that the phi nodes to merge the reduction result from the trip
count check and the middle block, to be used as resume value for the
scalar remainder loop are also generated based on
ComputeReductionResult.
Once we have a VPValue for the reduction result, this can also be
modeled explicitly and moved out of the recipe.
Reductions with intermediate stores currently need to be fixed in order
of their intermediate stores. Instead of doing this at fixup time after
code has been generated, sort the reductions in adjustRecipesForReductions.
This makes the order explicit in VPlan and will enable removing
fixReductions with modeling computing the final reduction result in
VPlan, followed by also modeling the intermediate stores explicitly.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.
This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.
This has the advantage of keeping the logic (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D158779
Extend VPRecipeWithIRFlags to also manage predicates for compares. This
allows removing the custom ICmpULE opcode from VPInstruction which was a
workaround for missing proper predicate handling.
This simplifies the code a bit while also allowing compares with any
predicates. It also fixes a case where the compare predixcate wasn't
printed properly for VPReplicateRecipes.
Discussed/split off from D150398.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D158992
Replace ConditionalAssume set by treating conditional assumes like other
predicated instructions (i.e. create a VPReplicateRecipe with a mask)
and later remove any assume recipes with masks during VPlan cleanup.
This reduces coupling of VPlan construction and Legal by removing a
shared set between the 2 and results in a cleaner code structure
overall.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D157034
If a candidate VF for epilogue vectorization is greater than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154264
If a candidate VF for epilogue vectorization is less than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D154264