Check the VPlan directly to determine if a VPValue is an optimiziable IV
or IV use instead of checking the underlying IR instructions.
Split off from https://github.com/llvm/llvm-project/pull/112147. This
refactoring enables moving IV end value creation from the legacy
fixupIVUsers to a VPlan-based transform.
There is one case we now won't optimize, that is IVs with subtracts and
non-constant steps. But as this is a minor optimization and doesn't
impact correctness, the benefits of performing the check in VPlan should
outweigh the missed case.
Split DerivedIV simplification off from
https://github.com/llvm/llvm-project/pull/112145 and use to remove the
need for extra checks in createScalarIVSteps. Required an extra
simplification run after IV transforms.
Following #90184, this patch emits vp.merge intrinsic, which is used to
set the inactive lanes in a select operation to the RHS instead of
undef. Currently, it is applied to out-loop reduction for EVL
vectorization.
This patch performs transformation to convert
select(header_mask, LHS, RHS)
into
vp.merge(all-true, LHS, RHS, EVL)
And always use the predicated reduction select to set the incoming value
of the reduction phi to support out-loop reduction when using tail
folding with EVL.
TODO: Postpone the adjustment of the predicated reduction select to
VPlanTransform. The current adjustment might be too early, which could
lead to a situation where the predicated reduction select is adjusted,
but the EVL recipes cannot be successfully generated during
VPlanTransform.
This patch implements explicit unrolling by UF as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).
It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)
In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.
The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.
This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.
The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.
A follow-up patch will clean up all references to VPTransformState::UF.
Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.
PR: https://github.com/llvm/llvm-project/pull/95842
Introduce explicit ExtractFromEnd recipes to extract the final values
for live-outs instead of implicitly extracting in VPLiveOut::fixPhi.
This is a follow-up to the recent changes of modeling extracts for
recurrences and consolidates live-out extract creation for fixed-order
recurrences at a single place: addLiveOutsForFirstOrderRecurrences.
It is also in preparation of replacing VPLiveOut with VPIRInstructions
wrapping the original scalar phis.
PR: https://github.com/llvm/llvm-project/pull/100658
Ignore incoming values with constant false masks when trying to simplify
VPBlendRecipes.
As a follow-on optimization, we should also be able to drop all incoming
values with false masks by creating a new VPBlendRecipe with those
operands dropped.
PR: https://github.com/llvm/llvm-project/pull/89384
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.
This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.
As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.
There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
Recommit with a fix for the use-after-free causing the revert.
This reverts the revert commit f872043e055f4163c3c4b1b86ca0354490174987.
Original commit message:
Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.
This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.
Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.
There are other places that may drop poison-generating flags; those
likely need similar treatment.
Fixes https://github.com/llvm/llvm-project/issues/81872
PR: https://github.com/llvm/llvm-project/pull/83821
This reverts commit c2c1e6ee4ce0df3d000ba880fa6cf58441da6462. It creates
a use after free.
==8342==ERROR: AddressSanitizer: heap-use-after-free on address 0x50f000001760 at pc 0x55b9fb84a8fb bp 0x7ffc18468a10 sp 0x7ffc18468a08
READ of size 1 at 0x50f000001760 thread T0
#0 0x55b9fb84a8fa in dropPoisonGeneratingFlags llvm/lib/Transforms/Vectorize/VPlan.h:1040:13
#1 0x55b9fb84a8fa in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>)::$_0::operator()(llvm::VPRecipeBase*) const llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:1236:23
#2 0x55b9fb84a196 in llvm::VPlanTransforms::dropPoisonGeneratingRecipes(llvm::VPlan&, llvm::function_ref<bool (llvm::BasicBlock*)>) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Can be reproduced with asan on
Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
Transforms/LoopVectorize/X86/pr81872.ll
Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll
Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.
This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.
Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.
There are other places that may drop poison-generating flags; those
likely need similar treatment.
Fixes https://github.com/llvm/llvm-project/issues/81872
PR: https://github.com/llvm/llvm-project/pull/83821
Generalize pattern matchers to take recipe types to match as template
arguments and use it to provide matchers for unary and binary recipes
with specific opcodes and a list of recipe types (VPWidenRecipe,
VPReplicateRecipe, VPWidenCastRecipe, VPInstruction)
The new matchers are used to simplify and generalize the code in
simplifyRecipes.