Following #90184, this patch emits vp.merge intrinsic, which is used to
set the inactive lanes in a select operation to the RHS instead of
undef. Currently, it is applied to out-loop reduction for EVL
vectorization.
This patch performs transformation to convert
select(header_mask, LHS, RHS)
into
vp.merge(all-true, LHS, RHS, EVL)
And always use the predicated reduction select to set the incoming value
of the reduction phi to support out-loop reduction when using tail
folding with EVL.
TODO: Postpone the adjustment of the predicated reduction select to
VPlanTransform. The current adjustment might be too early, which could
lead to a situation where the predicated reduction select is adjusted,
but the EVL recipes cannot be successfully generated during
VPlanTransform.
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.
Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.
PR: https://github.com/llvm/llvm-project/pull/109975
Infers member MayReadFromMemory, MayWriteToMemory, and
MayHaveSideEffects based on intrinsic attributes.
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
In some cases, Previous (and its operands) can be hoisted. This allows
supporting additional cases where sinking of all users of to FOR fails,
e.g. due having to sink recipes with side-effects.
This fixes a crash where we fail to create a scalar VPlan for a
first-order recurrence, but can create a vector VPlan, because the trunc
instruction of an IV which generates the previous value of the
recurrence has been optimized to a truncated induction recipe, thus
hoisting it to the beginning.
Fixes https://github.com/llvm/llvm-project/issues/106523.
PR: https://github.com/llvm/llvm-project/pull/108945
Enabled initial support for max safe distance in DataWithEVL mode. If
max safe distance is required, need to emit special code:
CMP = icmp ult AVL, MAX_SAFE_DISTANCE
SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE
EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL)
while vectorize the loop in DataWithEVL tail folding mode.
Reviewers: fhahn
Reviewed By: fhahn
Pull Request: https://github.com/llvm/llvm-project/pull/102897
Use getAllocTypeSize to get compute the offset to the start of
interleave groups instead getScalarSizeInBits, which may return 0 for
pointers. This is in line with the analysis building the interleave
groups and fixes a mis-compile reported for
https://github.com/llvm/llvm-project/pull/106431.
Use VPWidenIntrinsicRecipe
(https://github.com/llvm/llvm-project/pull/110486)
to create vp.select intrinsics. This potentially offers an alternative
to duplicating EVL recipes for all existing recipes.
There are some recipes that will need duplicates (at least at the
moment), due to extra code-gen needs (e.g. widening loads and stores).
But in cases the intrinsic can directly be used, creating the widened
intrinsic directly would reduce the need to duplicate some recipes.
PR: https://github.com/llvm/llvm-project/pull/110489
This patch splits off intrinsic hanlding to a new
VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the
intrinsic ID to widen and the scalar result type (in case the intrinsic
is overloaded on the result type). It does not need access to an
underlying IR call instruction or function.
This means VPWidenIntrinsicRecipe can be created easily without access
to underlying IR.
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.
In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
Patch explicitly models AVL as sub original TC, EVL_PHI instead of
having it in EXPLICIT-VECTOR-LENGTH VPInstruction. Required for correct
safe dependence distance suport.
Reviewers: fhahn, ayalz
Reviewed By: ayalz
Pull Request: https://github.com/llvm/llvm-project/pull/108869
This patch implements explicit unrolling by UF as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).
It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)
In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.
The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.
This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.
The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.
A follow-up patch will clean up all references to VPTransformState::UF.
Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.
PR: https://github.com/llvm/llvm-project/pull/95842
This moves licm after expanding replicate regions. This fixes a crash
when trying to hoist a predicated VPReplicateRecipes which later get
expanded to replicate regions.
Hoisting replicate regions out was not intended (see the discussion and
at the review and comment on shallow traversal in licm()).
Fixes https://github.com/llvm/llvm-project/issues/109510.
Add a new getSCEVExprForVPValue utility which can be used to get a SCEV
expression for a VPValue. The initial implementation only returns SCEVs
for live-in IR values (by constructing a SCEV based on the live-in IR
value) and VPExpandSCEVRecipe. This is enough to serve its first use,
getting a SCEV for a VPlan's trip count, but will be extended in the
future.
It also removes createTripCountSCEV, as the new helper can be used to
retrieve the SCEV from the VPlan.
PR: https://github.com/llvm/llvm-project/pull/94464
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe
and VPScalarIVStepsRecipe.
Use them to simplify the code to create scalar IV steps slightly.
Whilst trying to write some VPlan unit tests I realised
that we don't need to pass a ScalarEvolution object into
VPlanTransforms::optimize because the only thing we
actually need is a LLVMContext.
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.
PR: https://github.com/llvm/llvm-project/pull/95305
The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL
argument. The new recipe replaces `VPWidenRecipe` in
`tryAddExplicitVectorLength` for each binary and unary operations.
Follow up patches will extend support for remaining cases, like `FCmp`
and `ICmp`
This is a step towards further breaking up the rather large
tryToBuildVPlanWithVPRecipes. It moves logic create interleave groups to
VPlanTransforms.cpp, where similar replacements for other recipes are
defined as well (e.g. EVL-based ones)
It's not possible to pick the best mask to remove when optimising
VPBlend at construction and so this patch refactors the code to move the
decision (and thus transformation) to VPlanTransforms.
NOTE: This patch does not change the decision of which mask to pick.
That will be done in a following PR to keep this patch as NFC from an
output point of view.
Introduce explicit ExtractFromEnd recipes to extract the final values
for live-outs instead of implicitly extracting in VPLiveOut::fixPhi.
This is a follow-up to the recent changes of modeling extracts for
recurrences and consolidates live-out extract creation for fixed-order
recurrences at a single place: addLiveOutsForFirstOrderRecurrences.
It is also in preparation of replacing VPLiveOut with VPIRInstructions
wrapping the original scalar phis.
PR: https://github.com/llvm/llvm-project/pull/100658
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.
The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.
In follow-up changes, it will also be used to create phis for reduction
and induction resume values.
Depends on https://github.com/llvm/llvm-project/pull/92651
PR: https://github.com/llvm/llvm-project/pull/94760
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.
A number of crashes have been fixed by separate fixes, including
ttps://github.com/llvm/llvm-project/pull/96622. This version of the
PR also pre-computes the costs for branches (except the latch) instead
of computing their costs as part of costing of replicate regions, as
there may not be a direct correspondence between original branches and
number of replicate regions.
Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.
Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.
After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.
This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.
Depends on https://github.com/llvm/llvm-project/pull/92525
PR: https://github.com/llvm/llvm-project/pull/92651
This reverts commit 242cc200ccb24e22eaf54aed7b0b0c84cfc54c0b and
eea150c84053035163f307b46549a2997a343ce9, as it is causing a build bot
failure and there have been a number of crashes reported at
https://github.com/llvm/llvm-project/pull/92555
This reverts commit 6f538f6a2d3224efda985e9eb09012fa4275ea92.
Extra tests for crashes discovered when building Chromium have been
added in fb86cb7ec157689e, 3be7312f81ad2.
Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
This reverts commit 90fd99c0795711e1cf762a02b29b0a702f86a264.
This reverts commit 43e6f46936e177e47de6627a74b047ba27561b44.
Causes crashes, see comments on https://github.com/llvm/llvm-project/pull/92555.
This reverts commit 46080abe9b136821eda2a1a27d8a13ceac349f8c.
Extra tests have been added in 52d29eb287.
Original message:
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
This adds a new interface to compute the cost of recipes, VPBasicBlocks,
VPRegionBlocks and VPlan, initially falling back to the legacy cost model
for all recipes. Follow-up patches will gradually migrate recipes to
compute their own costs step-by-step.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree. Even though I checked a number of different build
configurations on AArch64 and X86, there may be some differences
that have been missed.
Additional discussions and context can be found in @arcbbb's
https://github.com/llvm/llvm-project/pull/67647 and
https://github.com/llvm/llvm-project/pull/67934 which is an earlier
version of the current PR.
PR: https://github.com/llvm/llvm-project/pull/92555
This patch uses the ExtractFromEnd VPInstruction opcode
to extract the value of a FOR to be used as resume value for the ph in
the scalar loop.
It adds a new live-out that temporarily wraps the FOR phi in the scalar
loop. fixFixedOrderRecurrence will process live outs for fixed order
recurrence phis by creating a new phi node in the scalar preheader,
using the generated value for the live-out as incoming value from the
middle block and the original start value as incoming value for the
other edge. Creation of the phi in the preheader, as well as updating
the phi in the scalar loop will also be moved to VPlan in the future,
eventually retiring fixFixedOrderRecurrence
Depends on https://github.com/llvm/llvm-project/pull/93395
PR: https://github.com/llvm/llvm-project/pull/93396
This patch introduces a new ExtractFromEnd VPInstruction opcode to
extract the value of a FOR for users outside the loop (i.e. in the
scalar loop's exits). This moves the first part of fixing first order
recurrences to VPlan, and removes some additional code to patch up
live-outs, which is now handled automatically.
The majority of test changes is due to changes in the order of which the
extracts are generated now. As we are now using VPTransformState to
generate the extracts, we may be able to re-use existing extracts in the
loop body in some cases. For scalable vectors, in some cases we now have
to compute the runtime VF twice, as each extract is now independent, but
those should be trivial to clean up for later passes (and in line with
other places in the code that also liberally re-compute runtime VFs).
PR: https://github.com/llvm/llvm-project/pull/93395
The transform updates all users of inductions to work based on EVL,
instead
of the VF directly. At the moment, widened inductions cannot be updated,
so
bail out if the plan contains any.
This patch introduces a check before applying EVL transform. If any
recipes in loop rely on RuntimeVF, the plan is discarded.