When building SPEC CPU 2017 with RISC-V and EVL tail folding, this
assertion in VPTypeAnalysis would trigger during the transformation to
EVL recipes:
d8a0709b10/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (L135-L142)
It was caused by this recipe:
```
WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6>
```
Having its type inferred as i16, when ir<%add33> and ir<0> had inferred
types of i32 somehow.
The cause of this turned out to be because the VPTypeAnalysis cache was
getting clobbered: In this transform we were erasing recipes but keeping
around the same mapping from VPValue* to Type*. In the meantime, new
recipes would be created which would have the same address as the old
value. They would then incorrectly get the old erased VPValue*'s cached
type:
```
--- before ---
0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
0x600001ec5450: <badref> <- some value that was erased
--- after ---
0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
0x600001ec5450: WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6> <- a new value that happens to have the same address
```
This fixes this by deferring the erasing of recipes till after the
transformation.
The test case might be a bit flakey since it just happens to have the
right conditions to recreate this. I tried to add an assert in
inferScalarType that every VPValue in the cache was valid, but couldn't
find a way of telling if a VPValue had been erased.
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
This fixes a crash that shows up when building SPEC CPU 2017 with EVL
tail folding on RISC-V.
A VPWidenCastRecipe doesn't always have an underlying value, and in the
case of this crash this happens whenever a widened cast is created via
truncateToMinimalBitwidths.
Fix this by just using the opcode stored in the recipe itself.
I think a similar issue exists with VPWidenIntrinsicRecipe and how it's
widened, but I haven't run into any crashes with it just yet.
Properly set VPWidenIntOrFpInductionRecipe's debug location in the
recipe and use it, instead of using the debug location of the underlying
IR instruction.
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.
Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.
PR: https://github.com/llvm/llvm-project/pull/114292
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.
The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.
PR: https://github.com/llvm/llvm-project/pull/112138
Introduce a general recipe to generate a scalar phi. Lower
VPCanonicalIVPHIRecipe and VPEVLBasedIVRecipe to VPScalarIVPHIrecipe
before plan execution, avoiding the need for duplicated ::execute
implementations. There are other cases that could benefit, including
in-loop reduction phis and pointer induction phis.
Builds on a similar idea as
https://github.com/llvm/llvm-project/pull/82270.
PR: https://github.com/llvm/llvm-project/pull/114305
VPReverseVectorPointer relies on the runtime VF, but in DataWithEVL
tail-folding, EVL (which can be less than VF at runtime) should be used
instead.
This patch updates the logic to check the users of VF and replaces the
second operand if the user is VPReverseVectorPointer.
Prior to this patch, optimizeForVFAndUF may optimize the conditional
branch for a VPBasicblock to have a constant condition, but
unnecessarily drops the DILocation attachment when it does so; this
patch changes it to preserve the DILocation.
Add extra arguments to connectBlocks which allow selecting which
existing predecessor/successor to update. This avoids having to
disconnect blocks first unnecessarily.
Suggested in https://github.com/llvm/llvm-project/pull/114292.
Following #90184, this patch emits vp.merge intrinsic, which is used to
set the inactive lanes in a select operation to the RHS instead of
undef. Currently, it is applied to out-loop reduction for EVL
vectorization.
This patch performs transformation to convert
select(header_mask, LHS, RHS)
into
vp.merge(all-true, LHS, RHS, EVL)
And always use the predicated reduction select to set the incoming value
of the reduction phi to support out-loop reduction when using tail
folding with EVL.
TODO: Postpone the adjustment of the predicated reduction select to
VPlanTransform. The current adjustment might be too early, which could
lead to a situation where the predicated reduction select is adjusted,
but the EVL recipes cannot be successfully generated during
VPlanTransform.
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.
Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.
PR: https://github.com/llvm/llvm-project/pull/109975
Infers member MayReadFromMemory, MayWriteToMemory, and
MayHaveSideEffects based on intrinsic attributes.
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
In some cases, Previous (and its operands) can be hoisted. This allows
supporting additional cases where sinking of all users of to FOR fails,
e.g. due having to sink recipes with side-effects.
This fixes a crash where we fail to create a scalar VPlan for a
first-order recurrence, but can create a vector VPlan, because the trunc
instruction of an IV which generates the previous value of the
recurrence has been optimized to a truncated induction recipe, thus
hoisting it to the beginning.
Fixes https://github.com/llvm/llvm-project/issues/106523.
PR: https://github.com/llvm/llvm-project/pull/108945
Enabled initial support for max safe distance in DataWithEVL mode. If
max safe distance is required, need to emit special code:
CMP = icmp ult AVL, MAX_SAFE_DISTANCE
SAFE_AVL = select CMP, AVL, MAX_SAFE_DISTANCE
EVL = call i32 @llvm.experimental.get.vector.length(i64 SAFE_AVL)
while vectorize the loop in DataWithEVL tail folding mode.
Reviewers: fhahn
Reviewed By: fhahn
Pull Request: https://github.com/llvm/llvm-project/pull/102897
Use getAllocTypeSize to get compute the offset to the start of
interleave groups instead getScalarSizeInBits, which may return 0 for
pointers. This is in line with the analysis building the interleave
groups and fixes a mis-compile reported for
https://github.com/llvm/llvm-project/pull/106431.
Use VPWidenIntrinsicRecipe
(https://github.com/llvm/llvm-project/pull/110486)
to create vp.select intrinsics. This potentially offers an alternative
to duplicating EVL recipes for all existing recipes.
There are some recipes that will need duplicates (at least at the
moment), due to extra code-gen needs (e.g. widening loads and stores).
But in cases the intrinsic can directly be used, creating the widened
intrinsic directly would reduce the need to duplicate some recipes.
PR: https://github.com/llvm/llvm-project/pull/110489
This patch splits off intrinsic hanlding to a new
VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the
intrinsic ID to widen and the scalar result type (in case the intrinsic
is overloaded on the result type). It does not need access to an
underlying IR call instruction or function.
This means VPWidenIntrinsicRecipe can be created easily without access
to underlying IR.
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.
In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
Patch explicitly models AVL as sub original TC, EVL_PHI instead of
having it in EXPLICIT-VECTOR-LENGTH VPInstruction. Required for correct
safe dependence distance suport.
Reviewers: fhahn, ayalz
Reviewed By: ayalz
Pull Request: https://github.com/llvm/llvm-project/pull/108869
This patch implements explicit unrolling by UF as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).
It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)
In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.
The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.
This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.
The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.
A follow-up patch will clean up all references to VPTransformState::UF.
Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.
PR: https://github.com/llvm/llvm-project/pull/95842
This moves licm after expanding replicate regions. This fixes a crash
when trying to hoist a predicated VPReplicateRecipes which later get
expanded to replicate regions.
Hoisting replicate regions out was not intended (see the discussion and
at the review and comment on shallow traversal in licm()).
Fixes https://github.com/llvm/llvm-project/issues/109510.
Add a new getSCEVExprForVPValue utility which can be used to get a SCEV
expression for a VPValue. The initial implementation only returns SCEVs
for live-in IR values (by constructing a SCEV based on the live-in IR
value) and VPExpandSCEVRecipe. This is enough to serve its first use,
getting a SCEV for a VPlan's trip count, but will be extended in the
future.
It also removes createTripCountSCEV, as the new helper can be used to
retrieve the SCEV from the VPlan.
PR: https://github.com/llvm/llvm-project/pull/94464
We already pass a Type object into the VPTypeAnalysis constructor, which
can be used to obtain the context. While in the same area it also made
sense to avoid passing the context into the VPTransformState and
VPCostContext constructors.
Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe
and VPScalarIVStepsRecipe.
Use them to simplify the code to create scalar IV steps slightly.
Whilst trying to write some VPlan unit tests I realised
that we don't need to pass a ScalarEvolution object into
VPlanTransforms::optimize because the only thing we
actually need is a LLVMContext.
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.
PR: https://github.com/llvm/llvm-project/pull/95305
The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL
argument. The new recipe replaces `VPWidenRecipe` in
`tryAddExplicitVectorLength` for each binary and unary operations.
Follow up patches will extend support for remaining cases, like `FCmp`
and `ICmp`
This is a step towards further breaking up the rather large
tryToBuildVPlanWithVPRecipes. It moves logic create interleave groups to
VPlanTransforms.cpp, where similar replacements for other recipes are
defined as well (e.g. EVL-based ones)
It's not possible to pick the best mask to remove when optimising
VPBlend at construction and so this patch refactors the code to move the
decision (and thus transformation) to VPlanTransforms.
NOTE: This patch does not change the decision of which mask to pick.
That will be done in a following PR to keep this patch as NFC from an
output point of view.