This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
There are a lot of tests that do not depend upon the IR output
for validation, relying instead on the debug output. For these
tests we can add the -disable-output command line argument.
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.
The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.
Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.
Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.
This completes the transition towards modeling all live-outs directly in
VPlan.
There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.
Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
Model updating IV users directly in VPlan, replace fixupIVUsers.
Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.
This completes the transition towards modeling all live-outs directly in
VPlan.
There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.
Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
This commit relands the changes from "[LV]: Teach LV to recursively
(de)interleave. #89018"
Reason for revert:
- The patch exposed a bug in the IA pass, the bug is now fixed and landed by commit: #122643
This fixes a miscompilation extracted from 525.x264_r, where we were
failing to update the runtime VF of a VPReverseVectorPointerRecipe.
We were removing a use of VF whilst iterating over the users() iterator,
which messed up the iterator in-flight and caused us to miss some
recipes. This fixes it by copying the users into a SmallVector first.
Fixes#122681Fixes#122682
This just copies the same conservative definition from mayWriteToMemory,
and enables more VPInstructions to be hoisted out in LICM.
I think this should give more accurate costs, and I was able to build
llvm-test-suite without the legacy-vplan cost model assertion going off.
Update optimizeForVFAndUF to completely remove the vector loop region
when possible. At the moment, we cannot remove the region if it contains
* widened IVs: the recipe is needed to generate the step vector
* reductions: ComputeReductionResults requires the reduction phi recipe
for codegen.
Both cases can be addressed by more explicit modeling.
The patch also includes a number of updates to allow executing VPlans
without a vector loop region.
Depends on https://github.com/llvm/llvm-project/pull/110004
This fixes a crash when building SPEC CPU 2017 with EVL tail folding
when widening @llvm.log10 intrinsics.
@llvm.log10 and some other intrinsics don't have a corresponding VP
intrinsic, so this fixes the crash by removing the assert and bailing
instead.
Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2.
This patch teaches the LV to use ld2/st2 recursively to support high
interleaving factors.
When building SPEC CPU 2017 with RISC-V and EVL tail folding, this
assertion in VPTypeAnalysis would trigger during the transformation to
EVL recipes:
d8a0709b10/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (L135-L142)
It was caused by this recipe:
```
WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6>
```
Having its type inferred as i16, when ir<%add33> and ir<0> had inferred
types of i32 somehow.
The cause of this turned out to be because the VPTypeAnalysis cache was
getting clobbered: In this transform we were erasing recipes but keeping
around the same mapping from VPValue* to Type*. In the meantime, new
recipes would be created which would have the same address as the old
value. They would then incorrectly get the old erased VPValue*'s cached
type:
```
--- before ---
0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
0x600001ec5450: <badref> <- some value that was erased
--- after ---
0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
0x600001ec5450: WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6> <- a new value that happens to have the same address
```
This fixes this by deferring the erasing of recipes till after the
transformation.
The test case might be a bit flakey since it just happens to have the
right conditions to recreate this. I tried to add an assert in
inferScalarType that every VPValue in the cache was valid, but couldn't
find a way of telling if a VPValue had been erased.
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
This fixes a crash that shows up when building SPEC CPU 2017 with EVL
tail folding on RISC-V.
A VPWidenCastRecipe doesn't always have an underlying value, and in the
case of this crash this happens whenever a widened cast is created via
truncateToMinimalBitwidths.
Fix this by just using the opcode stored in the recipe itself.
I think a similar issue exists with VPWidenIntrinsicRecipe and how it's
widened, but I haven't run into any crashes with it just yet.
This was originally done to reduce the diff for the change. Remove it
and update the remaining tests. NFC modulo reordering of incoming
values.
Clean up after https://github.com/llvm/llvm-project/pull/114292.
I'm not sure if getStepVector was used for other things in the past
where StartIdx was non-zero, but nowadays VPWidenIntOrFpInductionRecipe
is the only user of it, and just passes zero to it. I presume
InstCombine was already catching this so hopefully removing this won't
affect codegen.
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.
Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.
PR: https://github.com/llvm/llvm-project/pull/114292
This moves printing of the final VPlan to ::execute. This ensures the
final VPlan is printed, including recipes that get introduced by late,
lowering transforms and skeleton construction.
Split off from https://github.com/llvm/llvm-project/pull/114292, to
simplify the diff.
This reverts commit f09b16e2671cbcdf7cb7dc7ed705db092a9deda1.
The crash when building llvm-test-suite with stage2 should have been
fixed by 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.
This reverts commit 0678e2058364ec10b94560d27ec7138dfa003287.
This reverts commit 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.
Causes crashes in llvm-test-suite when using stage 2 clang.
Updated ILV.createInductionResumeValues (now createInductionResumeVPValue)
to directly update the VPIRInstructions wrapping the original phis with the
created resume values.
This is the first step towards modeling them completely in VPlan.
Subsequent patches will move creation of the resume values completely
into VPlan.
Depends on https://github.com/llvm/llvm-project/pull/109975.
PR: https://github.com/llvm/llvm-project/pull/110577
If IVUpdateMayOverflow is false, we proved that the induction increment
cannot overflow in the vector loop. This allows setting NUW in some
cases when folding the tail.
PR: https://github.com/llvm/llvm-project/pull/111758
SLEEF math vector library now supports RISC-V target.
Commit: https://github.com/shibatch/sleef/pull/477
This patch enables the use of auto-vectorization with
subsequent replacement by the corresponding SLEEF function.
VPReverseVectorPointer relies on the runtime VF, but in DataWithEVL
tail-folding, EVL (which can be less than VF at runtime) should be used
instead.
This patch updates the logic to check the users of VF and replaces the
second operand if the user is VPReverseVectorPointer.
Currently it's very difficult to improve the cost model for tail-folded
loops because as soon as you add a VPInstruction::computeCost function
that adds the costs of instructions such as
VPInstruction::ActiveLaneMask
and VPInstruction::ExplicitVectorLength the assert in
LoopVectorizationPlanner::computeBestVF fails for some tests. This is
because the VF chosen by the legacy cost model doesn't match the vplan
cost model. See PR #90191. This assert is currently making it difficult
to improve the cost model.
Hopefully we will be in a position to remove the assert soon, however
in order to do that we have to fix up a whole bunch of tests that rely
upon the legacy cost model output. I've tried my best to update
these tests to use vplan output instead.
There is still work needed for the VF=1 case because the vplan cost
model is not printed out in this case. I've not attempted to fix those
in this patch.
In #111310 an assert was added that for the IV overflow check used with
tail folding, the overflow check is never known.
However when applying the loop guards, it looks like it's possible that
we might actually know the IV won't overflow: this occurs in
500.perlbench_r from SPEC CPU 2017 and triggers the assertion:
Assertion failed: (!isIndvarOverflowCheckKnownFalse(Cost, VF * UF) &&
!SE.isKnownPredicate(CmpInst::getInversePredicate(ICmpInst::ICMP_ULT),
TC2OverflowSCEV, SE.getSCEV(Step)) && "unexpectedly proved overflow
check to be known"), function emitIterationCountCheck, file
LoopVectorize.cpp, line 2501.
There is a discrepancy between `isIndvarOverflowCheckKnownFalse` and the
ICMP_ULT check, because the former uses `getSmallConstantMaxTripCount`
which only takes into trip counts that fit into 32 bits. There doesn't
seem to be an easy way to make the assertion aware of this, so this PR
just removes it for now.
There are two potential follow up things from this PR:
1. We miss calculating the max trip count in `@trip_count_max_1024`, it
looks like we might need to apply loop guards somewhere in
`ScalarEvolution::computeExitLimitFromICmp`
2. In `@overflow_at_0`, if `%tc == 0` then we the overflow check will
always return false, even though it will overflow
Fixes https://github.com/llvm/llvm-project/issues/115755
In #101641, support for out of loop reductions with EVL tail folding was
added by transforming selects to vp_merges in
transformRecipestoEVLRecipes.
Whilst the select was previously free, the vp_merge wasn't and incurs a
cost on RISC-V with the VPlan cost model. But this diverged from the
legacy cost model and caused the "VPlan cost model and legacy cost model
disagreed" assertion to trigger when building 502.gcc_r from SPEC CPU
2017.
Neither the select nor vp_merge recipes from the VPlan exist in the
underlying instructions, so I thought it would make the most sense to
fix this by adding the cost to the underlying phi instruction in
getInstructionCost.
It's worth noting that on RISC-V this vp_merge won't actually generate
any instructions because the mask is all true, and will be folded away.
So we should update the cost model at some point to reflect that.
This PR enables scalable loop vectorization for f16 with zvfhmin and
bf16 with zvfbfmin.
Enabling this was dependent on filling out the gaps for scalable
zvfhmin/zvfbfmin codegen, but everything that the loop vectorizer might
emit should now be handled.
It does this by marking f16 and bf16 as legal in
`isLegalElementTypeForRVV`. There are a few users of
`isLegalElementTypeForRVV` that have already been enabled in other PRs:
- `isLegalStridedLoadStore` #115264
- `isLegalInterleavedAccessType` #115257
- `isLegalMaskedLoadStore` #115145
- `isLegalMaskedGatherScatter` #114945
The remaining user is `isLegalToVectorizeReduction`. We can't promote
f16/bf16 reductions to f32 so we need to disable them for scalable
vectors. The cost model actually marks these as invalid, but for
out-of-tree reductions `ComputeReductionResult` doesn't get costed and
it will end up emitting a reduction intrinsic regardless, so we still
need to mark them as illegal. We might be able to remove this
restriction later for fmax and fmin reductions.
The plan for the VF chosen by the legacy cost model could also contain
additional simplifications that cause cost differences. Also check if it
contains simplifications.
Fixes https://github.com/llvm/llvm-project/issues/114860.
Following #90184, this patch emits vp.merge intrinsic, which is used to
set the inactive lanes in a select operation to the RHS instead of
undef. Currently, it is applied to out-loop reduction for EVL
vectorization.
This patch performs transformation to convert
select(header_mask, LHS, RHS)
into
vp.merge(all-true, LHS, RHS, EVL)
And always use the predicated reduction select to set the incoming value
of the reduction phi to support out-loop reduction when using tail
folding with EVL.
TODO: Postpone the adjustment of the predicated reduction select to
VPlanTransform. The current adjustment might be too early, which could
lead to a situation where the predicated reduction select is adjusted,
but the EVL recipes cannot be successfully generated during
VPlanTransform.