This reverts commit 8dd160f4767f971572eac065c8650d9202ff5bf9.
The recommit contains an adjustment to planContainsAdditionalSimplifications,
which considers changes to the original predicate for compares.
Original commit message:
Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.
Alive2 Proofs for FPCMP test changes: https://alive2.llvm.org/ce/z/WGDz9U
PR: https://github.com/llvm/llvm-project/pull/129430
Remove legacy ILV sinkScalarOperands, which is superseded by the
sinkScalarOperands VPlan transforms.
There are a few cases that aren't handled by VPlan's sinkScalarOperands,
because the recipes doesn't support replicating. Those are pointer
inductions and blends.
We could probably improve this further, by allowing replication for more
recipes, but I don't think the extra complexity is warranted.
Depends on https://github.com/llvm/llvm-project/pull/136021.
PR: https://github.com/llvm/llvm-project/pull/136023
Extend sinking logic to duplicate scalar steps recipe if it enables
sinking, that is if all users in a destination block require all lanes.
This should be the last step before removing legacy sinkScalarOperands.
PR: https://github.com/llvm/llvm-project/pull/136021
Any VPlan we generate that contains a replicator region will result in
replicated blocks in the output, causing a large code size increase.
Reject such VPlans when optimizing for size, as the code size impact is
usually worse than having a scalar epilogue, which we already forbid
with optsize.
This change requires a lot of test changes. For tests of optsize
specifically I've updated the test with the new output, otherwise the
tests have been adjusted to not rely on optsize.
Fixes#66652
Fixed-order recurrence phis cannot be forced to be scalar, they will
always be widened at the moment.
Make sure we don't add them to ForcedScalars, otherwise the legacy cost
model will compute incorrect costs.
This fixes an assertion reported with
https://github.com/llvm/llvm-project/pull/129645.
Support auto-vectorize for fminimum_num and fmaximum_num.
For ARM64 with SVE, scalable vector cannot support yet.
---------
Co-authored-by: Your Name <you@example.com>
Considering that "or disjoint" is the canonical for certain add
operations, then I think we want to support such "add like" operations
when doing ADD+GEP->GEP+GEP rewrites to make things more consistent.
Problem was found when improving ValueTracking, which turned an ADD into
OR, and then suddenly optimizations got worse due to these rewrites no
longer triggering.
Add additional OR simplification to fix a divergence between legacy and
VPlan-based cost model.
This adds a new m_AllOnes matcher by generalizing specific_intval to
int_pred_ty, which takes a predicate to check to support matching both
specific APInts and other APInt predices, like isAllOnes.
Fixes https://github.com/llvm/llvm-project/issues/131359.
This PR accounts for scaled reductions in `calculateRegisterUsage` to
reflect the fact that the number of lanes in their output is smaller
than the VF.
Depends on https://github.com/llvm/llvm-project/pull/126437
Add a version of calculateRegisterUsage that works estimates register
usage for a VPlan. This mostly just ports the existing code, with some
updates to figure out what recipes will generate vectors vs scalars.
There are number of changes in the computed register usages, but they
should be more accurate w.r.t. to the generated vector code.
There are the following changes:
* Scalar usage increases in most cases by 1, as we always create a
scalar canonical IV, which is alive across the loop and is not
considered by the legacy implementation
* Output is ordered by insertion, now scalar registers are added first
due the canonical IV phi.
* Using the VPlan, we now also more precisely know if an induction will
be vectorized or scalarized.
Depends on https://github.com/llvm/llvm-project/pull/126415
PR: https://github.com/llvm/llvm-project/pull/126437
Add an initial CFG simplification transform, which removes the dead
edges for blocks terminated with BranchOnCond true.
At the moment, this removes the edge between middle block and scalar
preheader when folding the tail.
PR: https://github.com/llvm/llvm-project/pull/106748
As noted in 1a9358c090d0507be21c5e9b2d97a23ef1de8ab0, some
simplifications can produce a redundant select where the true and false
operands are the same, which this patch removes.
The is_fpclass test was changed so the condition wasn't made dead.
Vectorizing of fminimumnum and fminimumnum have not support yet. Let's
add the testcase for it now, and we will update the testcase when we
support it.
Similarly to other recipes, update VPScalarIVStepsRecipe to also take
the runtime VF as argument. This removes some unnecessary runtime VF
computations for scalable vectors. It will also allow dropping the
UF == 1 restriction for narrowing interleave groups required in
577631f0a528.
Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the
narrowest type necessary, when the trip-count of a loop is known to be
constant and the only use of the recipe is the condition used by the
vector loop's backedge branch.
Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled
VPlan, which means there's a single UF, but it will be > 1 if unrolling
took place.
This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf.
The recommmitted version limits to transform to cases where no
interleaving is taking place, to avoid a mis-compile when interleaving.
Original commit message:
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.
This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.
This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.
Depends on https://github.com/llvm/llvm-project/pull/106431.
Fixes https://github.com/llvm/llvm-project/issues/82936.
PR: https://github.com/llvm/llvm-project/pull/106441
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part where
only the first lane is demanded.
This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.
In preparation for removing redundant WideIV steps added in
https://github.com/llvm/llvm-project/pull/119284.
PR: https://github.com/llvm/llvm-project/pull/123655
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.
This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.
This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.
Depends on https://github.com/llvm/llvm-project/pull/106431.
Fixes https://github.com/llvm/llvm-project/issues/82936.
PR: https://github.com/llvm/llvm-project/pull/106441
createInductionAdditionalBypassValues is only used for epilogue
vectorization now. Move it out of ILV, which means we do not have to
thread through ExpandedSCEVs and also don't have to track the bypass
values in ILV. Instead, directly create them if needed after executing
the epilogue plan. This moves more the epilogue specific logic out of
the generic executePlan.
Fixes#131359
After #129645, a first-order recurrence will no longer have it's splice
costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and
is dead.
The legacy cost model didn't account for this, so this accounts for it
in planContainsAdditionalSimplifications to avoid the "VPlan cost model
and legacy cost model disagreed" assertion.
This updates VPWidenPointerInductionRecipe::execute to not use the
canonical IV to determine the insert point. Instead, it relies on the
current recipe position. In cases where this is not sufficient, set the
insert point to the first non-phi instruction, to ensure phis are
created together.
Update and generalize materializeBroadcasts to also introduce explicit
broadcasts for VPValues defined in the Plans Entry block.
This fixes a crash when trying to insert the broadcasts generated by
VPTransformState::get after the generating instruction, which isn't
possible after invoke instructions.
Fixes https://github.com/llvm/llvm-project/issues/128838.
Slightly generalize materializeLiveInBroadcasts to also introduce
broadcasts for live-ins used in the vector preheader. This should cover
all live-ins.
If the live-in is used in the vector preheader, insert the broadcast at
the beginning of the block.
emitSCEVChecks checks if SCEVCheckCond matches zero, and returns
nullptr. However, it sets SCEVCheckCond as used before it does this,
which prevents it from being removed during cleanup, resulting in
unreachable blocks being emitted. Fix this.
Add a new VPInstruction::Broadcast opcode and use it to materialize
explicit broadcasts of live-ins. The initial patch only materlizes the
broadcasts if the vector preheader dominates all uses that need it.
Later patches will pick the best valid insert point, thus retiring
implicit hoisting of broadcasts from VPTransformsState::get().
PR: https://github.com/llvm/llvm-project/pull/124644
Currently we only check if the pointers involved in runtime checks do
not wrap if we need to perform dependency checks. If that's not the
case, we generate runtime checks, even if the pointers may wrap (see
test/Analysis/LoopAccessAnalysis/runtime-checks-may-wrap.ll).
If the pointer wraps, then we swap start and end of the runtime check,
leading to incorrect checks.
An Alive2 proof of what the runtime checks are checking conceptually (on
i4 to have it complete in reasonable time) showing the incorrect result
should be https://alive2.llvm.org/ce/z/KsHzn8
Depends on https://github.com/llvm/llvm-project/pull/127410 to avoid
more regressions.
PR: https://github.com/llvm/llvm-project/pull/127543
Create a IR BB directly for the middle.block, instead of creating the IR
BB during skeleton creation and then replacing the middle VPBB with a
VPIRBB.
This moves another part of skeleton creation to VPlan and simplififes
the code slightly by removing code to disconnect the middle block and
vector preheader + the corresponding DT update.
NFC modulo IR block naming and block creation order, which changes the
IR names for the blocks.
Update getOrCreateVPValueForSCEVExpr to only skip expansion of
SCEVUnknown if the underlying value isn't an instruction. Instructions
may be defined in a loop and using them without expansion may break
LCSSA form. SCEVExpander will take care of preserving LCSSA if needed.
We could also try to pass LoopInfo, but there are some users of the
function where it won't be available and main benefit from skipping
expansion is slightly more concise VPlans.
Note that SCEVExpander is now used to expand SCEVUnknown with floats.
Adjust the check in expandCodeFor to only check the types and casts if
the type of the value is different to the requested type. Otherwise we
crash when trying to expand a float and requesting a float type.
Fixes https://github.com/llvm/llvm-project/issues/121518.
PR: https://github.com/llvm/llvm-project/pull/125235
Follow-up as discussed when using VPInstruction::ResumePhi for all resume
values (#112147). This patch explicitly adds incoming values for each
predecessor in VPlan. This simplifies codegen and allows transformations
adjusting the predecessors of blocks with
NFC modulo incoming block order in phis.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
There are a lot of tests that do not depend upon the IR output
for validation, relying instead on the debug output. For these
tests we can add the -disable-output command line argument.
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.
The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.
Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.
Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.
This completes the transition towards modeling all live-outs directly in
VPlan.
There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.
Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.