Update target-specific test to not force VF/UF, but instead use the
cost-model. There are similar tests arleady outside X86 and those force
VF & UF.
With this change, the target specific test checks the cost model.
Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required,
and should preserve the original spirit of the test.
A half with only zvfhmin or bfloat will end up getting promoted to a f32
for most instructions.
Unless the loop consists only of memory ops and permutation instructions
which don't need promoted (is this common?), we'll end up using double
the LMUL than what's currently being returned by getRegUsageForType.
Since this is used by the loop vectorizer, it seems better to be
conservative and assume that any usage of a zvfhmin half/bfloat will end
up being widened to a f32
Add a new VPIRInstruction recipe to wrap existing IR instructions not to
be modified during execution, execept for PHIs. For PHIs, a single
VPValue
operand is allowed, and it is used to add a new incoming value for the
single predecessor VPBB. Expect PHIs, VPIRInstructions cannot have any
operands.
Depends on https://github.com/llvm/llvm-project/pull/100658.
PR: https://github.com/llvm/llvm-project/pull/100735
RuntimePointerChecking::reset() is used to reset its state between
subsequent analysis invocations. Also reset CanUseDiffCheck to its
default (true). Otherwise it might have been set to false during a
previous analysis invocation, which unnecessarily pessimizes the
subsequent analysis invocations with a pruned set of dependences.
This is in line with the other fields being reset.
At the moment, the full cost of all interleave group members is assigned
to the instruction at the group's insert position, even if the decision
was to not form an interleave group.
This can lead to inaccurate cost estimates, e.g. if the instruction at
the insert position is dead. If the decision is to not vectorize but
scalarize or scather/gather, then the cost will be to total cost for all
members. In those cases, assign individual the cost per member, to more
closely reflect to choice per instruction.
This fixes a divergence between legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/108098.
The check for IV increments in collectUsersInEntryBlock currently
triggers for exit-block PHIs which use the IV start value, resulting in
us failing to add the input value for the middle block to these PHIs.
Fix this by amending the check for IV increments to only include
incoming values that are instructions inside the loop.
Fixes#108004
Update planContainsAdditionalSimplifications to also check phis not in
the loop header. This ensures we don't miss cases where VPBlendRecipes
(which correspond to such phis) have been simplified.
Fixes https://github.com/llvm/llvm-project/issues/107473.
Similar to VFxUF, also add a VF VPValue to VPlan and use it to get the
runtime VF in VPWidenIntOrFpInductionRecipe. Code for VF is only
generated if there are users of VF, to avoid unnecessary test changes.
PR: https://github.com/llvm/llvm-project/pull/95305
Update some tests with loop-invariant instructions, where hoisting them
out of the loop changes the vectorization decision. This should preserve
their original spirit when making further improvements.
The patch adds `VPWidenEVLRecipe` which represents `VPWidenRecipe` + EVL
argument. The new recipe replaces `VPWidenRecipe` in
`tryAddExplicitVectorLength` for each binary and unary operations.
Follow up patches will extend support for remaining cases, like `FCmp`
and `ICmp`
The code makes assumptions later on the operations and their inputs
being scalar in the loops that are processed, so we should make sure
this is the case in the legalizer.
There are some cases where only the first operand is marked for
truncation. In that case, the compare won't be truncated which would
incorrectly trigger the assertion.
It also shows that the check pre 3fe6a064f15c also considered compares
truncated that cannot be truncated.
The current check for truncated compares in getInstructionCost misses
cases where either the first or both operands are constants.
Check directly if the compare is marked for truncation. In that case,
the minimum bitwidth is that of the operands.
The patch also adds asserts to ensure that.
This fixes a divergence between legacy and VPlan-based cost model, where
the legacy cost model incorrectly estimated the cost of compares with
truncated operands.
Fixes https://github.com/llvm/llvm-project/issues/107171.
Successful vectorization message is emitted even
after "Result" is false. "Result" = false indicates failure of one of
the legality check and thus
successful message should not be printed.
Similarly to dd94537b4, setVectorizedCallDecision also did not consider
ForcedScalars. This lead to VPlans not reflecting the decision by the
legacy cost model (cost computation would use scalar cost, VPlan would
have VPWidenCallRecipe).
To fix this, check if the call has been forced to scalar in
setVectorizedCallDecision.
Note that this requires moving setVectorizedCallDecision after
collectLoopUniforms (which sets ForcedScalars). collectLoopUniforms does
not depend on call decisions and can safely be moved.
Fixes https://github.com/llvm/llvm-project/issues/107051.
Analogous to 2c7786e94a1058bd4f96794a1d4f70dcb86e5cc5, cleanup a case
where the vectorizer is emitting a non-canonical identity value given
the available flags. We use largest/smallest value during ISEL, and VP
expansion, but not during vectorization.
Since the fmin/fmax/fminimum/fmaximum intrinsics don't require a start
value, this difference is only visible when masking of inactive lanes is
required.
Primary motivation of this change is simply to remove a difference
between version of code which reason about the identity value of a
reduction so I can kill all but one off.
In review, it was pointed out that this is actually a functional fix as well.
The old code used inf on a noinf reduction instruction - whose
result is poison! That wasn't the intent of the code.
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.
This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
collectInstsToScalarize may decide to scalarize a call. If so, we have
to update the widening decision for the call, otherwise the call won't
be scalarized as expected during VPlan construction.
This issue was uncovered by f82543d509.
This moves the logic to create simplified operands using SCEV to MUL
recipe creation. This is needed to match the behavior of the legacy's cost
model. TODOs are to extend to other opcodes and move to a transform.
Note that this also restricts the number of SCEV simplifications we
apply to more precisely match the cases handled by the legacy cost
model.
Fixes https://github.com/llvm/llvm-project/issues/107015.
Follow-up to 9ccf825, adjust computeCost to also pass IntrinsicInst to
TTI if available, as there are multiple places in TTI which use the
IntrinsicInst.
Fixes https://github.com/llvm/llvm-project/issues/107016.
The op of phi transform wants to prevent moving an operation across a
backedge, as this may lead to an infinite combine loop.
Currently, this is done using isPotentiallyReachable(). The problem with
that is that all blocks inside a loop are reachable from each other.
This means that the op of phi transform is effectively completely
disabled for code inside loops, even when it's not actually operating on
a loop phi (just a phi that happens to be in a loop).
Fix this by explicitly computing the backedges inside the function
instead. Do this via RPOT, which is a bit more efficient than using
FindFunctionBackedges() (which does it without any pre-computed
analyses).
For irreducible cycles, there may be multiple possible choices of
backedge, and this just picks one of them. This is still sufficient to
prevent combine loops.
This also removes the last use of LoopInfo in InstCombine -- I'll drop
the analysis in a followup.
Branches exiting the loop will remain regardless, so don't consider them
in collectValuesToIgnore.
This fixes another divergence between legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/106780.
This patch replaces all dominated uses of condition with true/false to
improve context-sensitive optimizations. It eliminates a bunch of
branches in llvm-opt-benchmark.
As a side effect, it may introduce new phi nodes in some corner cases.
See the following case:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br i1 %cmp, label %if.then, label %if.else
if.then:
br %bb2
if.else:
br %bb2
bb2:
%res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else]
ret i1 %res
}
```
It will be simplified into:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br i1 %cmp, label %if.then, label %if.else
if.then:
br %bb2
if.else:
br %bb2
bb2:
%res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else]
ret i1 %res
}
```
I am planning to fix this in late pipeline/CGP since this problem exists
before the patch.
A optimizable cast can also be removed by VPlan simplifications. Remove
the restriction from planContainsAdditionalSimplifications, as this
causes it to miss relevant simplifications, triggering false positives
for the cost decision verification.
Also adds debug output for printing additional cost-precomputations.
Fixes https://github.com/llvm/llvm-project/issues/106641.
This ensures we skip any instructions identified to be ignored by the
legacy cost model as well. Fixes a divergence between legacy and
VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/106417.
This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)
This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bbd0d34.
Fixes https://github.com/llvm/llvm-project/issues/106248.
Live-ins that are used as exit values don't need to be extracted, they
can be passed through directly. This fixes a crash when trying to
extract from a live-in.
Fixes https://github.com/llvm/llvm-project/issues/106257.
This patch is moving out stepvector intrinsic from the experimental
namespace.
This intrinsic exists in LLVM for several years now, and is widely used.
The legacy cost model in some parts checks if any of the operands are
constants via SCEV. Update VPlan construction to replace live-ins that
are constants via SCEV with such constants. This means VPlans (and
codegen) reflects what we computing the cost of and removes another case
where the legacy and VPlan cost model diverged.
Fixes https://github.com/llvm/llvm-project/issues/105722.