Update fixupIVUsers to compute the value for escaped inductions using
the already computed end value of the induction (EndValue), but
subtracting the step.
This results in slightly simpler codegen, as we avoid computing the full
transformed index at VectorTripCount - 1.
PR: https://github.com/llvm/llvm-project/pull/110576
After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead
of requesting a single scalar, request lane 0, which correctly handles the
case when there is a scalar-per-lane.
Fixes https://github.com/llvm/llvm-project/issues/111606.
Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also
used if only the first lane is used.
This also requires pre-computing costs for forced scalars and
instructions considered profitable to scalarize. For those, the cost
will be computed separately in the legacy cost model. This will also be
needed when implementing VPReplicateRecipe::computeCost.
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.
In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
The legacy cost model always computes the cost for uniforms as cost of
VF = 1, but VPWidenCallRecipes would be created, as
setVectorizedCallDecisions would not consider uniform calls.
Fix setVectorizedCallDecision to set to Scalarize, if the call is
uniform-after-vectorization.
This fixes a bug in VPlan construction uncovered by the VPlan-based
cost model.
Fixes https://github.com/llvm/llvm-project/issues/111040.
This fixes another case where the VPlan-based and legacy cost models
disagree. If any of the operands is predicated, it can't be trivially
hoisted and we should consider the cost for evaluating it each loop
iteration.
Fixes https://github.com/llvm/llvm-project/issues/108697.
Predicated instructions cannot hoisted trivially, so don't treat them as
uniform value in the cost model.
This fixes a difference between legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/110295.
Unify logic for mayWriteToMemory and mayHaveSideEffects for
VPInstruction, with the later relying on the former. Also extend to
handle binary operators.
Split off from https://github.com/llvm/llvm-project/pull/106441
This patch implements explicit unrolling by UF as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).
It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)
In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.
The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.
This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.
The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.
A follow-up patch will clean up all references to VPTransformState::UF.
Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.
PR: https://github.com/llvm/llvm-project/pull/95842
This patch is split off from PR #88385 and concerns only the code
related to the legality of vectorising early exit loops. It is the first
step in adding support for vectorisation of a simple class of loops that
typically involves searching for something, i.e.
for (int i = 0; i < n; i++) {
if (p[i] == val)
return i;
}
return n;
or
for (int i = 0; i < n; i++) {
if (p1[i] != p2[i])
return i;
}
return n;
In this initial commit LoopVectorizationLegality will only consider
early exit loops legal for vectorising if they follow these criteria:
1. There are no stores in the loop.
2. The loop must have only one early exit like those shown in the above
example. I have referred to such exits as speculative early exits, to
distinguish from existing support for early exits where the
exit-not-taken count is known exactly at compile time.
3. The early exit block dominates the latch block.
4. The latch block must have an exact exit count.
5. There are no loads after the early exit block.
6. The loop must not contain reductions or recurrences. I don't see
anything fundamental blocking vectorisation of such loops, but I just
haven't done the work to support them yet.
7. We must be able to prove at compile-time that loops will not contain
faulting loads.
Tests have been added here:
Transforms/LoopVectorize/AArch64/simple_early_exit.ll
Update target-specific test to not force VF/UF, but instead use the
cost-model. There are similar tests arleady outside X86 and those force
VF & UF.
With this change, the target specific test checks the cost model.
Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required,
and should preserve the original spirit of the test.
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.
This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)
This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bbd0d34.
Fixes https://github.com/llvm/llvm-project/issues/106248.
The legacy cost model in some parts checks if any of the operands are
constants via SCEV. Update VPlan construction to replace live-ins that
are constants via SCEV with such constants. This means VPlans (and
codegen) reflects what we computing the cost of and removes another case
where the legacy and VPlan cost model diverged.
Fixes https://github.com/llvm/llvm-project/issues/105722.
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
Include chain of ops feeding inductions in cost precomputation for
inductions, not just the induction increment. In VPlan, those
instructions will be cleaned up, as both phi and increment are generated
by VPWidenIntOrFpInductionRecipe independently.
Fixes https://github.com/llvm/llvm-project/issues/101337.
After f0df4fbd0c7b, isPredicatedInst needs to handle SwitchInst as well.
Handle it the same as BranchInst.
This fixes a crash in the newly added test and improves the results for
one of the existing tests in predicate-switch.ll
Should fix https://lab.llvm.org/buildbot/#/builders/113/builds/2099.
Update createEdgeMask to created masks where the terminator in Src is a
switch. We need to handle 2 separate cases:
1. Dst is not the default desintation. Dst is reached if any of the
cases with destination == Dst are taken. Join the conditions for each
case where destination == Dst using a logical OR.
2. Dst is the default destination. Dst is reached if none of the cases
with destination != Dst are taken. Join the conditions for each case
where the destination is != Dst using a logical OR and negate it.
Edge masks are created for every destination of cases and/or
default when requesting a mask where the source is a switch.
Fixes https://github.com/llvm/llvm-project/issues/48188.
PR: https://github.com/llvm/llvm-project/pull/99808
For invariant stores to an address of a reduction, only the latest store
will be generated outside the loop. Consider earlier stores as dead.
This fixes a difference between the legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/96294.
This change is part 2 x86 Loop Vectorization of :
https://github.com/llvm/llvm-project/pull/96222
It also has veclib call loop vectorization hence the test cases in
`llvm/test/Transforms/LoopVectorize/X86/veclib-calls.ll`
finally the last pr missed tests for
`llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll` and
`llvm/test/CodeGen/X86/vec-libcalls.ll` so added those aswell.
No evidence was found for arc and hyperbolic trig glibc vector math
functions
https://github.com/lattera/glibc/blob/master/sysdeps/x86/fpu/bits/math-vector.h
so no new `_ZGVbN2v_*` and `_ZGVdN4v_*` .
So no new tests in
`llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll`
Also no new svml and no new tests to:
`llvm/test/Transforms/LoopVectorize/X86/svml-calls.ll`
There was not enough evidence that there were svml arc and hyperbolic
trig vector implementations, Documentation was scarces so looked at test
cases in
[numpy](32bf2a9842/linux/avx512/svml_z0_acos_d_la.s (L8)).
Someone with more experience with svml should investigate.
## Note
amd libm doesn't have a vector hyperbolic sine api hence why youi might
notice there are no tests for `sinh`.
## History
This change is part of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
This change adds loop vectorization for `acos`, `asin`, `atan`, `cosh`,
`sinh`, and `tanh`.
resolves#70079resolves#70080resolves#70081resolves#70083resolves#70084resolves#95966
Follow up to d216615518 to update dead interleave group pointer detection
to allow re-processing of operands of instructions determined to only feed
interleave groups.
This is needed because instructions feeding interleave group pointers
can become dead in any order, as per the newly added test case.
Process dead interleave pointer ops in reverse order. This also catches
cases where the same base pointer is used by multiple different
interleave groups.
This fixes another case where the legacy cost model inaccuarately
estimates cost, surfaced by b841e2eca3b5c8.
Resume and exit values for inductions are currently still created
outside of VPlan and independent of the induction recipes. Don't add
live-outs for now, as the additional unneeded users can pessimize other
anlysis.
Fixes https://github.com/llvm/llvm-project/issues/98660.
When collecting candidates to pre-compute cost for operands of exit
conditions, skip users outside the loop when checking if they are in
ExistInstrs. The users outside the loop should be ignored, as they won't
make a value live in the VPlan.
This fixes a failure when building for X86 with sanitizers on macOS
after b841e2eca3b5c
(https://green.lab.llvm.org/job/llvm.org/job/clang-stage2-cmake-RgSan/287/)