Update target-specific test to not force VF/UF, but instead use the
cost-model. There are similar tests arleady outside X86 and those force
VF & UF.
With this change, the target specific test checks the cost model.
Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required,
and should preserve the original spirit of the test.
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.
This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)
This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bbd0d34.
Fixes https://github.com/llvm/llvm-project/issues/106248.
The legacy cost model in some parts checks if any of the operands are
constants via SCEV. Update VPlan construction to replace live-ins that
are constants via SCEV with such constants. This means VPlans (and
codegen) reflects what we computing the cost of and removes another case
where the legacy and VPlan cost model diverged.
Fixes https://github.com/llvm/llvm-project/issues/105722.
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.
However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.
The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.
For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
Include chain of ops feeding inductions in cost precomputation for
inductions, not just the induction increment. In VPlan, those
instructions will be cleaned up, as both phi and increment are generated
by VPWidenIntOrFpInductionRecipe independently.
Fixes https://github.com/llvm/llvm-project/issues/101337.
After f0df4fbd0c7b, isPredicatedInst needs to handle SwitchInst as well.
Handle it the same as BranchInst.
This fixes a crash in the newly added test and improves the results for
one of the existing tests in predicate-switch.ll
Should fix https://lab.llvm.org/buildbot/#/builders/113/builds/2099.
Update createEdgeMask to created masks where the terminator in Src is a
switch. We need to handle 2 separate cases:
1. Dst is not the default desintation. Dst is reached if any of the
cases with destination == Dst are taken. Join the conditions for each
case where destination == Dst using a logical OR.
2. Dst is the default destination. Dst is reached if none of the cases
with destination != Dst are taken. Join the conditions for each case
where the destination is != Dst using a logical OR and negate it.
Edge masks are created for every destination of cases and/or
default when requesting a mask where the source is a switch.
Fixes https://github.com/llvm/llvm-project/issues/48188.
PR: https://github.com/llvm/llvm-project/pull/99808
For invariant stores to an address of a reduction, only the latest store
will be generated outside the loop. Consider earlier stores as dead.
This fixes a difference between the legacy and VPlan-based cost model.
Fixes https://github.com/llvm/llvm-project/issues/96294.
This change is part 2 x86 Loop Vectorization of :
https://github.com/llvm/llvm-project/pull/96222
It also has veclib call loop vectorization hence the test cases in
`llvm/test/Transforms/LoopVectorize/X86/veclib-calls.ll`
finally the last pr missed tests for
`llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll` and
`llvm/test/CodeGen/X86/vec-libcalls.ll` so added those aswell.
No evidence was found for arc and hyperbolic trig glibc vector math
functions
https://github.com/lattera/glibc/blob/master/sysdeps/x86/fpu/bits/math-vector.h
so no new `_ZGVbN2v_*` and `_ZGVdN4v_*` .
So no new tests in
`llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll`
Also no new svml and no new tests to:
`llvm/test/Transforms/LoopVectorize/X86/svml-calls.ll`
There was not enough evidence that there were svml arc and hyperbolic
trig vector implementations, Documentation was scarces so looked at test
cases in
[numpy](32bf2a9842/linux/avx512/svml_z0_acos_d_la.s (L8)).
Someone with more experience with svml should investigate.
## Note
amd libm doesn't have a vector hyperbolic sine api hence why youi might
notice there are no tests for `sinh`.
## History
This change is part of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294
This change adds loop vectorization for `acos`, `asin`, `atan`, `cosh`,
`sinh`, and `tanh`.
resolves#70079resolves#70080resolves#70081resolves#70083resolves#70084resolves#95966
Follow up to d216615518 to update dead interleave group pointer detection
to allow re-processing of operands of instructions determined to only feed
interleave groups.
This is needed because instructions feeding interleave group pointers
can become dead in any order, as per the newly added test case.
Process dead interleave pointer ops in reverse order. This also catches
cases where the same base pointer is used by multiple different
interleave groups.
This fixes another case where the legacy cost model inaccuarately
estimates cost, surfaced by b841e2eca3b5c8.
Resume and exit values for inductions are currently still created
outside of VPlan and independent of the induction recipes. Don't add
live-outs for now, as the additional unneeded users can pessimize other
anlysis.
Fixes https://github.com/llvm/llvm-project/issues/98660.
When collecting candidates to pre-compute cost for operands of exit
conditions, skip users outside the loop when checking if they are in
ExistInstrs. The users outside the loop should be ignored, as they won't
make a value live in the VPlan.
This fixes a failure when building for X86 with sanitizers on macOS
after b841e2eca3b5c
(https://green.lab.llvm.org/job/llvm.org/job/clang-stage2-cmake-RgSan/287/)
This patch introduces a new ResumePhi VPInstruction which creates a phi
in a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.
The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.
In follow-up changes, it will also be used to create phis for reduction
and induction resume values.
Depends on https://github.com/llvm/llvm-project/pull/92651
PR: https://github.com/llvm/llvm-project/pull/94760
Port collectEphemeralValues to VPlan as collectEphemeralRecipesForVPlan,
use it in willGenerateVectors. This fixes a regression caused by
29b8b72117 for loops where the only vector values are ephemeral.
Add tests with loops with ephemeral values that are widened.
After 29b8b72117, @ephemeral_load_and_compare_another_load_used_outside
is vectorized even though the only vector values that are generated are
ephemeral.
This patch moves the check if any vector instructions will be generated
from getInstructionCost to be based on VPlan. This simplifies
getInstructionCost, is more accurate as we check the final result and
also allows us to exit early once we visit a recipe that generates
vector instructions.
The helper can then be re-used by the VPlan-based cost model to match
the legacy selectVectorizationFactor behavior, this fixing a crash and
paving the way to recommit
https://github.com/llvm/llvm-project/pull/92555.
PR: https://github.com/llvm/llvm-project/pull/96622
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently
introduced VPIRBasicBlock.
Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.
After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.
This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.
Depends on https://github.com/llvm/llvm-project/pull/92525
PR: https://github.com/llvm/llvm-project/pull/92651
Previously we only handled the `L0 == R0` case if both `L1` and `R1`
where constant.
We can get more out of the analysis using general constant ranges
instead.
For example, `X u> Y` implies `X != 0`.
In general, any strict comparison on `X` implies that `X` is not equal
to the boundary value for the sign and constant ranges with/without
sign bits can be useful in deducing implications.
Closes#85557
This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep
i8, p, (mul x, C*4)`, so that the mul can combine both of the constant
multiplications, and we take a small step towards canonicalizing more
geps to i8.
It currently doesn't attempt to check for multiple uses on the mul, but
that should be possible if it sounds better. Let me know what you think
of the idea in general.
Use VPIRBasicBlock to wrap the middle block and implement patching up
branches in predecessors in VPIRBasicBlock::execute. The IR middle block
is only created after skeleton creation. Initially a regular
VPBasicBlock is created, which will later be replaced by a
VPIRBasicBlock once the middle IR basic block has been created.
Note that this slightly changes the order of instructions created in the
middle block; code generated by recipe execution in the middle block
will now be inserted before the terminator (and in between the compare
to used by the terminator). The original order will be restored in
https://github.com/llvm/llvm-project/pull/92651.
PR: https://github.com/llvm/llvm-project/pull/95816
For interleave groups, we only generate a pointer for the start of the
interleave group (the instruction at the insert position). The other
addresses for other members are alreayd considered free, but so are
their operands, if they are only used in address computations for
other interleave group members.
Now that FOR exit and resume value creation is explicitly modeled in
VPlan (05e1b5340b0caf1, 07b330132c0b) it doesn't depend on the first
order recurrence splice being preserved and it can now be marked as not
having side-effects. This allows removal of first-order-recurrence-splce
if the FOR is only used in the exit or as scalar ph resume value.