810 Commits

Author SHA1 Message Date
Florian Hahn
b021464d35
[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975)
Update VPlan to include the scalar loop header. This allows retiring
VPLiveOut, as the remaining live-outs can now be handled by adding
operands to the wrapped phis in the scalar loop header.

Note that the current version only includes the scalar loop header, no
other loop blocks and also does not wrap it in a region block.

PR: https://github.com/llvm/llvm-project/pull/109975
2024-10-31 21:36:44 +01:00
David Sherwood
7f498a865f
[CostModel][LoopVectorize] Move some loop vectoriser tests (#113702)
Many tests that were in test/Analysis/CostModel were actually
loop vectoriser tests. I've moved them as follows:

Analysis/CostModel/X86 -> Transforms/LoopVectorize/X86/CostModel
Analysis/CostModel/AArch64/arith-fp-frem.ll ->
  Transforms/LoopVectorize/AArch64/arith-fp-frem-costs.ll
2024-10-30 13:50:02 +00:00
Rohit Aggarwal
dfb60bb919
Adding more vector calls for -fveclib=AMDLIBM (#109662)
AMD has it's own implementation of vector calls.
New vector calls are introduced in the library for exp10, log10, sincos and finite asin/acos
Please refer [https://github.com/amd/aocl-libm-ose]

---------

Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
2024-10-29 10:09:55 +00:00
Florian Hahn
e724226da7
[VPlan] Return cost of 0 for VPWidenCastRecipe without underlying value.
In some cases, VPWidenCastRecipes are created but not considered in the
legacy cost model, including truncates/extends when evaluating a reduction
in a smaller type. Return 0 for such casts for now, to avoid divergences
between VPlan and legacy cost models.

Fixes https://github.com/llvm/llvm-project/issues/113526.
2024-10-25 21:25:44 +02:00
Florian Hahn
2dfb1c664c
[VPlan] Try to hoist Previous (and operands), if sinking fails for FORs. (#108945)
In some cases, Previous (and its operands) can be hoisted. This allows
supporting additional cases where sinking of all users of to FOR fails,
e.g. due having to sink recipes with side-effects.

This fixes a crash where we fail to create a scalar VPlan for a
first-order recurrence, but can create a vector VPlan, because the trunc
instruction of an IV which generates the previous value of the
recurrence has been optimized to a truncated induction recipe, thus
hoisting it to the beginning.

Fixes https://github.com/llvm/llvm-project/issues/106523.

PR: https://github.com/llvm/llvm-project/pull/108945
2024-10-23 13:12:03 -07:00
Florian Hahn
ddbb382a7c
[LV] Regenerate check-lines for some tests. 2024-10-23 04:34:13 +01:00
Florian Hahn
2a6b09e0d3
[LV] Use type from InsertPos for cost computation of interleave groups.
Previously the legacy cost model would pick the type for the cost
computation depending on the order of the members in the input IR.
This is incompatible with the VPlan-based cost model (independent of
original IR order) and also doesn't match code-gen, which uses the type
of the insert position.

Update the legacy cost model to use the type (and address space) from
the Group's insert position.

This brings the legacy cost model in line with the legacy cost model and
fixes a divergence between both models.

Note that the X86 cost model seems to assign different costs to groups
with i64 and double types. Added a TODO to check.

Fixes https://github.com/llvm/llvm-project/issues/112922.
2024-10-18 19:12:40 -07:00
Florian Hahn
b497010854
[VPlan] Use VPInstruction::Name when assigning names (NFCI).
This slightly improves the printing of VPInstructions. NFC except debug
output.
2024-10-18 05:52:35 +01:00
Yingwei Zheng
095d49da76
[InstCombine] Set samesign when converting signed predicates into unsigned (#112642)
Alive2: https://alive2.llvm.org/ce/z/6cqdt-
2024-10-17 20:43:48 +08:00
Florian Hahn
3860e29e0e
[VPlan] Mark VPVectorPointerRecipe as not having sideeffects.
VectorPointer doesn't read from memory or have any sideeffects. Mark it
accordingly.
2024-10-16 06:10:19 +01:00
Florian Hahn
bb937e276d
[LV] Compute value of escaped induction based on the computed end value. (#110576)
Update fixupIVUsers to compute the value for escaped inductions using
the already computed end value of the induction (EndValue), but
subtracting the step.

This results in slightly simpler codegen, as we avoid computing the full
transformed index at VectorTripCount - 1.

PR: https://github.com/llvm/llvm-project/pull/110576
2024-10-10 20:04:46 +01:00
Florian Hahn
01cbbc52dc
[VPlan] Request lane 0 for pointer arg in PtrAdd.
After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead
of requesting a single scalar, request lane 0, which correctly handles the
case when there is a scalar-per-lane.

Fixes https://github.com/llvm/llvm-project/issues/111606.
2024-10-09 13:18:54 +01:00
Florian Hahn
36fc291b6e
[VPlan] Implement VPBlendRecipe::computeCost.
Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also
used if only the first lane is used.

This also requires pre-computing costs for forced scalars and
instructions considered profitable to scalarize. For those, the cost
will be computed separately in the legacy cost model. This will also be
needed when implementing VPReplicateRecipe::computeCost.
2024-10-08 21:33:42 +01:00
Florian Hahn
3ec6f805c5
[VPlan] Don't created GEP x, 0 for interleave group pointers.
The GEP with offet 0 is redundant, remove it. This addresses a TODO
from 7f74651837b ((#106431).
2024-10-08 12:08:13 +01:00
Florian Hahn
7f74651837
[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431)
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.

In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
2024-10-06 22:53:13 +01:00
Florian Hahn
45b526afa2
[LV] Honor uniform-after-vectorization in setVectorizedCallDecision.
The legacy cost model always computes the cost for uniforms as cost of
VF = 1, but VPWidenCallRecipes would be created, as
setVectorizedCallDecisions would not consider uniform calls.

Fix setVectorizedCallDecision to set to Scalarize, if the call is
uniform-after-vectorization.

This fixes a bug in VPlan construction uncovered by the VPlan-based
cost model.

Fixes https://github.com/llvm/llvm-project/issues/111040.
2024-10-06 10:35:06 +01:00
Florian Hahn
68210c7c26
[VPlan] Only generate first lane for VPPredInstPHI if no others used.
IF only the first lane of the result is used, only generate the first
lane.

Fixes https://github.com/llvm/llvm-project/issues/111042.
2024-10-05 19:15:05 +01:00
Florian Hahn
9de327c94d
[LV] Generalize predication checks from 2c8836c899 for operands.
This fixes another case where the VPlan-based and legacy cost models
disagree. If any of the operands is predicated, it can't be trivially
hoisted and we should consider the cost for evaluating it each loop
iteration.

Fixes https://github.com/llvm/llvm-project/issues/108697.
2024-10-02 20:16:41 +01:00
Florian Hahn
6d6eea92e3
[LV] Use SCEV to simplify wide binop operand to constant.
The legacy cost model uses SCEV to determine if the second operand of a
binary op is a constant. Update the VPlan construction logic to mirror
the current legacy behavior, to fix a difference in the cost models.

Fixes https://github.com/llvm/llvm-project/issues/109528.
Fixes https://github.com/llvm/llvm-project/issues/110440.
2024-10-02 13:45:49 +01:00
Florian Hahn
2c8836c899
[LV] Don't consider predicated insts as invariant unconditionally in CM.
Predicated instructions cannot hoisted trivially, so don't treat them as
uniform value in the cost model.

This fixes a difference between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/110295.
2024-09-29 20:31:24 +01:00
Florian Hahn
68ed1728bf
[VPlan] Unify mayWriteToMemory and mayHaveSideEffects logic for VPInst.
Unify logic for mayWriteToMemory and mayHaveSideEffects for
VPInstruction, with the later relying on the former. Also extend to
handle binary operators.

Split off from https://github.com/llvm/llvm-project/pull/106441
2024-09-26 19:16:43 +01:00
Florian Hahn
53266f73f0
[VPlan] Run DCE after unrolling.
This cleans up a number of dead recipes after unrolling if only their
first or last parts are used. This simplifies a number of tests.

Fixes https://github.com/llvm/llvm-project/issues/109581.
2024-09-22 22:08:46 +01:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
Florian Hahn
4eb9838409
[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions.
Update isDefinedOutsideLoopRegions to check if a recipe is defined
outside any region. Split off already approved
https://github.com/llvm/llvm-project/pull/95842 now that this can be
tested separately after landing VPlan-based LICM
https://github.com/llvm/llvm-project/issues/107501
2024-09-20 15:34:00 +01:00
Florian Hahn
a861ed411a
[VPlan] Add initial loop-invariant code motion transform. (#107894)
Add initial transform to move out loop-invariant recipes.

This also helps to fix a divergence between legacy and VPlan-based cost
model due to legacy using ScalarEvolution::isLoopInvariant in some
cases.

Fixes https://github.com/llvm/llvm-project/issues/107501.

PR: https://github.com/llvm/llvm-project/pull/107894
2024-09-20 11:22:03 +01:00
David Sherwood
e762d4dac7
[LoopVectorize] Teach LoopVectorizationLegality about more early exits (#107004)
This patch is split off from PR #88385 and concerns only the code
related to the legality of vectorising early exit loops. It is the first
step in adding support for vectorisation of a simple class of loops that
typically involves searching for something, i.e.

  for (int i = 0; i < n; i++) {
    if (p[i] == val)
      return i;
  }
  return n;

or

  for (int i = 0; i < n; i++) {
    if (p1[i] != p2[i])
      return i;
  }
  return n;

In this initial commit LoopVectorizationLegality will only consider
early exit loops legal for vectorising if they follow these criteria:

1. There are no stores in the loop.
2. The loop must have only one early exit like those shown in the above
example. I have referred to such exits as speculative early exits, to
distinguish from existing support for early exits where the
exit-not-taken count is known exactly at compile time.
3. The early exit block dominates the latch block.
4. The latch block must have an exact exit count.
5. There are no loads after the early exit block.
6. The loop must not contain reductions or recurrences. I don't see
anything fundamental blocking vectorisation of such loops, but I just
haven't done the work to support them yet.
7. We must be able to prove at compile-time that loops will not contain
faulting loads.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/simple_early_exit.ll
2024-09-19 09:41:25 +01:00
Florian Hahn
c48a1ebec1
[LV] Remove force-vector-width/force-vector-interleave from X86 test.
Update target-specific test to not force VF/UF, but instead use the
cost-model. There are similar tests arleady outside X86 and those force
VF & UF.

With this change, the target specific test checks the cost model.
Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required,
and should preserve the original spirit of the test.
2024-09-17 08:59:24 +01:00
Philip Reames
2c7786e94a
Prefer use of 0.0 over -0.0 for fadd reductions w/nsz (in IR) (#106770)
This is a follow up to 924907bc6, and is mostly motivated by consistency
but does include one additional optimization. In general, we prefer 0.0
over -0.0 as the identity value for an fadd. We use that value in
several places, but don't in others. So, let's be consistent and use the
same identity (when nsz allows) everywhere.

This creates a bunch of test churn, but due to 924907bc6, most of that
churn doesn't actually indicate a change in codegen. The exception is
that this change enables the use of 0.0 for nsz, but *not* reasoc, fadd
reductions. Or said differently, it allows the neutral value of an
ordered fadd reduction to be 0.0.
2024-09-03 09:16:37 -07:00
Simon Pilgrim
6c8746b6e3
[Analysis] getIntrinsicForCallSite - add vectorization support for acos/asin/atan and cosh/sinh/tanh libcalls (#106844)
Followup to #106584 - ensure acos/asin/atan and cosh/sinh/tanh libcalls correctly map to the llvm intrinsic equivalents
2024-09-03 10:05:56 +01:00
Simon Pilgrim
4d412bedcc [LoopVectorize][X86] amdlibm-calls.ll - add missing sinh and f64 test coverage to all functions
Shows failure to vectorise acos/asin/atan and cosh/sinh/tanh libcalls if they don't have a corresponding veclib mapping
2024-08-31 11:48:22 +01:00
Philip Reames
4b553f4916 Regen a bunch of vectorizer tests to avoid naming churn in upcoming review 2024-08-30 10:13:02 -07:00
Simon Pilgrim
d58d105cda
[Analysis] isTriviallyVectorizable - add vectorization support for acos/asin/atan and cosh/sinh/tanh intrinsics (#106584)
Show fallback cases in amdlibm tests where it doesn't have that specific op
2024-08-30 16:49:23 +01:00
Simon Pilgrim
81acc84997 [LoopVectorize][X86] amdlibm-calls.ll - add 2/4/8/16 vector widths test checks for fallback to llvm intrinsics
Check for cases where there isn't a amdlib call but it still vectorises the math call
2024-08-29 17:31:55 +01:00
Simon Pilgrim
2f95298727 [LoopVectorize][X86] amdlibm-calls.ll - add additional 2/4/8/16 vector widths test checks
This should cover most amdlibm functions, but still not added every VF combo (e.g. 2f32/16f64 often vectorises to the llvm intrinsic for that vector type)
2024-08-29 14:27:31 +01:00
Simon Pilgrim
c57abc66e2 [LoopVectorize][X86] amdlibm-calls.ll - cleanup test checks for 2/4/8/16 vector widths
This cleans up the existing tests and shows the gaps in the test checks (for instance we're often testing VF4 + VF16 but not VF8 even though amdlibm supports it).
2024-08-29 14:27:31 +01:00
Florian Hahn
0a272d3a17
[LV] Use SCEV to analyze second operand for cost query.
Improve operand analysis using SCEV for cost purposes. This fixes a
divergence between legacy and VPlan-based cost-modeling after
533e6bbd0d34.

Fixes https://github.com/llvm/llvm-project/issues/106248.
2024-08-29 12:08:27 +01:00
Florian Hahn
533e6bbd0d
[VPlan] Simplify live-ins if they are SCEVConstant.
The legacy cost model in some parts checks if any of the operands are
constants via SCEV. Update VPlan construction to replace live-ins that
are constants via SCEV with such constants. This means VPlans (and
codegen) reflects what we computing the cost of and removes another case
where the legacy and VPlan cost model diverged.

Fixes https://github.com/llvm/llvm-project/issues/105722.
2024-08-26 09:15:58 +01:00
Nikita Popov
a105877646
[InstCombine] Remove some of the complexity-based canonicalization (#91185)
The idea behind this canonicalization is that it allows us to handle less
patterns, because we know that some will be canonicalized away. This is
indeed very useful to e.g. know that constants are always on the right.

However, this is only useful if the canonicalization is actually
reliable. This is the case for constants, but not for arguments: Moving
these to the right makes it look like the "more complex" expression is
guaranteed to be on the left, but this is not actually the case in
practice. It fails as soon as you replace the argument with another
instruction.

The end result is that it looks like things correctly work in tests,
while they actually don't. We use the "thwart complexity-based
canonicalization" trick to handle this in tests, but it's often a
challenge for new contributors to get this right, and based on the
regressions this PR originally exposed, we clearly don't get this right
in many cases.

For this reason, I think that it's better to remove this complexity
canonicalization. It will make it much easier to write tests for
commuted cases and make sure that they are handled.
2024-08-21 12:02:54 +02:00
Florian Hahn
2ab910c08c
[LV] Check pointer user are in loop when checking for uniform pointers.
Widening decisions are not set for users outside the loop. Avoid
crashing by only calling isVectorizedMemAccessUse for users in the loop.

Fixes https://github.com/llvm/llvm-project/issues/102934.
2024-08-13 09:23:44 +01:00
Florian Hahn
cd08fadd03
[LV] Include chains feeding inductions in cost precomputation.
Include chain of ops feeding inductions in cost precomputation for
inductions, not just the induction increment. In VPlan, those
instructions will be cleaned up, as both phi and increment are generated
by VPWidenIntOrFpInductionRecipe independently.

Fixes https://github.com/llvm/llvm-project/issues/101337.
2024-08-12 14:45:43 +01:00
Florian Hahn
db0603cb7b
[LV] Only OR unique edges when creating block-in masks.
This removes redundant ORs of matching masks.

Follow-up to f0df4fbd0c7b to reduce the number of redundant ORs for
masks.
2024-08-12 10:17:40 +01:00
Florian Hahn
5a42a677aa
[VPlan] Mark VPVectorPointer as only using the first part of the ptr.
VPVectorPointerRecipe only uses the first part of the pointer operand,
so mark it accordingly.

Follow-up suggested as part of
https://github.com/llvm/llvm-project/pull/99808.
2024-08-12 08:46:55 +01:00
Farzon Lotfi
efc6b50d2d
[LoopVectorize][X86][AMDLibm] Add Missing AMD LibM trig vector intrinsics (#101125)
Adding the following linked to their docs:
-
[amd_vrs16_acosf](9c0b67293b/scripts/libalm.def (L221))
-
[amd_vrd2_cosh](9c0b67293b/scripts/libalm.def (L124))
-
[amd_vrs16_tanhf](9c0b67293b/scripts/libalm.def (L224))
2024-08-11 22:11:09 -04:00
Florian Hahn
60680f7181
[LV] Handle SwitchInst in ::isPredicatedInst.
After f0df4fbd0c7b, isPredicatedInst needs to handle SwitchInst as well.
Handle it the same as BranchInst.

This fixes a crash in the newly added test and improves the results for
one of the existing tests in predicate-switch.ll

Should fix https://lab.llvm.org/buildbot/#/builders/113/builds/2099.
2024-08-11 20:56:58 +01:00
Florian Hahn
f0df4fbd0c
[LV] Support generating masks for switch terminators. (#99808)
Update createEdgeMask to created masks where the terminator in Src is a
switch. We need to handle 2 separate cases:

1. Dst is not the default desintation. Dst is reached if any of the
cases with destination == Dst are taken. Join the conditions for each
case where destination == Dst using a logical OR.
2. Dst is the default destination. Dst is reached if none of the cases
with destination != Dst are taken. Join the conditions for each case
where the destination is != Dst using a logical OR and negate it.

Edge masks are created for every destination of cases and/or 
default when requesting a mask where the source is a switch.

Fixes https://github.com/llvm/llvm-project/issues/48188.

PR: https://github.com/llvm/llvm-project/pull/99808
2024-08-11 20:38:36 +02:00
Florian Hahn
fdb9f96fa2
[LV] Consider earlier stores to invariant reduction address as dead.
For invariant stores to an address of a reduction, only the latest store
will be generated outside the loop. Consider earlier stores as dead.

This fixes a difference between the legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/96294.
2024-08-04 20:54:26 +01:00
Florian Hahn
855703537e
[LV] Add more tests with switches.
Extra tests for
https://github.com/llvm/llvm-project/pull/99808, including cost model
tests.
2024-08-01 19:30:48 +01:00
Farzon Lotfi
378fe2fc23
[X86][LoopVectorize] Add support for arc and hyperbolic trig functions (#99383)
This change is part 2 x86 Loop Vectorization of :
https://github.com/llvm/llvm-project/pull/96222

It also has veclib call loop vectorization hence the test cases in
`llvm/test/Transforms/LoopVectorize/X86/veclib-calls.ll`

finally the last pr missed tests for
`llvm/test/CodeGen/X86/fp-strict-libcalls-msvc32.ll` and
`llvm/test/CodeGen/X86/vec-libcalls.ll` so added those aswell.

No evidence was found for arc and hyperbolic trig glibc vector math
functions

https://github.com/lattera/glibc/blob/master/sysdeps/x86/fpu/bits/math-vector.h
so no  new `_ZGVbN2v_*` and  `_ZGVdN4v_*` .
So no new tests in
`llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll`

Also no new svml and no new tests to:
`llvm/test/Transforms/LoopVectorize/X86/svml-calls.ll`
There was not enough evidence that there were svml arc and hyperbolic
trig vector implementations, Documentation was scarces so looked at test
cases in
[numpy](32bf2a9842/linux/avx512/svml_z0_acos_d_la.s (L8)).
Someone with more experience with svml should investigate.

## Note 
amd libm doesn't have a vector hyperbolic sine api hence why youi might
notice there are no tests for `sinh`.

## History
This change is part of
https://github.com/llvm/llvm-project/issues/87367's investigation on
supporting IEEE math operations as intrinsics.
Which was discussed in this RFC:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This change adds loop vectorization for `acos`, `asin`, `atan`, `cosh`,
`sinh`, and `tanh`.
resolves #70079
resolves #70080
resolves #70081
resolves #70083
resolves #70084
resolves #95966
2024-07-28 20:57:43 -04:00
Florian Hahn
a3092152ac
[VPlan] Don't create live-outs for induction increments.
Follow up to fc9cd3272b5 to also skip creating live-outs for IV
increments, as those are also generated independent of VPlan for now.
2024-07-25 21:34:55 +01:00
Simon Pilgrim
010dcfd85f [CostModel][X86] Improve add/sub/mul overflow intrinsic costs
Noticed due to x86 changes in #97463
2024-07-25 16:01:05 +01:00