2685 Commits

Author SHA1 Message Date
Florian Hahn
7f06d8afb0
[SCEV] Retain SCEVSequentialMinMaxExpr if an operand may trigger UB. (#110824)
Retain SCEVSequentialMinMaxExpr if an operand may trigger UB, e.g. if
there is an UDiv operand that may divide by 0 or poison

PR: https://github.com/llvm/llvm-project/pull/110824
2024-10-14 13:08:49 +01:00
Florian Hahn
65da32c634
[LV] Account for any-of reduction when computing costs of blend phis.
Any-of reductions are narrowed to i1. Update the legacy cost model to
use the correct type when computing the cost of a phi that gets lowered
to selects (BLEND).

This fixes a divergence between legacy and VPlan-based cost models after
36fc291b6ec6d.

Fixes https://github.com/llvm/llvm-project/issues/111874.
2024-10-11 11:27:22 +01:00
David Sherwood
72f339de45
[LoopVectorize] Use predicated version of getSmallConstantMaxTripCount (#109928)
There are a number of places where we call getSmallConstantMaxTripCount
without passing a vector of predicates:

getSmallBestKnownTC
isIndvarOverflowCheckKnownFalse
computeMaxVF
isMoreProfitable

I've changed all of these to now pass in a predicate vector so that
we get the benefit of making better vectorisation choices when we
know the max trip count for loops that require SCEV predicate checks.

I've tried to add tests that cover all the cases affected by these
changes.
2024-10-11 10:10:15 +01:00
Florian Hahn
bb937e276d
[LV] Compute value of escaped induction based on the computed end value. (#110576)
Update fixupIVUsers to compute the value for escaped inductions using
the already computed end value of the induction (EndValue), but
subtracting the step.

This results in slightly simpler codegen, as we avoid computing the full
transformed index at VectorTripCount - 1.

PR: https://github.com/llvm/llvm-project/pull/110576
2024-10-10 20:04:46 +01:00
Florian Hahn
01cbbc52dc
[VPlan] Request lane 0 for pointer arg in PtrAdd.
After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead
of requesting a single scalar, request lane 0, which correctly handles the
case when there is a scalar-per-lane.

Fixes https://github.com/llvm/llvm-project/issues/111606.
2024-10-09 13:18:54 +01:00
Florian Hahn
6fbbe152fa
[VPlan] Introduce VPWidenIntrinsicRecipe to separate from libcall. (#110486)
This patch splits off intrinsic hanlding to a new
VPWidenIntrinsicRecipe. VPWidenIntrinsicRecipes only need access to the
intrinsic ID to widen and the scalar result type (in case the intrinsic
is overloaded on the result type). It does not need access to an
underlying IR call instruction or function.

This means VPWidenIntrinsicRecipe can be created easily without access
to underlying IR.
2024-10-08 22:37:20 +01:00
Florian Hahn
36fc291b6e
[VPlan] Implement VPBlendRecipe::computeCost.
Implement VPBlendRecipe::computeCost. VPBlendRecipe is currently is also
used if only the first lane is used.

This also requires pre-computing costs for forced scalars and
instructions considered profitable to scalarize. For those, the cost
will be computed separately in the legacy cost model. This will also be
needed when implementing VPReplicateRecipe::computeCost.
2024-10-08 21:33:42 +01:00
Florian Hahn
3ec6f805c5
[VPlan] Don't created GEP x, 0 for interleave group pointers.
The GEP with offet 0 is redundant, remove it. This addresses a TODO
from 7f74651837b ((#106431).
2024-10-08 12:08:13 +01:00
Luke Lau
366e469db9 [RISCV] Add cost tests for more interleave factors. NFC
This shows how we're not properly scaling the cost with the number of
factors, i.e. a factor 8 interleave costs the same as a factor 2
interleave at VF=2.
2024-10-08 17:16:17 +08:00
Luke Lau
e98875af4c [RISCV] Add scalable interleave cost tests. NFC
This gets the cost from the recipe output rather than the individual
instruction cost.

The factor 3 test was left alone since we don't support anything else
other than factor 2 for scalable vectors currently.
2024-10-08 13:39:13 +08:00
Florian Hahn
7f74651837
[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg. (#106431)
Update VPInterleaveRecipe to always use the pointer to member 0 as
pointer argument. This in many cases helps to remove unneeded index
adjustments and simplifies VPInterleaveRecipe::execute.

In some rare cases, the address of member 0 does not dominate the insert
position of the interleave group. In those cases a PtrAdd VPInstruction
is emitted to compute the address of member 0 based on the address of
the insert position. Alternatively we could hoist the recipe computing
the address of member 0.
2024-10-06 22:53:13 +01:00
Florian Hahn
89d2a9de05
[VPlan] Add additional FOR hoisting test.
Additional tests for https://github.com/llvm/llvm-project/pull/108945.
2024-10-06 14:36:33 +01:00
Florian Hahn
45b526afa2
[LV] Honor uniform-after-vectorization in setVectorizedCallDecision.
The legacy cost model always computes the cost for uniforms as cost of
VF = 1, but VPWidenCallRecipes would be created, as
setVectorizedCallDecisions would not consider uniform calls.

Fix setVectorizedCallDecision to set to Scalarize, if the call is
uniform-after-vectorization.

This fixes a bug in VPlan construction uncovered by the VPlan-based
cost model.

Fixes https://github.com/llvm/llvm-project/issues/111040.
2024-10-06 10:35:06 +01:00
Florian Hahn
68210c7c26
[VPlan] Only generate first lane for VPPredInstPHI if no others used.
IF only the first lane of the result is used, only generate the first
lane.

Fixes https://github.com/llvm/llvm-project/issues/111042.
2024-10-05 19:15:05 +01:00
Shih-Po Hung
26fca7256e
[VPlan][NFC] Use patterns in test check (#111086) 2024-10-04 17:19:07 +08:00
Benjamin Maxwell
01a1398971
[AArch64][Test] Update test variable names (NFC) (#110667)
Simply by running update_test_checks.py with no changes. This is to make
updating these tests for later changes easier.
2024-10-03 16:14:21 +01:00
Florian Hahn
9de327c94d
[LV] Generalize predication checks from 2c8836c899 for operands.
This fixes another case where the VPlan-based and legacy cost models
disagree. If any of the operands is predicated, it can't be trivially
hoisted and we should consider the cost for evaluating it each loop
iteration.

Fixes https://github.com/llvm/llvm-project/issues/108697.
2024-10-02 20:16:41 +01:00
Florian Hahn
6d6eea92e3
[LV] Use SCEV to simplify wide binop operand to constant.
The legacy cost model uses SCEV to determine if the second operand of a
binary op is a constant. Update the VPlan construction logic to mirror
the current legacy behavior, to fix a difference in the cost models.

Fixes https://github.com/llvm/llvm-project/issues/109528.
Fixes https://github.com/llvm/llvm-project/issues/110440.
2024-10-02 13:45:49 +01:00
Nikita Popov
9f3d1695eb
[SCEVExpander] Preserve gep nuw during expansion (#102133)
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
2024-10-02 11:45:00 +02:00
David Sherwood
0b2403197f
[LoopVectorize] In LoopVectorize.cpp start using getSymbolicMaxBackedgeTakenCount (#108833)
LoopVectorizationLegality currently only treats a loop as legal to vectorise
if PredicatedScalarEvolution::getBackedgeTakenCount returns a valid
SCEV, or more precisely that the loop must have an exact backedge taken
count. Therefore, in LoopVectorize.cpp we can safely replace all calls to
getBackedgeTakenCount with calls to getSymbolicMaxBackedgeTakenCount,
since the result is the same.

This also helps prepare the loop vectoriser for PR #88385.
2024-10-02 10:28:54 +01:00
Florian Hahn
0344123ffb
[VPlan] Manage FMFs for VPWidenCall via VPRecipeWithIRFlags. (NFC)
Update VPWidenCallRecipe to manage fast-math flags directly via
VPRecipeWithIRFlags. This addresses a TODO and allows adjusting the FMFs
directly on the recipe. Also fixes printing for flags for
VPWidenCallRecipe.
2024-10-01 13:20:34 +01:00
Ramkumar Ramachandra
f2ad39b77b
LV/test: improve a couple of tests, regen with UTC (#107225)
Add noalias, where applicable, to eliminate unnecessary memory check,
and regen with UTC.
2024-09-30 15:47:17 +01:00
Mel Chen
f8373cb0f9
[LV] Reuse VPReplicateRecipe to handle scalar stores in exit block. (#106342)
This patch separates the computation of the final reduction result and
the intermediate stores of reduction.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2024-09-30 15:35:09 +08:00
Florian Hahn
2c8836c899
[LV] Don't consider predicated insts as invariant unconditionally in CM.
Predicated instructions cannot hoisted trivially, so don't treat them as
uniform value in the cost model.

This fixes a difference between legacy and VPlan-based cost model.

Fixes https://github.com/llvm/llvm-project/issues/110295.
2024-09-29 20:31:24 +01:00
Florian Hahn
2f7ccaf4a8
[SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. (#108777)
This can help in cases where pointer alignment info is missing, e.g.
https://github.com/llvm/llvm-project/pull/108210

The predicate is formed for the complex expression that's passed to
SolveLinEquationWithOverflow and the checks could probably be pushed
closer to the root nodes, which in some cases may be cheaper to check.


PR: https://github.com/llvm/llvm-project/pull/108777
2024-09-28 14:19:57 +01:00
Graham Hunter
6f1a8c2da2
[LV] Vectorize histogram operations (#99851)
This patch implements autovectorization support for the 'all-in-one'
histogram intrinsic, which seems to have more support than the
'standalone' intrinsic. See
https://discourse.llvm.org/t/rfc-vectorization-support-for-histogram-count-operations/74788/
for an overview of the work and my notes on the tradeoffs between the
two approaches.
2024-09-27 13:08:55 +01:00
Florian Hahn
68ed1728bf
[VPlan] Unify mayWriteToMemory and mayHaveSideEffects logic for VPInst.
Unify logic for mayWriteToMemory and mayHaveSideEffects for
VPInstruction, with the later relying on the former. Also extend to
handle binary operators.

Split off from https://github.com/llvm/llvm-project/pull/106441
2024-09-26 19:16:43 +01:00
Elvis Wang
a068b974b1
[VPlan] Implement VPWidenLoad/StoreEVLRecipe::computeCost(). (#109644)
Currently the EVL recipes transfer the tail masking to the EVL.
But in the legacy cost model, the mask exist and will calculate the
instruction cost of the mask.
To fix the difference between the VPlan-based cost model and the legacy
cost model, we always calculate the instruction cost for the mask in the
EVL recipes.

Note that we should remove the mask cost in the EVL recipes when we
don't need to compare to the legacy cost model.

This patch also fixes #109468.
2024-09-26 07:10:25 +08:00
Alexey Bataev
60ed2361c0
[LV][EVL]Explicitly model AVL as sub, original TC, EVL_PHI.
Patch explicitly models AVL as sub original TC, EVL_PHI instead of
having it in EXPLICIT-VECTOR-LENGTH VPInstruction. Required for correct
safe dependence distance suport.

Reviewers: fhahn, ayalz

Reviewed By: ayalz

Pull Request: https://github.com/llvm/llvm-project/pull/108869
2024-09-25 08:58:29 -04:00
Florian Hahn
040bb37195
[VPlan] Fix incorrect argument for CreateBinOp after 06c3a7d2d764.
06c3a7d2d764 incorrectly updated CreateBinOp to pass the debug location,
which gets interpreted as FPMath node. Remove the argument.
2024-09-24 11:18:50 +01:00
Benjamin Maxwell
50a1ab12ab
[LAA] Don't assume libcalls with output/input pointers can be vectorized (#108980)
LoopAccessAnalysis currently does not check/track aliasing from the
output pointers, but assumes vectorizing library calls with a mapping is
safe.

This can result in incorrect codegen if something like the following is
vectorized:

```
for(int i=0; i<N; i++) {
  // No aliasing between input and output pointers detected.
  sincos(cos_out[0], sin_out+i, cos_out+i);
}
```

Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the
original value of `cos_out[0]` not the updated value.
2024-09-23 16:05:55 +01:00
David Sherwood
f4eeae1244
[LoopVectorize] Address comments on PR #107004 left post-commit (#109300)
* Rename Speculative -> Uncountable and update tests.
* Add comments explaining why it's safe to ignore the predicates when
building up a list of exiting blocks.
* Reshuffle some code to do (hopefully) cheaper checks first.
2024-09-23 13:36:25 +01:00
Graham Hunter
785337e2d9
[LV][AArch64] Don't query registers for illegal scalable vector elts (#109411)
When trying to maximize vector bandwidth we ask TTI for the number of
registers required for a given operation. If the type of that operation
happens to be something illegal for scalable vectors (e.g.
<vscale x 4 x fp128>) then we would see a crash.

Instead, just return a default value and let the cost model reject the
invalid operation later.
2024-09-23 13:35:23 +01:00
David Sherwood
02ee96eca9
[Analysis] Teach isDereferenceableAndAlignedInLoop about SCEV predicates (#106562)
Currently if a loop contains loads that we can prove at compile time
are dereferenceable when certain conditions are satisfied the function
isDereferenceableAndAlignedInLoop will still return false because
getSmallConstantMaxTripCount will return 0 when SCEV predicates
are required. This patch changes getSmallConstantMaxTripCount to take
an optional Predicates pointer argument so that we can permit
functions such as isDereferenceableAndAlignedInLoop to consider more
cases.
2024-09-23 09:56:37 +01:00
Florian Hahn
53266f73f0
[VPlan] Run DCE after unrolling.
This cleans up a number of dead recipes after unrolling if only their
first or last parts are used. This simplifies a number of tests.

Fixes https://github.com/llvm/llvm-project/issues/109581.
2024-09-22 22:08:46 +01:00
Florian Hahn
0e21c8e598
[LV] Auto-generate check lines for test. 2024-09-22 22:08:46 +01:00
Florian Hahn
8ec406757c
[VPlan] Implement unrolling as VPlan-to-VPlan transform. (#95842)
This patch implements explicit unrolling by UF  as VPlan transform. In
follow up patches this will allow simplifying VPTransform state (no need
to store unrolled parts) as well as recipe execution (no need to
generate code for multiple parts in an each recipe). It also allows for
more general optimziations (e.g. avoid generating code for recipes that
are uniform-across parts).

It also unifies the logic dealing with unrolled parts in a single place,
rather than spreading it out across multiple places (e.g. VPlan post
processing for header-phi recipes previously.)

In the initial implementation, a number of recipes still take the
unrolled part as additional, optional argument, if their execution
depends on the unrolled part.

The computation for start/step values for scalable inductions changed
slightly. Previously the step would be computed as scalar and then
splatted, now vscale gets splatted and multiplied by the step in a
vector mul.

This has been split off https://github.com/llvm/llvm-project/pull/94339
which also includes changes to simplify VPTransfomState and recipes'
::execute.

The current version mostly leaves existing ::execute untouched and
instead sets VPTransfomState::UF to 1.

A follow-up patch will clean up all references to VPTransformState::UF.

Another follow-up patch will simplify VPTransformState to only store a
single vector value per VPValue.

PR: https://github.com/llvm/llvm-project/pull/95842
2024-09-21 19:47:37 +01:00
Florian Hahn
58e05779b4
[LV] Move test requiring AArch64 to target subdir.
The test added in bd8fe9972e3f depends on the AArch64. Move it.
2024-09-21 12:54:59 +01:00
Florian Hahn
bd8fe9972e
[VPlan] Mov licm to end of VPlan optimizations.
This moves licm after expanding replicate regions. This fixes a crash
when trying to hoist a predicated VPReplicateRecipes which later get
expanded to replicate regions.

Hoisting replicate regions out was not intended (see the discussion and
at the review and comment on shallow traversal in licm()).

Fixes https://github.com/llvm/llvm-project/issues/109510.
2024-09-21 12:45:45 +01:00
Florian Hahn
4eb9838409
[VPlan] Generalize VPValue::isDefinedOutsideLoopRegions.
Update isDefinedOutsideLoopRegions to check if a recipe is defined
outside any region. Split off already approved
https://github.com/llvm/llvm-project/pull/95842 now that this can be
tested separately after landing VPlan-based LICM
https://github.com/llvm/llvm-project/issues/107501
2024-09-20 15:34:00 +01:00
Florian Hahn
a861ed411a
[VPlan] Add initial loop-invariant code motion transform. (#107894)
Add initial transform to move out loop-invariant recipes.

This also helps to fix a divergence between legacy and VPlan-based cost
model due to legacy using ScalarEvolution::isLoopInvariant in some
cases.

Fixes https://github.com/llvm/llvm-project/issues/107501.

PR: https://github.com/llvm/llvm-project/pull/107894
2024-09-20 11:22:03 +01:00
Florian Hahn
413b12a5ab
[LV] Add additional tests for UDiv expansion.
Additional tests for https://github.com/llvm/llvm-project/issues/89958,
following discussion at https://github.com/llvm/llvm-project/pull/92177.
2024-09-20 10:51:56 +01:00
Florian Hahn
e584278289
[LV] Update tests to avoid loop invariant instructions.
Update some tests with loop invariant instructions so the instructions
cannot be hoisted out.

This preserves the original test intention after
https://github.com/llvm/llvm-project/pull/107894.
2024-09-19 18:50:10 +01:00
David Sherwood
d4536bf5c9
Fix test issue introduced by e762d4dac762a3fc27c6e251086b6645d7543bb2 (#109254) 2024-09-19 10:06:48 +01:00
David Sherwood
e762d4dac7
[LoopVectorize] Teach LoopVectorizationLegality about more early exits (#107004)
This patch is split off from PR #88385 and concerns only the code
related to the legality of vectorising early exit loops. It is the first
step in adding support for vectorisation of a simple class of loops that
typically involves searching for something, i.e.

  for (int i = 0; i < n; i++) {
    if (p[i] == val)
      return i;
  }
  return n;

or

  for (int i = 0; i < n; i++) {
    if (p1[i] != p2[i])
      return i;
  }
  return n;

In this initial commit LoopVectorizationLegality will only consider
early exit loops legal for vectorising if they follow these criteria:

1. There are no stores in the loop.
2. The loop must have only one early exit like those shown in the above
example. I have referred to such exits as speculative early exits, to
distinguish from existing support for early exits where the
exit-not-taken count is known exactly at compile time.
3. The early exit block dominates the latch block.
4. The latch block must have an exact exit count.
5. There are no loads after the early exit block.
6. The loop must not contain reductions or recurrences. I don't see
anything fundamental blocking vectorisation of such loops, but I just
haven't done the work to support them yet.
7. We must be able to prove at compile-time that loops will not contain
faulting loads.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/simple_early_exit.ll
2024-09-19 09:41:25 +01:00
Shih-Po Hung
ffcff2f465
[VPlan][NFC] Fix the value name of VECTOR_GEP (#107544)
This patch passes the string `"vector.gep"` to CreateGEP instead of
CreateMul.
2024-09-18 19:22:36 +08:00
LiqinWeng
a2994b2999
[LV][NFC] Unify printing for WidenEVLReicpe with other EVL recipes (#108177) 2024-09-18 15:03:37 +08:00
Florian Hahn
3c5c61a414
[LV] Add first order rec test where hoisting can improve over sinking. 2024-09-17 09:25:39 +01:00
Florian Hahn
c48a1ebec1
[LV] Remove force-vector-width/force-vector-interleave from X86 test.
Update target-specific test to not force VF/UF, but instead use the
cost-model. There are similar tests arleady outside X86 and those force
VF & UF.

With this change, the target specific test checks the cost model.
Changes in picked VF/UF are limited to test_pr62954_scalar_epilogue_required,
and should preserve the original spirit of the test.
2024-09-17 08:59:24 +01:00
Luke Lau
30d7dcc1db [RISCV] Add asserts requirement to loop vectorizer tests
Hopefully this fixes a buildbot failure on fuchsia where opt doesn't
have -debug-only
2024-09-17 14:18:36 +08:00