2820 Commits

Author SHA1 Message Date
Florian Hahn
3026ecaff5
[LV] Also verify loops in vector loop removal tests.
Also verify loop info in tests added in 7d6ec3b9680.
2024-12-31 20:11:23 +00:00
Florian Hahn
7d6ec3b968
[LV] Add more tests for vector loop removal.
Add missing test coverage of loops where the vector loop region can be
removed that include replicate recipes as well as nested loops.

Extra test coverage for https://github.com/llvm/llvm-project/pull/108378.
2024-12-31 20:08:54 +00:00
Muhammad Omair Javaid
332d2647ff Revert "[LV]: Teach LV to recursively (de)interleave. (#89018)"
This reverts commit ccfe0de0e1e37ed369c9bf89dd0188ba0afb2e9a.

This breaks LLVM build on AArch64 SVE Linux buildbots
https://lab.llvm.org/buildbot/#/builders/143/builds/4462
https://lab.llvm.org/buildbot/#/builders/17/builds/4902
https://lab.llvm.org/buildbot/#/builders/4/builds/4399
https://lab.llvm.org/buildbot/#/builders/41/builds/4299
2024-12-31 03:12:24 +05:00
Florian Hahn
b20b6e9ea9
[LV] Check IR generated for both interleaving and vectorizing in test.
Currently the tests would in some cases would only check the vectorized
IR, but not the interleaved IR, if they are different.
2024-12-30 19:57:26 +00:00
Florian Hahn
c2be48a6ce
[LV] Add additional tests with induction users.
Adds test coverage of post-inc IV users with different opcodes.
2024-12-30 17:36:48 +00:00
Florian Hahn
7f3428d3ed
[VPlan] Compute induction end values in VPlan. (#112145)
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on https://github.com/llvm/llvm-project/pull/110004 and
https://github.com/llvm/llvm-project/pull/109975.

PR: https://github.com/llvm/llvm-project/pull/112145
2024-12-29 19:05:08 +00:00
Zequan Wu
4d8f9594b2 Revert "Reland "[LoopVectorizer] Add support for partial reductions" (#120721)"
This reverts commit c858bf620c3ab2a4db53e84b9365b553c3ad1aa6 as it casuse optimization crash on -O2, see https://github.com/llvm/llvm-project/pull/120721#issuecomment-2563192057
2024-12-27 11:51:54 -08:00
Hassnaa Hamdi
ccfe0de0e1
[LV]: Teach LV to recursively (de)interleave. (#89018)
Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2.
This patch teaches the LV to use ld2/st2 recursively to support high
interleaving factors.
2024-12-27 12:42:07 +00:00
Elvis Wang
47e1c87a61
[VPlan] Set debug location for VPReduction/VPWidenIntrinsicRecipe. (#120054)
This patch add missing debug location for
VPReduction/VPWidenIntrinsicRecipe.
2024-12-27 10:37:21 +08:00
Florian Hahn
2d038caeeb
[VPlan] Remove stray space when printing VPWidenCastRecipe.
printFlags() already takes care of printing a single space if there are
no flags. Remove the extra space when printing a recipe without flags.
2024-12-24 20:23:48 +00:00
Sam Tebbs
c858bf620c
Reland "[LoopVectorizer] Add support for partial reductions" (#120721)
This re-lands the reverted #92418 

When the VF is small enough so that dividing the VF by the scaling
factor results in 1, the reduction phi execution thinks the VF is scalar
and sets the reduction's output as a scalar value, tripping assertions
expecting a vector value. The latest commit in this PR fixes that by
using `State.VF` in the scalar check, rather than the divided VF.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2024-12-24 12:08:17 +00:00
Florian Hahn
db2307d2d7
[LV] Add tests with dereferenceable assumptions.
Add a number of tests with dereferenceable assumptions and different
alignment info.
2024-12-22 16:32:40 +00:00
LiqinWeng
86fa35ce7e
[LV][VPlan] Use opcode to retrieve the VPID of the CallRecipe, rather than underlying instruction (#120816)
This patch may cause the flags in the CallRecipe to be lost after EVL
transformation, and it has been addressed in the patch: #119847
2024-12-22 10:28:20 +08:00
Florian Hahn
bc23ef3feb
[LV] Add test showing incorrect debug location for scalar casts. 2024-12-21 22:30:36 +00:00
Florian Hahn
9b496deb90
[VPlan] Set and use debug location for VPPredInstPHIRecipe.
Update the recipe it always set its debug location and use it during IR
generation.
2024-12-21 21:57:47 +00:00
Florian Hahn
df8efbdbbf
[SCEV] Remove existing predicates implied by newly added ones. (#118185)
When adding a new predicate to a union predicate, some of the existing
predicates may be implied by the new predicate. Remove any existing
predicates that are already implied by the new predicate.

Depends on https://github.com/llvm/llvm-project/pull/118184 to show the
main benefit.

PR: https://github.com/llvm/llvm-project/pull/118185
2024-12-20 20:49:37 +00:00
David Sherwood
5845298f94
[LoopVectorize] Teach some X86 cost model tests to use new vplan costs (#120738)
I've only fixed up the tests where I was able to use a simple sed script
to replace the text. Even after this patch lands, there are still over
50 tests that need updating in X86/CostModel!
2024-12-20 15:08:08 +00:00
Florian Hahn
5f096fd221
Revert "[LoopVectorizer] Add support for partial reductions (#92418)"
This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701.

It looks like this is triggering an assertion when build llvm-test-suite
on ARM64 macOS.

Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32"
    target triple = "arm64-apple-macosx15.0.0"

    define void @test(i64 %idx.neg, i8 %0) #0 {
    entry:
      br label %while.body

    while.body:                                       ; preds = %while.body, %entry
      %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ]
      %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ]
      %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ]
      %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1
      %conv = sext i8 %0 to i64
      %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1
      %1 = load i8, ptr null, align 1
      %conv97 = sext i8 %1 to i64
      %mul = mul i64 %conv97, %conv
      %add99 = add i64 %mul, %sum.1129
      %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0
      %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1
      %2 = and i1 %cmp94, %cmp95
      br i1 %2, label %while.body, label %while.end.loopexit

    while.end.loopexit:                               ; preds = %while.body
      %add99.lcssa = phi i64 [ %add99, %while.body ]
      ret void
    }

    attributes #0 = { "target-cpu"="apple-m1" }

> opt -p loop-vectorize
Assertion failed: ((VF.isScalar() || V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.
2024-12-19 21:46:51 +00:00
Nicholas Guy
060d62b48a
[LoopVectorizer] Add support for partial reductions (#92418)
Following on from https://github.com/llvm/llvm-project/pull/94499, this
patch adds support to the Loop Vectorizer to emit the partial reduction
intrinsics where they may be beneficial for the target.

---------

Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>
2024-12-19 11:42:40 +00:00
David Sherwood
c18fda02e1
[LoopVectorize] Use new single string variant of reportVectorizationFailure (#120414) 2024-12-19 10:07:13 +00:00
Alexander Kornienko
23a239267e
Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460)
Reverts llvm/llvm-project#119225 due to the lack of sanitizer support,
large potential of breaking code containing latent UB, non-trivial
localization and investigation, and what seems to be a bad interaction
with msan (a test is in the works).

Related discussions:
https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822
https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255
2024-12-18 19:06:34 +01:00
Florian Hahn
0e8d022ffe
[VPlan] Handle exit phis with multiple operands in addUsersInExitBlocks. (#120260)
Currently the addUsersInExitBlocks incorrectly assumes exit phis only
have a single operand, which may not be the case for loops with early
exits when they share a common exit block.

Also further relax the assertion in fixupIVUsers to allow exit values if
they come from theloop latch/middle.block.

PR: https://github.com/llvm/llvm-project/pull/120260
2024-12-18 14:47:16 +00:00
Florian Hahn
3e02038948
[LV] Fixup check lines after 13107cb09441. 2024-12-18 09:37:30 +00:00
David Sherwood
13107cb094
[LoopVectorize] Enable more early exit vectorisation tests (#117008)
PR #112138 introduced initial support for dispatching to
multiple exit blocks via split middle blocks. This patch
fixes a few issues so that we can enable more tests to use
the new enable-early-exit-vectorization flag. Fixes are:

1. The code to bail out for any loop live-out values happens
too late. This is because collectUsersInExitBlocks ignores
induction variables, which get dealt with in fixupIVUsers.
I've moved the check much earlier in processLoop by looking
for outside users of loop-defined values.
2. We shouldn't yet be interleaving when vectorising loops
with uncountable early exits, since we've not added support
for this yet.
3. Similarly, we also shouldn't be creating vector epilogues.
4. Similarly, we shouldn't enable tail-folding.
5. The existing implementation doesn't yet support loops
that require scalar epilogues, although I plan to add that
as part of PR #88385.
6. The new split middle blocks weren't being added to the
parent loop.
2024-12-18 09:25:45 +00:00
Luke Lau
b1f4a0201a [LV] Update failing test with middle block. NFC 2024-12-18 11:51:48 +08:00
Luke Lau
c2a879ecaa
[VPlan] Fix VPTypeAnalysis cache clobbering in EVL transform (#120252)
When building SPEC CPU 2017 with RISC-V and EVL tail folding, this
assertion in VPTypeAnalysis would trigger during the transformation to
EVL recipes:


d8a0709b10/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (L135-L142)

It was caused by this recipe:

```
WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6>
```

Having its type inferred as i16, when ir<%add33> and ir<0> had inferred
types of i32 somehow.

The cause of this turned out to be because the VPTypeAnalysis cache was
getting clobbered: In this transform we were erasing recipes but keeping
around the same mapping from VPValue* to Type*. In the meantime, new
recipes would be created which would have the same address as the old
value. They would then incorrectly get the old erased VPValue*'s cached
type:

```
  --- before ---
  0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
  0x600001ec5450: <badref> <- some value that was erased
  --- after ---
  0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6>
  0x600001ec5450: WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6>  <- a new value that happens to have the same address
```

This fixes this by deferring the erasing of recipes till after the
transformation.

The test case might be a bit flakey since it just happens to have the
right conditions to recreate this. I tried to add an assert in
inferScalarType that every VPValue in the cache was valid, but couldn't
find a way of telling if a VPValue had been erased.

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2024-12-18 11:28:28 +08:00
Luke Lau
4a7f60d328
[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform (#120194)
This fixes a crash that shows up when building SPEC CPU 2017 with EVL
tail folding on RISC-V.

A VPWidenCastRecipe doesn't always have an underlying value, and in the
case of this crash this happens whenever a widened cast is created via
truncateToMinimalBitwidths.

Fix this by just using the opcode stored in the recipe itself.

I think a similar issue exists with VPWidenIntrinsicRecipe and how it's
widened, but I haven't run into any crashes with it just yet.
2024-12-18 11:28:07 +08:00
Florian Hahn
4ad0fdd163
[VPlan] Remove reverse() of predecessors from VPInstruction::generate.
This was originally done to reduce the diff for the change. Remove it
and update the remaining tests. NFC modulo reordering of incoming
values.

Clean up after https://github.com/llvm/llvm-project/pull/114292.
2024-12-17 20:44:32 +00:00
Nikita Popov
1157187496
[VPlan] Propagate all GEP flags (#119899)
Store GEPNoWrapFlags instead of only InBounds and propagate them.
2024-12-17 13:48:50 +01:00
Florian Hahn
0e528ac404
[VPlan] Use start value operand for FindLastIV reduction phis.
Update VPReductionPHIRecipe::execute to use the start value from the
start value operand of the recipe. This is needed to make sure we resume
from the correct value during epilogue vectorization.

At the moment, the start value is set to the sentinel value in
adjustRecipesForReductions, as the original start value needs to be used
when creating ResumePhi recipes.

Fixes a mis-compile introduced by b3cba9be41bfa8 in SPEC2017 on AArch64.
2024-12-16 23:29:49 +00:00
Florian Hahn
0f6d93f8d5
[LV] Add test showing bug in epilogue vectorization of selects.
This is causing mis-compiles when in SPEC2017 on AArch64 after
b3cba9be41bfa8.
2024-12-16 23:23:38 +00:00
Florian Hahn
95e509a989
[VPlan] Add VPWidenInduction recipe as common base class (NFC). (#120008)
This helps to simplify some existing code and new code
(https://github.com/llvm/llvm-project/pull/112145)

PR: https://github.com/llvm/llvm-project/pull/120008
2024-12-16 09:40:03 +00:00
Luke Lau
4746395bd7
[VPlan] Omit zero add in VPWidenIntOrFpInductionRecipe (#119668)
I'm not sure if getStepVector was used for other things in the past
where StartIdx was non-zero, but nowadays VPWidenIntOrFpInductionRecipe
is the only user of it, and just passes zero to it. I presume
InstCombine was already catching this so hopefully removing this won't
affect codegen.
2024-12-16 11:55:48 +08:00
Florian Hahn
43045051d4
[VPlan] Modernize VPWidenIntOrFpInductionRecipe printing (NFC).
Modernize VPWidenIntOrFpInductionRecipe printing by including the result
VPValue and all operand VPValues, similar to VPScalarIVStepsRecipe and
VPDerivedIVRecipe.
2024-12-15 20:46:52 +00:00
Florian Hahn
e64650d702
[VPlan] Get types and step from VPWidenPointerInductionRecipe (NFC).
Use information directly from operands instead of going through
IVDescriptor.
2024-12-15 18:52:10 +00:00
Florian Hahn
2067e604a4
[VPlan] Manage VPWidenPointerInduction debug location via recipe.
Update VPWidenPointerInduction to manage its debug location via recipe.
This makes sure we emit a proper debug location for
VPWidenPointerInductionRecipes.
2024-12-15 14:41:07 +00:00
Florian Hahn
6c98f70b30
[LV] Add test with missing debug location for pointer IV in vector loop. 2024-12-15 14:36:03 +00:00
Florian Hahn
2564f1e199
[VPlan] Simplify Not(Not(A)) -> A.
Follow-up simplification to 5fae408d3a4c073ee4.
2024-12-14 20:08:26 +00:00
Florian Hahn
d1dff1dc18
[LV] Remove hard-coded VPValue numbers in test check lines. (NFC)
Make tests independent of VPlan value numbers.
2024-12-12 22:33:00 +00:00
Florian Hahn
6c8f41d336
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292)
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.

Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/114292
2024-12-12 15:58:16 +00:00
Mel Chen
b3cba9be41
[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable (#67812)
Consider the following loop:
```
  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;
```
We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

---------

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
2024-12-12 16:48:31 +08:00
Florian Hahn
4993a30365
[LV] Add missing REQUIRES: asserts to test using -debug.
Fixup for test added in 5fae408d3a4c07.
2024-12-11 21:39:45 +00:00
Florian Hahn
5fae408d3a
[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138)
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.

The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.

PR: https://github.com/llvm/llvm-project/pull/112138
2024-12-11 21:11:05 +00:00
LiqinWeng
77b6910b27
[Test] Fix the failed test of #108351 (#119495) 2024-12-11 11:43:25 +08:00
LiqinWeng
b759020cc8
[LV][EVL] Support cast instruction with EVL-vectorization (#108351) 2024-12-11 10:01:41 +08:00
Mel Chen
f4081711f0
[LV][NFC] Add test cases for FindLastIV reduction idiom. (#118519)
Pre-commit for #67812
2024-12-10 20:09:18 +08:00
Nikita Popov
e21ab4d16b
[InstCombine] Infer nuw for gep inbounds from base of object (#119225)
When we have a gep inbounds from the base of an object (e.g. alloca or
global), we know that the index cannot be negative, as this would go out
of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also
accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently
incorrectly doesn't require the inbounds for the alloca case, see
https://github.com/AliveToolkit/alive2/issues/1138).
2024-12-10 10:00:50 +01:00
Florian Hahn
0e70289f37
[VPlan] Create canonical IV resume value for epilogue in VPlan. (NFCI)
Update the code to create induction resume PHIs to also create a resume
phi for the canonical induction during epilogue vectorization. This
unifies the code for handling induction resume values and removes the
need to explicitly create manually resume PHI and return it during
epilogue creation.

Overall it helps to move the code for updating the canonical induction
resume value to the place where all other header phi resume values are
updated.

This is NFC, modulo order of the created phis.
2024-12-09 23:11:38 +00:00
Igor Kirillov
337936a83b
[LV] Ignore some costs when loop gets fully unrolled (#106699)
When VF has a fixed width and equals the number of iterations, and we are not
tail folding by masking, comparison instruction and induction operation will be DCEed later.
Ignoring the costs of these instructions improves the cost model.
2024-12-09 18:17:52 +00:00
Nikita Popov
10f315dc9c
[ConstantFolding] Infer getelementptr nuw flag (#119214)
Infer nuw from nusw and nneg. This is the constant expression variant of
https://github.com/llvm/llvm-project/pull/111144.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-09 16:44:05 +01:00