725 Commits

Author SHA1 Message Date
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
David Sherwood
776ef9d1be
[LoopVectorize][NFC] Regenerate some early exit test CHECK lines (#124900) 2025-01-29 09:48:55 +00:00
David Sherwood
c836b8956d
[LoopVectorize][NFC] Disable output for tests that don't need it (#124747)
There are a lot of tests that do not depend upon the IR output
for validation, relying instead on the debug output. For these
tests we can add the -disable-output command line argument.
2025-01-29 08:09:50 +00:00
Nicholas Guy
cdea38f91a
Reland "[LoopVectorizer] Add support for chaining partial reductions #120272" (#124282)
Change `getScaledReduction` to take an existing vector, rather than
creating and returning a new one each call.
Rename `getScaledReduction` to `getScaledReductions` to more accurately
reflect what it's now doing.

---------

Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>
2025-01-28 10:40:35 +00:00
Florian Hahn
713482fccf [VPlan] Use State.get to extract lane mask for BranchOnMask.
Simplifies the code slightly and avoids redundant extracts/broadcasts if
the operand is live-in or already scalar.
2025-01-27 21:35:36 +00:00
Florian Hahn
81d38da65e [LV] Add more tests for narrowing interleave groups for AArch64.
Add additional tests for
https://github.com/llvm/llvm-project/pull/106441.
2025-01-26 13:52:18 +00:00
Vitaly Buka
0e213834df
Revert "[LoopVectorizer] Add support for chaining partial reductions (#120272)" (#124198)
Introduced stack buffer overflow, see #120272.

`getScaledReduction` can return empty vector, and there is not check for
that.

This reverts commit c9b7303b9b18129c4ee6b56aaa2a0a9f59be2d09.
This reverts commit caf0540b91b0fee31353dc7049ae836e0f814cff.
2025-01-23 14:00:33 -08:00
Nicholas Guy
caf0540b91
[LoopVectorizer] Add support for chaining partial reductions (#120272)
Chaining partial reductions, where multiple partial reductions share an
accumulator, allow for more values to be combined together as part of
the reduction without discarding the semantics of the partial reduction
itself.
2025-01-23 17:24:57 +00:00
Nicholas Guy
26b61e143b
[LoopVectorizer] Propagate underlying instruction to the cloned instances of VPPartialReductionRecipes (#123638) 2025-01-23 14:57:31 +00:00
Florian Hahn
3418cd082a
[LV] Add test showing cost-model difference after 9491f75e1d9.
Reduced test case from
https://lab.llvm.org/buildbot/#/builders/143/builds/4847.
2025-01-21 21:37:51 +00:00
Florian Hahn
6c787ff6cf
Revert "[LV]: Teach LV to recursively (de)interleave. (#122989)"
This reverts commit 9491f75e1d912b277247450d1c7b6d56f7faf885.

This triggers an assert when building with SVE enabled.
https://lab.llvm.org/buildbot/#/builders/143/builds/4795
2025-01-21 21:36:16 +00:00
David Green
8552c49046
[AArch64] Enable UseFixedOverScalableIfEqualCost for more Cortex-x cpus. (#122807)
For similar reasons for fixed-width being prefered to scalable for
Neoverse V2, this patch enables the UseFixedOverScalableIfEqualCost
feature when using -mcpu=cortex-x2, x3, x4 and x925 that are similar to
Neoverse V2.
2025-01-20 15:05:15 +00:00
Hassnaa Hamdi
9491f75e1d
Reland: [LV]: Teach LV to recursively (de)interleave. (#122989)
This commit relands the changes from "[LV]: Teach LV to recursively
(de)interleave. #89018"

Reason for revert:
- The patch exposed a bug in the IA pass, the bug is now fixed and landed by commit: #122643
2025-01-17 10:34:57 +00:00
Luke Lau
e83e0c300d [LV] Add test case for #119173. NFC
This showcases a miscompile involving a widened reduction-phi.
2025-01-17 09:45:42 +08:00
Florian Hahn
1de3dc7d23 [LV] Bail out early if BTC+1 wraps.
Currently we fail to detect the case where BTC + 1 wraps, i.e. the
vector trip count is 0, In those cases, the minimum iteration count
check will fail, and the vector code will never be executed.

Explicitly check for this condition in computeMaxVF and avoid trying to
vectorize alltogether.

Note that a number of tests needed to be updated, because the vector
loop would never be executed given the input IR.

Fixes https://github.com/llvm/llvm-project/issues/122558.
2025-01-14 22:07:38 +00:00
Sam Tebbs
795e35a653
Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744)
This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2025-01-13 11:20:35 +00:00
Florian Hahn
3ff1d01985 Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67.

Re-applies commit with typos fixed.
2025-01-12 20:10:28 +00:00
Florian Hahn
0ebb3ac7c9 Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac.

Typo breaking the build
2025-01-12 19:37:45 +00:00
Florian Hahn
1afba19913 [VPlan] Try to narrow wide and replicating recipes to uniform recipes.
Use the existing VPlan-based analysis to identify recipes that only have
their first lane demanded and transform them to uniform recpliate
recipes. This simplifies the generated code in some places and prepares
for fixing https://github.com/llvm/llvm-project/issues/122496.
2025-01-12 19:32:01 +00:00
Benjamin Maxwell
f88ef1bd1b
[LV] Teach LoopVectorizationLegality about struct vector calls (#119221)
This is a split-off from #109833 and only adds code relating to checking
if a struct-returning call can be vectorized.

This initial patch only allows the case where all users of the struct
return are `extractvalue` operations that can be widened.

```
%call = tail call { float, float } @foo(float %in_val)
%extract_a = extractvalue { float, float } %call, 0
%extract_b = extractvalue { float, float } %call, 1
```

Note: The tests require the VFABI changes from #119000 to pass.
2025-01-09 09:27:29 +00:00
Luke Lau
f0d5104c94
[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058)
This just copies the same conservative definition from mayWriteToMemory,
and enables more VPInstructions to be hoisted out in LICM.

I think this should give more accurate costs, and I was able to build
llvm-test-suite without the legacy-vplan cost model assertion going off.
2025-01-08 15:17:26 +08:00
David Sherwood
a3fff3a14d
[LoopVectorize][NFC] Fix arith-fp-frem-costs.ll test to use new vplan cost model (#120742) 2025-01-06 10:26:51 +00:00
Florian Hahn
f48884ded8
[VPlan] Remove loop region in optimizeForVFAndUF. (#108378)
Update optimizeForVFAndUF to completely remove the vector loop region
when possible. At the moment, we cannot remove the region if it contains

* widened IVs: the recipe is needed to generate the step vector
* reductions: ComputeReductionResults requires the reduction phi recipe
for codegen.

Both cases can be addressed by more explicit modeling.

The patch also includes a number of updates to allow executing VPlans
without a vector loop region.

Depends on https://github.com/llvm/llvm-project/pull/110004
2025-01-05 15:50:42 +00:00
John Brawn
073e65a8e5
[LoopVectorize] Make needsExtract notice scalarized instructions (#119720)
LoopVectorizationCostModel::needsExtract should recognise instructions
that have been widened by scalarizing as scalar instructions, and thus
not needing an extract when used by later scalarized instructions.

This fixes an incorrect cost calculation in computePredInstDiscount,
where we are adding a scalarization overhead cost when we shouldn't,
though I haven't come up with a test case where it makes a difference.
It will make a difference when the cost model switches to using the cost
kind TCK_CodeSize for optsize, as not doing this causes the test
LoopVectorize/X86/small-size.ll to get worse.
2025-01-02 14:31:36 +00:00
Muhammad Omair Javaid
332d2647ff Revert "[LV]: Teach LV to recursively (de)interleave. (#89018)"
This reverts commit ccfe0de0e1e37ed369c9bf89dd0188ba0afb2e9a.

This breaks LLVM build on AArch64 SVE Linux buildbots
https://lab.llvm.org/buildbot/#/builders/143/builds/4462
https://lab.llvm.org/buildbot/#/builders/17/builds/4902
https://lab.llvm.org/buildbot/#/builders/4/builds/4399
https://lab.llvm.org/buildbot/#/builders/41/builds/4299
2024-12-31 03:12:24 +05:00
Florian Hahn
7f3428d3ed
[VPlan] Compute induction end values in VPlan. (#112145)
Use createDerivedIV to compute IV end values directly in VPlan, instead
of creating them up-front.

This allows updating IV users outside the loop as follow-up.

Depends on https://github.com/llvm/llvm-project/pull/110004 and
https://github.com/llvm/llvm-project/pull/109975.

PR: https://github.com/llvm/llvm-project/pull/112145
2024-12-29 19:05:08 +00:00
Zequan Wu
4d8f9594b2 Revert "Reland "[LoopVectorizer] Add support for partial reductions" (#120721)"
This reverts commit c858bf620c3ab2a4db53e84b9365b553c3ad1aa6 as it casuse optimization crash on -O2, see https://github.com/llvm/llvm-project/pull/120721#issuecomment-2563192057
2024-12-27 11:51:54 -08:00
Hassnaa Hamdi
ccfe0de0e1
[LV]: Teach LV to recursively (de)interleave. (#89018)
Currently available intrinsics are only ld2/st2, which don't support interleaving factor > 2.
This patch teaches the LV to use ld2/st2 recursively to support high
interleaving factors.
2024-12-27 12:42:07 +00:00
Florian Hahn
2d038caeeb
[VPlan] Remove stray space when printing VPWidenCastRecipe.
printFlags() already takes care of printing a single space if there are
no flags. Remove the extra space when printing a recipe without flags.
2024-12-24 20:23:48 +00:00
Sam Tebbs
c858bf620c
Reland "[LoopVectorizer] Add support for partial reductions" (#120721)
This re-lands the reverted #92418 

When the VF is small enough so that dividing the VF by the scaling
factor results in 1, the reduction phi execution thinks the VF is scalar
and sets the reduction's output as a scalar value, tripping assertions
expecting a vector value. The latest commit in this PR fixes that by
using `State.VF` in the scalar check, rather than the divided VF.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2024-12-24 12:08:17 +00:00
Florian Hahn
df8efbdbbf
[SCEV] Remove existing predicates implied by newly added ones. (#118185)
When adding a new predicate to a union predicate, some of the existing
predicates may be implied by the new predicate. Remove any existing
predicates that are already implied by the new predicate.

Depends on https://github.com/llvm/llvm-project/pull/118184 to show the
main benefit.

PR: https://github.com/llvm/llvm-project/pull/118185
2024-12-20 20:49:37 +00:00
Florian Hahn
5f096fd221
Revert "[LoopVectorizer] Add support for partial reductions (#92418)"
This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701.

It looks like this is triggering an assertion when build llvm-test-suite
on ARM64 macOS.

Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32"
    target triple = "arm64-apple-macosx15.0.0"

    define void @test(i64 %idx.neg, i8 %0) #0 {
    entry:
      br label %while.body

    while.body:                                       ; preds = %while.body, %entry
      %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ]
      %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ]
      %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ]
      %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1
      %conv = sext i8 %0 to i64
      %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1
      %1 = load i8, ptr null, align 1
      %conv97 = sext i8 %1 to i64
      %mul = mul i64 %conv97, %conv
      %add99 = add i64 %mul, %sum.1129
      %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0
      %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1
      %2 = and i1 %cmp94, %cmp95
      br i1 %2, label %while.body, label %while.end.loopexit

    while.end.loopexit:                               ; preds = %while.body
      %add99.lcssa = phi i64 [ %add99, %while.body ]
      ret void
    }

    attributes #0 = { "target-cpu"="apple-m1" }

> opt -p loop-vectorize
Assertion failed: ((VF.isScalar() || V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.
2024-12-19 21:46:51 +00:00
Nicholas Guy
060d62b48a
[LoopVectorizer] Add support for partial reductions (#92418)
Following on from https://github.com/llvm/llvm-project/pull/94499, this
patch adds support to the Loop Vectorizer to emit the partial reduction
intrinsics where they may be beneficial for the target.

---------

Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>
2024-12-19 11:42:40 +00:00
Alexander Kornienko
23a239267e
Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460)
Reverts llvm/llvm-project#119225 due to the lack of sanitizer support,
large potential of breaking code containing latent UB, non-trivial
localization and investigation, and what seems to be a bad interaction
with msan (a test is in the works).

Related discussions:
https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822
https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255
2024-12-18 19:06:34 +01:00
Florian Hahn
3e02038948
[LV] Fixup check lines after 13107cb09441. 2024-12-18 09:37:30 +00:00
David Sherwood
13107cb094
[LoopVectorize] Enable more early exit vectorisation tests (#117008)
PR #112138 introduced initial support for dispatching to
multiple exit blocks via split middle blocks. This patch
fixes a few issues so that we can enable more tests to use
the new enable-early-exit-vectorization flag. Fixes are:

1. The code to bail out for any loop live-out values happens
too late. This is because collectUsersInExitBlocks ignores
induction variables, which get dealt with in fixupIVUsers.
I've moved the check much earlier in processLoop by looking
for outside users of loop-defined values.
2. We shouldn't yet be interleaving when vectorising loops
with uncountable early exits, since we've not added support
for this yet.
3. Similarly, we also shouldn't be creating vector epilogues.
4. Similarly, we shouldn't enable tail-folding.
5. The existing implementation doesn't yet support loops
that require scalar epilogues, although I plan to add that
as part of PR #88385.
6. The new split middle blocks weren't being added to the
parent loop.
2024-12-18 09:25:45 +00:00
Florian Hahn
4ad0fdd163
[VPlan] Remove reverse() of predecessors from VPInstruction::generate.
This was originally done to reduce the diff for the change. Remove it
and update the remaining tests. NFC modulo reordering of incoming
values.

Clean up after https://github.com/llvm/llvm-project/pull/114292.
2024-12-17 20:44:32 +00:00
Nikita Popov
1157187496
[VPlan] Propagate all GEP flags (#119899)
Store GEPNoWrapFlags instead of only InBounds and propagate them.
2024-12-17 13:48:50 +01:00
Luke Lau
4746395bd7
[VPlan] Omit zero add in VPWidenIntOrFpInductionRecipe (#119668)
I'm not sure if getStepVector was used for other things in the past
where StartIdx was non-zero, but nowadays VPWidenIntOrFpInductionRecipe
is the only user of it, and just passes zero to it. I presume
InstCombine was already catching this so hopefully removing this won't
affect codegen.
2024-12-16 11:55:48 +08:00
Florian Hahn
e64650d702
[VPlan] Get types and step from VPWidenPointerInductionRecipe (NFC).
Use information directly from operands instead of going through
IVDescriptor.
2024-12-15 18:52:10 +00:00
Florian Hahn
d1dff1dc18
[LV] Remove hard-coded VPValue numbers in test check lines. (NFC)
Make tests independent of VPlan value numbers.
2024-12-12 22:33:00 +00:00
Florian Hahn
6c8f41d336
[VPlan] Hook IR blocks into VPlan during skeleton creation (NFC) (#114292)
As a first step to move towards modeling the full skeleton in VPlan,
start by wrapping IR blocks created during legacy skeleton creation in
VPIRBasicBlocks and hook them into the VPlan. This means the skeleton
CFG is represented in VPlan, just before execute. This allows moving
parts of skeleton creation into recipes in the VPBBs gradually.

Note that this allows retiring some manual DT updates, as this will be
handled automatically during VPlan execution.

PR: https://github.com/llvm/llvm-project/pull/114292
2024-12-12 15:58:16 +00:00
Nikita Popov
e21ab4d16b
[InstCombine] Infer nuw for gep inbounds from base of object (#119225)
When we have a gep inbounds from the base of an object (e.g. alloca or
global), we know that the index cannot be negative, as this would go out
of bounds. As such, we can infer nuw as well.

The implementation is a bit stricter than necessary, we could also
accept one unknown index followed by known-non-negative indices.

Proof: https://alive2.llvm.org/ce/z/Hp7-6w (Note that alive2 currently
incorrectly doesn't require the inbounds for the alloca case, see
https://github.com/AliveToolkit/alive2/issues/1138).
2024-12-10 10:00:50 +01:00
Florian Hahn
0e70289f37
[VPlan] Create canonical IV resume value for epilogue in VPlan. (NFCI)
Update the code to create induction resume PHIs to also create a resume
phi for the canonical induction during epilogue vectorization. This
unifies the code for handling induction resume values and removes the
need to explicitly create manually resume PHI and return it during
epilogue creation.

Overall it helps to move the code for updating the canonical induction
resume value to the place where all other header phi resume values are
updated.

This is NFC, modulo order of the created phis.
2024-12-09 23:11:38 +00:00
Igor Kirillov
337936a83b
[LV] Ignore some costs when loop gets fully unrolled (#106699)
When VF has a fixed width and equals the number of iterations, and we are not
tail folding by masking, comparison instruction and induction operation will be DCEed later.
Ignoring the costs of these instructions improves the cost model.
2024-12-09 18:17:52 +00:00
Igor Kirillov
646d1854c0 [LV] Pre-commit extra test for fixed width VF fully unrolled loop cost model change 2024-12-09 15:31:38 +00:00
Florian Hahn
0cedb8f215
[LV] Update checks to actually check VPlan for correct function.
The checks should apply to test_getVectorCallCost, which doesn't have a
wide add.
2024-12-07 12:05:17 +00:00
Florian Hahn
6797b0f0c0
[VPlan] Use RPOT for VPlan codegen and printing.
This split off changes for more complex CFGs in VPlan from both
    https://github.com/llvm/llvm-project/pull/114292
    https://github.com/llvm/llvm-project/pull/112138

This simplifies their respective diffs.
2024-12-06 21:49:00 +00:00
Florian Hahn
7f7f540a48
Reapply "[VPlan] Update scalar induction resume values in VPlan. (#110577)"
This reverts commit f09b16e2671cbcdf7cb7dc7ed705db092a9deda1.

The crash when building llvm-test-suite with stage2 should have been
fixed by 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.
2024-12-06 19:41:51 +00:00
Nikita Popov
f09b16e267 Revert "[VPlan] Update scalar induction resume values in VPlan. (#110577)"
This reverts commit 0678e2058364ec10b94560d27ec7138dfa003287.
This reverts commit 1091fad31a83d5ab87eb6fa11fe3bdb3f0d152ea.

Causes crashes in llvm-test-suite when using stage 2 clang.
2024-12-06 18:01:42 +01:00