2906 Commits

Author SHA1 Message Date
David Sherwood
efc72347fd
[AArch64] Improve getPartialReductionCost for fixed-width VFs (#126538)
NEON does not have a version of udot/sdot that accumulates into
64-bit integer values, so we should return Invalid from
getPartialReductionCost for 64-bit types and fixed-width VFs.
In theory, if the 64-bit versions of SVE udot/sdot are available 
we could use those, but we don't currently have lowering support
for that.
2025-02-11 15:10:39 +00:00
Florian Hahn
e258bca950
[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235)
Update getOrCreateVPValueForSCEVExpr to only skip expansion of
SCEVUnknown if the underlying value isn't an instruction. Instructions
may be defined in a loop and using them without expansion may break
LCSSA form. SCEVExpander will take care of preserving LCSSA if needed.

We could also try to pass LoopInfo, but there are some users of the
function where it won't be available and main benefit from skipping
expansion is slightly more concise VPlans.

Note that SCEVExpander is now used to expand SCEVUnknown with floats.
Adjust the check in expandCodeFor to only check the types and casts if
the type of the value is different to the requested type. Otherwise we
crash when trying to expand a float and requesting a float type.

Fixes https://github.com/llvm/llvm-project/issues/121518.

PR: https://github.com/llvm/llvm-project/pull/125235
2025-02-11 13:03:12 +01:00
Florian Hahn
3706dfef66
[LV] Forget LCSSA phi with new pred before other SCEV invalidation. (#119897)
`forgetLcssaPhiWithNewPredecessor` performs additional invalidation if
there is an existing SCEV for the phi, but earlier
`forgetBlockAndLoopDispositions` or `forgetLoop` may already invalidate
the SCEV for the phi.

Change the order to first call `forgetLcssaPhiWithNewPredecessor` to
ensure it runs before its SCEV gets invalidated too eagerly.

Fixes https://github.com/llvm/llvm-project/issues/119665.

PR: https://github.com/llvm/llvm-project/pull/119897
2025-02-10 16:29:42 +00:00
David Sherwood
0010a3c97e
[NFC][LoopVectorize] Add more partial reduction tests (#126525)
* Adds variants of dotp (dotp_i8_to_i64_has_neon_dotprod,
dotp_i16_to_i64_has_neon_dotprod) that show how the loop
vectoriser has generated fixed-width partial reductions
without any matching NEON udot instruction.
* Adds loops that could also benefit from partial
reductions once the work is done to recognise patterns
such as
  %zext = zext i8 %load to i32
  %acc.next = add i32 %acc, %zext
See zext_add_reduc_i8_i32, etc. I intend to follow up with
a patch to add support for vectorising such patterns.
2025-02-10 16:04:43 +00:00
Elvis Wang
2e3729bf40
[LV] Prevent query the computeCost() when VF=1 in emitInvalidCostRemarks(). (#117288)
We should only query the computeCost() when the VF is vector.
2025-02-10 08:40:28 +08:00
Hassnaa Hamdi
e9a20f77ee
Reland "[LV]: Teach LV to recursively (de)interleave." (#125094)
This patch relands the changes from "[LV]: Teach LV to recursively
(de)interleave.#122989"
    Reason for revert:
- The patch exposed an assert in the vectorizer related to VF difference
between
legacy cost model and VPlan-based cost model because of uncalculated
cost for
VPInstruction which is created by VPlanTransforms as a replacement to
'or disjoint'
       instruction.
VPlanTransforms do that instructions change when there are memory
interleaving and
predicated blocks, but that change didn't cause problems because at most
cases the cost
      difference between legacy/new models is not noticeable.
    - Issue is fixed by #125434

Original patch: https://github.com/llvm/llvm-project/pull/89018
Reviewed-by: paulwalker-arm, Mel-Chen
2025-02-09 19:21:54 +00:00
Simon Pilgrim
70906f0514 [LV][X86] Regenerate interleaved load/store costs. NFC.
update_analyze_test_checks has improved the checks since these were last updated.

Reduce noise diffs in future patches.
2025-02-09 15:02:41 +00:00
Florian Hahn
32c4493d5f
[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI).
Follow-up as discussed when using VPInstruction::ResumePhi for all resume
values (#112147). This patch explicitly adds incoming values for each
predecessor in VPlan. This simplifies codegen and allows transformations
adjusting the predecessors of blocks with

NFC modulo incoming block order in phis.
2025-02-09 11:20:20 +00:00
Florian Hahn
9266b48c5b
[VPlan] Add outer loop tests with wide phis in inner loop.
Add test coverage with phis outside a header block with multiple
incoming values.
2025-02-08 18:09:45 +00:00
Florian Hahn
cea799afc6
[LV] Add ordered reduction test with live-in.
Extra test for https://github.com/llvm/llvm-project/pull/124644.
2025-02-07 20:50:46 +00:00
Florian Hahn
1611059f5d
[VPlan] Compute cost for binary op VPInstruction with underlying values. (#125434)
As exposed by https://github.com/llvm/llvm-project/pull/125094, we are
missing cost computation for some binary VPInstructions we created based
on original IR instructions. Their cost should be considered.

PR: https://github.com/llvm/llvm-project/pull/125434
2025-02-07 15:27:31 +00:00
David Sherwood
1930524bbd
[LoopVectorize] Fix cost model assert when vectorising calls (#125716)
The legacy and vplan cost models did not agree because
VPWidenCallRecipe::computeCost only calculates the cost of the
call instruction, whereas
LoopVectorizationCostModel::setVectorizedCallDecision in some
cases adds on the cost of a synthesised mask argument. However,
this mask is always 'splat(i1 true)' which should be hoisted out
of the loop during codegen. In order to synchronise the two cost
models I have two options:

1) Also add the cost of the splat to the vplan model, or
2) Remove the cost of the splat from the legacy model.

I chose 2) because I feel this more closely represents what the
final code will look like. There is an argument that we should
take account of such broadcast costs in the preheader when
deciding if it's profitable to vectorise a loop, however there
isn't currently a mechanism to do this. We currently only take
account of the runtime checks when assessing profitability and
what the minimum trip count should be. However, I don't believe
this work needs doing as part of this PR.
2025-02-07 09:36:52 +00:00
James Chesterman
ac158aa13b
[LoopVectorizer] Allow partial reductions to be made in predicated loops (#124268)
Does a select on the input rather than the output. This way the mask has
the same number of lanes as the other operand in the select instruction.
2025-02-07 09:09:10 +00:00
Mel Chen
4d3148d926
[LV][EVL] Fix the check for legality of folding with EVL. (#125678)
The current legality check for folding with EVL has incomplete
verification for VF.
This patch fixes the VF check, ensuring that tail folding with EVL is
enabled only when a scalable VF is available. This allows loops that
prefer tail folding with EVL but cannot use scalable VF vectorization to
still be vectorized using a fixed VF, rather than abandoning
vectorization entirely.
2025-02-07 12:53:10 +08:00
David Sherwood
f07cd36a5d
[LoopVectorize] Add the cost of VPInstruction::AnyOf to vplan (#125058)
This patch adds an initial implementation of
VPInstruction::computeCost with support for only one
instruction so far - VPInstruction::AnyOf. This is only
used when vectorising loops with uncountable early exits.
2025-02-05 16:31:14 +00:00
Sam Tebbs
c7995a6905
[AArch64] Disallow vscale x 1 partial reductions (#125252)
We don't want to allow partial reductions resulting in a vscale x 1 type
as we can't lower it in the backend.
2025-02-05 13:34:43 +00:00
Mikhail R. Gadelha
e78be31639
[RISCV] Added cost model for fmuladd (#125683)
This patch updates the cost model for fmuladd on vector types to scale with LMUL. This was found when analyzing a hot loop in 519.lbm_r that was unprofitably vectorized, but doesn't directly impact that case and is split off so it doesn't get forgotten.

Unlike other FP arithmetic ops, it's not scaled by 2 because the scalar cost isn't scaled by 2.
2025-02-05 09:33:24 -03:00
Mel Chen
8d037b9256
[LV][EVL] Skip tryAddExplicitVectorLength for plans with scalar VF. (#125497)
The plans with scalar VF should not be transformed the plans folded by
EVL.

TODO: Move the scalar VF checking into `LoopVectorizationCostModel
::foldTailWithEVL()`.
2025-02-05 15:02:33 +08:00
Mel Chen
b9fa35fc07
[LV][EVL] Pre-commit test cases for preventing to transform plans with scalar VF. NFC (#125499)
Pre-commit for #125497.
2025-02-04 15:00:35 +08:00
Florian Hahn
30f3752e54
[VPlan] Only use SCEV for live-ins in tryToWiden. (#125436)
Replacing a recipe with a live-in may not be correct in all cases,
e.g. when replacing recipes involving header-phi recipes, like
reductions.

For now, only use SCEV to simplify live-ins.

More powerful input simplification can be built in top of
https://github.com/llvm/llvm-project/pull/124432 in the future.


Fixes https://github.com/llvm/llvm-project/issues/119173.
Fixes https://github.com/llvm/llvm-project/issues/125374.

PR: https://github.com/llvm/llvm-project/pull/125436
2025-02-03 17:01:02 +00:00
Florian Hahn
75b922dccf [VPlan] Check VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof.
When VPWidenIntrinsicRecipe was changed to inhert from VPRecipeWithIRFlags,
VPRecipeWithIRFlags::classof wasn't updated accordingly. Also check for
VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof.

Fixes https://github.com/llvm/llvm-project/issues/125301.
2025-02-01 21:42:19 +00:00
Sam Parker
28d7880618
[WebAssembly] getMemoryOpCost and getCastInstrCost (#122896)
Add inital implementations of these TTI methods for SIMD types. For
casts, The costing covers the free extensions provided by extmul_low as
well as extend_low. For memory operations we consider the use of
load32_zero and load64_zero, as well as full width v128 loads.
2025-01-31 10:33:31 +00:00
David Sherwood
3bc2dade36
[LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567)
This work feeds part of PR
https://github.com/llvm/llvm-project/pull/88385, and adds support for
vectorising
loops with uncountable early exits and outside users of loop-defined
variables. When calculating the final value from an uncountable early
exit we need to calculate the vector lane that triggered the exit,
and hence determine the value at the point we exited.

All code for calculating the last value when exiting the loop early
now lives in a new vector.early.exit block, which sits between the
middle.split block and the original exit block. Doing this required
two fixes:

1. The vplan verifier incorrectly assumed that the block containing
a definition always dominates the block of the user. That's not true
if you can arrive at the use block from multiple incoming blocks.
This is possible for early exit loops where both the early exit and
the latch jump to the same block.
2. We were adding the new vector.early.exit to the wrong parent loop.
It needs to have the same parent as the actual early exit block from
the original loop.

I've added a new ExtractFirstActive VPInstruction that extracts the
first active lane of a vector, i.e. the lane of the vector predicate
that triggered the exit.

NOTE: The IR generated for dealing with live-outs from early exit
loops is unoptimised, as opposed to normal loops. This inevitably
leads to poor quality code, but this can be fixed up later.
2025-01-30 10:37:00 +00:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
David Sherwood
776ef9d1be
[LoopVectorize][NFC] Regenerate some early exit test CHECK lines (#124900) 2025-01-29 09:48:55 +00:00
David Sherwood
c836b8956d
[LoopVectorize][NFC] Disable output for tests that don't need it (#124747)
There are a lot of tests that do not depend upon the IR output
for validation, relying instead on the debug output. For these
tests we can add the -disable-output command line argument.
2025-01-29 08:09:50 +00:00
Nicholas Guy
cdea38f91a
Reland "[LoopVectorizer] Add support for chaining partial reductions #120272" (#124282)
Change `getScaledReduction` to take an existing vector, rather than
creating and returning a new one each call.
Rename `getScaledReduction` to `getScaledReductions` to more accurately
reflect what it's now doing.

---------

Co-authored-by: Karlo Basioli <68535415+basioli-k@users.noreply.github.com>
2025-01-28 10:40:35 +00:00
Florian Hahn
713482fccf [VPlan] Use State.get to extract lane mask for BranchOnMask.
Simplifies the code slightly and avoids redundant extracts/broadcasts if
the operand is live-in or already scalar.
2025-01-27 21:35:36 +00:00
Florian Hahn
09a29fcc8d
[VPlan] Don't collect live-ins in collectUsersInExitBlocks. (NFC) (#123819)
Live-ins don't need to be handled, other than adding to the exit phi
recipe. Do that early and assert that otherwise the exit value is
defined in the vector loop region.

This should enable simply skipping other exit values that do not need
further fixing, e.g. if handling the exit value from the early exit
directly in handleUncountableEarlyExit.

PR: https://github.com/llvm/llvm-project/pull/123819
2025-01-27 16:12:07 +00:00
David Sherwood
b7286dbef9
Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752" (#123616)
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
2025-01-27 11:59:38 +00:00
Florian Hahn
81d38da65e [LV] Add more tests for narrowing interleave groups for AArch64.
Add additional tests for
https://github.com/llvm/llvm-project/pull/106441.
2025-01-26 13:52:18 +00:00
Florian Hahn
6383a12e3b
[VPlan] Refactor HCFG builder to preserve original vector latch (NFC).
Update HCFG builder to preserve the original latch block of the initial
VPlan, ensuring there is always a latch.

It also skips creating the BranchOnCond for the latch of the top-level
loop, instead of removing it later. Exiting via the latch is controlled
by later recipes.

This further unifies HCFG construction and prepares for use to also
build an initial VPlan (VPlan0) for inner loops.
2025-01-25 13:32:01 +00:00
Elvis Wang
aff1242b8e
[LV] Align debug location of the widen-phi to the original phi. (#120338)
This patch align the debug location of the widen-phi to the debug
location of original phi.

Split from: #120054
2025-01-24 17:49:54 +08:00
David Blaikie
42043c423f Reapply "Verifier: Add check for DICompositeType elements being null"
This remove some erroneous debug info from tests that should address the
test failures that showed up when the this was previously committed.

This reverts commit 6716ce8b641f0e42e2343e1694ee578b027be0c4.
2025-01-23 22:29:30 +00:00
Vitaly Buka
0e213834df
Revert "[LoopVectorizer] Add support for chaining partial reductions (#120272)" (#124198)
Introduced stack buffer overflow, see #120272.

`getScaledReduction` can return empty vector, and there is not check for
that.

This reverts commit c9b7303b9b18129c4ee6b56aaa2a0a9f59be2d09.
This reverts commit caf0540b91b0fee31353dc7049ae836e0f814cff.
2025-01-23 14:00:33 -08:00
Nicholas Guy
caf0540b91
[LoopVectorizer] Add support for chaining partial reductions (#120272)
Chaining partial reductions, where multiple partial reductions share an
accumulator, allow for more values to be combined together as part of
the reduction without discarding the semantics of the partial reduction
itself.
2025-01-23 17:24:57 +00:00
Nicholas Guy
26b61e143b
[LoopVectorizer] Propagate underlying instruction to the cloned instances of VPPartialReductionRecipes (#123638) 2025-01-23 14:57:31 +00:00
Florian Hahn
3418cd082a
[LV] Add test showing cost-model difference after 9491f75e1d9.
Reduced test case from
https://lab.llvm.org/buildbot/#/builders/143/builds/4847.
2025-01-21 21:37:51 +00:00
Florian Hahn
6c787ff6cf
Revert "[LV]: Teach LV to recursively (de)interleave. (#122989)"
This reverts commit 9491f75e1d912b277247450d1c7b6d56f7faf885.

This triggers an assert when building with SVE enabled.
https://lab.llvm.org/buildbot/#/builders/143/builds/4795
2025-01-21 21:36:16 +00:00
David Green
8552c49046
[AArch64] Enable UseFixedOverScalableIfEqualCost for more Cortex-x cpus. (#122807)
For similar reasons for fixed-width being prefered to scalable for
Neoverse V2, this patch enables the UseFixedOverScalableIfEqualCost
feature when using -mcpu=cortex-x2, x3, x4 and x925 that are similar to
Neoverse V2.
2025-01-20 15:05:15 +00:00
Mel Chen
84c89d0aa4
[LV][EVL] Address post-commit comments for 9720be9. (NFC) (#123311) 2025-01-20 14:20:40 +08:00
Florian Hahn
2c87133c62
Reapply "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.

The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.

Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-19 19:32:03 +00:00
Florian Hahn
8d90473c3e
[LV] Add tests with outisde IV users where vector region can e removed.
Tests for crash caused by initial version of
https://github.com/llvm/llvm-project/pull/112147.
2025-01-19 19:15:45 +00:00
Florian Hahn
58326f1d5b
Revert "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts commit c2d15ac4d4432788557e77c15ce572ac655a8fec.

Causes build failures on PPC stage2 & fuchsia bots
    https://lab.llvm.org/buildbot/#/builders/168/builds/7650
    https://lab.llvm.org/buildbot/#/builders/11/builds/11248
2025-01-18 13:40:33 +00:00
Florian Hahn
c2d15ac4d4
[VPlan] Update final IV exit value via VPlan. (#112147)
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform 
(optimizeInductionExitUsers) replaces extracts of inductions with 
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them 
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.


Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-18 13:22:34 +00:00
Florian Hahn
22637a877a
[Loads] Respect UseDerefAtPointSemantics in isDerefAndAlignedPointer. (#123196)
If a pointer gets freed, it may not be dereferenceable any longer, even
though there is a dominating dereferenceable assumption. As first step,
only consider assumptions if the pointer value cannot be freed if
UseDerefAtPointSemantics is used.

PR: https://github.com/llvm/llvm-project/pull/123196
2025-01-17 12:52:24 +00:00
Hassnaa Hamdi
9491f75e1d
Reland: [LV]: Teach LV to recursively (de)interleave. (#122989)
This commit relands the changes from "[LV]: Teach LV to recursively
(de)interleave. #89018"

Reason for revert:
- The patch exposed a bug in the IA pass, the bug is now fixed and landed by commit: #122643
2025-01-17 10:34:57 +00:00
Mel Chen
9720be95d6
[LV][EVL] Disable fixed-order recurrence idiom with EVL tail folding. (#122458)
The currently llvm.splice may occurs unexpected behavior if the evl of
the second-to-last iteration is not VF*UF.

Issue #122461
2025-01-17 16:55:35 +08:00
Luke Lau
e83e0c300d [LV] Add test case for #119173. NFC
This showcases a miscompile involving a widened reduction-phi.
2025-01-17 09:45:42 +08:00
Florian Hahn
4481030a03 [Loads] Use use-dereferenceable-at-point-semantics=1 in test.
Update the test to use use-dereferenceable-at-point-semantics=1.
Existing tests are updated with the nofree attribute and a new one has
been added showing that the dereferenceable assumption is used after the
pointer may be freed.
2025-01-16 12:49:50 +00:00