889 Commits

Author SHA1 Message Date
Luke Lau
8107b430ed
[VPlan] Simplify select c, x, x -> x (#133731)
As noted in 1a9358c090d0507be21c5e9b2d97a23ef1de8ab0, some
simplifications can produce a redundant select where the true and false
operands are the same, which this patch removes.

The is_fpclass test was changed so the condition wasn't made dead.
2025-04-02 10:26:48 +01:00
YunQiang Su
e25187bc3e
LLVM/Test: Add vectorizing testcases for fminimumnum and fminimumnum (#133843)
Vectorizing of fminimumnum and fminimumnum have not support yet. Let's
add the testcase for it now, and we will update the testcase when we
support it.
2025-04-02 08:46:02 +08:00
Ramkumar Ramachandra
3a66760d9b
[LV] Improve a test, regen with UTC (#130092) 2025-04-01 14:11:20 +01:00
Florian Hahn
783a846507
[VPlan] Add VF as operand to VPScalarIVStepsRecipe.
Similarly to other recipes, update VPScalarIVStepsRecipe to also take
the runtime VF as argument. This removes some unnecessary runtime VF
computations for scalable vectors. It will also allow dropping the
UF == 1 restriction for narrowing interleave groups required in
577631f0a528.
2025-03-28 21:48:59 +00:00
Hari Limaye
bf5627c85e
[LV] Optimize VPWidenIntOrFpInductionRecipe for known TC (#118828)
Optimize the IR generated for a VPWidenIntOrFpInductionRecipe to use the
narrowest type necessary, when the trip-count of a loop is known to be
constant and the only use of the recipe is the condition used by the
vector loop's backedge branch.
2025-03-28 14:47:40 +00:00
Florian Hahn
5c26e80e57
[LV] Make cost model tests independent of VPValue numbers.
Update tests to not rely on hard-coded VPValue numbers.
2025-03-27 21:15:32 +00:00
Florian Hahn
2c7d40b2f0
[VPlan] Generalize SCALAR-STEPS removal to any unroll factor.
Follow-up to dfca6c0d3bf9d1a056 to extend isUnrolled handle any unrolled
VPlan, which means there's a single UF, but it will be > 1 if unrolling
took place.
2025-03-26 21:03:50 +00:00
Florian Hahn
577631f0a5
Reapply "[VPlan] Add transformation to narrow interleave groups. (#106441)"
This reverts commit ff3e2ba9eb94217f3ad3525dc18b0c7b684e0abf.

The recommmitted version limits to transform to cases where no
interleaving is taking place, to avoid a mis-compile when interleaving.

Original commit message:

This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.

Depends on https://github.com/llvm/llvm-project/pull/106431.

Fixes https://github.com/llvm/llvm-project/issues/82936.

PR: https://github.com/llvm/llvm-project/pull/106441
2025-03-25 20:57:10 +00:00
Florian Hahn
dfca6c0d3b
[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655)
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part where
only the first lane is demanded.

This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.

In preparation for removing redundant WideIV steps added in
https://github.com/llvm/llvm-project/pull/119284.

PR: https://github.com/llvm/llvm-project/pull/123655
2025-03-25 12:57:24 +00:00
Martin Storsjö
ff3e2ba9eb Revert "[VPlan] Add transformation to narrow interleave groups. (#106441)"
This reverts commit dfa665f19c52d98b8d833a8e9073427ba5641b19.

This commit caused miscompilations in ffmpeg, see
https://github.com/llvm/llvm-project/pull/106441 for details.
2025-03-23 23:27:39 +02:00
Florian Hahn
dfa665f19c
[VPlan] Add transformation to narrow interleave groups. (#106441)
This patch adds a new narrowInterleaveGroups transfrom, which tries
convert a plan with interleave groups with VF elements to a plan that
instead replaces the interleave groups with wide loads and stores
processing VF elements.

This effectively is a very simple form of loop-aware SLP, where we
use interleave groups to identify candidates.

This initial version is quite restricted and hopefully serves as a
starting point for how to best model those kinds of transforms.

Depends on https://github.com/llvm/llvm-project/pull/106431.

Fixes https://github.com/llvm/llvm-project/issues/82936.

PR: https://github.com/llvm/llvm-project/pull/106441
2025-03-22 21:40:17 +00:00
Florian Hahn
0d3ba087f7
[LV] Move IV bypass value creation out of ILV (NFC)
createInductionAdditionalBypassValues is only used for epilogue
vectorization now. Move it out of ILV, which means we do not have to
thread through ExpandedSCEVs and also don't have to track the bypass
values in ILV. Instead, directly create them if needed after executing
the epilogue plan. This moves more the epilogue specific logic out of
the generic executePlan.
2025-03-22 20:36:45 +00:00
Florian Hahn
870f753f1f
[VPlan] Also materialize broadcasts for backedge-taken-counts (NFC).
Also include VPlan's BTC in the set of VPValues to materialize
broadcasts for, if it is used.
2025-03-18 22:35:18 +00:00
Luke Lau
eef5ea0c42
[VPlan] Account for dead FOR splice simplification in cost model (#131486)
Fixes #131359

After #129645, a first-order recurrence will no longer have it's splice
costed if the VPInstruction::FirstOrderRecurrenceSplice has no users and
is dead.

The legacy cost model didn't account for this, so this accounts for it
in planContainsAdditionalSimplifications to avoid the "VPlan cost model
and legacy cost model disagreed" assertion.
2025-03-18 00:00:54 +08:00
Florian Hahn
6a8d5f22ff
[VPlan] Don't access canonical IV in VPWidenPointerInduction::execute.
This updates VPWidenPointerInductionRecipe::execute to not use the
canonical IV to determine the insert point. Instead, it relies on the
current recipe position. In cases where this is not sufficient, set the
insert point to the first non-phi instruction, to ensure phis are
created together.
2025-03-15 21:32:48 +00:00
Florian Hahn
aadfa9f6c8
[LV] Add additional tests for narrowing interleave groups.
Extend test coverage for https://github.com/llvm/llvm-project/pull/106441.
2025-03-15 21:13:49 +00:00
Florian Hahn
62994c3291
[VPlan] Also introduce explicit broadcasts for values from entry VPBB.
Update and generalize materializeBroadcasts to also introduce explicit
broadcasts for VPValues defined in the Plans Entry block.

This fixes a crash when trying to insert the broadcasts generated by
VPTransformState::get after the generating instruction, which isn't
possible after invoke instructions.

Fixes https://github.com/llvm/llvm-project/issues/128838.
2025-03-12 22:03:19 +00:00
Florian Hahn
8132c4f554
[VPlan] Also introduce broadcasts for live-ins used in vec preheader.
Slightly generalize materializeLiveInBroadcasts to also introduce
broadcasts for live-ins used in the vector preheader. This should cover
all live-ins.

If the live-in is used in the vector preheader, insert the broadcast at
the beginning of the block.
2025-03-11 21:19:14 +00:00
Florian Hahn
8dd160f476
Revert "[VPlan] Fold NOT into predicate of wide compares." (#130347)
Reverts llvm/llvm-project#129430

this seems to have introduced a divergence between legacy and
VPlan-based cost model

https://lab.llvm.org/buildbot/#/builders/30/builds/17159
2025-03-07 21:18:49 +00:00
Florian Hahn
cb3ce30ca8
[VPlan] Fold NOT into predicate of wide compares. (#129430)
Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.

Alive2 Proofs for FPCMP test changes:  https://alive2.llvm.org/ce/z/WGDz9U

PR: https://github.com/llvm/llvm-project/pull/129430
2025-03-07 20:32:43 +00:00
Ramkumar Ramachandra
ddffb74afd
[LV] Strip unreachable SCEV-check blocks (#130079)
emitSCEVChecks checks if SCEVCheckCond matches zero, and returns
nullptr. However, it sets SCEVCheckCond as used before it does this,
which prevents it from being removed during cleanup, resulting in
unreachable blocks being emitted. Fix this.
2025-03-06 19:30:25 +00:00
Florian Hahn
f937b17e85
[LV] Don't query SCEV for non-invariant values in cost model.
This fixes a divergence between VPlan and legacy cost model, matching
behavior further up in getInstructionCost as well.

Fixes https://github.com/llvm/llvm-project/issues/129236.
2025-03-02 10:55:52 +00:00
Florian Hahn
1e1b9bccc0
[VPlan] Simplify BLEND %a, %b, NOT(%m) -> BLEND %b, %a, %m. (#128375)
Avoid negations for normalized blends by reordering operands.

PR: https://github.com/llvm/llvm-project/pull/128375
2025-02-27 17:43:24 +00:00
Florian Hahn
4277c21059
[VPlan] Introduce explicit broadcasts for live-ins. (#124644)
Add a new VPInstruction::Broadcast opcode and use it to materialize
explicit broadcasts of live-ins. The initial patch only materlizes the
broadcasts if the vector preheader dominates all uses that need it.
Later patches will pick the best valid insert point, thus retiring
implicit hoisting of broadcasts from VPTransformsState::get().

PR: https://github.com/llvm/llvm-project/pull/124644
2025-02-26 13:57:51 +00:00
Elvis Wang
8009c1fd81
[LV][VPlan] Prevent calculate cost for skiped instructions in precomputeCosts(). (#127966)
Skip calculating instruction costs for exit conditions in
precomputeCosts() when it should be skipped.

Reported from:
https://github.com/llvm/llvm-project/issues/115744#issuecomment-2670479463
Godbolt for reduced test cases: https://godbolt.org/z/fr4YMeqcv
2025-02-25 11:09:09 +08:00
Florian Hahn
52ded67249
[LAA] Always require non-wrapping pointers for runtime checks. (#127543)
Currently we only check if the pointers involved in runtime checks do
not wrap if we need to perform dependency checks. If that's not the
case, we generate runtime checks, even if the pointers may wrap (see
test/Analysis/LoopAccessAnalysis/runtime-checks-may-wrap.ll).

If the pointer wraps, then we swap start and end of the runtime check,
leading to incorrect checks.

An Alive2 proof of what the runtime checks are checking conceptually (on
i4 to have it complete in reasonable time) showing the incorrect result
should be https://alive2.llvm.org/ce/z/KsHzn8

Depends on https://github.com/llvm/llvm-project/pull/127410 to avoid
more regressions.

PR: https://github.com/llvm/llvm-project/pull/127543
2025-02-20 19:00:23 +01:00
Florian Hahn
04b5c63ddf [LV] Add inbounds to interleave test.
In preparation for https://github.com/llvm/llvm-project/pull/127543
2025-02-20 16:33:01 +01:00
Florian Hahn
e5f5517f91 [VPlan] Create IR basic block for middle.block in VPlan.
Create a IR BB directly for the middle.block, instead of creating the IR
BB during skeleton creation and then replacing the middle VPBB with a
VPIRBB.

This moves another part of skeleton creation to VPlan and simplififes
the code slightly by removing code to disconnect the middle block and
vector preheader + the corresponding DT update.

NFC modulo IR block naming and block creation order, which changes the
IR names for the blocks.
2025-02-15 21:54:16 +01:00
Florian Hahn
e258bca950
[VPlan] Only skip expansion for SCEVUnknown if it isn't an instruction. (#125235)
Update getOrCreateVPValueForSCEVExpr to only skip expansion of
SCEVUnknown if the underlying value isn't an instruction. Instructions
may be defined in a loop and using them without expansion may break
LCSSA form. SCEVExpander will take care of preserving LCSSA if needed.

We could also try to pass LoopInfo, but there are some users of the
function where it won't be available and main benefit from skipping
expansion is slightly more concise VPlans.

Note that SCEVExpander is now used to expand SCEVUnknown with floats.
Adjust the check in expandCodeFor to only check the types and casts if
the type of the value is different to the requested type. Otherwise we
crash when trying to expand a float and requesting a float type.

Fixes https://github.com/llvm/llvm-project/issues/121518.

PR: https://github.com/llvm/llvm-project/pull/125235
2025-02-11 13:03:12 +01:00
Simon Pilgrim
70906f0514 [LV][X86] Regenerate interleaved load/store costs. NFC.
update_analyze_test_checks has improved the checks since these were last updated.

Reduce noise diffs in future patches.
2025-02-09 15:02:41 +00:00
Florian Hahn
32c4493d5f
[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI).
Follow-up as discussed when using VPInstruction::ResumePhi for all resume
values (#112147). This patch explicitly adds incoming values for each
predecessor in VPlan. This simplifies codegen and allows transformations
adjusting the predecessors of blocks with

NFC modulo incoming block order in phis.
2025-02-09 11:20:20 +00:00
Florian Hahn
1611059f5d
[VPlan] Compute cost for binary op VPInstruction with underlying values. (#125434)
As exposed by https://github.com/llvm/llvm-project/pull/125094, we are
missing cost computation for some binary VPInstructions we created based
on original IR instructions. Their cost should be considered.

PR: https://github.com/llvm/llvm-project/pull/125434
2025-02-07 15:27:31 +00:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
David Sherwood
c836b8956d
[LoopVectorize][NFC] Disable output for tests that don't need it (#124747)
There are a lot of tests that do not depend upon the IR output
for validation, relying instead on the debug output. For these
tests we can add the -disable-output command line argument.
2025-01-29 08:09:50 +00:00
Florian Hahn
713482fccf [VPlan] Use State.get to extract lane mask for BranchOnMask.
Simplifies the code slightly and avoids redundant extracts/broadcasts if
the operand is live-in or already scalar.
2025-01-27 21:35:36 +00:00
David Sherwood
b7286dbef9
Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752" (#123616)
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
2025-01-27 11:59:38 +00:00
Florian Hahn
2c87133c62
Reapply "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts the revert commit 58326f1d5b5b379590af92dd129b2f3b3e96af46.

The build failure in sanitizer stage2 builds has been fixed with
0d39fe6f5bb3edf0bddec09a8c6417377390aeac.

Original commit message:
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform
(optimizeInductionExitUsers) replaces extracts of inductions with
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.

Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-19 19:32:03 +00:00
Florian Hahn
58326f1d5b
Revert "[VPlan] Update final IV exit value via VPlan. (#112147)"
This reverts commit c2d15ac4d4432788557e77c15ce572ac655a8fec.

Causes build failures on PPC stage2 & fuchsia bots
    https://lab.llvm.org/buildbot/#/builders/168/builds/7650
    https://lab.llvm.org/buildbot/#/builders/11/builds/11248
2025-01-18 13:40:33 +00:00
Florian Hahn
c2d15ac4d4
[VPlan] Update final IV exit value via VPlan. (#112147)
Model updating IV users directly in VPlan, replace fixupIVUsers.

Now simple extracts are created for all phis in the exit block during
initial VPlan construction. A later VPlan transform 
(optimizeInductionExitUsers) replaces extracts of inductions with 
their pre-computed values if possible.

This completes the transition towards modeling all live-outs directly in
VPlan.

There are a few follow-ups:
* emit extracts initially also for resume phis, and optimize them 
   tougher with IV exit users
* support for VPlans with multiple exits in optimizeInductionExitUsers.


Depends on https://github.com/llvm/llvm-project/pull/110004,
https://github.com/llvm/llvm-project/pull/109975 and
https://github.com/llvm/llvm-project/pull/112145.
2025-01-18 13:22:34 +00:00
David Sherwood
a00938eedd
Revert "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop (#96752)" (#123057)
This reverts commit bfedf6460c2cad6e6f966b457d8d27084579dcd8.
2025-01-15 13:56:42 +00:00
David Sherwood
bfedf6460c
[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop (#96752)
Currently when we encounter a negative step in the induction
variable isDereferenceableAndAlignedInLoop bails out because
the element size is signed greater than the step. This patch
adds support for negative steps in cases where we detect the
start address for the load is of the form base + offset. In
this case the address decrements in each iteration so we need
to calculate the access size differently. I have done this by
caling getStartAndEndForAccess from LoopAccessAnalysis.cpp.

The motivation for this patch comes from PR #88385 where a
reviewer requested reusing isDereferenceableAndAlignedInLoop,
but that PR itself does support reverse loops.

The changed test in LoopVectorize/X86/load-deref-pred.ll now
passes because previously we were calculating the total access
size incorrectly, whereas now it is 412 bytes and fits
perfectly into the alloca.
2025-01-15 12:47:43 +00:00
LiqinWeng
0294dab79e
[LV][VPlan] Add fast flags for selectRecipe (#121023)
Change the inheritance of class VPWidenSelectRecipe to class
VPRecipeWithIRFlags, which allows recipe of the select to pass the
fastmath flags.The patch of #119847 will add the fastmath flag to for
recipe
2025-01-15 10:10:11 +08:00
Florian Hahn
1de3dc7d23 [LV] Bail out early if BTC+1 wraps.
Currently we fail to detect the case where BTC + 1 wraps, i.e. the
vector trip count is 0, In those cases, the minimum iteration count
check will fail, and the vector code will never be executed.

Explicitly check for this condition in computeMaxVF and avoid trying to
vectorize alltogether.

Note that a number of tests needed to be updated, because the vector
loop would never be executed given the input IR.

Fixes https://github.com/llvm/llvm-project/issues/122558.
2025-01-14 22:07:38 +00:00
Florian Hahn
8df64ed777 [LV] Don't consider IV increments uniform if exit value is used outside.
In some cases, there might be a chain of uniform instructions producing
the exit value. To generate correct code in all cases, consider the IV
increment not uniform, if there are users outside the loop.

Instead, let VPlan narrow the IV, if possible using the logic from
3ff1d01985752.

Test case from #122602 verified with Alive2:
    https://alive2.llvm.org/ce/z/bA4EGj

Fixes https://github.com/llvm/llvm-project/issues/122496.
Fixes https://github.com/llvm/llvm-project/issues/122602.
2025-01-12 22:03:21 +00:00
Florian Hahn
3ff1d01985 Recommit "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 0ebb3ac7c92c4c1c44e7f3d17832d75ec5a42a67.

Re-applies commit with typos fixed.
2025-01-12 20:10:28 +00:00
Florian Hahn
0ebb3ac7c9 Revert "[VPlan] Try to narrow wide and replicating recipes to uniform recipes."
This reverts commit 1afba19913253dda865a8e57b37b9f4dabead1ac.

Typo breaking the build
2025-01-12 19:37:45 +00:00
Florian Hahn
1afba19913 [VPlan] Try to narrow wide and replicating recipes to uniform recipes.
Use the existing VPlan-based analysis to identify recipes that only have
their first lane demanded and transform them to uniform recpliate
recipes. This simplifies the generated code in some places and prepares
for fixing https://github.com/llvm/llvm-project/issues/122496.
2025-01-12 19:32:01 +00:00
Florian Hahn
44058e5b5f [LV] Precommit tests for #106441.
Tests for https://github.com/llvm/llvm-project/pull/106441
from https://github.com/llvm/llvm-project/issues/82936.
2025-01-10 18:49:44 +00:00
Florian Hahn
b0697dc1de
[LV] Only check isVectorizableEarlyExitLoop with multiple exits. (#121994)
Currently we emit early-exit related debug messages/remarks even when
there is a single exit. Update to only check isVectorizableEarlyExitLoop
if there isn't a single exit block.

PR: https://github.com/llvm/llvm-project/pull/121994
2025-01-09 12:05:19 +00:00
Luke Lau
f0d5104c94
[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058)
This just copies the same conservative definition from mayWriteToMemory,
and enables more VPInstructions to be hoisted out in LICM.

I think this should give more accurate costs, and I was able to build
llvm-test-suite without the legacy-vplan cost model assertion going off.
2025-01-08 15:17:26 +08:00