25 Commits

Author SHA1 Message Date
Florian Hahn
6d905e41bc
[SCEV] Use getConstantMultiple in to get divisibility info from guards. (#162617)
Simplify and generalize the code to get a common constant multiple for
expressions when collecting guards, replacing the manual implementation.

Split off from https://github.com/llvm/llvm-project/pull/160012.

PR: https://github.com/llvm/llvm-project/pull/162617
2025-10-09 10:51:36 +01:00
Florian Hahn
8907b6d393
[VPlan] Remove original loop blocks if dead. (#155497)
Build on top of https://github.com/llvm/llvm-project/pull/154510 to
completely remove the blocks of dead scalar loops.

Depends on https://github.com/llvm/llvm-project/pull/154510. 

PR: https://github.com/llvm/llvm-project/pull/155497
2025-10-01 16:53:59 +00:00
Florian Hahn
129c6836b7
[LV] Add test showing missed optimization due to missing info from guard
Add test for SCEVUMaxExpr handling in
https://github.com/llvm/llvm-project/pull/160012.
2025-09-22 20:58:52 +01:00
Florian Hahn
50b9ca4dda
[VPlan] Simplify Plan's entry in removeBranchOnConst. (#154510)
After https://github.com/llvm/llvm-project/pull/153643, there may be a
BranchOnCond with constant condition in the entry block.

Simplify those in removeBranchOnConst. This removes a number of
redundant conditional branch from entry blocks.

In some cases, it may also make the original scalar loop unreachable,
because we know it will never execute. In that case, we need to remove
the loop from LoopInfo, because all unreachable blocks may dominate each
other, making LoopInfo invalid. In those cases, we can also completely
remove the loop, for which I'll share a follow-up patch.

Depends on https://github.com/llvm/llvm-project/pull/153643.

PR: https://github.com/llvm/llvm-project/pull/154510
2025-09-18 19:25:05 +01:00
Florian Hahn
b16930204b
[LV] Add additional tests for reasoning about dereferenceable loads.
Includes a test for the crash exposed by 08001cf340185877.
2025-09-03 10:46:46 +01:00
Florian Hahn
876a2a9287
[LV] Add early-exit test needing loop guards to prove dereferenceable. 2025-08-27 13:33:15 +01:00
Florian Hahn
0abc8b07e5
[LV] Add early-exit test where the inner loop IV depends on outer loop. 2025-08-26 10:52:53 +01:00
Florian Hahn
f492eb9509
[VPlan] Make VPInstruction::AnyOf poison-safe. (#154156)
AnyOf reduces multiple input vectors to a single boolean value. When
used for early-exit vectorization, we need to consider any lane after
the early exit being poison. Any poison lane would result in poison
after the AnyOf reduction. To prevent this, freeze all inputs to AnyOf.

Fixes https://github.com/llvm/llvm-project/issues/153946.
Fixes https://github.com/llvm/llvm-project/issues/155162.

https://alive2.llvm.org/ce/z/FD-XxA

PR: https://github.com/llvm/llvm-project/pull/154156
2025-08-25 18:55:23 +01:00
Florian Hahn
351d398a37
[VPlan] Run final VPlan simplifications before codegen.
Dissolving the hierarchical VPlan CFG and converting abstract to
concrete recipes can expose additional simplification opportunities.

Do a final run of simplifyRecipes before executing the VPlan.
2025-08-16 18:54:27 +01:00
Florian Hahn
89ae085859
[VPlan] Remove VPVectorPointer for part 0 after unrolling. (#149735)
VPVectorPointer for part 0 is just the pointer operand. Simplify it
after unrolling. This removes a large number of redundant GEPs with
index 0.

PR: https://github.com/llvm/llvm-project/pull/149735
2025-07-27 13:53:26 +01:00
Florian Hahn
fa3ec0c17c
[VPlan] Materialize constant vector trip counts before final opts. (#142309)
Materialize constant vector trip counts before ::execute, if the trip
count can be computed as Original (TC / (VF * UF)) * (VF * UF). For now
this excludes when the tail is folded or scalar epilogues are required.

This enables removing a number of redundant branches from the middle
block.

For now this is also only done when not vectorizing the epilogue, as the
simplification complicates stitching the 2 plans together.

PR: https://github.com/llvm/llvm-project/pull/142309
2025-07-26 17:16:36 +01:00
David Sherwood
bf2b14acf3
[LV] Enable auto-vectorisation of loops with uncountable exits (#133099)
Until now the feature to enable vectorisation of some early exit
loops with uncountable exits was controlled under a flag, off by
default. Now that we have efficient code generation for
vectorising such loops (see PR #130766) and we still have some
time from the next LLVM release it seems like a good time point
to enable the feature by default. If any issues arise post-commit
it can be easily reverted.

Using this patch I built and ran the LLVM test suite successfully,
which on neoverse-v1 led to the vectorisation of 114 additional
early exit loops. I also built and ran SPEC2017 successfully for
both neoverse-v1 and neoverse-v2.
2025-06-27 10:39:33 +01:00
Florian Hahn
043b04acff
Reapply "[VPlan] Fold NOT into predicate of wide compares." (#130347)
This reverts commit 8dd160f4767f971572eac065c8650d9202ff5bf9.

The recommit contains an adjustment to planContainsAdditionalSimplifications,
which considers changes to the original predicate for compares.

Original commit message:

Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.

Alive2 Proofs for FPCMP test changes:  https://alive2.llvm.org/ce/z/WGDz9U

PR: https://github.com/llvm/llvm-project/pull/129430
2025-04-28 20:01:37 +01:00
Florian Hahn
dfca6c0d3b
[VPlan] Remove no-op SCALAR-STEPS after unrolling. (#123655)
After unrolling, there may be additional simplifications that can be
applied. One example is removing SCALAR-STEPS for the first part where
only the first lane is demanded.

This removes redundant adds of 0 from a large number of tests (~200),
many which I am still working on updating.

In preparation for removing redundant WideIV steps added in
https://github.com/llvm/llvm-project/pull/119284.

PR: https://github.com/llvm/llvm-project/pull/123655
2025-03-25 12:57:24 +00:00
Florian Hahn
8dd160f476
Revert "[VPlan] Fold NOT into predicate of wide compares." (#130347)
Reverts llvm/llvm-project#129430

this seems to have introduced a divergence between legacy and
VPlan-based cost model

https://lab.llvm.org/buildbot/#/builders/30/builds/17159
2025-03-07 21:18:49 +00:00
Florian Hahn
cb3ce30ca8
[VPlan] Fold NOT into predicate of wide compares. (#129430)
Add simplification to fold negation into a compare, if the negation is
the only user of the compare. This removes a number of redundant
negations.

Alive2 Proofs for FPCMP test changes:  https://alive2.llvm.org/ce/z/WGDz9U

PR: https://github.com/llvm/llvm-project/pull/129430
2025-03-07 20:32:43 +00:00
Florian Hahn
e5f5517f91 [VPlan] Create IR basic block for middle.block in VPlan.
Create a IR BB directly for the middle.block, instead of creating the IR
BB during skeleton creation and then replacing the middle VPBB with a
VPIRBB.

This moves another part of skeleton creation to VPlan and simplififes
the code slightly by removing code to disconnect the middle block and
vector preheader + the corresponding DT update.

NFC modulo IR block naming and block creation order, which changes the
IR names for the blocks.
2025-02-15 21:54:16 +01:00
Florian Hahn
32c4493d5f
[VPlan] Add incoming values for all predecessor to ResumePHI (NFCI).
Follow-up as discussed when using VPInstruction::ResumePhi for all resume
values (#112147). This patch explicitly adds incoming values for each
predecessor in VPlan. This simplifies codegen and allows transformations
adjusting the predecessors of blocks with

NFC modulo incoming block order in phis.
2025-02-09 11:20:20 +00:00
David Sherwood
3bc2dade36
[LoopVectorize] Enable vectorisation of early exit loops with live-outs (#120567)
This work feeds part of PR
https://github.com/llvm/llvm-project/pull/88385, and adds support for
vectorising
loops with uncountable early exits and outside users of loop-defined
variables. When calculating the final value from an uncountable early
exit we need to calculate the vector lane that triggered the exit,
and hence determine the value at the point we exited.

All code for calculating the last value when exiting the loop early
now lives in a new vector.early.exit block, which sits between the
middle.split block and the original exit block. Doing this required
two fixes:

1. The vplan verifier incorrectly assumed that the block containing
a definition always dominates the block of the user. That's not true
if you can arrive at the use block from multiple incoming blocks.
This is possible for early exit loops where both the early exit and
the latch jump to the same block.
2. We were adding the new vector.early.exit to the wrong parent loop.
It needs to have the same parent as the actual early exit block from
the original loop.

I've added a new ExtractFirstActive VPInstruction that extracts the
first active lane of a vector, i.e. the lane of the vector predicate
that triggered the exit.

NOTE: The IR generated for dealing with live-outs from early exit
loops is unoptimised, as opposed to normal loops. This inevitably
leads to poor quality code, but this can be fixed up later.
2025-01-30 10:37:00 +00:00
David Sherwood
776ef9d1be
[LoopVectorize][NFC] Regenerate some early exit test CHECK lines (#124900) 2025-01-29 09:48:55 +00:00
Luke Lau
f0d5104c94
[VPlan] Handle some VPInstructions in may{Read,Write}FromMemory (#120058)
This just copies the same conservative definition from mayWriteToMemory,
and enables more VPInstructions to be hoisted out in LICM.

I think this should give more accurate costs, and I was able to build
llvm-test-suite without the legacy-vplan cost model assertion going off.
2025-01-08 15:17:26 +08:00
Florian Hahn
4ad0fdd163
[VPlan] Remove reverse() of predecessors from VPInstruction::generate.
This was originally done to reduce the diff for the change. Remove it
and update the remaining tests. NFC modulo reordering of incoming
values.

Clean up after https://github.com/llvm/llvm-project/pull/114292.
2024-12-17 20:44:32 +00:00
Florian Hahn
2564f1e199
[VPlan] Simplify Not(Not(A)) -> A.
Follow-up simplification to 5fae408d3a4c073ee4.
2024-12-14 20:08:26 +00:00
Florian Hahn
5fae408d3a
[VPlan] Dispatch to multiple exit blocks via middle blocks. (#112138)
A more lightweight variant of
https://github.com/llvm/llvm-project/pull/109193,
which dispatches to multiple exit blocks via the middle blocks.

The patch also introduces a bit of required scaffolding to enable
early-exit vectorization, including an option. At the moment, early-exit
vectorization doesn't come with legality checks, and is only used if the
option is provided and the loop has metadata forcing vectorization. This
is only intended to be used for testing during bring-up, with @david-arm
enabling auto early-exit vectorization plugging in the changes from
https://github.com/llvm/llvm-project/pull/88385.

PR: https://github.com/llvm/llvm-project/pull/112138
2024-12-11 21:11:05 +00:00
David Sherwood
76f3776185
[NFC][LoopVectorize] Restructure simple early exit tests (#112721)
The previous simple_early_exit.ll was growing too large and difficult to
manage. Instead I've decided to refactor the tests by splitting out into
notional groups:

1. single_early_exit.ll: loops with a single uncountable exit that do
not have live-outs from the loop.
2. single_early_exit_live_outs.ll: loops with a single uncountable exit
with live-outs.
3. multi_early_exit.ll: loops with multiple early exits, i.e. a mixture
of countable and uncountable exits, but with no live-outs from the loop.
4. multi_early_exit_live_outs.ll: as above, but with live-outs.
5. single_early_exit_unsafe_ptrs.ll: loops with a single uncountable
exit, but with pointers that are not unconditionally dereferenceable.
6. unsupported_early_exit.ll: loops with uncountable exits that we
cannot yet vectorise.
7. early_exit_legality.ll: tests the debug output from
LoopVectorizationLegality to make sure we handle different scenarios
correctly.

Only the last test now requires asserts. Over time some of these tests
should start vectorising as more support is added.

I also tried to rename the multi early exit tests to make it clear there
what mixture of countable and uncountable exits are present.
2024-10-17 16:50:59 +01:00