692 Commits

Author SHA1 Message Date
Mingjie Xu
159f1c048e
[IR] Optimize PHINode::removeIncomingValue() by swapping removed incoming value with the last incoming value. (#171963)
Current implementation uses `std::copy` to shift all incoming values
after the removed index. This patch optimizes
`PHINode::removeIncomingValue()` by replacing the linear shift of
incoming values with a swap-with-last strategy.

After this change, the relative order of incoming values after removal
is not preserved.

This improves compile-time for PHI nodes with many predecessors.

Depends:
https://github.com/llvm/llvm-project/pull/171955
https://github.com/llvm/llvm-project/pull/171956
https://github.com/llvm/llvm-project/pull/171960
https://github.com/llvm/llvm-project/pull/171962
2025-12-17 19:44:01 +08:00
Craig Topper
ef21740781
[LoopPeel] Check for onlyAccessesInaccessibleMemory instead of llvm.assume in peelToTurnInvariantLoadsDereferenceable. (#171910)
onlyAccessesInaccessibleMemory can't alias with a load. This allows us
to ignore more intrinsics than llvm.assume.

Follow up from #171547
2025-12-12 10:45:41 -08:00
Craig Topper
ccc3835ffa
[LoopPeel] Ignore assume intrinsics for the mayWriteToMemory check in peelToTurnInvariantLoadsDereferenceable. (#171547)
llvm.assume intrinsics have the mayWriteToMemory property, but
won't prevent the load from becoming dereferenceable.
2025-12-10 13:14:19 -08:00
Pengcheng Wang
a0b6638c85
[RISCV] Don't unroll vectorized loops with vector operands (#171089)
We have disabled unrolling for vectorized loops in #151525 but this
PR only checked the instruction type.

For some loops, there is no instruction with vector type but they
are still vector operations (just like the memset zero test in the
precommit test).

Here we check the operands as well to cover these cases.
2025-12-09 12:42:41 +08:00
Pengcheng Wang
893479adcc [RISCV] Precommit test for unrolling loops with vector operands 2025-12-09 11:51:33 +08:00
Florian Hahn
7470d721c6
[AArch64] Add isAppleMLike helper to check for M cores and aligned CPUs. (#170553)
Add a new isAppleMLike helper, that returns true if the core is part of
the Apple M core family or Apple A14 or later. Used to apply cost
decisions consistently to those groups of cores.

The function is now a single place to update when new cores are added.
It also makes sure we apply unrolling decisions for newer Apple cores to
Apple A17.

PR: https://github.com/llvm/llvm-project/pull/170553
2025-12-05 20:05:29 +00:00
Florian Hahn
c5e6f4e99d
[AArch64] Add unrolling test with -mcpu=apple-a17.
Currently Apple unrolling preferences are not applied to apple-a17.
2025-12-03 20:15:58 +00:00
Philip Reames
c752bb9203
[IndVars] Strengthen inference of samesign flags (#170363)
When reviewing another change, I noticed that we were failing to infer
samsign for two cases: 1) an unsigned comparison, and 2) when both
arguments were known negative.

Using CVP and InstCombine as a reference, we need to be careful to not
allow eq/ne comparisons. I'm a bit unclear on the why of that, and for
now am going with the low risk change. I may return to investigate that
in a follow up.

Compile time results look like noise to me, see:
https://llvm-compile-time-tracker.com/compare.php?from=49a978712893fcf9e5f40ac488315d029cf15d3d&to=2ddb263604fd7d538e09dc1f805ebc30eb3ffab0&stat=instructions:u
2025-12-03 16:16:22 +00:00
Philip Reames
49a9787128 [SCEV] Regenerate a subset of auto updated tests
Reducing spurious diff in an upcoming change.
2025-12-02 12:16:53 -08:00
Julian Nagele
b641509637
[LoopUnroll] Introduce parallel accumulators when unrolling FP reductions. (#166630)
This is building on top of
https://github.com/llvm/llvm-project/pull/149470, also introducing
parallel accumulator PHIs when the reduction is for floating points,
provided we have the reassoc flag. See also
https://github.com/llvm/llvm-project/pull/166353, which aims to
introduce parallel accumulators for reductions with vector instructions.
2025-11-27 15:03:36 +00:00
Julian Nagele
c73de9777e
[IVDesciptors] Support detecting reductions with vector instructions. (#166353)
In combination with https://github.com/llvm/llvm-project/pull/149470
this will introduce parallel accumulators when unrolling reductions with
vector instructions. See also
https://github.com/llvm/llvm-project/pull/166630, which aims to
introduce parallel accumulators for FP reductions.
2025-11-24 11:12:06 +00:00
Joel E. Denny
21fedcbf89
[LoopPeel] Fix BFI when peeling last iteration without guard (#168250)
LoopPeel sometimes proves that, when reached, the original loop always
executes at least two iterations. LoopPeel then unconditionally executes
both the remaining loop's initial iteration and the peeled final
iteration. But that increases the latter's frequency above its frequency
in the original loop. To maintain the total frequency, this patch
compensates by decreasing the remaininng loop's latch probability.

This is another step in issue #135812 and was discussed at
<https://github.com/llvm/llvm-project/pull/166858#discussion_r2528968542>.
2025-11-20 10:45:53 -05:00
Vladi Krapp
42a1184e42
[AArch64] Allow forcing unrolling of small loops (#167488)
- Introduce the -aarch64-force-unroll-threshold option; when a loop’s
cost is below this value we set UP.Force = true (default 0 keeps current
behaviour)
- Add an AArch64 loop-unroll regression test that runs once at the
default threshold and once with the flag raised, confirming forced
unrolling
2025-11-17 08:59:44 +00:00
Mircea Trofin
358e9a56af
[LP] Assign weights when peeling last iteration. (#166858) 2025-11-15 10:01:04 -08:00
Joel E. Denny
1aa86ca521
[LoopUnroll] Fix division by zero (#166258)
PR #159163's probability computation for epilogue loops does not handle
the possibility of an original loop probability of one. Runtime loop
unrolling does not make sense for such an infinite loop, and a division
by zero results. This patch works around that case.

Issue #165998.
2025-11-04 12:49:33 -05:00
Ivan Kelarev
37825ad4f6
[LoopUnroll] Prevent LoopFullUnrollPass from performing partial unrolling when trip counts are unknown (#165013)
Currently, `LoopFullUnrollPass` incorrectly performs partial unrolling
when `#pragma unroll` is specified and both `TripCount` and
`MaxTripCount` are unknown. This patch adds a check to prevent partial
unrolling when `OnlyFullUnroll` parameter is true and both trip count
values are zero.
2025-11-04 09:20:01 -08:00
Joel E. Denny
bb9bd5f263
[LoopUnroll] Fix assert fail on zeroed branch weights (#165938)
BranchProbability fails an assert when its denominator is zero.

Reported at
<https://github.com/llvm/llvm-project/pull/159163#pullrequestreview-3406318423>.
2025-11-03 10:19:12 -05:00
Joel E. Denny
cc8ff73fba
[LoopUnroll] Fix block frequencies for epilogue (#159163)
As another step in issue #135812, this patch fixes block frequencies for
partial loop unrolling with an epilogue remainder loop. It does not
fully handle the case when the epilogue loop itself is unrolled. That
will be handled in the next patch.

For the guard and latch of each of the unrolled loop and epilogue loop,
this patch sets branch weights derived directly from the original loop
latch branch weights. The total frequency of the original loop body,
summed across all its occurrences in the unrolled loop and epilogue
loop, is the same as in the original loop. This patch also sets
`llvm.loop.estimated_trip_count` for the epilogue loop instead of
relying on the epilogue's latch branch weights to imply it.

This patch fixes branch weights in tests that PR #157754 adversely
affected.
2025-10-31 11:01:42 -04:00
Joel E. Denny
24557cce40
[LoopUnroll] Fix block frequencies when no runtime (#157754)
This patch implements the LoopUnroll changes discussed in [[RFC] Fix
Loop Transformations to Preserve Block

Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785)
and is thus another step in addressing issue #135812.

In summary, for the case of partial loop unrolling without a remainder
loop, this patch changes LoopUnroll to:

- Maintain branch weights consistently with the original loop for the
sake of preserving the total frequency of the original loop body.
- Store the new estimated trip count in the
`llvm.loop.estimated_trip_count` metadata, introduced by PR #148758.
- Correct the new estimated trip count (e.g., 3 instead of 2) when the
original estimated trip count (e.g., 10) divided by the unroll count
(e.g., 4) leaves a remainder (e.g., 2).

There are loop unrolling cases this patch does not fully fix, such as
partial unrolling with a remainder loop and complete unrolling, and
there are two associated tests whose branch weights this patch adversely
affects. They will be addressed in future patches that should land with
this patch.
2025-10-31 10:44:27 -04:00
Joel E. Denny
8d186e2195
[LoopUnroll][NFCI] Clean up remainder followup metadata handling (#165272)
Followup metadata for remainder loops is handled by two implementations,
both added by 7244852557ca6:

1. `tryToUnrollLoop` in `LoopUnrollPass.cpp`.
2. `CloneLoopBlocks` in `LoopUnrollRuntime.cpp`.

As far as I can tell, 2 is useless: I added `assert(!NewLoopID)` for the
`NewLoopID` returned by the `makeFollowupLoopID` call, and it never
fails throughout check-all for my build.

Moreover, if 2 were useful, it appears it would have a bug caused by
7cd826a321d9. That commit skips adding loop metadata to a new remainder
loop if the remainder loop itself is to be completely unrolled because
it will then no longer be a loop. However, that commit incorrectly
assumes that `UnrollRemainder` dictates complete unrolling of a
remainder loop, and thus it skips adding loop metadata even if the
remainder loop will be only partially unrolled.

To avoid further confusion here, this patch removes 2. check-all
continues to pass for my build. If 2 actually is useful, please advise
so we can create a test that covers that usage.

Near 2, this patch retains the `UnrollRemainder` guard on the
`setLoopAlreadyUnrolled` call, which adds `llvm.loop.unroll.disable` to
the remainder loop. That behavior exists both before and after
7cd826a321d9. The logic appears to be that remainder loop unrolling
(whether complete or partial) is opt-in. That is, unless
`UnrollRemainder` is true, `UnrollRuntimeLoopRemainder` skips running
remainder loop unrolling, and `llvm.loop.unroll.disable` suppresses any
later attempt at it.

This patch also extends testing of remainder loop followup metadata to
be sure remainder loop partial unrolling is handled correctly by 1.
2025-10-30 10:57:27 -04:00
paperchalice
249883d0c5
[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786)
Post cleanup for #164534.
2025-10-23 20:31:31 +08:00
Nikita Popov
573ca36753
[IR] Replace alignment argument with attribute on masked intrinsics (#163802)
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.

This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)

It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
2025-10-20 08:50:09 +00:00
Florian Hahn
2d027260b0
[SCEV] Collect guard info for ICMP NE w/o constants. (#160500)
When collecting information from loop guards, use UMax(1, %b - %a) for
ICMP NE %a, %b, if neither are constant.

This improves results in some cases, and will be even more useful
together with
 * https://github.com/llvm/llvm-project/pull/160012
 * https://github.com/llvm/llvm-project/pull/159942

https://alive2.llvm.org/ce/z/YyBvoT

PR: https://github.com/llvm/llvm-project/pull/160500
2025-10-14 14:20:34 +00:00
Joel E. Denny
6d44b9082e
[LoopUnroll] Skip remainder loop guard if skip unrolled loop (#156549)
The original loop (OL) that serves as input to LoopUnroll has basic
blocks that are arranged as follows:

```
OLPreHeader
OLHeader <-.
...        |
OLLatch ---'
OLExit
```

In this depiction, every block has an implicit edge to the next block
below, so any explicit edge indicates a conditional branch.

Given OL and unroll count N, LoopUnroll sometimes creates an unrolled
loop (UL) with a remainder loop (RL) epilogue arranged like this:

```
,-- ULGuard
|   ULPreHeader
|   ULHeader <-.
|   ...        |
|   ULLatch ---'
|   ULExit
`-> RLGuard -----.
    RLPreHeader  |
,-> RLHeader     |
|   ...          |
`-- RLLatch      |
    RLExit       |
    OLExit <-----'
```

Each UL iteration executes N OL iterations, but each RL iteration
executes 1 OL iteration. ULGuard or RLGuard checks whether the first
iteration of UL or RL should execute, respectively. If so, ULLatch or
RLLatch checks whether to execute each subsequent iteration.

Once reached, OL always executes its first iteration but not necessarily
the next N-1 iterations. Thus, ULGuard is always required before the
first UL iteration. However, when control flows from ULGuard directly to
RLGuard, the first OL iteration has yet to execute, so RLGuard is then
redundant before the first RL iteration.

Thus, this patch makes the following changes:
- Adjust ULGuard to branch to RLPreHeader instead of RLGuard, thus
eliminating RLGuard's unnecessary branch instruction for that path.
- Eliminate the creation of RLGuard phi node poison values. Without this
patch, RLGuard has such a phi node for each value that is defined by any
OL iteration and used in OLExit. The poison value is required where
ULGuard is the predecessor. The poison value indicates that control flow
from ULGuard to RLGuard to Exit has no counterpart in OL because the
first OL iteration must execute either in UL or RL.
- Simplify the CFG by not splitting ULExit and RLGuard because, without
the ULGuard predecessor, the single block can now be a dedicated UL
exit.
- To RLPreHeader, add an `llvm.assume` call that asserts the RL trip
count is non-zero. Without this patch, RLPreHeader is reachable only
when RLGuard guarantees that assertion is true. With this patch, RLGuard
guarantees it only when RLGuard is the predecessor, and the OL structure
guarantees it when ULGuard is the predecessor. If RL itself is unrolled
later, this guarantee somehow prevents ScalarEvolution from giving up
when trying to compute a maximum trip count for RL. That maximum trip
count enables the branch instruction in the final unrolled instance of
RLLatch to be eliminated. Without the `llvm.assume` call, some existing
unroll tests start to fail because that instruction is not eliminated.

The original motivation for this patch is to facilitate later patches
that fix LoopUnroll's computation of branch weights so that they
maintain the block frequency of OL's body (see #135812). Specifically,
this patch ensures RLGuard's branch weights do not affect RL's
contribution to the block frequency of OL's body in the case that
ULGuard skips UL.
2025-10-07 10:45:49 -04:00
Joel E. Denny
afb262855e
[LoopPeel] Fix branch weights' effect on block frequencies (#128785)
[LoopPeel] Fix branch weights' effect on block frequencies

This patch implements the LoopPeel changes discussed in [[RFC] Fix Loop
Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).

In summary, a loop's latch block can have branch weight metadata that
encodes an estimated trip count that is derived from application profile
data. Initially, the loop body's block frequencies agree with the
estimated trip count, as expected. However, sometimes loop
transformations adjust those branch weights in a way that correctly
maintains the estimated trip count but that corrupts the block
frequencies. This patch addresses that problem in LoopPeel, which it
changes to:

- Maintain branch weights consistently with the original loop for the
sake of preserving the total frequency of the original loop body.
- Store the new estimated trip count in the
`llvm.loop.estimated_trip_count` metadata, introduced by PR #148758.
2025-10-02 16:07:55 +00:00
Florian Hahn
3c4f611791
[LoopPeel] Add test with branch that can be simplified with guards.
Add test where a branch can be removed after peeling by applying info
from loop guards. It unfortunately requires running IndVars first, to
strengthen flags of the induction.
2025-09-24 11:51:55 +01:00
Florian Hahn
8693ef16f6
[SCEV] Add tests that benefit from rewriting SCEVAddExpr with guards.
Add additional tests benefiting from rewriting existing SCEVAddExprs with
guards.
2025-09-20 19:24:19 +01:00
Florian Hahn
3ea089ba19
[AArch64] Enable RT and partial unrolling with reductions for Apple CPUs. (#149699)
Update unrolling preferences for Apple Silicon CPUs to enable partial
unrolling and runtime unrolling for small loops with reductions.

This builds on top of unroller changes to introduce parallel reduction
phis, if possible: https://github.com/llvm/llvm-project/pull/149470.

PR: https://github.com/llvm/llvm-project/pull/149699
2025-09-09 13:23:30 +00:00
Florian Hahn
2d9e452ab0
[LoopUnroll] Introduce parallel reduction phis when unrolling. (#149470)
When partially or runtime unrolling loops with reductions, currently the
reductions are performed in-order in the loop, negating most benefits
from unrolling such loops.

This patch extends unrolling code-gen to keep a parallel reduction phi
per unrolled iteration and combining the final result after the loop.
For out-of-order CPUs, this allows executing mutliple reduction chains
in parallel.

For now, the initial transformation is restricted to cases where we
unroll a small number of iterations (hard-coded to 4, but should maybe
be capped by TTI depending on the execution units), to avoid introducing
an excessive amount of parallel phis.

It also requires single block loops for now, where the unrolled
iterations are known to not exit the loop (either due to runtime
unrolling or partial unrolling). This ensures that the unrolled loop
will have a single basic block, with a single exit block where we can
place the final reduction value computation.

The initial implementation also only supports parallelizing loops with a
single reduction and only integer reductions. Those restrictions are
just to keep the initial implementation simpler, and can easily be
lifted as follow-ups.

With corresponding TTI to the AArch64 unrolling preferences which I will
also share soon, this triggers in ~300 loops across a wide range of
workloads, including LLVM itself, ffmgep, av1aom, sqlite, blender,
brotli, zstd and more.

PR: https://github.com/llvm/llvm-project/pull/149470
2025-09-04 20:54:09 +01:00
Ryotaro Kasuga
2330fd2f73
[LoopPeel] Add new option to peeling loops to convert PHI into IV (#121104)
LoopPeel currently considers PHI nodes that become loop invariants
through peeling. However, in some cases, peeling transforms PHI nodes
into induction variables (IVs), potentially enabling further
optimizations such as loop vectorization. For example:

```c
// TSVC s292
int im = N-1;
for (int i=0; i<N; i++) {
  a[i] = b[i] + b[im];
  im = i;
}
```

In this case, peeling one iteration converts `im` into an IV, allowing
it to be handled by the loop vectorizer.

This patch adds a new feature to peel loops when to convert PHIs into
IVs. At the moment this feature is disabled by default.

Enabling it allows to vectorize the above example. I have measured on
neoverse-v2 and observed a speedup of more than 60% (options: `-O3
-ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`).

This PR is taken over from #94900
Related #81851
2025-08-20 13:44:56 +00:00
Ahmad Yasin
1f2fb8e979
[AArch64] Tune unrolling prefs for more patterns on Apple CPUs (#149358)
Enhance the heuristics in `getAppleRuntimeUnrollPreferences` to let a
bit more loops to be unrolled.

Specifically, this patch adjusts two checks:
I. Tune the loop size budget from 8 to 10
II. Include immediate in-loop users of loaded values in the load/stores
dependencies predicate

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>

PR: https://github.com/llvm/llvm-project/pull/149358
2025-08-13 11:16:54 +01:00
Florian Hahn
d10dc67fc3
[LoopUnroll] Add additional reduction unroll tests for #149470.
Add additional tests from https://github.com/llvm/llvm-project/pull/149470.
2025-08-01 15:06:22 +01:00
Ramkumar Ramachandra
fd175fafa6
[RISCV] Adjust unroll prefs for loops with vectors (#151525)
Adjust the unrolling preferences to unroll hand-vectorized code, as well
as the scalar remainder of a vectorized loop. Inspired by a similar
effort in AArch64: see #147420 and #151164.
2025-07-31 21:11:56 +01:00
Joel E. Denny
37e03b56b8
Revert "[PGO] Add llvm.loop.estimated_trip_count metadata" (#151585)
Reverts llvm/llvm-project#148758

[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31 15:56:31 -04:00
Joel E. Denny
f7b65011de
[PGO] Add llvm.loop.estimated_trip_count metadata (#148758)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.

An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
2025-07-31 12:28:25 -04:00
John Brawn
9a9b8b7d1c
[AArch64] Allow unrolling of scalar epilogue loops (#151164)
#147420 changed the unrolling preferences to permit unrolling of
non-auto vectorized loops by checking for the isvectorized attribute,
however when a loop is vectorized this attribute is put on both the
vector loop and the scalar epilogue, so this change prevented the scalar
epilogue from being unrolled.

Restore the previous behaviour of unrolling the scalar epilogue by
checking both for the isvectorized attribute and vector instructions in
the loop.
2025-07-31 11:03:41 +01:00
Florian Hahn
f9f68af4b8
[SCEV] Make sure LCSSA is preserved when re-using phi if needed.
If we insert a new add instruction, it may introduce a new use outside
the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to
fix LCSSA form, if needed.

This fixes a crash reported in
https://github.com/llvm/llvm-project/pull/147824#issuecomment-3124670997.
2025-07-28 16:24:46 +01:00
Florian Hahn
90f733ce6e
[LoopUnroll] Add tests for unrolling loops with reductions.
Add tests for unrolling loops with reductions. In some cases, multiple
parallel reduction phis could be retained to improve performance.
2025-07-18 07:39:28 +01:00
Ahmad Yasin
671072e830
[AArch64] Unrolling of loops with vector instructions. (#147420)
This patch permits loops with vector instructions to be unrolled.

Today there is an early exit in `getUnrollingPreferences()` of AArch64
targets if a vector instruction is observed in any of the loop blocks.
This patch fixes that so common loops like this one get a chance to be
unrolled:

void saxpy (float * dst, const float * src, const float a, const int
len) {
        float32x4_t * vdst = (float32x4_t *)dst;
        float32x4_t * vsrc = (float32x4_t *)src;
        float32x4_t vk = vdupq_n_f32(a);
        for (int i = 0; i < (len >> 2); i++)
        {
            vdst[i] = vaddq_f32(vdst[i], vmulq_f32(vsrc[i], vk));
        }
    }

Auto-vectorized loops are still not unrolled, unless they were not
interleaved when vectorized.

The provided test case shows the enhancement on top of runtime/partial
unrolling, depending on the CPU.

PR: https://github.com/llvm/llvm-project/pull/147420
2025-07-14 20:53:09 +01:00
macurtis-amd
cff4a00d3f
AMDGPU: Fix runtime unrolling when cascaded GEPs present (#147700)
Cascaded GEP (i.e. GEP of GEP) are not handled when determining if it is
ok to runtime unroll loops.

This change simply uses `getUnderlyingObjects` to look through cascaded
GEPs.
2025-07-10 03:44:04 -05:00
Philip Reames
bb288de4e0
[LoopPeel] Support last iteration peeling of min/max intrinsics (#143598)
This isn't terribly useful at the moment because of the step=1
restriction but it should be functionally sound. This is mostly just
making sure the codepaths don't diverge as we make other changes.
2025-06-17 11:22:23 -07:00
Philip Reames
719e7bea8a [LoopPeel] Add tests for last iteration peeling of min/max intrinsics 2025-06-10 13:08:36 -07:00
Philip Reames
4e706adc5e [LoopPeel] Add test coverage for edge case for peel last
Add coverage for two cases:
1) Handling of the two transition edge case with equality conditions
   when last iteration is both first and second transition.
2) Need to handle inverted predicates
2025-06-10 11:46:06 -07:00
Florian Hahn
e5ff7055be
[LoopPeel] Use loop guards when checking if last iter can be peeled. (#142605)
Apply loop guards to BTC before checking if the last iteration should be
peeled off. This also adds an assert to make sure applying the guards
does not pessimize the results. I checked on a large test set and it did
not trigger there, but it adds an additional guard to catch potential
cases where loop-guards pessimize results.

Peels ~15% more loops.

PR: https://github.com/llvm/llvm-project/pull/142605
2025-06-10 08:29:42 +01:00
Yingwei Zheng
4eac8daa38
[LoopPeel] Handle non-local instructions/arguments when updating exiting values (#142993)
Similar to
7e14161f49,
the exiting value may be a non-local instruction or an argument.

Closes https://github.com/llvm/llvm-project/issues/142895.
2025-06-06 12:56:28 +08:00
Florian Hahn
3a8b48862a
[LoopPeel] Add tests for peeling last iteration with loop guards.
Add additional test coverage for peeling the last iteration where
information from loop guards is needed.
2025-06-03 14:29:44 +01:00
Florian Hahn
f98bdd94e6
Reapply "[LoopPeel] Remove known trip count restriction when peeling last. (#140792)"
This reverts commit 580454526b936f7a576ddbc9bb932cf9be376ec4.

The recommitted version contains an extra check to not peel if the
latch exit is controlled by a pointer induction.

Original message:
Remove the restriction that the loop must be known to execute at least 2
iterations when peeling the last iteration. If we cannot prove at least
2 iterations are executed, a check and branch to skip the peeled loop is
inserted.

PR: https://github.com/llvm/llvm-project/pull/140792
2025-05-28 13:02:03 +01:00
Florian Hahn
f0f666bc32
[LoopPeel] Add peeling tests with debug value and pointer inductions
Adds extra test coverage for https://github.com/llvm/llvm-project/pull/140792.
2025-05-28 10:07:02 +01:00
Florian Hahn
580454526b
Revert "[LoopPeel] Remove known trip count restriction when peeling last. (#140792)"
This reverts commit 24b97756decb7bf0e26dcf0e30a7a9aaf27f417c.
Also reverts ac9a466e39bf97ffeab127982aa7c405cb257551.

Building CMake triggers a crash with the patch, revert while I
investigate.
2025-05-27 21:25:32 +01:00
Florian Hahn
ac9a466e39
[LoopPeel] Insert new phis before first non-PHI when peeling last iter.
Make sure the new phis are inserted before any non-phi instructions.
This fixes a crash when dbg_value instructions are present in the
original exit block.
2025-05-27 10:46:28 +01:00