llvm-project

Author	SHA1	Message	Date
Mingjie Xu	159f1c048e	[IR] Optimize PHINode::removeIncomingValue() by swapping removed incoming value with the last incoming value. (#171963 ) Current implementation uses `std::copy` to shift all incoming values after the removed index. This patch optimizes `PHINode::removeIncomingValue()` by replacing the linear shift of incoming values with a swap-with-last strategy. After this change, the relative order of incoming values after removal is not preserved. This improves compile-time for PHI nodes with many predecessors. Depends: https://github.com/llvm/llvm-project/pull/171955 https://github.com/llvm/llvm-project/pull/171956 https://github.com/llvm/llvm-project/pull/171960 https://github.com/llvm/llvm-project/pull/171962	2025-12-17 19:44:01 +08:00
Craig Topper	ef21740781	[LoopPeel] Check for onlyAccessesInaccessibleMemory instead of llvm.assume in peelToTurnInvariantLoadsDereferenceable. (#171910 ) onlyAccessesInaccessibleMemory can't alias with a load. This allows us to ignore more intrinsics than llvm.assume. Follow up from #171547	2025-12-12 10:45:41 -08:00
Craig Topper	ccc3835ffa	[LoopPeel] Ignore assume intrinsics for the mayWriteToMemory check in peelToTurnInvariantLoadsDereferenceable. (#171547 ) llvm.assume intrinsics have the mayWriteToMemory property, but won't prevent the load from becoming dereferenceable.	2025-12-10 13:14:19 -08:00
Pengcheng Wang	a0b6638c85	[RISCV] Don't unroll vectorized loops with vector operands (#171089 ) We have disabled unrolling for vectorized loops in #151525 but this PR only checked the instruction type. For some loops, there is no instruction with vector type but they are still vector operations (just like the memset zero test in the precommit test). Here we check the operands as well to cover these cases.	2025-12-09 12:42:41 +08:00
Pengcheng Wang	893479adcc	[RISCV] Precommit test for unrolling loops with vector operands	2025-12-09 11:51:33 +08:00
Florian Hahn	7470d721c6	[AArch64] Add isAppleMLike helper to check for M cores and aligned CPUs. (#170553 ) Add a new isAppleMLike helper, that returns true if the core is part of the Apple M core family or Apple A14 or later. Used to apply cost decisions consistently to those groups of cores. The function is now a single place to update when new cores are added. It also makes sure we apply unrolling decisions for newer Apple cores to Apple A17. PR: https://github.com/llvm/llvm-project/pull/170553	2025-12-05 20:05:29 +00:00
Florian Hahn	c5e6f4e99d	[AArch64] Add unrolling test with -mcpu=apple-a17. Currently Apple unrolling preferences are not applied to apple-a17.	2025-12-03 20:15:58 +00:00
Philip Reames	c752bb9203	[IndVars] Strengthen inference of samesign flags (#170363 ) When reviewing another change, I noticed that we were failing to infer samsign for two cases: 1) an unsigned comparison, and 2) when both arguments were known negative. Using CVP and InstCombine as a reference, we need to be careful to not allow eq/ne comparisons. I'm a bit unclear on the why of that, and for now am going with the low risk change. I may return to investigate that in a follow up. Compile time results look like noise to me, see: https://llvm-compile-time-tracker.com/compare.php?from=49a978712893fcf9e5f40ac488315d029cf15d3d&to=2ddb263604fd7d538e09dc1f805ebc30eb3ffab0&stat=instructions:u	2025-12-03 16:16:22 +00:00
Philip Reames	49a9787128	[SCEV] Regenerate a subset of auto updated tests Reducing spurious diff in an upcoming change.	2025-12-02 12:16:53 -08:00
Julian Nagele	b641509637	[LoopUnroll] Introduce parallel accumulators when unrolling FP reductions. (#166630 ) This is building on top of https://github.com/llvm/llvm-project/pull/149470, also introducing parallel accumulator PHIs when the reduction is for floating points, provided we have the reassoc flag. See also https://github.com/llvm/llvm-project/pull/166353, which aims to introduce parallel accumulators for reductions with vector instructions.	2025-11-27 15:03:36 +00:00
Julian Nagele	c73de9777e	[IVDesciptors] Support detecting reductions with vector instructions. (#166353 ) In combination with https://github.com/llvm/llvm-project/pull/149470 this will introduce parallel accumulators when unrolling reductions with vector instructions. See also https://github.com/llvm/llvm-project/pull/166630, which aims to introduce parallel accumulators for FP reductions.	2025-11-24 11:12:06 +00:00
Joel E. Denny	21fedcbf89	[LoopPeel] Fix BFI when peeling last iteration without guard (#168250 ) LoopPeel sometimes proves that, when reached, the original loop always executes at least two iterations. LoopPeel then unconditionally executes both the remaining loop's initial iteration and the peeled final iteration. But that increases the latter's frequency above its frequency in the original loop. To maintain the total frequency, this patch compensates by decreasing the remaininng loop's latch probability. This is another step in issue #135812 and was discussed at <https://github.com/llvm/llvm-project/pull/166858#discussion_r2528968542>.	2025-11-20 10:45:53 -05:00
Vladi Krapp	42a1184e42	[AArch64] Allow forcing unrolling of small loops (#167488 ) - Introduce the -aarch64-force-unroll-threshold option; when a loop’s cost is below this value we set UP.Force = true (default 0 keeps current behaviour) - Add an AArch64 loop-unroll regression test that runs once at the default threshold and once with the flag raised, confirming forced unrolling	2025-11-17 08:59:44 +00:00
Mircea Trofin	358e9a56af	[LP] Assign weights when peeling last iteration. (#166858 )	2025-11-15 10:01:04 -08:00
Joel E. Denny	1aa86ca521	[LoopUnroll] Fix division by zero (#166258 ) PR #159163's probability computation for epilogue loops does not handle the possibility of an original loop probability of one. Runtime loop unrolling does not make sense for such an infinite loop, and a division by zero results. This patch works around that case. Issue #165998.	2025-11-04 12:49:33 -05:00
Ivan Kelarev	37825ad4f6	[LoopUnroll] Prevent LoopFullUnrollPass from performing partial unrolling when trip counts are unknown (#165013 ) Currently, `LoopFullUnrollPass` incorrectly performs partial unrolling when `#pragma unroll` is specified and both `TripCount` and `MaxTripCount` are unknown. This patch adds a check to prevent partial unrolling when `OnlyFullUnroll` parameter is true and both trip count values are zero.	2025-11-04 09:20:01 -08:00
Joel E. Denny	bb9bd5f263	[LoopUnroll] Fix assert fail on zeroed branch weights (#165938 ) BranchProbability fails an assert when its denominator is zero. Reported at <https://github.com/llvm/llvm-project/pull/159163#pullrequestreview-3406318423>.	2025-11-03 10:19:12 -05:00
Joel E. Denny	cc8ff73fba	[LoopUnroll] Fix block frequencies for epilogue (#159163 ) As another step in issue #135812, this patch fixes block frequencies for partial loop unrolling with an epilogue remainder loop. It does not fully handle the case when the epilogue loop itself is unrolled. That will be handled in the next patch. For the guard and latch of each of the unrolled loop and epilogue loop, this patch sets branch weights derived directly from the original loop latch branch weights. The total frequency of the original loop body, summed across all its occurrences in the unrolled loop and epilogue loop, is the same as in the original loop. This patch also sets `llvm.loop.estimated_trip_count` for the epilogue loop instead of relying on the epilogue's latch branch weights to imply it. This patch fixes branch weights in tests that PR #157754 adversely affected.	2025-10-31 11:01:42 -04:00
Joel E. Denny	24557cce40	[LoopUnroll] Fix block frequencies when no runtime (#157754 ) This patch implements the LoopUnroll changes discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785) and is thus another step in addressing issue #135812. In summary, for the case of partial loop unrolling without a remainder loop, this patch changes LoopUnroll to: - Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body. - Store the new estimated trip count in the `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758. - Correct the new estimated trip count (e.g., 3 instead of 2) when the original estimated trip count (e.g., 10) divided by the unroll count (e.g., 4) leaves a remainder (e.g., 2). There are loop unrolling cases this patch does not fully fix, such as partial unrolling with a remainder loop and complete unrolling, and there are two associated tests whose branch weights this patch adversely affects. They will be addressed in future patches that should land with this patch.	2025-10-31 10:44:27 -04:00
Joel E. Denny	8d186e2195	[LoopUnroll][NFCI] Clean up remainder followup metadata handling (#165272 ) Followup metadata for remainder loops is handled by two implementations, both added by 7244852557ca6: 1. `tryToUnrollLoop` in `LoopUnrollPass.cpp`. 2. `CloneLoopBlocks` in `LoopUnrollRuntime.cpp`. As far as I can tell, 2 is useless: I added `assert(!NewLoopID)` for the `NewLoopID` returned by the `makeFollowupLoopID` call, and it never fails throughout check-all for my build. Moreover, if 2 were useful, it appears it would have a bug caused by 7cd826a321d9. That commit skips adding loop metadata to a new remainder loop if the remainder loop itself is to be completely unrolled because it will then no longer be a loop. However, that commit incorrectly assumes that `UnrollRemainder` dictates complete unrolling of a remainder loop, and thus it skips adding loop metadata even if the remainder loop will be only partially unrolled. To avoid further confusion here, this patch removes 2. check-all continues to pass for my build. If 2 actually is useful, please advise so we can create a test that covers that usage. Near 2, this patch retains the `UnrollRemainder` guard on the `setLoopAlreadyUnrolled` call, which adds `llvm.loop.unroll.disable` to the remainder loop. That behavior exists both before and after 7cd826a321d9. The logic appears to be that remainder loop unrolling (whether complete or partial) is opt-in. That is, unless `UnrollRemainder` is true, `UnrollRuntimeLoopRemainder` skips running remainder loop unrolling, and `llvm.loop.unroll.disable` suppresses any later attempt at it. This patch also extends testing of remainder loop followup metadata to be sure remainder loop partial unrolling is handled correctly by 1.	2025-10-30 10:57:27 -04:00
paperchalice	249883d0c5	[test][Transforms] Remove unsafe-fp-math uses part 2 (NFC) (#164786 ) Post cleanup for #164534.	2025-10-23 20:31:31 +08:00
Nikita Popov	573ca36753	[IR] Replace alignment argument with attribute on masked intrinsics (#163802 ) The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).	2025-10-20 08:50:09 +00:00
Florian Hahn	2d027260b0	[SCEV] Collect guard info for ICMP NE w/o constants. (#160500 ) When collecting information from loop guards, use UMax(1, %b - %a) for ICMP NE %a, %b, if neither are constant. This improves results in some cases, and will be even more useful together with * https://github.com/llvm/llvm-project/pull/160012 * https://github.com/llvm/llvm-project/pull/159942 https://alive2.llvm.org/ce/z/YyBvoT PR: https://github.com/llvm/llvm-project/pull/160500	2025-10-14 14:20:34 +00:00
Joel E. Denny	6d44b9082e	[LoopUnroll] Skip remainder loop guard if skip unrolled loop (#156549 ) The original loop (OL) that serves as input to LoopUnroll has basic blocks that are arranged as follows: ``` OLPreHeader OLHeader <-. ... \| OLLatch ---' OLExit ``` In this depiction, every block has an implicit edge to the next block below, so any explicit edge indicates a conditional branch. Given OL and unroll count N, LoopUnroll sometimes creates an unrolled loop (UL) with a remainder loop (RL) epilogue arranged like this: ``` ,-- ULGuard \| ULPreHeader \| ULHeader <-. \| ... \| \| ULLatch ---' \| ULExit `-> RLGuard -----. RLPreHeader \| ,-> RLHeader \| \| ... \| `-- RLLatch \| RLExit \| OLExit <-----' ``` Each UL iteration executes N OL iterations, but each RL iteration executes 1 OL iteration. ULGuard or RLGuard checks whether the first iteration of UL or RL should execute, respectively. If so, ULLatch or RLLatch checks whether to execute each subsequent iteration. Once reached, OL always executes its first iteration but not necessarily the next N-1 iterations. Thus, ULGuard is always required before the first UL iteration. However, when control flows from ULGuard directly to RLGuard, the first OL iteration has yet to execute, so RLGuard is then redundant before the first RL iteration. Thus, this patch makes the following changes: - Adjust ULGuard to branch to RLPreHeader instead of RLGuard, thus eliminating RLGuard's unnecessary branch instruction for that path. - Eliminate the creation of RLGuard phi node poison values. Without this patch, RLGuard has such a phi node for each value that is defined by any OL iteration and used in OLExit. The poison value is required where ULGuard is the predecessor. The poison value indicates that control flow from ULGuard to RLGuard to Exit has no counterpart in OL because the first OL iteration must execute either in UL or RL. - Simplify the CFG by not splitting ULExit and RLGuard because, without the ULGuard predecessor, the single block can now be a dedicated UL exit. - To RLPreHeader, add an `llvm.assume` call that asserts the RL trip count is non-zero. Without this patch, RLPreHeader is reachable only when RLGuard guarantees that assertion is true. With this patch, RLGuard guarantees it only when RLGuard is the predecessor, and the OL structure guarantees it when ULGuard is the predecessor. If RL itself is unrolled later, this guarantee somehow prevents ScalarEvolution from giving up when trying to compute a maximum trip count for RL. That maximum trip count enables the branch instruction in the final unrolled instance of RLLatch to be eliminated. Without the `llvm.assume` call, some existing unroll tests start to fail because that instruction is not eliminated. The original motivation for this patch is to facilitate later patches that fix LoopUnroll's computation of branch weights so that they maintain the block frequency of OL's body (see #135812). Specifically, this patch ensures RLGuard's branch weights do not affect RL's contribution to the block frequency of OL's body in the case that ULGuard skips UL.	2025-10-07 10:45:49 -04:00
Joel E. Denny	afb262855e	[LoopPeel] Fix branch weights' effect on block frequencies (#128785 ) [LoopPeel] Fix branch weights' effect on block frequencies This patch implements the LoopPeel changes discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). In summary, a loop's latch block can have branch weight metadata that encodes an estimated trip count that is derived from application profile data. Initially, the loop body's block frequencies agree with the estimated trip count, as expected. However, sometimes loop transformations adjust those branch weights in a way that correctly maintains the estimated trip count but that corrupts the block frequencies. This patch addresses that problem in LoopPeel, which it changes to: - Maintain branch weights consistently with the original loop for the sake of preserving the total frequency of the original loop body. - Store the new estimated trip count in the `llvm.loop.estimated_trip_count` metadata, introduced by PR #148758.	2025-10-02 16:07:55 +00:00
Florian Hahn	3c4f611791	[LoopPeel] Add test with branch that can be simplified with guards. Add test where a branch can be removed after peeling by applying info from loop guards. It unfortunately requires running IndVars first, to strengthen flags of the induction.	2025-09-24 11:51:55 +01:00
Florian Hahn	8693ef16f6	[SCEV] Add tests that benefit from rewriting SCEVAddExpr with guards. Add additional tests benefiting from rewriting existing SCEVAddExprs with guards.	2025-09-20 19:24:19 +01:00
Florian Hahn	3ea089ba19	[AArch64] Enable RT and partial unrolling with reductions for Apple CPUs. (#149699 ) Update unrolling preferences for Apple Silicon CPUs to enable partial unrolling and runtime unrolling for small loops with reductions. This builds on top of unroller changes to introduce parallel reduction phis, if possible: https://github.com/llvm/llvm-project/pull/149470. PR: https://github.com/llvm/llvm-project/pull/149699	2025-09-09 13:23:30 +00:00
Florian Hahn	2d9e452ab0	[LoopUnroll] Introduce parallel reduction phis when unrolling. (#149470 ) When partially or runtime unrolling loops with reductions, currently the reductions are performed in-order in the loop, negating most benefits from unrolling such loops. This patch extends unrolling code-gen to keep a parallel reduction phi per unrolled iteration and combining the final result after the loop. For out-of-order CPUs, this allows executing mutliple reduction chains in parallel. For now, the initial transformation is restricted to cases where we unroll a small number of iterations (hard-coded to 4, but should maybe be capped by TTI depending on the execution units), to avoid introducing an excessive amount of parallel phis. It also requires single block loops for now, where the unrolled iterations are known to not exit the loop (either due to runtime unrolling or partial unrolling). This ensures that the unrolled loop will have a single basic block, with a single exit block where we can place the final reduction value computation. The initial implementation also only supports parallelizing loops with a single reduction and only integer reductions. Those restrictions are just to keep the initial implementation simpler, and can easily be lifted as follow-ups. With corresponding TTI to the AArch64 unrolling preferences which I will also share soon, this triggers in ~300 loops across a wide range of workloads, including LLVM itself, ffmgep, av1aom, sqlite, blender, brotli, zstd and more. PR: https://github.com/llvm/llvm-project/pull/149470	2025-09-04 20:54:09 +01:00
Ryotaro Kasuga	2330fd2f73	[LoopPeel] Add new option to peeling loops to convert PHI into IV (#121104 ) LoopPeel currently considers PHI nodes that become loop invariants through peeling. However, in some cases, peeling transforms PHI nodes into induction variables (IVs), potentially enabling further optimizations such as loop vectorization. For example: ```c // TSVC s292 int im = N-1; for (int i=0; i<N; i++) { a[i] = b[i] + b[im]; im = i; } ``` In this case, peeling one iteration converts `im` into an IV, allowing it to be handled by the loop vectorizer. This patch adds a new feature to peel loops when to convert PHIs into IVs. At the moment this feature is disabled by default. Enabling it allows to vectorize the above example. I have measured on neoverse-v2 and observed a speedup of more than 60% (options: `-O3 -ffast-math -mcpu=neoverse-v2 -mllvm -enable-peeling-for-iv`). This PR is taken over from #94900 Related #81851	2025-08-20 13:44:56 +00:00
Ahmad Yasin	1f2fb8e979	[AArch64] Tune unrolling prefs for more patterns on Apple CPUs (#149358 ) Enhance the heuristics in `getAppleRuntimeUnrollPreferences` to let a bit more loops to be unrolled. Specifically, this patch adjusts two checks: I. Tune the loop size budget from 8 to 10 II. Include immediate in-loop users of loaded values in the load/stores dependencies predicate --------- Co-authored-by: Florian Hahn <flo@fhahn.com> PR: https://github.com/llvm/llvm-project/pull/149358	2025-08-13 11:16:54 +01:00
Florian Hahn	d10dc67fc3	[LoopUnroll] Add additional reduction unroll tests for #149470 . Add additional tests from https://github.com/llvm/llvm-project/pull/149470.	2025-08-01 15:06:22 +01:00
Ramkumar Ramachandra	fd175fafa6	[RISCV] Adjust unroll prefs for loops with vectors (#151525 ) Adjust the unrolling preferences to unroll hand-vectorized code, as well as the scalar remainder of a vectorized loop. Inspired by a similar effort in AArch64: see #147420 and #151164.	2025-07-31 21:11:56 +01:00
Joel E. Denny	37e03b56b8	Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585 ) Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)	2025-07-31 15:56:31 -04:00
Joel E. Denny	f7b65011de	[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758 ) This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.	2025-07-31 12:28:25 -04:00
John Brawn	9a9b8b7d1c	[AArch64] Allow unrolling of scalar epilogue loops (#151164 ) #147420 changed the unrolling preferences to permit unrolling of non-auto vectorized loops by checking for the isvectorized attribute, however when a loop is vectorized this attribute is put on both the vector loop and the scalar epilogue, so this change prevented the scalar epilogue from being unrolled. Restore the previous behaviour of unrolling the scalar epilogue by checking both for the isvectorized attribute and vector instructions in the loop.	2025-07-31 11:03:41 +01:00
Florian Hahn	f9f68af4b8	[SCEV] Make sure LCSSA is preserved when re-using phi if needed. If we insert a new add instruction, it may introduce a new use outside the loop that contains the phi node we re-use. Use fixupLCSSAFormFor to fix LCSSA form, if needed. This fixes a crash reported in https://github.com/llvm/llvm-project/pull/147824#issuecomment-3124670997.	2025-07-28 16:24:46 +01:00
Florian Hahn	90f733ce6e	[LoopUnroll] Add tests for unrolling loops with reductions. Add tests for unrolling loops with reductions. In some cases, multiple parallel reduction phis could be retained to improve performance.	2025-07-18 07:39:28 +01:00
Ahmad Yasin	671072e830	[AArch64] Unrolling of loops with vector instructions. (#147420 ) This patch permits loops with vector instructions to be unrolled. Today there is an early exit in `getUnrollingPreferences()` of AArch64 targets if a vector instruction is observed in any of the loop blocks. This patch fixes that so common loops like this one get a chance to be unrolled: void saxpy (float * dst, const float * src, const float a, const int len) { float32x4_t * vdst = (float32x4_t )dst; float32x4_t vsrc = (float32x4_t *)src; float32x4_t vk = vdupq_n_f32(a); for (int i = 0; i < (len >> 2); i++) { vdst[i] = vaddq_f32(vdst[i], vmulq_f32(vsrc[i], vk)); } } Auto-vectorized loops are still not unrolled, unless they were not interleaved when vectorized. The provided test case shows the enhancement on top of runtime/partial unrolling, depending on the CPU. PR: https://github.com/llvm/llvm-project/pull/147420	2025-07-14 20:53:09 +01:00
macurtis-amd	cff4a00d3f	AMDGPU: Fix runtime unrolling when cascaded GEPs present (#147700 ) Cascaded GEP (i.e. GEP of GEP) are not handled when determining if it is ok to runtime unroll loops. This change simply uses `getUnderlyingObjects` to look through cascaded GEPs.	2025-07-10 03:44:04 -05:00
Philip Reames	bb288de4e0	[LoopPeel] Support last iteration peeling of min/max intrinsics (#143598 ) This isn't terribly useful at the moment because of the step=1 restriction but it should be functionally sound. This is mostly just making sure the codepaths don't diverge as we make other changes.	2025-06-17 11:22:23 -07:00
Philip Reames	719e7bea8a	[LoopPeel] Add tests for last iteration peeling of min/max intrinsics	2025-06-10 13:08:36 -07:00
Philip Reames	4e706adc5e	[LoopPeel] Add test coverage for edge case for peel last Add coverage for two cases: 1) Handling of the two transition edge case with equality conditions when last iteration is both first and second transition. 2) Need to handle inverted predicates	2025-06-10 11:46:06 -07:00
Florian Hahn	e5ff7055be	[LoopPeel] Use loop guards when checking if last iter can be peeled. (#142605 ) Apply loop guards to BTC before checking if the last iteration should be peeled off. This also adds an assert to make sure applying the guards does not pessimize the results. I checked on a large test set and it did not trigger there, but it adds an additional guard to catch potential cases where loop-guards pessimize results. Peels ~15% more loops. PR: https://github.com/llvm/llvm-project/pull/142605	2025-06-10 08:29:42 +01:00
Yingwei Zheng	4eac8daa38	[LoopPeel] Handle non-local instructions/arguments when updating exiting values (#142993 ) Similar to `7e14161f49`, the exiting value may be a non-local instruction or an argument. Closes https://github.com/llvm/llvm-project/issues/142895.	2025-06-06 12:56:28 +08:00
Florian Hahn	3a8b48862a	[LoopPeel] Add tests for peeling last iteration with loop guards. Add additional test coverage for peeling the last iteration where information from loop guards is needed.	2025-06-03 14:29:44 +01:00
Florian Hahn	f98bdd94e6	Reapply "[LoopPeel] Remove known trip count restriction when peeling last. (#140792 )" This reverts commit 580454526b936f7a576ddbc9bb932cf9be376ec4. The recommitted version contains an extra check to not peel if the latch exit is controlled by a pointer induction. Original message: Remove the restriction that the loop must be known to execute at least 2 iterations when peeling the last iteration. If we cannot prove at least 2 iterations are executed, a check and branch to skip the peeled loop is inserted. PR: https://github.com/llvm/llvm-project/pull/140792	2025-05-28 13:02:03 +01:00
Florian Hahn	f0f666bc32	[LoopPeel] Add peeling tests with debug value and pointer inductions Adds extra test coverage for https://github.com/llvm/llvm-project/pull/140792.	2025-05-28 10:07:02 +01:00
Florian Hahn	580454526b	Revert "[LoopPeel] Remove known trip count restriction when peeling last. (#140792 )" This reverts commit 24b97756decb7bf0e26dcf0e30a7a9aaf27f417c. Also reverts ac9a466e39bf97ffeab127982aa7c405cb257551. Building CMake triggers a crash with the patch, revert while I investigate.	2025-05-27 21:25:32 +01:00
Florian Hahn	ac9a466e39	[LoopPeel] Insert new phis before first non-PHI when peeling last iter. Make sure the new phis are inserted before any non-phi instructions. This fixes a crash when dbg_value instructions are present in the original exit block.	2025-05-27 10:46:28 +01:00

1 2 3 4 5 ...

692 Commits