llvm-project

Author	SHA1	Message	Date
Yingwei Zheng	4eac8daa38	[LoopPeel] Handle non-local instructions/arguments when updating exiting values (#142993 ) Similar to `7e14161f49`, the exiting value may be a non-local instruction or an argument. Closes https://github.com/llvm/llvm-project/issues/142895.	2025-06-06 12:56:28 +08:00
Florian Hahn	3a8b48862a	[LoopPeel] Add tests for peeling last iteration with loop guards. Add additional test coverage for peeling the last iteration where information from loop guards is needed.	2025-06-03 14:29:44 +01:00
Florian Hahn	f98bdd94e6	Reapply "[LoopPeel] Remove known trip count restriction when peeling last. (#140792 )" This reverts commit 580454526b936f7a576ddbc9bb932cf9be376ec4. The recommitted version contains an extra check to not peel if the latch exit is controlled by a pointer induction. Original message: Remove the restriction that the loop must be known to execute at least 2 iterations when peeling the last iteration. If we cannot prove at least 2 iterations are executed, a check and branch to skip the peeled loop is inserted. PR: https://github.com/llvm/llvm-project/pull/140792	2025-05-28 13:02:03 +01:00
Florian Hahn	f0f666bc32	[LoopPeel] Add peeling tests with debug value and pointer inductions Adds extra test coverage for https://github.com/llvm/llvm-project/pull/140792.	2025-05-28 10:07:02 +01:00
Florian Hahn	580454526b	Revert "[LoopPeel] Remove known trip count restriction when peeling last. (#140792 )" This reverts commit 24b97756decb7bf0e26dcf0e30a7a9aaf27f417c. Also reverts ac9a466e39bf97ffeab127982aa7c405cb257551. Building CMake triggers a crash with the patch, revert while I investigate.	2025-05-27 21:25:32 +01:00
Florian Hahn	ac9a466e39	[LoopPeel] Insert new phis before first non-PHI when peeling last iter. Make sure the new phis are inserted before any non-phi instructions. This fixes a crash when dbg_value instructions are present in the original exit block.	2025-05-27 10:46:28 +01:00
Florian Hahn	24b97756de	[LoopPeel] Remove known trip count restriction when peeling last. (#140792 ) Remove the restriction that the loop must be known to execute at least 2 iterations when peeling the last iteration. If we cannot prove at least 2 iterations are executed, a check and branch to skip the peeled loop is inserted. PR: https://github.com/llvm/llvm-project/pull/140792	2025-05-26 20:08:02 +01:00
Florian Hahn	3c9812eeea	[LoopPeel] Add tests for peeling last iteration with multiple exits.	2025-05-23 15:46:34 +01:00
Florian Hahn	4f869e0f5c	[LoopPeel] Add test for peeling last iteration with non-trivial BTC. Additional test to https://github.com/llvm/llvm-project/pull/140792 with different SCEV expansion costs.	2025-05-21 22:28:26 +01:00
Florian Hahn	705e27c234	[LoopPeel] Add tests for peeling from end with variable trip counts. Add more test coverage for peeling the last iteration with variable trip counts. Separate test cases for constant and variable trip counts in different files.	2025-05-20 21:07:21 +01:00
Florian Hahn	a0a2a1e095	[LoopPeel] Make sure exit condition has a single use when peeling last. Update the check in canPeelLastIteration to make sure the exiting condition has a single use. When peeling the last iteration, we adjust the condition in the loop body to be true one iteration early, which would be incorrect for other users. Fixes https://github.com/llvm/llvm-project/issues/140444.	2025-05-18 11:47:12 +01:00
Florian Hahn	7e14161f49	[LoopPeel] Handle constants when updating exit values when peeling last. Account for constant values when updating exit values after peeling an iteration from the end. This can happen if the inner loop gets unrolled and simplified. Fixes https://github.com/llvm/llvm-project/issues/140442.	2025-05-18 10:17:21 +01:00
Florian Hahn	3fcfce4c5e	Reapply "[LoopPeel] Implement initial peeling off the last loop iteration. (#139551 )" This reverts the revert commit bf92b127d2637948f53d11a187e865aa10e2e74c. This adds missing initialization of PeelLast in gatherPeelingPreferences. Original message: Generalize countToEliminateCompares to also consider peeling off the last iteration if it eliminates a compare. At the moment, codegen for peeling off the last iteration is quite restrictive and callers have to make sure that the exit condition can be adjusted when peeling and that the loop executes at least 2 iterations. Both will be relaxed in follow-ups. PR: https://github.com/llvm/llvm-project/pull/139551	2025-05-17 10:51:05 +01:00
Florian Hahn	bf92b127d2	Revert "[LoopPeel] Implement initial peeling off the last loop iteration. (#139551 )" This reverts commit bb10c3ba7f77d40a7fbfd4ac815015d3a4ae476a. Also reverts 4f663cca15f2b53c2bc6a84d1b1f5bd81679356d: Revert "[LoopPeel] Make sure PeelLast is always initialized." Revert for now to bring msan bots back to green https://lab.llvm.org/buildbot/#/builders/164/builds/9992 https://lab.llvm.org/buildbot/#/builders/94/builds/7158	2025-05-16 08:33:12 +01:00
Florian Hahn	bb10c3ba7f	[LoopPeel] Implement initial peeling off the last loop iteration. (#139551 ) Generalize countToEliminateCompares to also consider peeling off the last iteration if it eliminates a compare. At the moment, codegen for peeling off the last iteration is quite restrictive and callers have to make sure that the exit condition can be adjusted when peeling and that the loop executes at least 2 iterations. Both will be relaxed in follow-ups. PR: https://github.com/llvm/llvm-project/pull/139551	2025-05-15 19:15:48 +01:00
Florian Hahn	310ed2b070	[LoopUnroll] Add tests with multiple exiting/latches and small BTCs. Extra test coverage for cases mentioned during review of https://github.com/llvm/llvm-project/pull/139551.	2025-05-15 12:54:00 +01:00
Florian Hahn	d39ca81fdd	[LoopPeel] Add initial tests for peeling the last iteration. Precommit tests for upcoming PR.	2025-05-12 14:56:21 +01:00
Matt Arsenault	9bdd9dc895	AMDGPU: Mark workitem ID intrinsics with range attribute (#136196 ) This avoids the need to have special handling at every use site. Unfortunately this means we unnecessarily emit AssertZext in the DAG (where we already directly understand the range of the intrinsic), andt we regress in undefined cases as we don't fold out asserts on undef.	2025-04-18 12:27:38 +02:00
Sirish Pande	7f107c3019	[IndVarsSimplify] sinkUnusedInvariants is skipping instructions while sinking. (#135205 ) While sinking instructions (that are loop invariant) from preheader to the exit block, we are skipping instructions due to decrementing instruction iterator twice.	2025-04-17 19:21:18 -05:00
Yingwei Zheng	7e5317139d	[PowerPC] Pre-commit tests for PR130742. NFC. (#135606 ) Needed by https://github.com/llvm/llvm-project/pull/130742.	2025-04-17 17:52:49 +08:00
Björn Pettersson	092b6e73e6	[InstCombine] Handle "add like" in ADD+GEP->GEP+GEP rewrites (#135156 ) Considering that "or disjoint" is the canonical for certain add operations, then I think we want to support such "add like" operations when doing ADD+GEP->GEP+GEP rewrites to make things more consistent. Problem was found when improving ValueTracking, which turned an ADD into OR, and then suddenly optimizations got worse due to these rewrites no longer triggering.	2025-04-14 17:11:13 +02:00
David Sherwood	712c21336f	[AArch64] Enable unrolling for small multi-exit loops (#131998 ) It can be highly beneficial to unroll small, two-block search loops that look for a value in an array. An example of this would be something that uses std::find to find a value in libc++. Older versions of std::find in the libstdc++ headers are manually unrolled in the source code, but this might change in newer releases where the compiler is expected to either vectorise or unroll itself.	2025-04-09 10:34:27 +01:00
Florian Hahn	a4573ee38d	[LoopUnroll] UnrollRuntimeMultiExit takes precedence over TTI. (#134259 ) Update UnrollRuntimeLoopRemainder to always give priority to the UnrollRuntimeMultiExit option, if provided. After ad9da92cf6f7357 (https://github.com/llvm/llvm-project/pull/124462), we would ignore the option if the backend indicates multi-exit is profitable. This means it cannot be used to disable runtime unrolling. To be consistent with canProfitablyRuntimeUnrollMultiExitLoop, always respect the option. This surfaced while discussing https://github.com/llvm/llvm-project/pull/131998. PR: https://github.com/llvm/llvm-project/pull/134259	2025-04-04 10:16:50 +01:00
David Sherwood	aaf398c2e7	[AArch64] Regenerate apple-unrolling-multi-exit.ll test checks (#134257 )	2025-04-04 09:03:49 +01:00
Yingwei Zheng	c5a491e9ea	[SCEV] Check whether the start is non-zero in `ScalarEvolution::howFarToZero` (#131522 ) https://github.com/llvm/llvm-project/pull/94525 assumes that the loop will be infinite when the stride is zero. However, it doesn't hold when the start value of addrec is also zero. Closes https://github.com/llvm/llvm-project/issues/131465.	2025-03-17 13:59:16 +08:00
Jeremy Morse	792a6f8119	[RemoveDIs] Remove "try-debuginfo-iterators..." test flags (#130298 ) These date back to when the non-intrinsic format of variable locations was still being tested and was behind a compile-time flag, so not all builds / bots would correctly run them. The solution at the time, to get at least some test coverage, was to have tests opt-in to non-intrinsic debug-info if it was built into LLVM. Nowadays, non-intrinsic format is the default and has been on for more than a year, there's no need for this flag to exist. (I've downgraded the flag from "try" to explicitly requesting non-intrinsic format in some places, so that we can deal with tests that are explicitly about non-intrinsic format in their own commit).	2025-03-14 15:50:49 +00:00
Florian Hahn	46a13a5b17	[AArch64] Runtime-unroll small multi-exit loops on Apple Silicon. (#124751 ) Extend unrolling preferences to allow more aggressive unrolling of search loops with 2 exits, building on the TTI hook added in `ad9da92cf6`. In combination with `eac23a5b97` this enables unrolling loops like std::find, which can improve performance significantly (+15% end-to-end on a workload that makes heavy use of std::find). It increase the total number of unrolled loops by ~2.5% across a very large corpus of workloads. For SPEC2017, +1.6% more loops are unrolled and the following workloads increase in size (`__text`): workload base patch 500.perlbench_r 1682884.00 1694104.00 0.7% 523.xalancbmk_r 3001716.00 3003832.00 0.1% PR: https://github.com/llvm/llvm-project/pull/124751	2025-02-27 14:42:45 +00:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Florian Hahn	3007f31e74	[LoopUnroll] Add AArch64 tests for multi-exit loop unrolling. Test coverage to https://github.com/llvm/llvm-project/pull/124751.	2025-01-28 14:25:27 +00:00
Florian Hahn	d486b76823	[AArch64] Unroll some loops with early-continues on Apple Silicon. (#118499 ) Try to runtime-unroll loops with early-continues depending on loop-varying loads; this helps with branch-prediction for the early-continues and can significantly improve performance for such loops Builds on top of https://github.com/llvm/llvm-project/pull/118317. PR: https://github.com/llvm/llvm-project/pull/118499.	2024-12-22 13:10:54 +00:00
Vladi Krapp	f8d270474c	[ARM] Reduce loop unroll when low overhead branching is available (#120065 ) For processors with low overhead branching (LOB), runtime unrolling the innermost loop is often detrimental to performance. In these cases the loop remainder gets unrolled into a series of compare-and-jump blocks, which in deeply nested loops get executed multiple times, negating the benefits of LOB. This is particularly noticable when the loop trip count of the innermost loop varies within the outer loop, such as in the case of triangular matrix decompositions. In these cases we will prefer to not unroll the innermost loop, with the intention for it to be executed as a low overhead loop.	2024-12-18 10:10:51 +00:00
Florian Hahn	0bb7bd4b4e	[AArch64] Runtime-unroll small load/store loops for Apple Silicon CPUs. (#118317 ) Add initial heuristics to selectively enable runtime unrolling for loops where doing so is expected to be highly beneficial on Apple Silicon CPUs. To start with, we try to runtime-unroll small, single block loops, if they have load/store dependencies, to expose more parallel memory access streams [1] and to improve instruction delivery [2]. We also explicitly avoid runtime-unrolling for loop structures that may limit the expected gains from runtime unrolling. Such loops include loops with complex control flow (aren't innermost loops, have multiple exits, have a large number of blocks), trip count expansion is expensive and are expected to execute a small number of iterations. Note that the heuristics here may be overly conservative and we err on the side of avoiding runtime unrolling rather than unroll excessively. They are all subject to further refinement. Across a large set of workloads, this increase the total number of unrolled loops by 2.9%. [1] 4.6.10 in Apple Silicon CPU Optimization Guide [2] 4.4.4 in Apple Silicon CPU Optimization Guide Depends on https://github.com/llvm/llvm-project/pull/118316 for TTI changes. PR: https://github.com/llvm/llvm-project/pull/118317	2024-12-09 14:28:31 +00:00
VladiKrapp-Arm	bb3eb0ca0c	[ARM] Test unroll behaviour on machines with low overhead branching (#118692 ) Add test for existing loop unroll behaviour. Current behaviour is the single loop with fmul gets runtime unrolled by count of 4, with the loop remainder unrolled as the 3 for.body9.us.prol sections. This is quite a lot of compare and branch, negating the benefits of the low overhead loop mechanism.	2024-12-06 15:04:56 +00:00
Nikita Popov	f7685af4a5	[InstCombine] Move gep of phi fold into separate function This makes sure that an early return during this fold doesn't end up skipping later gep folds.	2024-12-05 15:20:56 +01:00
Nikita Popov	462cb3cd6c	[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144 ) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z/ihztLy	2024-12-05 14:36:40 +01:00
Florian Hahn	21d27b3aab	[LoopUnroll] Add tests for loop unrolling on Apple platforms. Add first set of tests where runtime unrolling can be highly beneficial on Apple Silicon CPUs.	2024-12-02 15:48:48 +00:00
Lee Wei	abb9f9fa06	[llvm] Remove `br i1 undef` from some regression tests [NFC] (#117112 ) This PR removes tests with `br i1 undef` under `llvm/tests/Transforms/Loop, Lower`.	2024-11-21 08:06:56 +00:00
Stephen Tozer	92e0fb0c94	[DebugInfo][LoopUnroll] Preserve DebugLocs on optimized cond branches (#114225 ) This patch fixes a simple error where as part of loop unrolling we optimize conditional loop-exiting branches into unconditional branches when we know that they will or won't exit the loop, but does not propagate the source location of the original branch to the new one. Found using https://github.com/llvm/llvm-project/pull/107279.	2024-11-08 16:52:30 +00:00
Yingwei Zheng	0b9f1cc024	[SCEV] Disallow simplifying phi(undef, X) to X (#115109 ) See the following case: ``` @GlobIntONE = global i32 0, align 4 define ptr @src() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph %retval.0 = phi ptr [ %retval.2.peel, %entry.peel.newph ], [ %retval.2, %cleanup ] br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body %retval.2 = phi ptr [ %retval.0, %for.body ], [ @GlobIntONE, %cleanup.loopexit ] br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } define ptr @tgt() { entry: br label %for.body.peel.begin for.body.peel.begin: ; preds = %entry br label %for.body.peel for.body.peel: ; preds = %for.body.peel.begin br i1 true, label %cleanup.peel, label %cleanup.loopexit.peel cleanup.loopexit.peel: ; preds = %for.body.peel br label %cleanup.peel cleanup.peel: ; preds = %cleanup.loopexit.peel, %for.body.peel %retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ] br i1 true, label %for.body.peel.next, label %cleanup7 for.body.peel.next: ; preds = %cleanup.peel br label %for.body.peel.next1 for.body.peel.next1: ; preds = %for.body.peel.next br label %entry.peel.newph entry.peel.newph: ; preds = %for.body.peel.next1 br label %for.body for.body: ; preds = %cleanup, %entry.peel.newph br i1 false, label %cleanup, label %cleanup.loopexit cleanup.loopexit: ; preds = %for.body br label %cleanup cleanup: ; preds = %cleanup.loopexit, %for.body br i1 false, label %for.body, label %cleanup7.loopexit cleanup7.loopexit: ; preds = %cleanup %retval.2.lcssa.ph = phi ptr [ %retval.2.peel, %cleanup ] br label %cleanup7 cleanup7: ; preds = %cleanup7.loopexit, %cleanup.peel %retval.2.lcssa = phi ptr [ %retval.2.peel, %cleanup.peel ], [ %retval.2.lcssa.ph, %cleanup7.loopexit ] ret ptr %retval.2.lcssa } ``` 1. `simplifyInstruction(%retval.2.peel)` returns `@GlobIntONE`. Thus, `ScalarEvolution::createNodeForPHI` returns SCEV expr `@GlobIntONE` for `%retval.2.peel`. 2. `SimplifyIndvar::replaceIVUserWithLoopInvariant` tries to replace the use of `%retval.2.peel` in `%retval.2.lcssa.ph` with `@GlobIntONE`. 3. `simplifyLoopAfterUnroll -> simplifyLoopIVs -> SCEVExpander::expand` reuses `%retval.2.peel = phi ptr [ undef, %for.body.peel ], [ @GlobIntONE, %cleanup.loopexit.peel ]` to generate code for `@GlobIntONE`. It is incorrect. This patch disallows simplifying `phi(undef, X)` to `X` by setting `CanUseUndef` to false. Closes https://github.com/llvm/llvm-project/issues/114879.	2024-11-07 15:53:51 +08:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Florian Hahn	2f7ccaf4a8	[SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. (#108777 ) This can help in cases where pointer alignment info is missing, e.g. https://github.com/llvm/llvm-project/pull/108210 The predicate is formed for the complex expression that's passed to SolveLinEquationWithOverflow and the checks could probably be pushed closer to the root nodes, which in some cases may be cheaper to check. PR: https://github.com/llvm/llvm-project/pull/108777	2024-09-28 14:19:57 +01:00
Nikita Popov	5bcc82d433	[LoopPeel] Fix LCSSA phi node invalidation In the test case, the BECount of the second loop uses %load, but we only have an LCSSA phi node for %add, so that is what gets invalidated. Use the forgetLcssaPhiWithNewPredecessor() API instead, which will invalidate the roots of the expression instead. Fixes https://github.com/llvm/llvm-project/issues/109333.	2024-09-20 17:01:41 +02:00
Nikita Popov	4ec4ac15ed	[SCEVExpander] Fix addrec cost model (#106704 ) The current isHighCostExpansion cost model for addrecs computes the cost for some kind of polynomial expansion that does not appear to have any relation to addrec expansion whatsoever. A literal expansion of an affine addrec is a phi and add (plus the expansion of start and step). For a non-affine addrec, we get another phi+add for each additional addrec nested in the step recurrence. This partially `fixes` https://github.com/llvm/llvm-project/issues/53205 (the runtime unroll test case in this PR).	2024-09-19 09:39:35 +02:00
Ganesh	02e4186d0b	[X86] AMD Zen 5 Initial enablement (#107964 ) This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.	2024-09-13 17:45:33 +01:00
Nikita Popov	52b879594f	[LoopUnroll] Avoid undef values in test (NFC) Avoid most of the code being optimized away as a result of optimization improvements.	2024-09-03 12:10:29 +02:00
Nikita Popov	fe1a1eee2f	[Tests] Regenerate test checks (NFC)	2024-09-03 11:42:47 +02:00
Nikita Popov	9edd998e10	[LoopUnroll] Add test for #53205 (NFC)	2024-08-29 16:43:56 +02:00
Nikita Popov	fe182ddf1f	[LoopUnrollAnalyzer] Use constant folding API for loads Use ConstantFoldLoadFromConst() instead of a partial re-implementation. This makes the code slightly more generic by not depending on the exact structure of the constant.	2024-08-28 11:53:25 +02:00
James Y Knight	dfeb3991fb	Remove the `x86_mmx` IR type. (#98505 ) It is now translated to `<1 x i64>`, which allows the removal of a bunch of special casing. This _incompatibly_ changes the ABI of any LLVM IR function with `x86_mmx` arguments or returns: instead of passing in mmx registers, they will now be passed via integer registers. However, the real-world incompatibility caused by this is expected to be minimal, because Clang never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>` or `double`, depending on ABI. This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type. That type simply no longer corresponds to an IR type, and is used only by MMX intrinsics and inline-asm operands. Because SelectionDAGBuilder only knows how to generate the operands/results of intrinsics based on the IR type, it thus now generates the intrinsics with the type MVT::v1i64, instead of MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus have the X86 backend fix them up in DAGCombine. (This may be a short-lived hack, if all the MMX intrinsics can be removed in upcoming changes.) Works towards issue #98272.	2024-07-25 09:19:22 -04:00
v01dXYZ	cff8d716bd	[SCEV] forgetValue: support (with-overflow-inst op0, op1) (#98015 ) The use-def walk in forgetValue() was skipping instructions with non-SCEVable types. However, SCEV may look past with.overflow intrinsics returning aggregates. Fixes #97586.	2024-07-09 09:14:33 +02:00

1 2 3 4 5 ...

648 Commits