llvm-project

Author	SHA1	Message	Date
Graham Hunter	03f852f704	[AArch64] Improve cost model for legal subvec insert/extract (#81135 ) Currently we model subvector inserts and extracts as shuffles, potentially going as far as scalarizing. If the types are legal then they can just be simple zip/unzip operations, or possible even no-ops. Change the cost to a relatively small one to ensure that simple loops featuring such operations between fixed and scalable vector types that are effectively the same at a given sve width can be unrolled and further optimized.	2024-03-04 16:17:01 +00:00
Vedant Paranjape	e209178d64	[SimplifyIndVar] LCSSA form is destroyed by simplifyLoopIVs, preserve it (#78696 ) In LoopUnroll, peelLoop is called on the loop. After the loop is peeled it calls simplifyLoopAfterUnroll on the loop. This call to simplifyLoopAfterUnroll doesn't preserve the LCSSA form of the parent loop and thus during the next call to peelLoop the LCSSA form is already broken. LoopPeel util takes in the PreserveLCSSA argument and it passes on the same argument to simplifyLoop which checks if the loop is in a valid LCSSA form, when (PreserveLCSSA = true). This causes an assert in simplifyLoop when (PreserveLCSSA = true), as during the last call LCSSA for the loop wasn't preserved, and thus crashes at the following assert. assert(L->isRecursivelyLCSSAForm(DT, LI) && "Requested to preserve LCSSA, but it's already broken."); Upon debugging, it is evident that simplifyLoopIVs call inside simplifyLoopAfterUnroll breaks the LCSSA form. This patch fixes llvm#77118, it checks if the replacement of IV Users with Loop Invariant preserves the LCSSA form. If it does not, it emits the required LCSSA Phi instructions.	2024-02-21 17:51:56 +05:30
Graham Hunter	ad78e210bd	[NFC][AArch64] Tests for guarding unrolling with scalable vec ins/ext (#81132 )	2024-02-19 09:47:49 +00:00
Sergey Kachkov	ffd79b3312	[LoopUnroll] Consider simplified operands while retrieving TTI instruction cost (#70929 ) Get more precise cost of instruction after LoopUnroll considering that some operands of it can be simplified, e.g. induction variable will be replaced by constant after full unrolling.	2024-02-06 17:01:38 +03:00
modiking	99ddd77ed9	[LoopUnroll] Introduce PragmaUnrollFullMaxIterations as a hard cap on how many iterations we try to unroll (#78648 ) Fixes [PR77842](https://github.com/llvm/llvm-project/issues/77842) where UBSAN causes pragma full unroll to try and unroll INT_MAX times. This sets a cap to make sure we don't attempt this and crash the compiler. Testing: ninja check-all with new test --------- Co-authored-by: Nikita Popov <github@npopov.com>	2024-02-05 17:01:00 -08:00
Nikita Popov	2d69827c5c	[Transforms] Convert tests to opaque pointers (NFC)	2024-02-05 11:57:34 +01:00
Nikita Popov	62ae7d976f	[LoopUnroll] Fix missing sign extension For integers larger than 64-bit, this would zero-extend a -1 value, instead of sign-extending it. Fixes https://github.com/llvm/llvm-project/issues/80289.	2024-02-01 16:08:25 +01:00
Nikita Popov	f7b05e055f	[LoopUnroll] Add test for #80289 (NFC)	2024-02-01 16:08:25 +01:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Yingwei Zheng	b7f50e13d8	[InstCombine] Improve `foldICmpWithDominatingICmp` with DomConditionCache (#75370 ) This patch uses affected values from DomConditionCache(introduced by #73662), instead of a cheap/incomplete check `getSinglePredecessor`.	2023-12-14 21:02:10 +08:00
XiangZhang	1d6a678591	[LoopUnroll] Make use of MaxTripCount for loops with "#pragma unroll" (#74703 ) Fix loop unroll fail caused by branches folding. For example: SimplifyCFG foldloop branches then cause loop unroll failed for "#program unroll" loop. ``` #program unroll for (int I = 0; I < ConstNum; ++I) { // folding "I < ConstNum" and "Cond2" if (Cond2) { break; } xxx loop body; } ``` The pragma unroll metadata only takes effect if there is an exact trip count, but not if there is an upper bound trip count. This patch make it work with an upper bound trip count as well in shouldPragmaUnroll(). Loop unroll is important in stack nervous devices (e.g. GPU, and that is why a lot of GPU code mark loop with "#program unroll"). It usually much simplify the address (offset) calculations in old iterations, then we can do a lot of others optimizations, e.g, SROA, for these simplifed address (escape alloca the whole aggregates).	2023-12-08 19:43:10 +08:00
Philip Reames	ffb2af3ed6	[SCEVExpander] Attempt to reinfer flags dropped due to CSE (#72431 ) LSR uses SCEVExpander to generate induction formulas. The expander internally tries to reuse existing IR expressions. To do that, it needs to strip any poison generating flags (nsw, nuw, exact, nneg, etc..) which may not be valid for the newly added users. This is conservatively correct, but has the effect that LSR will strip nneg flags on zext instructions involved in trip counts in loop preheaders. To avoid this, this patch adjusts the expanded to reinfer the flags on the CSE candidate if legal for all possible users. This should fix the regression reported in https://github.com/llvm/llvm-project/issues/71200. This should arguably be done inside canReuseInstruction instead, but doing it outside is more conservative compile time wise. Both canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so right now we are performing work which is roughly O(N^2) in the size of the operand graph. We should fix that before making the per operand step more expensive. My tenative plan is to land this, and then rework the code to sink the logic into more core interfaces.	2023-12-07 13:20:36 -08:00
Nikita Popov	eecb99c5f6	[Tests] Add disjoint flag to some tests (NFC) These tests rely on SCEV looking recognizing an "or" with no common bits as an "add". Add the disjoint flag to relevant or instructions in preparation for switching SCEV to use the flag instead of the ValueTracking query. The IR with disjoint flag matches what InstCombine would produce.	2023-12-05 14:09:36 +01:00
Joshua Cao	5602636835	[LoopPeel] Peel iterations based on and, or conditions (#73413 ) For example, this allows us to peel this loop with a `and`: ``` for (int i = 0; i < N; ++i) { if (i % 2 == 0 && i < 3) // can peel based on \|\| as well f1(); f2(); ``` into: ``` for (int i = 0; i < 3; ++i) { // peel three iterations if (i % 2 == 0) f1(); f2(); } for (int i = 3; i < N; ++i) f2(); ```	2023-12-02 11:24:02 -08:00
Jeremy Morse	d2d9dc8eb4	[DebugInfo][RemoveDIs] Make debugify pass convert to/from RemoveDIs mode (#73251 ) Debugify is extremely useful as a testing and debugging tool, and a good number of LLVM-IR transform tests use it. We need it to support "new" non-instruction debug-info to get test coverage, but it's not important enough to completely convert right now (and it'd be a large undertaking). Thus: convert to/from dbg.value/DPValue mode on entry and exit of the pass, which gives us the functionality without any further work. The cost is compile-time, but again this is only happening during tests. Tested by: the large set of debugify tests enabled here. Note the InstCombine test (cast-mul-select.ll) that hasn't been fully enabled: this is because there's a debug-info sinking piece of code there that hasn't been instrumented.	2023-11-29 13:19:50 +00:00
Nikita Popov	a86bce6577	[LoopPeel] Regenerate test checks (NFC)	2023-11-28 15:42:07 +01:00
Craig Topper	03d4a9d94d	[InstCombine] Set disjoint flag when turning Add into Or. (#72702 ) The disjoint flag was recently added to IR in #72583	2023-11-27 12:54:11 -08:00
Jeremy Morse	c672ba7dde	[DebugInfo][RemoveDIs] Instrument inliner for non-instr debug-info (#72884 ) With intrinsics representing debug-info, we just clone all the intrinsics when inlining a function and don't think about it any further. With non-instruction debug-info however we need to be a bit more careful and manually move the debug-info from one place to another. For the most part, this means keeping a "cursor" during block cloning of where we last copied debug-info from, and performing debug-info copying whenever we successfully clone another instruction. There are several utilities in LLVM for doing this, all of which now need to manually call cloneDebugInfo. The testing story for this is not well covered as we could rely on normal instruction-cloning mechanisms to do all the hard stuff. Thus, I've added a few tests to explicitly test dbg.value behaviours, ahead of them becoming not-instructions.	2023-11-26 21:24:29 +00:00
Jeremy Morse	59fab22642	[DebugInfo][RemoveDIs] Support cloning and remapping DPValues (#72546 ) This patch adds support for CloneBasicBlock duplicating the DPValues attached to instructions, and adds facilities to remap them into their new context. The plumbing to achieve this is fairly straightforwards and mechanical. I've also added illustrative uses to LoopUnrollRuntime, SimpleLoopUnswitch and SimplifyCFG. The former only updates for the epilogue right now so I've added CHECK lines just for the end of an unrolled loop (further updates coming later). SimpleLoopUnswitch had no debug-info tests so I've added a new one. The two modified parts of SimplifyCFG are covered by the two modified SimplifyCFG tests. These are scenarios where we have to do extra cloning for copying of DPValues because they're no longer instructions, and remap them too.	2023-11-24 15:17:32 +00:00
Ramkumar Ramachandra	4c01a58008	update_analyze_test_checks: support output from LAA (#67584 ) update_analyze_test_checks.py is an invaluable tool in updating tests. Unfortunately, it only supports output from the CostModel, ScalarEvolution, and LoopVectorize analyses. Many LoopAccessAnalysis tests use hand-crafted CHECK lines, and it is moreover tedious to generate these CHECK lines, as the output fom the analysis is not stable, and requires the test-writer to hand-craft FileCheck matches. Alleviate this pain, and support output from: $ opt -passes='print<loop-accesses>' This patch includes several non-trivial changes including: - Preserving whitespace at the beginning of the line, so that the LAA output can be properly indented. - Regexes matching the unstable output, which is basically a pointer address hex. - Separating is_analyze from preserve_names clearly, as the former was formerly used as an overload for the latter. To demonstate the utility of this patch, several tests in LoopAccessAnalysis have been auto-generated by update_analyze_test_checks.py.	2023-10-31 14:33:53 +00:00
Aleksandr Popov	e8d5db206c	[LoopPeeling] Fix weights updating of peeled off branches (#70094 ) In https://reviews.llvm.org/D64235 a new algorithm has been introduced for updating the branch weights of latch blocks and their copies. It increases the probability of going to the exit block for each next peel iteration, calculating weights by (F - I * E, E), where: - F is a weight of the edge from latch to header. - E is a weight of the edge from latch to exit. - I is a number of peeling iteration. E.g: Let's say the latch branch weights are (100,300) and the estimated trip count is 4. If we peel off all 4 iterations the weights of the copied branches will be: 0: (100,300) 1: (100,200) 2: (100,100) 3: (100,1) https://godbolt.org/z/93KnoEsT6 So we make the original loop almost unreachable from the 3rd peeled copy according to the profile data. But that's only true if the profiling data is accurate. Underestimated trip count can lead to a performance issues with the register allocator, which may decide to spill intervals inside the loop assuming it's unreachable. Since we don't know how accurate the profiling data is, it seems better to set neutral 1/1 weights on the last peeled latch branch. After this change, the weights in the example above will look like this: 0: (100,300) 1: (100,200) 2: (100,100) 3: (100,100) Co-authored-by: Aleksandr Popov <apopov@azul.com>	2023-10-31 14:02:42 +01:00
David Green	75b3c3d267	[ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709 ) For MVE tail predicated loops, better code can be generated by keeping the loop whole than to unroll to an upper bound, which requires the expansion of active lane masks that can be difficult to generate good code for. This patch disables UpperBound unrolling when we find a active_lane_mask in the loop.	2023-10-31 09:51:30 +00:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Alex Richardson	e86d6a43f0	Regenerate test checks for tests affected by D141060	2023-10-04 10:51:35 -07:00
Nikita Popov	4d5525e0cb	[LoopUnroll] Add tests for excessive znver3 unrolls (NFC)	2023-09-28 12:10:38 +02:00
Matthias Braun	b30c9c9378	LoopUnrollRuntime: Add weights to all branches Make sure every conditional branch constructed by `LoopUnrollRuntime` code sets branch weights. - Add new 1:127 weights for the conditional jumps checking whether the whole (unrolled) loop should be skipped in the generated prolog or epilog code. - Remove `updateLatchBranchWeightsForRemainderLoop` function and just add weights immediately when constructing the relevant branches. This leads to simpler code and makes the code more obvious as every call to `CreateCondBr` now has a `BranchWeights` parameter. - Rework formula for epilogue latch weights, to assume equal distribution of remainders and remove `assert` (as I was able to reach this code when forcing small unroll factors on the commandline). Differential Revision: https://reviews.llvm.org/D158642	2023-09-11 14:23:29 -07:00
Nikita Popov	1c6e6432ca	[SCEVExpander] Fix incorrect reuse of more poisonous instructions (PR63763) SCEVExpander tries to reuse existing instruction with the same SCEV expression. However, doing this replacement blindly is not safe, because the instruction might be more poisonous. What we were already doing is to drop poison-generating flags on the reused instruction. But this is not the only way that more poison can be introduced. The poison-generating flag might not be directly on the reused instruction, or the poison contribution might come from something like 0 * %var, which folds to 0 but can still introduce poison. This patch fixes the issue in a principled way, by determining which values can contribute poison to the SCEV expression, and then checking whether any additional values can contribute poison to the instruction being reused. Poison-generating flags are dropped if doing that enables reuse. This is a pretty big hammer and does cause some regressions in tests, but less than I would have expected. I wasn't able to come up with a less intrusive fix that still satisfies the correctness requirements. Fixes https://github.com/llvm/llvm-project/issues/63763. Fixes https://github.com/llvm/llvm-project/issues/63926. Fixes https://github.com/llvm/llvm-project/issues/64333. Fixes https://github.com/llvm/llvm-project/issues/63727. Differential Revision: https://reviews.llvm.org/D158181	2023-08-22 09:27:07 +02:00
Ganesh Gopalasubramanian	536e805e4d	[X86] AMD Genoa (znver4) Change LoopMicroOpBufferSize to handle minimal unrolling of loops	2023-07-26 16:01:56 +05:30
Nikita Popov	a6705053c3	[LoopPeel] Clear dispositions after peeling Block dispositions of values defined inside the loop may change during peeling, so clear them. We already do this for other kinds of unrolling. Differential Revision: https://reviews.llvm.org/D153762	2023-07-19 10:39:59 +02:00
Nikita Popov	b9808e5660	[LoopUnroll] Fold add chains during unrolling Loop unrolling tends to produce chains of `%x1 = add %x0, 1; %x2 = add %x1, 1; ...` with one add per unrolled iteration. This patch simplifies these adds to `%xN = add %x0, N` directly during unrolling, rather than waiting for InstCombine to do so. The motivation for this is that having a single add (rather than an add chain) on the induction variable makes it a simple recurrence, which we specially recognize in a number of places. This allows InstCombine to directly perform folds with that knowledge, instead of first folding the add chains, and then doing other folds in another InstCombine iteration. Due to the reduced number of InstCombine iterations, this also results in a small compile-time improvement. Differential Revision: https://reviews.llvm.org/D153540	2023-07-05 09:54:28 +02:00
Nikita Popov	7905c48a84	[LoopUnroll] Add test for early add folding (NFC) Test for D153540, with adds that have different overflow flags.	2023-07-05 09:46:41 +02:00
Nikita Popov	d179421099	[LoopUnroll] Avoid undef indices in test (NFC) Doesn't really matter for the larger purpose of the test, but avoid the use of undef indices and instead use the loop induction variable as index, which is what was likely intended here.	2023-06-22 16:56:08 +02:00
Nikita Popov	4c748821cd	[LoopUnroll] Regenerate test checks (NFC)	2023-06-22 13:00:02 +02:00
Yevgeny Rouban	1ebbbf1614	[LoopUnrollRuntime] Allow indirect transition to deopt non-latch exit blocks Relax condition on runtime trip count unrolling loops with 1 non-latch exit that leads to a deop block. There are cases when the deopt blocks are common exits for different loops. LoopSimplify pass splits such edges to the common deopting blocks to make sure that all exit nodes of the loop only have predecessors that are inside of the loop (See simplifyOneLoop()). This breaks the current condition for unrolling. This patch allows the split transitive blocks that still lead to the deopting blocks. Differential Revision: https://reviews.llvm.org/D152639	2023-06-19 11:10:01 +07:00
Nikita Popov	79115aebb7	[LoopUnroll] Add test for SCEV invalidation issue (NFC) Test for the issue reported at https://reviews.llvm.org/D149331#4387931.	2023-06-05 17:28:32 +02:00
Joshua Cao	849d01bf3d	[LoopUnroll] Peel iterations based on select conditions This also allows us to peel loops with a `select`: ``` for (int i = 0; i <= N; ++i); f3(i == 0 ? a : b); // select instruction ``` into: ``` f3(a); // peel one iteration for (int i = 1; i <= N; ++i) f3(b); ``` Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D151052	2023-05-24 00:57:57 -07:00
Joshua Cao	b76f0a0b15	[LoopUnroll] Add tests for peeling iterations based on select, and, or conditions	2023-05-24 00:57:57 -07:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
ManuelJBrito	8b56da5e9f	[IR] Change shufflevector undef mask to poison With this patch an undefined mask in a shufflevector will be printed as poison. This change is done to support the new shufflevector semantics for undefined mask elements. Differential Revision: https://reviews.llvm.org/D149210	2023-04-27 14:41:10 +01:00
Yashwant Singh	aea2a14736	[LoopUnroll] Prevent LoopFullUnrollPass to perform partial/runtime unrolling FullLoopUnroll was performing runtime unrolling in certain cases when '#pragma unroll' was specified. Patch to fix this by introducing new parameter to tryToUnrollLoop() to differentiate between LoopUnrollPass and FullLoopUnrollPass. Based on the discussion here (https://discourse.llvm.org/t/loop-unroller-fails-to-unroll-loop/69834) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D148071	2023-04-13 10:21:24 +05:30
Joshua Cao	898a9ca5e9	[SCEV] Strengthen huge constant trip multiples. SCEV determines that loops with trip count >=2^32 have a trip multiple of 1 to guard against huge multiples. This patch stregthens this to instead find the greatest power of 2 divisor that is less than the threshold. Differential Revision: https://reviews.llvm.org/D147868	2023-04-10 20:00:46 -07:00
Joshua Cao	60470e163a	[LoopUnroll] Add script test checks for LoopUnroll/X86/mmx.ll	2023-04-10 19:59:01 -07:00
Max Kazantsev	d12af65d46	[TTI] Treat AND/OR with widenable conditions as free of cost Because widenable conditions with eventually lower into a constant, such instructions as `and`, `or` etc. will also be optimized away. Treat them as free. This is an important thing to have if we want that guards represented as experimental.guard calls and in their explicit form (branch by `and` with widenable condition) have the same cost for unroller and other passes like this. Differential Revision: https://reviews.llvm.org/D146034 Reviewed By: nikic	2023-03-14 20:55:17 +07:00
Max Kazantsev	b0ea210b35	[TTI] Evaluate cost of experimental_widenable_condition as zero This intrinsic is not supposed to live through lowering, eventually it should turn into `true` constant and be optimized away. Differential Revision: https://reviews.llvm.org/D146027 Reviewed By: skatkov	2023-03-14 17:11:07 +07:00
Max Kazantsev	77308dd400	[Test] Add missing REQUIRES: asserts in test	2023-03-14 16:28:23 +07:00
Max Kazantsev	a7bbeba74a	[Test] Add test showing difference in cost models for guards	2023-03-14 15:50:45 +07:00
Florian Hahn	7019624ee1	[SCEV] Strengthen nowrap flags via ranges for ARs on construction. At the moment, proveNoWrapViaConstantRanges is only used when creating SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by strengthening flags after creating the AddRec. I'll also share a follow-up patch that removes the code to strengthen flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while creating those can lead to surprising changes. Compile-time looks neutral: https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u Reviewed By: mkazantsev, nikic Differential Revision: https://reviews.llvm.org/D144050	2023-03-07 17:10:34 +01:00
Zhongyunde	15d5c59280	[InstCombine] Improvement the analytics through the dominating condition Address the dominating condition, the urem fold is benefit from the analytics improvements. Fix https://github.com/llvm/llvm-project/issues/60546 NOTE: delete the calls in simplifyBinaryIntrinsic and foldICmpWithDominatingICmp is used to reduce compile time. Reviewed By: nikic, arsenm, erikdesjardins Differential Revision: https://reviews.llvm.org/D144248	2023-03-01 17:03:34 +08:00
Mircea Trofin	8cf1524cbc	[loop unroll] Fix `branch-weights` for unrolled loop. The branch weights of the unrolled loop need to be reduced by the unroll factor. Differential Revision: https://reviews.llvm.org/D143948	2023-02-14 12:00:53 -08:00
Florian Hahn	f92b35392e	[LoopUnroll] Add test case exposing crash with d0907ce7ed9f.	2023-01-20 16:08:25 +00:00

1 2 3 4 5 ...

583 Commits