llvm-project

Author	SHA1	Message	Date
Nikita Popov	fb5683449e	[Pipelines] Restore old DAE position in LTO pipeline This is a partial revert of D128830, restoring the previous position of DeadArgElim in the fat LTO pipeline. The motivation for this is a major code size regression observed in Rust and illustrated in the PhaseOrdering test. This is a conservative fix restoring the previous pipeline order. The real problem is that the LTO pipeline is conceptually broken: It doesn't have a CGSCC function simplification pipeline. The inliner is just being run by itself. This wouldn't be a problem if fat LTO used a standard design where ArgPromotion and DAE are only run after functions have already been simplified by the CGSCC inliner pipeline. Differential Revision: https://reviews.llvm.org/D146051	2023-03-14 17:00:17 +01:00
Nikita Popov	8df140c860	[PhaseOrdering] Add test for DAE/GlobalDCE interaction (NFC)	2023-03-14 15:19:39 +01:00
Arthur Eubanks	0d4a709bb8	[Pipeline] Adjust PostOrderFunctionAttrs placement in simplification pipeline We can infer more attribute information once functions are fully simplified, so move the PostOrderFunctionAttrs pass after the function simplification pipeline. However, just doing this can impact simplification of recursive functions since function simplification takes advantage of function attributes of callees (some LLVM tests are actually impacted by this), so keep a copy of PostOrderFunctionAttrs before the function simplification pipeline that only runs on recursive functions. For example, this fixes the small regression noticed in https://reviews.llvm.org/D128830. This requires some restructuring of the CGSCC NoRerun feature. We need to cache the ShouldNotRunFunctionPassesAnalysis analysis after the simplification is done, which now is after the second PostOrderFunctionAttrs run, rather than after the function simplification pipeline. Compile time impact: https://llvm-compile-time-tracker.com/compare.php?from=33cf40122279342b50f92a3a53f5c185390b6018&to=1bb2a07875634e508a6bdf2ca1b130f55510f060&stat=instructions:u Compile time increase from unconditionally running the first PostOrderFunctionAttrs: https://llvm-compile-time-tracker.com/compare.php?from=1bb2a07875634e508a6bdf2ca1b130f55510f060&to=f4f87e89cc7a35c64e3a103a8036192a84ae002b&stat=instructions:u Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D145210	2023-03-06 09:01:45 -08:00
Arthur Eubanks	bd80dbf284	[test] Precommit test for D145210	2023-03-03 11:22:32 -08:00
Sanjay Patel	6c7b2eef47	[PhaseOrdering] add test for vector load and cast transforms; NFC issue #51397	2023-03-01 13:07:16 -05:00
Alexey Bataev	e03d254bbd	[SLP]Do not reduce repeated values, use scalar red ops instead. Metric: size..text size..text results results0 diff SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test 445.00 461.00 3.6% SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 428477.00 428445.00 -0.0% External/SPEC/CFP2006/447.dealII/447.dealII.test 618849.00 618785.00 -0.0% For all tests some extra code was optimized, GCC-C-execute has some more inlining after Differential Revision: https://reviews.llvm.org/D132261	2023-02-17 07:19:35 -08:00
David Green	86bfeb906e	Revert "Inlining: Run the legacy AlwaysInliner before the regular inliner." This seems to cause large regressions in existing code, as much as 75% slower (4x the time taken). Small always inline functions seem to be used a lot in the cmsis-dsp library. I would add a phase ordering test to show the problems, but one already exists! The llvm/test/Transforms/PhaseOrdering/ARM/arm_mult_q15.ll was just changed by removing alwaysinline to hide the problems that existed. This reverts commit cae033dcf227aeecf58fca5af6fc7fde1fd2fb4f. This reverts commit 8e33c41e72ad42e4c27f8cbc3ad2e02b169637a1.	2023-02-10 15:01:49 +00:00
Amara Emerson	cae033dcf2	Inlining: Run the legacy AlwaysInliner before the regular inliner. We have several situations where it's beneficial for code size to ensure that every call to always-inline functions are inlined before normal inlining decisions are made. While the normal inliner runs in a "MandatoryOnly" mode to try to do this, it only does it on a per-SCC basis, rather than the whole module. Ensuring that all mandatory inlinings are done before any heuristic based decisions are made just makes sense. Despite being referred to the "legacy" AlwaysInliner pass, it's already necessary for -O0 because the CGSCC inliner is too expensive in compile time to run at -O0. This also fixes an exponential compile time blow up in https://github.com/llvm/llvm-project/issues/59126 Differential Revision: https://reviews.llvm.org/D143624	2023-02-09 16:49:29 -08:00
Florian Hahn	43acb61a08	Revert "[SCCP] Support NUW/NSW inference for all overflowing binary operators." This reverts commit 024115ab14822a97c09adcd2545c14e78b843b36. I suspect that this may be causing some buildbot bootstrapping failures. Revert while I investigate.	2023-01-28 21:33:28 +00:00
Florian Hahn	024115ab14	[SCCP] Support NUW/NSW inference for all overflowing binary operators. Extend the NUW/NSW inference logic add in 72121a20cd and cdeaf5f28c3dc to all overflowing binary operators. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D142721	2023-01-28 17:40:41 +00:00
Jamie Hill-Daniel	6b9317f52a	[InstCombine] Fold zero check followed by decrement to usub.sat Fold (a == 0) : 0 ? a - 1 into usub.sat(a, 1). Differential Revision: https://reviews.llvm.org/D140798	2023-01-09 14:22:25 +01:00
Florian Hahn	68469a80cb	[LV] Disable runtime unrolling for vectorized loops. This patch adds metadata to disable runtime unrolling to the vectorized loop. If runtime unrolling/interleaving is considered profitable, LV will interleave the loop directly. There should be no need to perform runtime unrolling at a later stage. Note that we already add metadata to disable runtime unrolling to the scalar loop after vectorization. The additional unrolling unnecessarily increases code size and compile time. In addition to that we have several bug reports of unncessary runtime unrolling for vectorized loops, e.g. PR40961 Compile-time improvements: NewPM-O3: -1.04% NewPM-ReleaseThinLTO: -0.59% NewPM-ReleaseLTO-g: -0.97% https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u Fixes #40306. Reviewed By: lebedev.ri, nikic Differential Revision: https://reviews.llvm.org/D115261	2023-01-06 10:56:17 +00:00
Roman Lebedev	08c2f4eb7a	[CVP] When expanding `urem`, always freeze the nominator As per the post-commit feedback - that was not the correct precondition to avoid it here. I think we should generally start changing mentality about `freeze`, the fact that we have been conditioned to be afraid of it (or of anything in LLVM in general) is the key problem here.	2022-12-31 05:00:43 +03:00
Roman Lebedev	66efb98632	[CVP] Expand bound `urem`s This kind of thing happens really frequently in LLVM's very own shuffle combining methods, and it is even considered bad practice to use `%` there, instead of using this expansion directly. Though, many of the cases there have variable divisors, so this won't help everything. Simple case: https://alive2.llvm.org/ce/z/PjvYf- There's alternative expansion via `umin`: https://alive2.llvm.org/ce/z/hWCVPb BUT while we can transform the first expansion into the `umin` one (e.g. for SCEV): https://alive2.llvm.org/ce/z/iNxKmJ ... we can't go in the opposite direction. Also, the non-`umin` expansion seems somewhat more codegen-friendly: https://godbolt.org/z/qzjx5bqWK https://godbolt.org/z/a7bj1axbx There's second variant of precondition: https://alive2.llvm.org/ce/z/zE6cbM but there the numerator must be non-undef / must be frozen.	2022-12-30 19:40:46 +03:00
Roman Lebedev	3d852d1e74	[NFC][PhaseOrdering] Re-autogenerate check lines in one test	2022-12-30 19:40:46 +03:00
Nikita Popov	2d742ef7a7	[PhaseOrdering] Add test for vector promotion regression (NFC) Sample test where cfd594f8bb5e779c81171e7c1e61ae8436efabd3 causes optimization regressions.	2022-12-20 14:42:37 +01:00
Roman Lebedev	4def99e642	[InstCombine] Try to fold `not` into `cmp` iff other users of `cmp` are freely invertible There is still some such patterns that require collaboration of folds to handle,that we don't currently do.	2022-12-19 00:24:28 +03:00
Sanjay Patel	d5f8878a6e	[InstCombine] canonicalize insertelement order based on index This puts lower insert indexes before higher. This is independent of endian, so it requires an adjustment to a fold added with 4446f71ce392, but it makes that fold more robust. That's also where this patch was suggested - D139668. This matches what we already do in DAGCombiner, but there is one more constraint because there's an existing canonicalization for insert-of-scalar-constant. I'm not sure if that is still needed, so it may be adjusted/removed as a follow-up.	2022-12-18 07:08:48 -05:00
Roman Lebedev	96d3c82645	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3)" While the PPC litte-endian miscompile did get addressed by https://reviews.llvm.org/D140046 the PPV big-endian bots are still unhappy. https://lab.llvm.org/buildbot/#/builders/93/builds/12560 This reverts commit 7bd358bcb4e358b4351c69e02ef76939e08acdc7.	2022-12-16 22:58:41 +03:00
Roman Lebedev	cfd594f8bb	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 3) * This is a recommit of 3c4d2a03968ccf5889bacffe02d6fa2443b0260f, * which was reverted in 25f01d593ce296078f57e872778b77d074ae5888, because it exposed a miscompile in PPC backend, which was resolved in https://reviews.llvm.org/D140089 / cb3f415cd2019df7d14683842198bc4b7a492bc5. * which was a recommit of cf624b23bc5d5a6161706d1663def49380ff816a, * which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df, because the cut-off on the number of vector elements was not low enough, and it triggered both SDAG SDNode operand number assertions, 5and caused compile time explosions in some cases. Let's try with something really REALLY conservative first, just to get somewhere, and try to bump it later. FIXME: should this respect TTI reg width * num vec regs? Original commit message: Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-12-16 19:27:38 +03:00
Sanjay Patel	4446f71ce3	[InstCombine] try to fold a pair of insertelements into one insertelement This replaces patches that tried to convert related patterns to shuffles (D138872, D138873, D138874 - reverted/abandoned) but caused codegen problems and were questionable as a canonicalization because an insertelement is a simpler op than a shuffle. This detects a larger pattern -- insert-of-insert -- and replaces with another insert, so this hopefully does not cause any problems. As noted by TODO items in the code and tests, this could go a lot further. But this is enough to reduce the motivating test from issue #17113. Example proofs: https://alive2.llvm.org/ce/z/NnUv3a I drafted a version of this for AggressiveInstCombine, but it seems that would uncover yet another phase ordering gap. If we do generalize this to handle the full range of potential patterns, that may be worth looking at again. Differential Revision: https://reviews.llvm.org/D139668	2022-12-12 10:39:58 -05:00
Sanjay Patel	05dbdb0088	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try)" This reverts commit e71b81cab09bf33e3b08ed600418b72cc4117461. As discussed in the planned follow-on to this patch (D138874), this and the subsequent patches in this set can cause trouble for the backend, and there's probably no quick fix. We may even want to canonicalize in the opposite direction (towards insertelt).	2022-12-08 14:16:46 -05:00
Roman Lebedev	75d1a815c3	[NFC] Port all PhaseOrdering tests to `-passes=` syntax	2022-12-08 02:38:50 +03:00
Sanjay Patel	e71b81cab0	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 (2nd try) The first attempt was reverted because a clang test changed unexpectedly - the file is already marked with a FIXME, so I just updated it this time to pass. Original commit message: This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 14:52:20 -05:00
Sanjay Patel	5eacdcff06	Revert "[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1" This reverts commit a4c466766db77cd1fb42d7f98f32bb87a3d38829. This broke clang tests that are wrongly dependent on the optimizer.	2022-11-30 14:10:50 -05:00
Sanjay Patel	a4c466766d	[InstCombine] canonicalize trunc + insert as bitcast + shuffle, part 1 This is the main patch for converting a truncated scalar that is inserted into a vector to bitcast+shuffle. We could go either way on patterns like this, but this direction will allow collapsing a pair of these sequences on the motivating example from issue The patch is split into 3 parts to make it easier to see the progression of tests diffs. We allow inserting/shuffling into a different size vector for flexibility, so there are several test variations. The length-changing is handled by shortening/padding the shuffle mask with undef elements. In part 1, handle the basic pattern: inselt undef, (trunc T), IndexC --> shuffle (bitcast T), IdentityMask Proof for the endian-dependency behaving as expected: https://alive2.llvm.org/ce/z/BsA7yC The TODO items for handling shifts and insert into an arbitrary base vector value are implemented as follow-ups. Differential Revision: https://reviews.llvm.org/D138872	2022-11-30 13:22:04 -05:00
William Huang	be4b1dd35b	[InstCombine] Revert D125845 Reverting D125845 `[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back` because multiple users reported performance regression Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D138950	2022-11-29 22:02:40 +00:00
Sanjay Patel	c7bd82dfd8	[PhaseOrdering] add test for vector load combining; NFC This is another example from issue #17113	2022-11-28 16:00:06 -05:00
Matt Arsenault	1c55cc600e	PhaseOrdering: Convert tests to opaque pointers Required manually running update_test_checks: AArch64/hoisting-sinking-required-for-vectorization.ll AArch64/peel-multiple-unreachable-exits-for-vectorization.ll ARM/arm_mult_q15.ll X86/hoist-load-of-baseptr.ll X86/spurious-peeling.ll	2022-11-27 21:26:41 -05:00
Roman Lebedev	25f01d593c	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)" TableGen is still getting miscompiled on PPC buildbots. Sent a mail with request for help. This reverts commit 3c4d2a03968ccf5889bacffe02d6fa2443b0260f.	2022-11-27 00:00:06 +03:00
Roman Lebedev	3c4d2a0396	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2) This is a recommit of cf624b23bc5d5a6161706d1663def49380ff816a, which was reverted in 5cfc22cafe3f2465e0bb324f8daba82ffcabd0df, because the cut-off on the number of vector elements was not low enough, and it triggered both SDAG SDNode operand number assertions, and caused compile time explosions in some cases. Let's try with something really REALLY conservative first, just to get somewhere, and try to bump it (to 64/128) later. FIXME: should this respect TTI reg width * num vec regs? Original commit message: Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-11-26 23:19:15 +03:00
Max Kazantsev	211d941188	[SCEV] Rename max backedge-taken count -> constant max backedge taken-count in printout This is a preparatory step for introducing symbolic max backedge-taken count.	2022-11-24 18:43:42 +07:00
Benjamin Kramer	5cfc22cafe	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes" This reverts commit cf624b23bc5d5a6161706d1663def49380ff816a. It triggers crashes in clang, see the comments on github on the original change.	2022-11-23 13:11:16 +01:00
Roman Lebedev	cf624b23bc	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-11-23 02:38:25 +03:00
Roman Lebedev	6a77477d53	[NFC][SROA] Autogenerate check lines in some tests being affected by upcoming change	2022-11-23 02:38:25 +03:00
Sanjay Patel	163bb6d64e	[Passes][VectorCombine] enable early run generally and try load folds An early run of VectorCombine was added with D102496 specifically to deal with unnecessary vector ops produced with the C matrix extension. This patch is proposing to try those folds in general and add a pair of load folds to the menu. The load transform will partly solve (see PhaseOrdering diffs) a longstanding vectorization perf bug by removing redundant loads via GVN: issue #17113 The main reason for not enabling the extra pass generally in the initial patch was compile-time cost. The cost of VectorCombine was significantly (surprisingly) improved with: 87debdadaf18 https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u ...so the extra run is going to cost very little now - the total cost of the 2 runs should be less than the 1 run before that micro-optimization: https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u It may be possible to reduce the cost slightly more with a few more earlier-exits like that, but it's probably in the noise based on timing experiments. Differential Revision: https://reviews.llvm.org/D138353	2022-11-21 13:57:55 -05:00
Roman Lebedev	8adfa29706	[Pipelines] Introduce SROA after (final, run-time) loop unrolling Now that we are done with loop unrolling, be it either by LoopVectorizer, or LoopUnroll passes, some variable-offset GEP's into alloca's could have become constant-offset, thus enabling SROA and alloca promotion, yet we don't capitalize on that, which is surprizing. While it would be good to not introduce one more SROA invocation, but instead move the one from `PassBuilder::buildFunctionSimplificationPipeline()`, the existing test coverage says that is a bad idea, though it would be fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=b150d34c47efbd8fa09604bce805c0920360f8d7&to=5a9a5c855158b482552be8c7af3e73d67fa44805&stat=instructions So instead, i add yet another SROA run. I have checked, and it needs to be at least after said final loop unrolling. This is still fine compile-time wise: https://llvm-compile-time-tracker.com/compare.php?from=70324cd88328c0924e605fa81b696572560aa5c9&to=fb489bbef687ad821c3173a931709f9cad9aee8a&stat=instructions I've encountered this in a real code, `SROA-after-final-loop-unrolling.ll` has been reduced from https://godbolt.org/z/fsdMhETh3 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D136806	2022-11-17 21:31:30 +03:00
Sanjay Patel	3682180e5f	[PhaseOrdering] add test for load combining; NFC Based on an example in issue #17113	2022-11-17 10:33:35 -05:00
Arthur Eubanks	70dc3b811e	[AggressiveInstCombine] Remove legacy PM pass As part of legacy PM optimization pipeline removal. This shouldn't be used in codegen pipelines so it should be ok to remove. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D137116	2022-11-15 14:35:15 -08:00
Bjorn Pettersson	893e351f2f	[test] Avoid legacy PM default pipelines (O0,O1 etc) when running opt Two lit tests were found running something like this: opt -O<n> -pass-locked-to-legacy-PM ... The expand-atomicrmw-xchg-fp.ll seem to have used -O1 just to ensure that the -atomic-expand pass were thinking that it wasn't running at O0 level. Same thing can be ensured by using the -codegen-opt-level=1 option, making it possible to avoid using O1 in that test case. In the vector-reductions-expanded.ll test case it was possible to split the RUN line into using two opt invocations. First running "opt -O2" using the new PM, and then running "opt -expand-reductions" using the legacy PM. I think that given this patch we get closer to removing code related to 'AddOptimizationPasses' in opt.cpp. Differential Revision: https://reviews.llvm.org/D137626	2022-11-09 09:57:57 +01:00
Sanjay Patel	115d2f69a5	[InstCombine] canonicalize branch with logical-and-not condition https://alive2.llvm.org/ce/z/EfHlWN In the motivating case from issue #58313, this allows forming a duplicate 'not' op which then gets CSE'd and simplifyCFG'd and combined into the expected 'xor'.	2022-10-31 15:51:45 -04:00
Sanjay Patel	a7265f9102	[InstCombine] add tests for branch on logical and/or; NFC	2022-10-31 15:51:45 -04:00
David Green	5dd7d2ce67	[InstCombine] Don't change switch table from desirable to illegal types In InstCombine we treat i8/i16 as desirable, even if they are not legal. The current logic in shouldChangeType will decide to convert from an illegal but desirable type (such as an i8) to an illegal and undesirable type (such as i3). This patch prevents changing the switch conditions to an irregular type on like Arm/AArch64 where i8/i16 are not legal. This is the same issue as https://reviews.llvm.org/D54115. In the case I was looking it is was converting an i32 switch to an i8 switch, which then became a i3 switch. Differential Revision: https://reviews.llvm.org/D136763	2022-10-28 10:15:41 +01:00
Roman Lebedev	e14f30584d	[NFC][PhaseOrdering] Add one more test for SROA after partial unroll https://reviews.llvm.org/D136806	2022-10-27 19:10:27 +03:00
Roman Lebedev	70324cd883	[NFC][PhaseOrdering] Add new test for SROA misplacement	2022-10-27 03:11:23 +03:00
Yaxun (Sam) Liu	9d5adc7e49	Revert "reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost" This reverts commit bd7949bcd86633bd4203b2ba6f891aea00fce4d1. Revert this patch since reviwers have different opinions regarding the approach in post-commit review. Will open RFC for further discussion. Differential Revision: https://reviews.llvm.org/D132408	2022-10-25 12:15:39 -04:00
Yaxun (Sam) Liu	bd7949bcd8	reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost Fixed compile time increase due to always constructing LocalCostTracker. Now only construct LocalCostTracker when needed.	2022-10-24 15:43:53 -04:00
William Huang	6c767cef5a	[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D125845	2022-10-20 17:41:26 +00:00
bipmis	38f3e44997	[AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads. This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns. Differential Revision: https://reviews.llvm.org/D135137	2022-10-19 11:22:58 +01:00
bipmis	82e3056255	Add test for combinations of four i8-loads spliced into a 32-bit value	2022-10-18 15:40:56 +01:00

1 2 3 4 5 ...

458 Commits