llvm-project

Author	SHA1	Message	Date
Philip Reames	091422adc1	[LSR] Fix wrapping bug in lsr-term-fold logic The existing logic was unsound, in two ways. First, due to wrapping on the trip count computation, it could compute a value which convert a loop which exiting on iteration 256, to one which exited at 255. (With i8 trip counts.) Second, it allowed rewriting when the trip count implies wrapping around the alternate IV. As a trivial example, it allowed rewriting an i128 exit test in terms of an i64 IV. This is obviously wrong. Note that the test change is fairly minimal - i.e. only the targeted test - but that's only because I precommitted a change which switched the test from 32 to 64 bit pointers. For 32 bit point architectures with 32 bit primary inductions, this transform is almost always unsound to perform. Differential Revision: https://reviews.llvm.org/D146429	2023-03-20 13:47:21 -07:00
Philip Reames	272ebd6957	[LSR] Inline getAlternateIVEnd and simplify [nfc] Also, add a comment to highlight that the "good" result on this test is accidental, and not based on a principled decision. I matched the original behavior to make this nfc, but selecting the last legal IV is not well motivated here.	2023-03-20 11:22:21 -07:00
Philip Reames	b9521484ec	[LSR] Rewrite IV match for term-fold using existing utilities Main benefit here is making the logic easier to follow, slightly more efficient, and more in line with LFTR. This is not NFC. There are three semantic changes here. First, we drop handling for constants on the LHS of the comparison. These are non-canonical, and we're very late in the optimization pipeline here, so there's no point in supporting this. I removed a test which covered this case. Second, we don't need the almost dead IV to be an addrec. We just need SCEV to be able to compute a trip count for it. Third, we require a simple IV for the almost dead IV. In theory, this removes cases we could have previously handled, but given a) zero testing and b) multiple known correctness issues, I'm adopting an attidute of narrowing this down to something which works correctly, and then expanding.	2023-03-20 10:41:01 -07:00
Philip Reames	67089a39a2	[LSR] Regen tests to adjust for naming in SCEVExpander [nfc]	2023-03-20 09:39:29 -07:00
Mark Goncharov	e4dd7ec39f	[LSR] Fold terminating condition not only for eq and ne. Add opportunity to fold any icmp instruction.	2023-03-20 13:42:27 +03:00
Philip Reames	6f00170159	[LSR] Rework term-fold tests There were two major problems with the tests. First, with the pointer size being 32 bit and the original IVs also being 32 bit, almost all of the positive tests were actually unsound. An upcoming change will add the appropriate safety check, but the test diffs are really hard to understand without switching the tests to 64 bit pointers first. Second, checking debug messages for failures is a major bad practice. This should not have been accepted in review at all. The reason is that it makes the order of legality checks visibile and modifying any of them becomes annoying and tedious.	2023-03-17 16:03:08 -07:00
Philip Reames	3a1fb672f7	[LSR] Cleanup term-fold tests Autogen for naming change, and remove comments about C code inspiration. Multiple of these are out of sync with the actual IR, and these are IR tests anyways.	2023-03-17 08:40:11 -07:00
Nikita Popov	687b5b9a0c	[SCEVExpander] Always use scevgep as name With opaque pointers the scevgep / uglygep distinction no longer makes sense -- GEPs are always emitted in offset-based representation.	2023-03-17 14:27:03 +01:00
Philip Reames	502ab13eb0	[LSR] Add tests which demonstrate miscompiles in the current term-fold code	2023-03-16 14:10:09 -07:00
Arthur Eubanks	7c3c981442	[Passes] Remove some legacy passes DFAJumpThreading JumpThreading LibCallsShrink LoopVectorize SLPVectorizer DeadStoreElimination AggressiveDCE CorrelatedValuePropagation IndVarSimplify These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.	2023-03-10 17:17:00 -08:00
Florian Hahn	7019624ee1	[SCEV] Strengthen nowrap flags via ranges for ARs on construction. At the moment, proveNoWrapViaConstantRanges is only used when creating SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by strengthening flags after creating the AddRec. I'll also share a follow-up patch that removes the code to strengthen flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while creating those can lead to surprising changes. Compile-time looks neutral: https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u Reviewed By: mkazantsev, nikic Differential Revision: https://reviews.llvm.org/D144050	2023-03-07 17:10:34 +01:00
wangpc	5fdab3c81b	[RISCV] Enable machine copy propagation for copy-like instructions Like what has been done in AArch64 (D125335). We enable this under `-O2` to show the codegen diffs here but we may only do this under `-O3` like AArch64. There are two cases that we may produce these eliminable copies: 1. ISel of `FrameIndex`. Like `rvv/fixed-vectors-calling-conv.ll`. 2. Tail duplication. Like `select-optimize-multiple.ll`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144535	2023-03-07 17:54:05 +08:00
Tiwari Abhinav Ashok Kumar	bfb1559fbe	[NFC] Fix missing colon in CHECK directives Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D144412	2023-02-21 00:13:04 +05:30
Florian Hahn	1538761303	[LSR] Add test case which shows additional LSR with D144050.	2023-02-16 16:12:07 +00:00
Craig Topper	7638409d43	[RISCV] Make vsetvli intrinsics default to MA. The vsetvli insertion pass can replace it with MU if needed by a using instruction. The vsetvli insertion pass will not convert MU to MA so we need to start at MA. Reviewed By: eopXD Differential Revision: https://reviews.llvm.org/D143790	2023-02-13 10:39:55 -08:00
chenglin.bi	14dedd9cf5	[Reland][LSR] Hoist IVInc to loop header if its all uses are in the loop header Original code will cause crash when the load/store memory type is structure because isIndexedLoadLegal/isIndexedStore doesn't support struct type. So we limit the load/store memory type to integer. Origin commit message: When the latch block is different from header block, IVInc will be expanded in the latch loop. We can't generate the post index load/store this case. But if the IVInc only used in the loop, actually we still can use the post index load/store because when exit loop we don't care the last IVInc value. So, trying to hoist IVInc to help backend to generate more post index load/store. Fix #53625 Reviewed By: eopXD Differential Revision: https://reviews.llvm.org/D138636	2023-02-10 16:52:00 +08:00
Matt Arsenault	aa65dba05c	LoopStrengthReduce: Convert AMDGPU tests to opaque pointers	2023-01-27 22:17:20 -04:00
Philip Reames	a9871772a8	[RISCV][LSR] Treat number of instructions as dominate factor in LSR cost decisions This matches the behavior from a number of other targets, including e.g. X86. This does have the effect of increasing register pressure slightly, but we have a relative abundance of registers in the ISA compared to other targets which use the same heuristic. The motivation here is that our current cost heuristic treats number of registers as the dominant cost. As a result, an extra use outside of a loop can radically change the LSR result. As an example consider test4 from the recently added test/Transforms/LoopStrengthReduce/RISCV/lsr-cost-compare.ll. Without a use outside the loop (see test3), we convert the IV into a pointer increment. With one, we leave the gep in place. The pointer increment version both decreases number of instructions in some loops, and creates parallel chains of computation (i.e. decreases critical path depth). Both are generally profitable. Arguably, we should really be using a more sophisticated model here - such as e.g. using profile information or explicitly modeling parallelism gains. However, as a practical matter starting with the same mild hack that other targets have used seems reasonable. Differential Revision: https://reviews.llvm.org/D142227	2023-01-24 11:42:37 -08:00
Philip Reames	7ad786a29e	[LSR] Generalize one aspect of terminator folding (recently introduced in D132443) There's no need to require the start value to come directly from the loop predecessor. This was sometimes covering up a latent miscompile in this off-by-default option, but the miscompile needs fixed anyways and the issue has been raised on the original review. Differential Revision: https://reviews.llvm.org/D142240	2023-01-20 12:19:43 -08:00
Philip Reames	b94e5ff248	[RISCV][LSR] Precommit test coverage for an upcoming change Main point of these is to show the difference between a loop with and without a use outside the loop.	2023-01-20 08:22:30 -08:00
Nikita Popov	9ed2f14c87	[AsmParser] Remove typed pointer auto-detection IR is now always parsed in opaque pointer mode, unless -opaque-pointers=0 is explicitly given. There is no automatic detection of typed pointers anymore. The -opaque-pointers=0 option is added to any remaining IR tests that haven't been migrated yet. Differential Revision: https://reviews.llvm.org/D141912	2023-01-18 09:58:32 +01:00
Florian Hahn	20ecc07991	[MachineCombiner] Lift same-bb restriction for reassociable ops. This patch relaxes the restriction that both reassociate operands must be in the same block as the root instruction. The comment indicates that the reason for this restriction was that the operands not in the same block won't have a depth in the trace. I believe this is outdated; if the operand is in a different block, it must dominate the current block (otherwise it would need to be phi), which in turn means the operand's block must be included in the current rance, and depths must be available. There's a test case (no_reassociate_different_block) added in 70520e2f1c5fc4 which shows that we have accurate depths for operands defined in other blocks. This allows reassociation of code that computes the final reduction value after vectorization, among other things. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D141302	2023-01-13 15:32:44 +00:00
chenglin.bi	b84ab1f7c9	Revert "[LSR] Hoist IVInc to loop header if its all uses are in the loop header" The original commit seems to cause a regression in numba test. This reverts commit b1b4758e7f4b2ffe1faa28b00eb037832e5d26a7.	2023-01-11 01:24:34 +08:00
chenglin.bi	b1b4758e7f	[LSR] Hoist IVInc to loop header if its all uses are in the loop header When the latch block is different from header block, IVInc will be expanded in the latch loop. We can't generate the post index load/store this case. But if the IVInc only used in the loop, actually we still can use the post index load/store because when exit loop we don't care the last IVInc value. So, trying to hoist IVInc to help backend to generate more post index load/store. Fix #53625 Reviewed By: eopXD Differential Revision: https://reviews.llvm.org/D138636	2023-01-10 18:34:00 +08:00
Nikita Popov	5867241eac	[Transforms] Convert some tests to opaque pointers (NFC)	2023-01-06 12:14:45 +01:00
OCHyams	7ea47f9e41	[DebugInfo] Replace UndefValue with PoisonValue in setKillLocation This helps towards the effort to remove UndefValue from LLVM. Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D140905	2023-01-06 10:51:02 +00:00
Nikita Popov	055fb7795a	[Transforms] Convert some tests to opaque pointers (NFC) These are all tests where conversion worked automatically, and required no manual fixup.	2023-01-05 12:43:45 +01:00
Anshil Gandhi	4bbcbdaee5	[AMDGPU] Unify divergent nodes if the PostDom tree has one root This patch allows AMDGPUUnifyDivergenceExitNodes pass to transform a function whose PDT has exactly one root and ends in a branch instruction. Fixes https://github.com/llvm/llvm-project/issues/58861. Reviewed By: ruiling, arsenm Differential Revision: https://reviews.llvm.org/D139780	2023-01-04 10:45:03 -07:00
Nikita Popov	25c338ccb6	[LSR] Convert test to check IR (NFC) Convert this llc -O3 test to instead check the IR after -loop-reduce.	2023-01-03 14:35:10 +01:00
Roman Lebedev	3a8e009f97	Revert "Reland "[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block"" One of these two changes is exposing (or causing) some more miscompiles. A reproducer is in progress, so reverting until resolved. This reverts commit 428f36401b1b695fd501ebfdc8773bed8ced8d4e.	2022-12-20 18:36:42 +03:00
Roman Lebedev	428f36401b	Reland "[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block" This reverts commit 37b8f09a4b61bf9bf9d0b9017d790c8b82be2e17, and returns commit 1bd0b82e508d049efdb07f4f8a342f35818df341. The miscompile was in InstCombine, and it has been addressed. This tries to approach the problem noted by @arsenm: terrible codegen for `__builtin_fpclassify()`: https://godbolt.org/z/388zqdE37 Just because the PHI in the common successor happens to have different incoming values for these two blocks, doesn't mean we have to give up. It's quite easy to deal with this, we just need to produce a select: https://alive2.llvm.org/ce/z/000srb Now, the cost model for this transform is rather overly strict, so this will basically never fire. We tally all (over all preds) the selects needed to the NumBonusInsts Differential Revision: https://reviews.llvm.org/D139275	2022-12-17 05:18:54 +03:00
Alexander Kornienko	37b8f09a4b	Revert "[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block" This reverts commit 1bd0b82e508d049efdb07f4f8a342f35818df341, since it leads to miscompiles. See https://reviews.llvm.org/D139275#3993229 and https://reviews.llvm.org/D139275#4001580.	2022-12-16 17:23:35 +01:00
Roman Lebedev	1bd0b82e50	[SimplifyCFG] `FoldBranchToCommonDest()`: deal with mismatched IV's in PHI's in common successor block This tries to approach the problem noted by @arsenm: terrible codegen for `__builtin_fpclassify()`: https://godbolt.org/z/388zqdE37 Just because the PHI in the common successor happens to have different incoming values for these two blocks, doesn't mean we have to give up. It's quite easy to deal with this, we just need to produce a select: https://alive2.llvm.org/ce/z/000srb Now, the cost model for this transform is rather overly strict, so this will basically never fire. We tally all (over all preds) the selects needed to the NumBonusInsts Differential Revision: https://reviews.llvm.org/D139275	2022-12-12 18:20:03 +03:00
chenglin.bi	15e41467f0	[LSR] precommit test for D138636; NFC	2022-11-25 13:46:51 +08:00
eopXD	c0ef83e3b9	[LSR] Check if terminating value is safe to expand before transformation According to report by @JojoR, the assertion error was hit hence we need to have this check before the actual transformation. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D136415	2022-11-15 14:56:47 -08:00
eopXD	10da9844d0	[LSR] Drop LSR solution if it is less profitable than baseline The LSR may suggest less profitable transformation to the loop. This patch adds check to prevent LSR from generating worse code than what we already have. Since LSR affects nearly all targets, the patch is guarded by the option 'lsr-drop-solution' and default as disable for now. The next step should be extending an TTI interface to allow target(s) to enable this enhancememnt. Debug log is added to remind user of such choice to skip the LSR solution. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D126043	2022-10-27 10:13:57 -07:00
eopXD	c9cd5bcf72	[LSR][RISCV] Pre-commit test case for D126043 Pre-commit test case for D126043 Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D134823	2022-10-27 01:54:10 -07:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Matthias Braun	189900eb14	X86: Stop assigning register costs for longer encodings. This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`. This was previously done because instruction encoding require a REX prefix when using them resulting in longer instruction encodings. I found that this regresses the quality of the register allocation as the costs impose an ordering on eviction candidates. I also feel that there is a bit of an impedance mismatch as the actual costs occure when encoding instructions using those registers, but the order of VReg assignments is not primarily ordered by number of Defs+Uses. I did extensive measurements with the llvm-test-suite wiht SPEC2006 + SPEC2017 included, internal services showed similar patterns. Generally there are a log of improvements but also a lot of regression. But on average the allocation quality seems to improve at a small code size regression. Results for measuring static and dynamic instruction counts: Dynamic Counts (scaled by execution frequency) / Optimization Remarks: Spills+FoldedSpills -5.6% Reloads+FoldedReloads -4.2% Copies -0.1% Static / LLVM Statistics: regalloc.NumSpills mean -1.6%, geomean -2.8% regalloc.NumReloads mean -1.7%, geomean -3.1% size..text mean +0.4%, geomean +0.4% Static / LLVM Statistics: mean -2.2%, geomean -3.1%) regalloc.NumSpills mean -2.6%, geomean -3.9%) regalloc.NumReloads mean +0.6%, geomean +0.6%) size..text Static / LLVM Statistics: regalloc.NumSpills mean -3.0% regalloc.NumReloads mean -3.3% size..text mean +0.3%, geomean +0.3% Differential Revision: https://reviews.llvm.org/D133902	2022-09-30 16:01:33 -07:00
Florian Hahn	3abaa3760d	[LSR] Preserve LCSSA in expander when rewriting loop exit values. The expanded values when rewriting exit values need to preserve LCSSA. Ask SCEVExpander to preserve LCSSA to ensure that. Fixes #58007.	2022-09-27 09:58:48 +01:00
eopXD	3b2011fd4f	[LSR] Fold terminating condition to other IV when possible When the IV is only used by the terminating condition (say IV-A) and the loop has a predictable back-edge count and we have another IV (say IV-B) that is an affine add recursion, we will be able to calculate the terminating value of IV-B in the loop pre-header. This patch adds attempts to replace IV-B as the new terminating condition and remove IV-A. It is safe to do so since IV-A is only used as the terminating condition. This transformation is suitable to be appended after LSR as it may optimize the loop into the situation mentioned above. The transformation can reduce number of IV-s in the loop by one. A cli option `lsr-term-fold` is added and default disabled. Reviewed By: mcberg2021, craig.topper Differential Revision: https://reviews.llvm.org/D132443	2022-09-20 01:38:47 -07:00
eopXD	eff5c3e6c6	[LSR] Precommit test for D132443 Pre-commit test for D132443 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D132452	2022-09-19 19:06:52 -07:00
Nuno Lopes	9df0b254d2	[NFC] Switch a few uses of undef to poison as placeholders for unreachable code	2022-07-23 21:50:11 +01:00
Philip Reames	6ab686eb86	[LSR] Allow already invariant operand for ICmpZero matching [try 2] Changes since initial commit: * Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in d767b392. * isLoopInvariant returns true for instructions outside the loop, but not necessarily above the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in 2aed3cdb. Original commit message follows... The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 13:29:43 -07:00
Philip Reames	2aed3cdb5e	[test] Reduced test for second distinct issue triggering revert of 9153515	2022-07-15 12:13:27 -07:00
Philip Reames	d767b392d0	[test] Reduced test which triggered revert of 9153515	2022-07-15 11:31:35 -07:00
Philip Reames	6fe766beba	Revert "[LSR] Allow already invariant operand for ICmpZero matching" This reverts commit 9153515a7bea9fb9dd4c76f70053a170bf825f35. Builtbot crash was reported in the commit thread, reverting while investigating.	2022-07-15 10:47:57 -07:00
Philip Reames	9153515a7b	[LSR] Allow already invariant operand for ICmpZero matching The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 09:51:00 -07:00
Nikita Popov	2a721374ae	[IR] Don't use blockaddresses as callbr arguments Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations: ; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo] The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations: * Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248) This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed. Differential Revision: https://reviews.llvm.org/D129288	2022-07-15 10:18:17 +02:00
Philip Reames	c0df6bc949	[RISCV][LSR] Add coverage for ICmpZero with scaled vscale values Follow up to 3bc09c7da5 - remove a fixme I forgot to remove, and add test cases showing remaining work. Note that scaled vscales show up in vectorized code from a couple of sources: * Element types smaller than vector block size (i.e. everything under i64) * Unrolling * LMUL > 1 The largest scaling we can currently have is 256 (e8 in every possible vector register). More practically useful scales are in the 2-16 range.	2022-07-14 10:55:31 -07:00

1 2 3 4 5 ...

528 Commits