llvm-project

Author	SHA1	Message	Date
Philip Reames	27a62ec72a	[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234 ) This transformation doesn't actually use any of the internal state of LSR and recomputes all information from SCEV. Splitting it out makes it easier to test. Note that long term I would like to write a version of this transform which is integrated with LSR's solver, but if that happens, we'll just delete the extra pass. Integration wise, I switched from using TTI to using a pass configuration variable. This seems slightly more idiomatic, and means we don't run the extra logic on any target other than RISCV.	2024-08-17 18:34:23 -07:00
Luke Lau	557ef043af	[RISCV] Copy AVLs whose LiveIntervals aren't extendable in insertVSETVLI (#98342 ) Currently before forwarding an AVL we do a simple non-exhaustive check to see if its LiveInterval is extendable. But we also need to check for this when we're extending an AVL's LiveInterval via merging the VSETVLIInfos in transferBefore with equally zero AVLs. Rather than trying to conservatively prevent these cases, this inserts a copy of the AVL instead if we don't know we'll be able to extend it. This is likely to be more robust, and even if the extra copy is undesirable these cases should be rare in practice.	2024-07-15 13:18:12 +08:00
Alex Bradbury	f8dbe1d09d	Revert "[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default" (#98328 ) Reverts llvm/llvm-project#89927 while we investigate performance regressions reported by @dtcxzyw	2024-07-10 15:33:20 +01:00
Alex Bradbury	af47a4ec50	[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default (#89927 ) This avoids some cases where LSR produces results that lead to very poor codegen. There's a chance we'll see minor degradations for some inputs in the case that our metrics say the found solution is worse, but in reality it's better than the starting point. Per the review thread, at least one vendor has been enabling this by defualt for some time and found overall it's an improvement. As such, we'll enable by default and aim to fix any as-yet-unknown regressions in-tree.	2024-07-10 13:23:31 +01:00
Luke Lau	db782b44b3	[RISCV] Don't forward AVL in VSETVLIInfo if it would clobber other definitions (#97264 ) This fixes a crash found when compiling OpenBLAS with -mllvm -verify-machineinstrs. When we "forward" the AVL from the output of a vsetvli, we might have to extend the LiveInterval of the AVL to where insert the new vsetvli. Most of the time we are able to extend the LiveInterval because there's only one val num (definition) for the register. But PHI elimination can assign multiple values to the same register, in which case we end up clobbering a different val num when extending: %x = PseudoVSETVLI %avl, ... %avl = ADDI ... %v = PseudoVADD ..., avl=%x ; %avl is forwarded to PseudoVADD: %x = PseudoVSETVLI %avl, ... %avl = ADDI ... %v = PseudoVADD ..., avl=%avl Here there's no way to extend the %avl from the vsetvli since %avl is redefined, i.e. we have two val nums. This fixes it by only forwarding it when we have exactly one val num, where it should be safe to extend it.	2024-07-05 11:44:59 +08:00
Luke Lau	eb76bc38ff	[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers (#96200 ) If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same VLMAX, we treat it as the AVL of said vsetvli. This allows us to remove a true dependency as well as treating VSETVLIInfos as equal in more places and avoid toggles. We do this in two places, needVSETVLI and computeInfoForInstr. However we don't do this in computeInfoForInstr's vsetvli equivalent, getInfoForVSETVLI. We also have a restriction only in computeInfoForInstr that the AVL can't be a register as we want to avoid extending live ranges. This patch does two interlinked things: 1) It adds this AVL "peeking" to getInfoForVSETVLI 2) It relaxes the constraint that the AVL can't be a register in computeInfoForInstr, since it removes a use of the output VL which can actually reduce register pressure. E.g. see the diff in @vector_init_vsetvli_N and @test6 Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we can remove the check in needVSETVLI. We also need to update how we update LiveIntervals in insertVSETVLI, as we can now end up needing to extend the LiveRange of the AVL across blocks.	2024-06-23 20:20:59 +08:00
Alex Bradbury	3ac6a646d4	[RISCV][test] Precommit LSR test that partially motivates #89927	2024-06-05 21:01:50 +02:00
Philip Reames	baca93fc83	[LSR] Tweak debug output to always print initial cost	2024-05-14 13:34:20 -07:00
Nikita Popov	72763521c3	[LSR] Clear SCEVExpander before calling DeleteDeadPHIs To avoid an assertion failure when an AssertingVH is removed, as reported in: https://github.com/llvm/llvm-project/pull/82362#issuecomment-1960067147 Also remove an unnecessary use of SCEVExpanderCleaner.	2024-02-22 22:50:26 +01:00
Philip Reames	5ce067d592	Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V." This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus re-enables terminator folding for RISCV. The reported miscompile has been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.	2024-01-11 13:20:02 -08:00
Craig Topper	fdb87640ee	[LSR][TTI][RISCV] Disable terminator folding for RISC-V. This is a partial revert of e947f953370abe8ffc8713b8f3250a3ec39599fe. It caused a miscompile in downstream testing. Spoke with Philip offline. We believe the issue is that LSR needs to make sure the Step of the other AddRec is non-zero. Reverting until Philip is back from vacation.	2023-12-27 15:13:32 -08:00
Philip Reames	e947f95337	[LSR][TTI][RISCV] Enable terminator folding for RISC-V If looking for a miscompile revert candidate, look here! The transform being enabled prefers comparing to a loop invariant exit value for a secondary IV over using an otherwise dead primary IV. This increases register pressure (by requiring the exit value to be live through the loop), but reduces the number of instructions within the loop by one. On RISC-V which has a large number of scalar registers, this is generally a profitable transform. We loose the ability to use a beqz on what is typically a count down IV, and pay the cost of computing the exit value on the secondary IV in the loop preheader, but save an add or sub in the loop body. For anything except an extremely short running loop, or one with extreme register pressure, this is profitable. On spec2017, we see a 0.42% geomean improvement in dynamic icount, with no individual workload regressing by more than 0.25%. Code size wise, we trade a (possibly compressible) beqz and a (possibly compressible) addi for a uncompressible beq. We also add instructions in the preheader. Net result is a slight regression overall, but neutral or better inside the loop. Previous versions of this transform had numerous cornercase correctness bugs. All of them ones I can spot by inspection have been fixed, and I have run this through all of spec2017, but there may be further issues lurking. Adding uses to an IV is a fraught thing to do given poison semantics, so this transform is somewhat inherently risky. This patch is a reworked version of D134893 by @eop. That patch has been abandoned since May, so I picked it up, reworked it a bit, and am landing it.	2023-11-29 12:04:06 -08:00
Tobias Hieta	f84bac329b	[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0 since I forgot the lit.local.cfg files in that one. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: barannikov88, kwk Differential Revision: https://reviews.llvm.org/D150762	2023-05-17 17:03:15 +02:00
wangpc	5fdab3c81b	[RISCV] Enable machine copy propagation for copy-like instructions Like what has been done in AArch64 (D125335). We enable this under `-O2` to show the codegen diffs here but we may only do this under `-O3` like AArch64. There are two cases that we may produce these eliminable copies: 1. ISel of `FrameIndex`. Like `rvv/fixed-vectors-calling-conv.ll`. 2. Tail duplication. Like `select-optimize-multiple.ll`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D144535	2023-03-07 17:54:05 +08:00
Craig Topper	7638409d43	[RISCV] Make vsetvli intrinsics default to MA. The vsetvli insertion pass can replace it with MU if needed by a using instruction. The vsetvli insertion pass will not convert MU to MA so we need to start at MA. Reviewed By: eopXD Differential Revision: https://reviews.llvm.org/D143790	2023-02-13 10:39:55 -08:00
Philip Reames	a9871772a8	[RISCV][LSR] Treat number of instructions as dominate factor in LSR cost decisions This matches the behavior from a number of other targets, including e.g. X86. This does have the effect of increasing register pressure slightly, but we have a relative abundance of registers in the ISA compared to other targets which use the same heuristic. The motivation here is that our current cost heuristic treats number of registers as the dominant cost. As a result, an extra use outside of a loop can radically change the LSR result. As an example consider test4 from the recently added test/Transforms/LoopStrengthReduce/RISCV/lsr-cost-compare.ll. Without a use outside the loop (see test3), we convert the IV into a pointer increment. With one, we leave the gep in place. The pointer increment version both decreases number of instructions in some loops, and creates parallel chains of computation (i.e. decreases critical path depth). Both are generally profitable. Arguably, we should really be using a more sophisticated model here - such as e.g. using profile information or explicitly modeling parallelism gains. However, as a practical matter starting with the same mild hack that other targets have used seems reasonable. Differential Revision: https://reviews.llvm.org/D142227	2023-01-24 11:42:37 -08:00
Philip Reames	b94e5ff248	[RISCV][LSR] Precommit test coverage for an upcoming change Main point of these is to show the difference between a loop with and without a use outside the loop.	2023-01-20 08:22:30 -08:00
eopXD	10da9844d0	[LSR] Drop LSR solution if it is less profitable than baseline The LSR may suggest less profitable transformation to the loop. This patch adds check to prevent LSR from generating worse code than what we already have. Since LSR affects nearly all targets, the patch is guarded by the option 'lsr-drop-solution' and default as disable for now. The next step should be extending an TTI interface to allow target(s) to enable this enhancememnt. Debug log is added to remind user of such choice to skip the LSR solution. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D126043	2022-10-27 10:13:57 -07:00
eopXD	c9cd5bcf72	[LSR][RISCV] Pre-commit test case for D126043 Pre-commit test case for D126043 Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D134823	2022-10-27 01:54:10 -07:00
Philip Reames	6ab686eb86	[LSR] Allow already invariant operand for ICmpZero matching [try 2] Changes since initial commit: * Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in d767b392. * isLoopInvariant returns true for instructions outside the loop, but not necessarily above the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in 2aed3cdb. Original commit message follows... The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 13:29:43 -07:00
Philip Reames	2aed3cdb5e	[test] Reduced test for second distinct issue triggering revert of 9153515	2022-07-15 12:13:27 -07:00
Philip Reames	d767b392d0	[test] Reduced test which triggered revert of 9153515	2022-07-15 11:31:35 -07:00
Philip Reames	6fe766beba	Revert "[LSR] Allow already invariant operand for ICmpZero matching" This reverts commit 9153515a7bea9fb9dd4c76f70053a170bf825f35. Builtbot crash was reported in the commit thread, reverting while investigating.	2022-07-15 10:47:57 -07:00
Philip Reames	9153515a7b	[LSR] Allow already invariant operand for ICmpZero matching The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 09:51:00 -07:00
Philip Reames	c0df6bc949	[RISCV][LSR] Add coverage for ICmpZero with scaled vscale values Follow up to 3bc09c7da5 - remove a fixme I forgot to remove, and add test cases showing remaining work. Note that scaled vscales show up in vectorized code from a couple of sources: * Element types smaller than vector block size (i.e. everything under i64) * Unrolling * LMUL > 1 The largest scaling we can currently have is 256 (e8 in every possible vector register). More practically useful scales are in the 2-16 range.	2022-07-14 10:55:31 -07:00
Philip Reames	3bc09c7da5	[SCEVExpander] Allow udiv with isKnownNonZero(RHS) + add vscale case Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all. SCEVExpander needs() to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values. vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes. () As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference. Differential Revision: https://reviews.llvm.org/D129710	2022-07-14 08:56:58 -07:00
Philip Reames	ddd4ed9944	[LSR] Add test coverage for ICmpZero cases involving urem RHS For the moment, we're pretty conservative here. My motivating case is the vscale one (as that is idiomatic for scalable vectorized loops on RISCV). There are two obvious approaches to fixing this, and I tried to add reasonable coverage for both even though I'll likely only fix one.	2022-07-13 17:15:07 -07:00

27 Commits