27 Commits

Author SHA1 Message Date
Philip Reames
27a62ec72a
[LSR] Split the -lsr-term-fold transformation into it's own pass (#104234)
This transformation doesn't actually use any of the internal state of
LSR and recomputes all information from SCEV.  Splitting it out makes
it easier to test.
    
Note that long term I would like to write a version of this transform
which *is* integrated with LSR's solver, but if that happens, we'll
just delete the extra pass.
    
Integration wise, I switched from using TTI to using a pass configuration
variable.  This seems slightly more idiomatic, and means we don't run
the extra logic on any target other than RISCV.
2024-08-17 18:34:23 -07:00
Luke Lau
557ef043af
[RISCV] Copy AVLs whose LiveIntervals aren't extendable in insertVSETVLI (#98342)
Currently before forwarding an AVL we do a simple non-exhaustive check
to see if its LiveInterval is extendable. But we also need to check for
this when we're extending an AVL's LiveInterval via merging the
VSETVLIInfos in transferBefore with equally zero AVLs.

Rather than trying to conservatively prevent these cases, this inserts a
copy of the AVL instead if we don't know we'll be able to extend it.
This is likely to be more robust, and even if the extra copy is
undesirable these cases should be rare in practice.
2024-07-15 13:18:12 +08:00
Alex Bradbury
f8dbe1d09d
Revert "[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default" (#98328)
Reverts llvm/llvm-project#89927 while we investigate performance
regressions reported by @dtcxzyw
2024-07-10 15:33:20 +01:00
Alex Bradbury
af47a4ec50
[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default (#89927)
This avoids some cases where LSR produces results that lead to very poor
codegen. There's a chance we'll see minor degradations for some inputs
in the case that our metrics say the found solution is worse, but in
reality it's better than the starting point.

Per the review thread, at least one vendor has been enabling this by
defualt for some time and found overall it's an improvement. As such,
we'll enable by default and aim to fix any as-yet-unknown regressions
in-tree.
2024-07-10 13:23:31 +01:00
Luke Lau
db782b44b3
[RISCV] Don't forward AVL in VSETVLIInfo if it would clobber other definitions (#97264)
This fixes a crash found when compiling OpenBLAS with -mllvm
-verify-machineinstrs.

When we "forward" the AVL from the output of a vsetvli, we might have to
extend the LiveInterval of the AVL to where insert the new vsetvli.

Most of the time we are able to extend the LiveInterval because there's
only one val num (definition) for the register. But PHI elimination can
assign multiple values to the same register, in which case we end up
clobbering a different val num when extending:

    %x = PseudoVSETVLI %avl, ...
    %avl = ADDI ...
    %v = PseudoVADD ..., avl=%x
    ; %avl is forwarded to PseudoVADD:
    %x = PseudoVSETVLI %avl, ...
    %avl = ADDI ...
    %v = PseudoVADD ..., avl=%avl

Here there's no way to extend the %avl from the vsetvli since %avl is
redefined, i.e. we have two val nums.

This fixes it by only forwarding it when we have exactly one val num,
where it should be safe to extend it.
2024-07-05 11:44:59 +08:00
Luke Lau
eb76bc38ff
[RISCV] Relax RISCVInsertVSETVLI output VL peeking to cover registers (#96200)
If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same
VLMAX, we treat it as the AVL of said vsetvli.

This allows us to remove a true dependency as well as treating
VSETVLIInfos as equal in more places and avoid toggles.

We do this in two places, needVSETVLI and computeInfoForInstr. However
we don't do this in computeInfoForInstr's vsetvli equivalent,
getInfoForVSETVLI.

We also have a restriction only in computeInfoForInstr that the AVL
can't be a register as we want to avoid extending live ranges.

This patch does two interlinked things:

1) It adds this AVL "peeking" to getInfoForVSETVLI

2) It relaxes the constraint that the AVL can't be a register in
computeInfoForInstr, since it removes a use of the output VL which can
actually reduce register pressure. E.g. see the diff in
@vector_init_vsetvli_N and @test6

Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we
can remove the check in needVSETVLI.

We also need to update how we update LiveIntervals in insertVSETVLI, as
we can now end up needing to extend the LiveRange of the AVL across
blocks.
2024-06-23 20:20:59 +08:00
Alex Bradbury
3ac6a646d4 [RISCV][test] Precommit LSR test that partially motivates #89927 2024-06-05 21:01:50 +02:00
Philip Reames
baca93fc83 [LSR] Tweak debug output to always print initial cost 2024-05-14 13:34:20 -07:00
Nikita Popov
72763521c3 [LSR] Clear SCEVExpander before calling DeleteDeadPHIs
To avoid an assertion failure when an AssertingVH is removed,
as reported in:
https://github.com/llvm/llvm-project/pull/82362#issuecomment-1960067147

Also remove an unnecessary use of SCEVExpanderCleaner.
2024-02-22 22:50:26 +01:00
Philip Reames
5ce067d592 Revert "[LSR][TTI][RISCV] Disable terminator folding for RISC-V."
This reverts commit fdb87640ee2be63af9b0e0cd943cb13d79686a03, and thus
re-enables terminator folding for RISCV.  The reported miscompile has
been fixed in f5dd70c58277d925710e5a7c25c86d7565cc3c6c.
2024-01-11 13:20:02 -08:00
Craig Topper
fdb87640ee [LSR][TTI][RISCV] Disable terminator folding for RISC-V.
This is a partial revert of e947f953370abe8ffc8713b8f3250a3ec39599fe.

It caused a miscompile in downstream testing.

Spoke with Philip offline. We believe the issue is that LSR needs to
make sure the Step of the other AddRec is non-zero. Reverting until
Philip is back from vacation.
2023-12-27 15:13:32 -08:00
Philip Reames
e947f95337 [LSR][TTI][RISCV] Enable terminator folding for RISC-V
If looking for a miscompile revert candidate, look here!

The transform being enabled prefers comparing to a loop invariant
exit value for a secondary IV over using an otherwise dead primary
IV.  This increases register pressure (by requiring the exit value
to be live through the loop), but reduces the number of instructions
within the loop by one.

On RISC-V which has a large number of scalar registers, this is
generally a profitable transform.  We loose the ability to use a beqz
on what is typically a count down IV, and pay the cost of computing
the exit value on the secondary IV in the loop preheader, but save
an add or sub in the loop body.  For anything except an extremely
short running loop, or one with extreme register pressure, this is
profitable.  On spec2017, we see a 0.42% geomean improvement in
dynamic icount, with no individual workload regressing by more than
0.25%.

Code size wise, we trade a (possibly compressible) beqz and a (possibly
compressible) addi for a uncompressible beq.  We also add instructions
in the preheader.  Net result is a slight regression overall, but
neutral or better inside the loop.

Previous versions of this transform had numerous cornercase correctness
bugs.  All of them ones I can spot by inspection have been fixed, and I
have run this through all of spec2017, but there may be further issues
lurking.  Adding uses to an IV is a fraught thing to do given poison
semantics, so this transform is somewhat inherently risky.

This patch is a reworked version of D134893 by @eop.  That patch has
been abandoned since May, so I picked it up, reworked it a bit, and
am landing it.
2023-11-29 12:04:06 -08:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
wangpc
5fdab3c81b [RISCV] Enable machine copy propagation for copy-like instructions
Like what has been done in AArch64 (D125335).

We enable this under `-O2` to show the codegen diffs here but we
may only do this under `-O3` like AArch64.

There are two cases that we may produce these eliminable copies:
1. ISel of `FrameIndex`. Like `rvv/fixed-vectors-calling-conv.ll`.
2. Tail duplication. Like `select-optimize-multiple.ll`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144535
2023-03-07 17:54:05 +08:00
Craig Topper
7638409d43 [RISCV] Make vsetvli intrinsics default to MA.
The vsetvli insertion pass can replace it with MU if needed by
a using instruction. The vsetvli insertion pass will not convert
MU to MA so we need to start at MA.

Reviewed By: eopXD

Differential Revision: https://reviews.llvm.org/D143790
2023-02-13 10:39:55 -08:00
Philip Reames
a9871772a8 [RISCV][LSR] Treat number of instructions as dominate factor in LSR cost decisions
This matches the behavior from a number of other targets, including e.g. X86. This does have the effect of increasing register pressure slightly, but we have a relative abundance of registers in the ISA compared to other targets which use the same heuristic.

The motivation here is that our current cost heuristic treats number of registers as the dominant cost. As a result, an extra use outside of a loop can radically change the LSR result. As an example consider test4 from the recently added test/Transforms/LoopStrengthReduce/RISCV/lsr-cost-compare.ll. Without a use outside the loop (see test3), we convert the IV into a pointer increment. With one, we leave the gep in place.

The pointer increment version both decreases number of instructions in some loops, and creates parallel chains of computation (i.e. decreases critical path depth). Both are generally profitable.

Arguably, we should really be using a more sophisticated model here - such as e.g. using profile information or explicitly modeling parallelism gains. However, as a practical matter starting with the same mild hack that other targets have used seems reasonable.

Differential Revision: https://reviews.llvm.org/D142227
2023-01-24 11:42:37 -08:00
Philip Reames
b94e5ff248 [RISCV][LSR] Precommit test coverage for an upcoming change
Main point of these is to show the difference between a loop with and without a use outside the loop.
2023-01-20 08:22:30 -08:00
eopXD
10da9844d0 [LSR] Drop LSR solution if it is less profitable than baseline
The LSR may suggest less profitable transformation to the loop. This
patch adds check to prevent LSR from generating worse code than what
we already have.

Since LSR affects nearly all targets, the patch is guarded by the
option 'lsr-drop-solution' and default as disable for now.

The next step should be extending an TTI interface to allow target(s)
to enable this enhancememnt.

Debug log is added to remind user of such choice to skip the LSR
solution.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D126043
2022-10-27 10:13:57 -07:00
eopXD
c9cd5bcf72 [LSR][RISCV] Pre-commit test case for D126043
Pre-commit test case for D126043

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D134823
2022-10-27 01:54:10 -07:00
Philip Reames
6ab686eb86 [LSR] Allow already invariant operand for ICmpZero matching [try 2]
Changes since initial commit:

* Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in d767b392.
* isLoopInvariant returns true for instructions outside the loop, but not necessarily *above* the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in 2aed3cdb.

Original commit message follows...

The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.

As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?

Differential Revision: https://reviews.llvm.org/D129793
2022-07-15 13:29:43 -07:00
Philip Reames
2aed3cdb5e [test] Reduced test for second distinct issue triggering revert of 9153515 2022-07-15 12:13:27 -07:00
Philip Reames
d767b392d0 [test] Reduced test which triggered revert of 9153515 2022-07-15 11:31:35 -07:00
Philip Reames
6fe766beba Revert "[LSR] Allow already invariant operand for ICmpZero matching"
This reverts commit 9153515a7bea9fb9dd4c76f70053a170bf825f35.  Builtbot crash was reported in the commit thread, reverting while investigating.
2022-07-15 10:47:57 -07:00
Philip Reames
9153515a7b [LSR] Allow already invariant operand for ICmpZero matching
The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.

As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?

Differential Revision: https://reviews.llvm.org/D129793
2022-07-15 09:51:00 -07:00
Philip Reames
c0df6bc949 [RISCV][LSR] Add coverage for ICmpZero with scaled vscale values
Follow up to 3bc09c7da5 - remove a fixme I forgot to remove, and add test cases showing remaining work.

Note that scaled vscales show up in vectorized code from a couple of sources:
* Element types smaller than vector block size (i.e. everything under i64)
* Unrolling
* LMUL > 1

The largest scaling we can currently have is 256 (e8 in every possible vector register).  More practically useful scales are in the 2-16 range.
2022-07-14 10:55:31 -07:00
Philip Reames
3bc09c7da5 [SCEVExpander] Allow udiv with isKnownNonZero(RHS) + add vscale case
Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all.

SCEVExpander needs(*) to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values.

vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes.

(*) As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference.

Differential Revision: https://reviews.llvm.org/D129710
2022-07-14 08:56:58 -07:00
Philip Reames
ddd4ed9944 [LSR] Add test coverage for ICmpZero cases involving urem RHS
For the moment, we're pretty conservative here.  My motivating case is the vscale one (as that is idiomatic for scalable vectorized loops on RISCV).  There are two obvious approaches to fixing this, and I tried to add reasonable coverage for both even though I'll likely only fix one.
2022-07-13 17:15:07 -07:00