541 Commits

Author SHA1 Message Date
sgokhale
bb5befefc6 Revert "[CodeGen][ShrinkWrap] Split restore point"
This reverts commit 5f0bccc3d1a74111458c71f009817c9995f4bf83.

An issue has been reported here: https://github.com/ClangBuiltLinux/linux/issues/1833
2023-04-13 10:52:28 +05:30
Nikita Popov
e7f4ad13ae [Transforms] Convert some tests to opaque pointers (NFC) 2023-04-11 16:49:12 +02:00
sgokhale
5f0bccc3d1 [CodeGen][ShrinkWrap] Split restore point
This patch splits a restore point to allow it to only post-dominate blocks reachable by use
or def of CSRs(Callee Saved Registers)/FI(Frame Index).

Benchmarking this on SPEC2017, this gives around 4% improvement on povray and no significant change
for others.

Co-authored-by: junbuml

Differential Revision: https://reviews.llvm.org/D42600
2023-04-11 11:58:50 +05:30
Dmitry Makogon
3d7242f05e Reapply "[LSR] Preserve LCSSA when rewriting instruction with PHI user"
This reverts commit efd34ba60f3839b0a68b2e32ff9011b6823bc16f.

Reapplies 8ff4832679e1. Missed a failing test. Needed to just
update test checks.
2023-04-06 17:31:27 +07:00
Nico Weber
efd34ba60f Revert "[LSR] Preserve LCSSA when rewriting instruction with PHI user"
This reverts commit 8ff4832679e1ff2d2a1cfaa45bb5cb995b0685a1.
Breaks tests, see https://reviews.llvm.org/D146811#4232839
2023-03-30 06:40:16 -04:00
Dmitry Makogon
8ff4832679 [LSR] Preserve LCSSA when rewriting instruction with PHI user
Fixes https://github.com/llvm/llvm-project/issues/61182.

LoopStrengthReduce may sometimes break LCSSA form when applying a rewrite
for an instruction used in a PHI.
It happens if:
 - The PHI is in a loop exit block,
 - The edge from the corresponding exiting block to that exit is critical,
 - The PHI has at least two inputs coming from loop blocks,
 - and the rewritten instruction is inserted in the loop.

In such case we split the critical edge and then replace PHI inputs
with the rewritten instruction. However ExitBlock is no longer
a loop exit, so LCSSA form is broken.

This patch fixes it by collecting all inserted instructions for PHIs
whose parent block is not a loop exit and then forming LCSSA for them.

Differential Revision: https://reviews.llvm.org/D146811
2023-03-30 14:46:28 +07:00
Dmitry Makogon
8e85bede79 [Test] Regenerate test checks for some LSR tests (NFC) 2023-03-24 21:24:22 +07:00
Dmitry Makogon
2ac5bf2272 [Test] Add test to check that LCSSA is preserved by LSR (NFC)
Currently it fails as LSR doesn't preserve LCSSA in some cases.
2023-03-24 21:24:21 +07:00
Dmitry Makogon
90eab480a9 [Test] Use autogenerated checks in uglygep.ll test for LSR (NFC) 2023-03-24 18:28:29 +07:00
Philip Reames
00fdd2cb6c [LSR] Don't crash on non-branch terminator in -lsr-term-fold
Reported in https://reviews.llvm.org/D146415.  I rewrote the patch and aded the test case.  Per that report, spec2006.483.xalancbmk crashes without this fix.
2023-03-21 09:30:01 -07:00
Philip Reames
53e9a5ddc0 [LSR] Fix "new use of poison" problem in lsr-term-fold
This models the approach used in LFTR. The short summary is that we need to prove the IV is not dead first, and then we have to either prove the poison flag is valid after the new user or delete it.

There are two key differences between this and LFTR.

First, I allow a non-concrete start to the IV. The goal of LFTR is to canonicalize and IVs with constant starts are canonical, so the very restrictive definition there is mostly okay. Here on the other hand, we're explicitly moving *away* from the canonical form, and thus need to handle non-constant starts.

Second, LFTR bails out instead of removing inbounds on a GEP. This is a pragmatic tradeoff since inbounds is hard to infer and assists aliasing. This pass runs very late, and I think the tradeoff runs the other way.

A different approach we could take for the post-inc check would be to perform a pre-inc check instead of a post-inc check. We would still have to check the pre-inc IV, but that would avoid the need to drop inbounds. Doing the pre-inc check would basically trade killing a whole IV for an extra register move in the loop. I'm open to suggestions on the right approach here.

Note that this analysis is quite expensive compile time wise. I have made no effort to optimize (yet).

Differential Revision: https://reviews.llvm.org/D146464
2023-03-21 08:23:40 -07:00
Philip Reames
b33f5e7ed3 [LSR] Use evaluateAtIteration in lsr-term-fold
This is a follow up to one of the side discussions on D146429.  There are two semantic changes contained here.

The motivation for the change to the legality condition introduced in D146429 comes from the fact that we only check the post-inc form. As such, as long as the values of the post-inc variable don't self wrap, it's actually okay if we wrap past the starting value of the pre-inc IV.

Second, Nikic noticed during review that the test changes changed behavior for TC=0 (i.e. N=0 in the tests).  On more careful inspection, it became apparent that the previous manual expansion code was incorrect in the case where the primary IV could wrap without poison, and started with the limit value (i.e. i8 post-inc starts at 255 for 0 exit test, implying pre-inc starts with 0).  See @wrap_around test for an example of the (previous) miscompile.

Differential Revision: https://reviews.llvm.org/D146457
2023-03-21 08:11:36 -07:00
Philip Reames
b7af34c303 [LSR] Add a test case for (another) miscompile in lsr-term-fold
Derived from an observation by @nikic on D146457.
2023-03-21 08:11:36 -07:00
Philip Reames
091422adc1 [LSR] Fix wrapping bug in lsr-term-fold logic
The existing logic was unsound, in two ways.

First, due to wrapping on the trip count computation, it could compute a value which convert a loop which exiting on iteration 256, to one which exited at 255. (With i8 trip counts.)

Second, it allowed rewriting when the trip count implies wrapping around the alternate IV. As a trivial example, it allowed rewriting an i128 exit test in terms of an i64 IV. This is obviously wrong.

Note that the test change is fairly minimal - i.e. only the targeted test - but that's only because I precommitted a change which switched the test from 32 to 64 bit pointers. For 32 bit point architectures with 32 bit primary inductions, this transform is almost always unsound to perform.

Differential Revision: https://reviews.llvm.org/D146429
2023-03-20 13:47:21 -07:00
Philip Reames
272ebd6957 [LSR] Inline getAlternateIVEnd and simplify [nfc]
Also, add a comment to highlight that the "good" result on this test is accidental, and not based on a principled decision.  I matched the original behavior to make this nfc, but selecting the last legal IV is not well motivated here.
2023-03-20 11:22:21 -07:00
Philip Reames
b9521484ec [LSR] Rewrite IV match for term-fold using existing utilities
Main benefit here is making the logic easier to follow, slightly more efficient, and more in line with LFTR.  This is not NFC.  There are three semantic changes here.

First, we drop handling for constants on the LHS of the comparison.  These are non-canonical, and we're very late in the optimization pipeline here, so there's no point in supporting this.  I removed a test which covered this case.

Second, we don't need the almost dead IV to be an addrec.  We just need SCEV to be able to compute a trip count for it.

Third, we require a simple IV for the almost dead IV.  In theory, this removes cases we could have previously handled, but given a) zero testing and b) multiple known correctness issues, I'm adopting an attidute of narrowing this down to something which works correctly, and *then* expanding.
2023-03-20 10:41:01 -07:00
Philip Reames
67089a39a2 [LSR] Regen tests to adjust for naming in SCEVExpander [nfc] 2023-03-20 09:39:29 -07:00
Mark Goncharov
e4dd7ec39f [LSR] Fold terminating condition not only for eq and ne.
Add opportunity to fold any icmp instruction.
2023-03-20 13:42:27 +03:00
Philip Reames
6f00170159 [LSR] Rework term-fold tests
There were two major problems with the tests.

First, with the pointer size being 32 bit and the original IVs also being 32 bit, almost all of the positive tests were actually unsound.  An upcoming change will add the appropriate safety check, but the test diffs are really hard to understand without switching the tests to 64 bit pointers first.

Second, checking debug messages for failures is a major bad practice.  This should not have been accepted in review at all.  The reason is that it makes the *order* of legality checks visibile and modifying any of them becomes annoying and tedious.
2023-03-17 16:03:08 -07:00
Philip Reames
3a1fb672f7 [LSR] Cleanup term-fold tests
Autogen for naming change, and remove comments about C code inspiration.  Multiple of these are out of sync with the actual IR, and these are IR tests anyways.
2023-03-17 08:40:11 -07:00
Nikita Popov
687b5b9a0c [SCEVExpander] Always use scevgep as name
With opaque pointers the scevgep / uglygep distinction no longer
makes sense -- GEPs are always emitted in offset-based representation.
2023-03-17 14:27:03 +01:00
Philip Reames
502ab13eb0 [LSR] Add tests which demonstrate miscompiles in the current term-fold code 2023-03-16 14:10:09 -07:00
Arthur Eubanks
7c3c981442 [Passes] Remove some legacy passes
DFAJumpThreading
JumpThreading
LibCallsShrink
LoopVectorize
SLPVectorizer
DeadStoreElimination
AggressiveDCE
CorrelatedValuePropagation
IndVarSimplify

These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.
2023-03-10 17:17:00 -08:00
Florian Hahn
7019624ee1
[SCEV] Strengthen nowrap flags via ranges for ARs on construction.
At the moment, proveNoWrapViaConstantRanges is only used when creating
SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by
strengthening flags after creating the AddRec.

I'll also share a follow-up patch that removes the code to strengthen
flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while
creating those can lead to surprising changes.

Compile-time looks neutral:
https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u

Reviewed By: mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D144050
2023-03-07 17:10:34 +01:00
wangpc
5fdab3c81b [RISCV] Enable machine copy propagation for copy-like instructions
Like what has been done in AArch64 (D125335).

We enable this under `-O2` to show the codegen diffs here but we
may only do this under `-O3` like AArch64.

There are two cases that we may produce these eliminable copies:
1. ISel of `FrameIndex`. Like `rvv/fixed-vectors-calling-conv.ll`.
2. Tail duplication. Like `select-optimize-multiple.ll`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144535
2023-03-07 17:54:05 +08:00
Tiwari Abhinav Ashok Kumar
bfb1559fbe [NFC] Fix missing colon in CHECK directives
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D144412
2023-02-21 00:13:04 +05:30
Florian Hahn
1538761303
[LSR] Add test case which shows additional LSR with D144050. 2023-02-16 16:12:07 +00:00
Craig Topper
7638409d43 [RISCV] Make vsetvli intrinsics default to MA.
The vsetvli insertion pass can replace it with MU if needed by
a using instruction. The vsetvli insertion pass will not convert
MU to MA so we need to start at MA.

Reviewed By: eopXD

Differential Revision: https://reviews.llvm.org/D143790
2023-02-13 10:39:55 -08:00
chenglin.bi
14dedd9cf5 [Reland][LSR] Hoist IVInc to loop header if its all uses are in the loop header
Original code will cause crash when the load/store memory type is structure because isIndexedLoadLegal/isIndexedStore doesn't support struct type.
So we limit the load/store memory type to integer.

Origin commit message:
When the latch block is different from header block, IVInc will be expanded in the latch loop. We can't generate the post index load/store this case.
But if the IVInc only used in the loop, actually we still can use the post index load/store because when exit loop we don't care the last IVInc value.
So, trying to hoist IVInc to help backend to generate more post index load/store.

Fix #53625

Reviewed By: eopXD

Differential Revision: https://reviews.llvm.org/D138636
2023-02-10 16:52:00 +08:00
Matt Arsenault
aa65dba05c LoopStrengthReduce: Convert AMDGPU tests to opaque pointers 2023-01-27 22:17:20 -04:00
Philip Reames
a9871772a8 [RISCV][LSR] Treat number of instructions as dominate factor in LSR cost decisions
This matches the behavior from a number of other targets, including e.g. X86. This does have the effect of increasing register pressure slightly, but we have a relative abundance of registers in the ISA compared to other targets which use the same heuristic.

The motivation here is that our current cost heuristic treats number of registers as the dominant cost. As a result, an extra use outside of a loop can radically change the LSR result. As an example consider test4 from the recently added test/Transforms/LoopStrengthReduce/RISCV/lsr-cost-compare.ll. Without a use outside the loop (see test3), we convert the IV into a pointer increment. With one, we leave the gep in place.

The pointer increment version both decreases number of instructions in some loops, and creates parallel chains of computation (i.e. decreases critical path depth). Both are generally profitable.

Arguably, we should really be using a more sophisticated model here - such as e.g. using profile information or explicitly modeling parallelism gains. However, as a practical matter starting with the same mild hack that other targets have used seems reasonable.

Differential Revision: https://reviews.llvm.org/D142227
2023-01-24 11:42:37 -08:00
Philip Reames
7ad786a29e [LSR] Generalize one aspect of terminator folding (recently introduced in D132443)
There's no need to require the start value to come directly from the loop predecessor.  This was sometimes covering up a latent miscompile in this off-by-default option, but the miscompile needs fixed anyways and the issue has been raised on the original review.

Differential Revision: https://reviews.llvm.org/D142240
2023-01-20 12:19:43 -08:00
Philip Reames
b94e5ff248 [RISCV][LSR] Precommit test coverage for an upcoming change
Main point of these is to show the difference between a loop with and without a use outside the loop.
2023-01-20 08:22:30 -08:00
Nikita Popov
9ed2f14c87 [AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.

The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.

Differential Revision: https://reviews.llvm.org/D141912
2023-01-18 09:58:32 +01:00
Florian Hahn
20ecc07991
[MachineCombiner] Lift same-bb restriction for reassociable ops.
This patch relaxes the restriction that both reassociate operands must
be in the same block as the root instruction.

The comment indicates that the reason for this restriction was that the
operands not in the same block won't have a depth in the trace.

I believe this is outdated; if the operand is in a different block, it
must dominate the current block (otherwise it would need to be phi),
which in turn means the operand's block must be included in the current
rance, and depths must be available.

There's a test case (no_reassociate_different_block) added in
70520e2f1c5fc4 which shows that we have accurate depths for operands
defined in other blocks.

This allows reassociation of code that computes the final reduction
value after vectorization, among other things.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D141302
2023-01-13 15:32:44 +00:00
chenglin.bi
b84ab1f7c9 Revert "[LSR] Hoist IVInc to loop header if its all uses are in the loop header"
The original commit seems to cause a regression in numba test.
This reverts commit b1b4758e7f4b2ffe1faa28b00eb037832e5d26a7.
2023-01-11 01:24:34 +08:00
chenglin.bi
b1b4758e7f [LSR] Hoist IVInc to loop header if its all uses are in the loop header
When the latch block is different from header block, IVInc will be expanded in the latch loop. We can't generate the post index load/store this case.
But if the IVInc only used in the loop, actually we still can use the post index load/store because when exit loop we don't care the last IVInc value.
So, trying to hoist IVInc to help backend to generate more post index load/store.

Fix #53625

Reviewed By: eopXD

Differential Revision: https://reviews.llvm.org/D138636
2023-01-10 18:34:00 +08:00
Nikita Popov
5867241eac [Transforms] Convert some tests to opaque pointers (NFC) 2023-01-06 12:14:45 +01:00
OCHyams
7ea47f9e41 [DebugInfo] Replace UndefValue with PoisonValue in setKillLocation
This helps towards the effort to remove UndefValue from LLVM.

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D140905
2023-01-06 10:51:02 +00:00
Nikita Popov
055fb7795a [Transforms] Convert some tests to opaque pointers (NFC)
These are all tests where conversion worked automatically, and
required no manual fixup.
2023-01-05 12:43:45 +01:00
Anshil Gandhi
4bbcbdaee5 [AMDGPU] Unify divergent nodes if the PostDom tree has one root
This patch allows AMDGPUUnifyDivergenceExitNodes pass
to transform a function whose PDT has exactly one root
and ends in a branch instruction. Fixes
https://github.com/llvm/llvm-project/issues/58861.

Reviewed By: ruiling, arsenm

Differential Revision: https://reviews.llvm.org/D139780
2023-01-04 10:45:03 -07:00
Nikita Popov
25c338ccb6 [LSR] Convert test to check IR (NFC)
Convert this llc -O3 test to instead check the IR after -loop-reduce.
2023-01-03 14:35:10 +01:00
Roman Lebedev
3a8e009f97
Revert "Reland "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block""
One of these two changes is exposing (or causing) some more miscompiles.
A reproducer is in progress, so reverting until resolved.

This reverts commit 428f36401b1b695fd501ebfdc8773bed8ced8d4e.
2022-12-20 18:36:42 +03:00
Roman Lebedev
428f36401b
Reland "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block"
This reverts commit 37b8f09a4b61bf9bf9d0b9017d790c8b82be2e17,
and returns commit 1bd0b82e508d049efdb07f4f8a342f35818df341.
The miscompile was in InstCombine, and it has been addressed.

This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37

Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb

Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts

Differential Revision: https://reviews.llvm.org/D139275
2022-12-17 05:18:54 +03:00
Alexander Kornienko
37b8f09a4b Revert "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block"
This reverts commit 1bd0b82e508d049efdb07f4f8a342f35818df341, since it leads to
miscompiles. See https://reviews.llvm.org/D139275#3993229 and
https://reviews.llvm.org/D139275#4001580.
2022-12-16 17:23:35 +01:00
Roman Lebedev
1bd0b82e50
[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37

Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb

Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts

Differential Revision: https://reviews.llvm.org/D139275
2022-12-12 18:20:03 +03:00
chenglin.bi
15e41467f0 [LSR] precommit test for D138636; NFC 2022-11-25 13:46:51 +08:00
eopXD
c0ef83e3b9 [LSR] Check if terminating value is safe to expand before transformation
According to report by @JojoR, the assertion error was hit hence we need
to have this check before the actual transformation.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D136415
2022-11-15 14:56:47 -08:00
eopXD
10da9844d0 [LSR] Drop LSR solution if it is less profitable than baseline
The LSR may suggest less profitable transformation to the loop. This
patch adds check to prevent LSR from generating worse code than what
we already have.

Since LSR affects nearly all targets, the patch is guarded by the
option 'lsr-drop-solution' and default as disable for now.

The next step should be extending an TTI interface to allow target(s)
to enable this enhancememnt.

Debug log is added to remind user of such choice to skip the LSR
solution.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D126043
2022-10-27 10:13:57 -07:00
eopXD
c9cd5bcf72 [LSR][RISCV] Pre-commit test case for D126043
Pre-commit test case for D126043

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D134823
2022-10-27 01:54:10 -07:00