932 Commits

Author SHA1 Message Date
Kazu Hirata
7b014a0732 [Scalar] Use range-based for loops (NFC) 2023-04-16 09:05:20 -07:00
Dmitry Makogon
3d7242f05e Reapply "[LSR] Preserve LCSSA when rewriting instruction with PHI user"
This reverts commit efd34ba60f3839b0a68b2e32ff9011b6823bc16f.

Reapplies 8ff4832679e1. Missed a failing test. Needed to just
update test checks.
2023-04-06 17:31:27 +07:00
Nico Weber
efd34ba60f Revert "[LSR] Preserve LCSSA when rewriting instruction with PHI user"
This reverts commit 8ff4832679e1ff2d2a1cfaa45bb5cb995b0685a1.
Breaks tests, see https://reviews.llvm.org/D146811#4232839
2023-03-30 06:40:16 -04:00
Dmitry Makogon
8ff4832679 [LSR] Preserve LCSSA when rewriting instruction with PHI user
Fixes https://github.com/llvm/llvm-project/issues/61182.

LoopStrengthReduce may sometimes break LCSSA form when applying a rewrite
for an instruction used in a PHI.
It happens if:
 - The PHI is in a loop exit block,
 - The edge from the corresponding exiting block to that exit is critical,
 - The PHI has at least two inputs coming from loop blocks,
 - and the rewritten instruction is inserted in the loop.

In such case we split the critical edge and then replace PHI inputs
with the rewritten instruction. However ExitBlock is no longer
a loop exit, so LCSSA form is broken.

This patch fixes it by collecting all inserted instructions for PHIs
whose parent block is not a loop exit and then forming LCSSA for them.

Differential Revision: https://reviews.llvm.org/D146811
2023-03-30 14:46:28 +07:00
Philip Reames
6afcc54ac7 [SCEV] Infer no-self-wrap via constant ranges
Without this, pointer IVs in loops with small constant trip counts couldn't be proven no-self-wrap. This came up in a new LSR transform, but may also benefit other SCEV consumers as well.

Differential Revision: https://reviews.llvm.org/D146596
2023-03-22 12:06:28 -07:00
Philip Reames
00fdd2cb6c [LSR] Don't crash on non-branch terminator in -lsr-term-fold
Reported in https://reviews.llvm.org/D146415.  I rewrote the patch and aded the test case.  Per that report, spec2006.483.xalancbmk crashes without this fix.
2023-03-21 09:30:01 -07:00
Philip Reames
e9df5d62c8 [LSR] Remove a couple stale comments in lsr-term-fold 2023-03-21 09:21:30 -07:00
Philip Reames
53e9a5ddc0 [LSR] Fix "new use of poison" problem in lsr-term-fold
This models the approach used in LFTR. The short summary is that we need to prove the IV is not dead first, and then we have to either prove the poison flag is valid after the new user or delete it.

There are two key differences between this and LFTR.

First, I allow a non-concrete start to the IV. The goal of LFTR is to canonicalize and IVs with constant starts are canonical, so the very restrictive definition there is mostly okay. Here on the other hand, we're explicitly moving *away* from the canonical form, and thus need to handle non-constant starts.

Second, LFTR bails out instead of removing inbounds on a GEP. This is a pragmatic tradeoff since inbounds is hard to infer and assists aliasing. This pass runs very late, and I think the tradeoff runs the other way.

A different approach we could take for the post-inc check would be to perform a pre-inc check instead of a post-inc check. We would still have to check the pre-inc IV, but that would avoid the need to drop inbounds. Doing the pre-inc check would basically trade killing a whole IV for an extra register move in the loop. I'm open to suggestions on the right approach here.

Note that this analysis is quite expensive compile time wise. I have made no effort to optimize (yet).

Differential Revision: https://reviews.llvm.org/D146464
2023-03-21 08:23:40 -07:00
Philip Reames
b33f5e7ed3 [LSR] Use evaluateAtIteration in lsr-term-fold
This is a follow up to one of the side discussions on D146429.  There are two semantic changes contained here.

The motivation for the change to the legality condition introduced in D146429 comes from the fact that we only check the post-inc form. As such, as long as the values of the post-inc variable don't self wrap, it's actually okay if we wrap past the starting value of the pre-inc IV.

Second, Nikic noticed during review that the test changes changed behavior for TC=0 (i.e. N=0 in the tests).  On more careful inspection, it became apparent that the previous manual expansion code was incorrect in the case where the primary IV could wrap without poison, and started with the limit value (i.e. i8 post-inc starts at 255 for 0 exit test, implying pre-inc starts with 0).  See @wrap_around test for an example of the (previous) miscompile.

Differential Revision: https://reviews.llvm.org/D146457
2023-03-21 08:11:36 -07:00
Philip Reames
091422adc1 [LSR] Fix wrapping bug in lsr-term-fold logic
The existing logic was unsound, in two ways.

First, due to wrapping on the trip count computation, it could compute a value which convert a loop which exiting on iteration 256, to one which exited at 255. (With i8 trip counts.)

Second, it allowed rewriting when the trip count implies wrapping around the alternate IV. As a trivial example, it allowed rewriting an i128 exit test in terms of an i64 IV. This is obviously wrong.

Note that the test change is fairly minimal - i.e. only the targeted test - but that's only because I precommitted a change which switched the test from 32 to 64 bit pointers. For 32 bit point architectures with 32 bit primary inductions, this transform is almost always unsound to perform.

Differential Revision: https://reviews.llvm.org/D146429
2023-03-20 13:47:21 -07:00
Philip Reames
272ebd6957 [LSR] Inline getAlternateIVEnd and simplify [nfc]
Also, add a comment to highlight that the "good" result on this test is accidental, and not based on a principled decision.  I matched the original behavior to make this nfc, but selecting the last legal IV is not well motivated here.
2023-03-20 11:22:21 -07:00
Philip Reames
b9521484ec [LSR] Rewrite IV match for term-fold using existing utilities
Main benefit here is making the logic easier to follow, slightly more efficient, and more in line with LFTR.  This is not NFC.  There are three semantic changes here.

First, we drop handling for constants on the LHS of the comparison.  These are non-canonical, and we're very late in the optimization pipeline here, so there's no point in supporting this.  I removed a test which covered this case.

Second, we don't need the almost dead IV to be an addrec.  We just need SCEV to be able to compute a trip count for it.

Third, we require a simple IV for the almost dead IV.  In theory, this removes cases we could have previously handled, but given a) zero testing and b) multiple known correctness issues, I'm adopting an attidute of narrowing this down to something which works correctly, and *then* expanding.
2023-03-20 10:41:01 -07:00
Mark Goncharov
e4dd7ec39f [LSR] Fold terminating condition not only for eq and ne.
Add opportunity to fold any icmp instruction.
2023-03-20 13:42:27 +03:00
Philip Reames
c60e0b66ad [LSR] Another minor code style improvement [nfc] 2023-03-17 08:09:01 -07:00
Philip Reames
cd47f5bb59 [LSR] Minor code style improvement [nfc] 2023-03-17 07:50:59 -07:00
Paul Walker
62d11b2cca Revert "Revert "[SCEV] Add SCEVType to represent vscale.""
Relanding after fixing Polly related build error.

This reverts commit 7b26dcae9eaf8cdcba7fef032fd83d060dffd4b4.
2023-03-02 13:14:07 +00:00
Paul Walker
7b26dcae9e Revert "[SCEV] Add SCEVType to represent vscale."
This reverts commit 7912f5cc92f65ad0d3c705f3683a0b69dbedcc57.
2023-03-02 11:59:50 +00:00
Paul Walker
7912f5cc92 [SCEV] Add SCEVType to represent vscale.
This is part of an effort to remove ConstantExpr based
representations of `vscale` so that its LangRef definiton can
be relaxed to accommodate a less strict definition of constant.

Differential Revision: https://reviews.llvm.org/D144891
2023-03-02 11:11:36 +00:00
David Green
74b67e53c6 [LSR] Fix incorrect check in 73cd3d4391ad47ae7
I missed that the test needed a icelake-server cpu to fail, and left a testing
"false &&" in the if condition. Hopefully this is now the correct fix.
2023-02-22 23:42:21 +00:00
David Green
73cd3d4391 [LSR] Prevent creating SCEVs of addrecs from mismatching loops
LSR can include Regs of AddRec SCEVs from different loops, which do not combine
well when added in Scalar Evolution. As they should never produce constant
differences so we can just guard against trying to create them.

Fixes #60927
2023-02-22 22:50:37 +00:00
Kazu Hirata
a28b252d85 Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC)
Note that getMinSignedBits has been soft-deprecated in favor of
getSignificantBits.
2023-02-19 23:56:52 -08:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
David Green
7abe3497e7 [LSR] Improve filtered uses in NarrowSearchSpaceByPickingWinnerRegs
NarrowSearchSpaceByPickingWinnerRegs has an aggressive filtering method to
reduce the complexity of the search space down by picking a best formula with
the highest number of reuses and assuming it will yield profitable reuse. In
certain cases we can find a best formula like {X+30,+,1} and later check a
formula like {X,+,1} with the same number of Uses. On some architectures it
can be better to pick {X,+,1}, especially if an offset of 30 can be used as a
legal addressing mode, but -30 cannot. That happens under Thumb1 code, which
has fairly limited addressing modes. This patch adds a check to see if it can
pick the simpler formula, if it looks more profitable.

Differential Revision: https://reviews.llvm.org/D144014
2023-02-16 15:48:12 +00:00
chenglin.bi
14dedd9cf5 [Reland][LSR] Hoist IVInc to loop header if its all uses are in the loop header
Original code will cause crash when the load/store memory type is structure because isIndexedLoadLegal/isIndexedStore doesn't support struct type.
So we limit the load/store memory type to integer.

Origin commit message:
When the latch block is different from header block, IVInc will be expanded in the latch loop. We can't generate the post index load/store this case.
But if the IVInc only used in the loop, actually we still can use the post index load/store because when exit loop we don't care the last IVInc value.
So, trying to hoist IVInc to help backend to generate more post index load/store.

Fix #53625

Reviewed By: eopXD

Differential Revision: https://reviews.llvm.org/D138636
2023-02-10 16:52:00 +08:00
Kazu Hirata
55e2cd1609 Use llvm::count{lr}_{zero,one} (NFC) 2023-01-28 12:41:20 -08:00
Haojian Wu
778a582e8e Fix a -Wunused-variable warning in release build. 2023-01-20 23:40:33 +01:00
Philip Reames
915bcb0629 [LSR] Style cleanup for code recently added in D132443
Also add FIXMEs which highlight correctness bugs in this recently added off by default option.  These have also been raised on the original review.
2023-01-20 12:45:37 -08:00
Philip Reames
7ad786a29e [LSR] Generalize one aspect of terminator folding (recently introduced in D132443)
There's no need to require the start value to come directly from the loop predecessor.  This was sometimes covering up a latent miscompile in this off-by-default option, but the miscompile needs fixed anyways and the issue has been raised on the original review.

Differential Revision: https://reviews.llvm.org/D142240
2023-01-20 12:19:43 -08:00
chenglin.bi
b84ab1f7c9 Revert "[LSR] Hoist IVInc to loop header if its all uses are in the loop header"
The original commit seems to cause a regression in numba test.
This reverts commit b1b4758e7f4b2ffe1faa28b00eb037832e5d26a7.
2023-01-11 01:24:34 +08:00
Nikita Popov
094ccee2c8 Reapply [Dominators] Add findNearestCommonDominator() for Instructions (NFC)
Reapply with checks for instructions in unreachable blocks. A test
case for this was added in 1ee4a93b15bb.

-----

This is a recurring pattern: We want to find the nearest common
dominator (instruction) for two instructions, but currently only
provide an API for the nearest common dominator of two basic blocks.

Add an overload that accepts and return instructions.
2023-01-10 12:16:31 +01:00
chenglin.bi
b1b4758e7f [LSR] Hoist IVInc to loop header if its all uses are in the loop header
When the latch block is different from header block, IVInc will be expanded in the latch loop. We can't generate the post index load/store this case.
But if the IVInc only used in the loop, actually we still can use the post index load/store because when exit loop we don't care the last IVInc value.
So, trying to hoist IVInc to help backend to generate more post index load/store.

Fix #53625

Reviewed By: eopXD

Differential Revision: https://reviews.llvm.org/D138636
2023-01-10 18:34:00 +08:00
Nikita Popov
c60149b49e Revert "[Dominator] Add findNearestCommonDominator() for Instructions (NFC)"
This reverts commit 7f0de9573f758f5f9108795850337a5acbd17eef.

This is missing handling for !isReachableFromEntry() blocks, which
may be relevant for some callers. Revert for now.
2023-01-06 17:36:01 +01:00
Nikita Popov
7f0de9573f [Dominator] Add findNearestCommonDominator() for Instructions (NFC)
This is a recurring pattern: We want to find the nearest common
dominator (instruction) for two instructions, but currently only
provide an API for the nearest common dominator of two basic blocks.

Add an overload that accepts and return instructions.
2023-01-06 17:06:25 +01:00
OCHyams
042107494d [DebugInfo][NFC] Rename is/setUndef to is/setKilllocation
These names better reflect the semantics and also the implementation, since
it's not just "undef" operands that are sentinels used to signal that the debug
intrinsic terminates dominating locations definitions.

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D140903
2023-01-06 09:15:02 +00:00
Fangrui Song
51b685734b [Transforms,CodeGen] std::optional::value => operator*/operator->
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
2022-12-16 23:21:27 +00:00
Nikita Popov
04d652994d [SCEV] Return ArrayRef for SCEV operands() (NFC)
Use a consistent type for the operands() methods of different SCEV
types. Also make the API consistent by only providing operands(),
rather than also providin op_begin() and op_end() for some of them.
2022-12-16 15:36:19 +01:00
Vasileios Porpodas
32b38d248f [NFC] Rename Instruction::insertAt() to Instruction::insertInto(), to be consistent with BasicBlock::insertInto()
Differential Revision: https://reviews.llvm.org/D140085
2022-12-15 12:27:45 -08:00
Vasileios Porpodas
06911ba6ea [NFC] Cleanup: Replaces BB->getInstList().insert() with I->insertAt().
This is part of a series of cleanup patches towards making BasicBlock::getInstList() private.

Differential Revision: https://reviews.llvm.org/D138877
2022-12-12 13:33:05 -08:00
Krzysztof Parzyszek
ea6ed399b2 [SCEV] Convert Optional to std::optional 2022-12-08 08:35:11 -06:00
Kazu Hirata
343de6856e [Transforms] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:37 -08:00
Kazu Hirata
f207cf1d93 [Scalar] Use std::optional in LoopStrengthReduce.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 17:26:57 -08:00
eopXD
c0ef83e3b9 [LSR] Check if terminating value is safe to expand before transformation
According to report by @JojoR, the assertion error was hit hence we need
to have this check before the actual transformation.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D136415
2022-11-15 14:56:47 -08:00
eopXD
10da9844d0 [LSR] Drop LSR solution if it is less profitable than baseline
The LSR may suggest less profitable transformation to the loop. This
patch adds check to prevent LSR from generating worse code than what
we already have.

Since LSR affects nearly all targets, the patch is guarded by the
option 'lsr-drop-solution' and default as disable for now.

The next step should be extending an TTI interface to allow target(s)
to enable this enhancememnt.

Debug log is added to remind user of such choice to skip the LSR
solution.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D126043
2022-10-27 10:13:57 -07:00
Kazu Hirata
5ea3155565 [llvm] Use llvm::find (NFC) 2022-10-16 16:21:00 -07:00
eopXD
8cbdb1e081 [LSR][NFC] Add missing constness 2022-09-29 06:30:50 -07:00
Florian Hahn
3abaa3760d
[LSR] Preserve LCSSA in expander when rewriting loop exit values.
The expanded values when rewriting exit values need to preserve LCSSA.
Ask SCEVExpander to preserve LCSSA to ensure that.

Fixes #58007.
2022-09-27 09:58:48 +01:00
Dmitri Gribenko
5d7ff0d87c Fix an unused warning in release build 2022-09-20 11:29:39 +02:00
eopXD
3b2011fd4f [LSR] Fold terminating condition to other IV when possible
When the IV is only used by the terminating condition (say IV-A) and the loop
has a predictable back-edge count and we have another IV (say IV-B) that is an
affine add recursion, we will be able to calculate the terminating value of
IV-B in the loop pre-header. This patch adds attempts to replace IV-B as the
new terminating condition and remove IV-A. It is safe to do so since IV-A is
only used as the terminating condition.

This transformation is suitable to be appended after LSR as it may optimize the
loop into the situation mentioned above. The transformation can reduce number of
IV-s in the loop by one.

A cli option `lsr-term-fold` is added and default disabled.

Reviewed By: mcberg2021, craig.topper

Differential Revision: https://reviews.llvm.org/D132443
2022-09-20 01:38:47 -07:00
Kazu Hirata
21de2888a4 Use llvm::is_contained (NFC) 2022-08-27 09:53:11 -07:00
Kazu Hirata
ec5eab7e87 Use range-based for loops (NFC) 2022-08-20 21:18:32 -07:00