40 Commits

Author SHA1 Message Date
Graham Hunter
03f852f704
[AArch64] Improve cost model for legal subvec insert/extract (#81135)
Currently we model subvector inserts and extracts as shuffles,
potentially going as far as scalarizing. If the types are legal then
they can just be simple zip/unzip operations, or possible even no-ops.
Change the cost to a relatively small one to ensure that simple loops
featuring such operations between fixed and scalable vector types that
are effectively the same at a given sve width can be unrolled and
further optimized.
2024-03-04 16:17:01 +00:00
Graham Hunter
ad78e210bd
[NFC][AArch64] Tests for guarding unrolling with scalable vec ins/ext (#81132) 2024-02-19 09:47:49 +00:00
Craig Topper
03d4a9d94d
[InstCombine] Set disjoint flag when turning Add into Or. (#72702)
The disjoint flag was recently added to IR in #72583
2023-11-27 12:54:11 -08:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Nikita Popov
b9808e5660 [LoopUnroll] Fold add chains during unrolling
Loop unrolling tends to produce chains of
`%x1 = add %x0, 1; %x2 = add %x1, 1; ...` with one add per unrolled
iteration. This patch simplifies these adds to `%xN = add %x0, N`
directly during unrolling, rather than waiting for InstCombine to do so.

The motivation for this is that having a single add (rather than
an add chain) on the induction variable makes it a simple recurrence,
which we specially recognize in a number of places. This allows
InstCombine to directly perform folds with that knowledge, instead
of first folding the add chains, and then doing other folds in another
InstCombine iteration.

Due to the reduced number of InstCombine iterations, this also
results in a small compile-time improvement.

Differential Revision: https://reviews.llvm.org/D153540
2023-07-05 09:54:28 +02:00
Nikita Popov
d179421099 [LoopUnroll] Avoid undef indices in test (NFC)
Doesn't really matter for the larger purpose of the test, but
avoid the use of undef indices and instead use the loop induction
variable as index, which is what was likely intended here.
2023-06-22 16:56:08 +02:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Nikita Popov
ef992b6079 [LoopUnroll] Convert some tests to opaque pointers (NFC) 2022-12-23 16:35:26 +01:00
Roman Lebedev
5103ef64fe
[NFC] Port all (but one) LoopUnroll tests to -passes= syntax 2022-12-07 20:15:43 +03:00
Simon Pilgrim
09cb9fdef9 [InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635)
Alive2: https://alive2.llvm.org/ce/z/sZ6wwS

As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero.

Differential Revision: https://reviews.llvm.org/D134172
2022-09-20 16:44:41 +01:00
William S. Moses
d9da6a535f [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate
LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information.

This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D119965
2022-02-17 20:13:07 -05:00
Philip Reames
37ead201e6 [runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing.

Why does this matter?  A couple of reasons:
* SCEV doesn't have a native subtract node.  Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such.  As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones.  (You can see this in the inferred flags in some of the test cases.)
* Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language.  We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced.  (You can see this looking at nearby phis in the test cases.)

Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen.

* Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value.  We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.
2021-11-12 15:44:58 -08:00
Philip Reames
de2fed6152 [unroll] Keep unrolled iterations with initial iteration
The unrolling code was previously inserting new cloned blocks at the end of the function.  The result of this with typical loop structures is that the new iterations are placed far from the initial iteration.

With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting.  As such, placing Count-1 copies out of line is a fairly poor code placement choice.  We'd much rather fall through into the hot (non-exiting) path.  For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code.

However, the real motivation for this change isn't performance.  Its readability and human understanding.  Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
2021-11-12 11:40:50 -08:00
Jingu Kang
94c4952951 [AArch64] Enable Upper bound unrolling universally
Differential Revision: https://reviews.llvm.org/D105996
2021-08-20 11:25:38 +01:00
Roman Lebedev
e52364532a
[NewPM] Remove SpeculateAroundPHIs pass
Addition of this pass has been botched.
There is no particular reason why it had to be sold as an inseparable part
of new-pm transition. It was added when old-pm was still the default,
and very *very* few users were actually tracking new-pm,
so it's effects weren't measured.

Which means, some of the turnoil of the new-pm transition
are actually likely regressions due to this pass.

Likewise, there has been a number of post-commit feedback
(post new-pm switch), namely
* https://reviews.llvm.org/D37467#2787157 (regresses HW-loops)
* https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before)
* https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata)
and in the half year past, the pass authors (google) still haven't found time to respond to any of that.

Hereby it is proposed to backout the pass from the pipeline,
until someone who cares about it can address the issues reported,
and properly start the process of adding a new pass into the pipeline,
with proper performance evaluation.

Furthermore, neither google nor facebook reports any perf changes
from this change, so i'm dropping the pass completely.
It can always be re-reverted should/if anyone want to pick it up again.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D104099
2021-06-15 20:35:55 +03:00
Philip Reames
449d14ebd2 Do actual DCE in LoopUnroll (try 4)
Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param.  I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation.  I'm simplfy dropping that part of the change.

Commit message from try 3:

Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)

Original commit message:

The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-19 10:25:31 -07:00
Amy Huang
517857421d Revert "Do actual DCE in LoopUnroll (try 3)"
This reverts commit b6320eeb8622f05e4a5d4c7f5420523357490fca
as it causes clang to assert; see
https://reviews.llvm.org/rGb6320eeb8622f05e4a5d4c7f5420523357490fca.
2021-05-19 08:53:38 -07:00
Philip Reames
b6320eeb86 Do actual DCE in LoopUnroll (try 3)
Recommitting after fixing a bug found post commit.  Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug.  Oops.  :)

The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration.  Test case added in LoopUnroll/dce.ll to cover this case.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-17 14:47:02 -07:00
Philip Reames
6ae9893ed2 Revert "Do actual DCE in LoopUnroll (try 2)"
This reverts commit 653fa0b46ae34c06495b542414b704b30381cd02.

Reported to trigger pr50354.  Reverting until investigated.
2021-05-16 09:38:36 -07:00
Philip Reames
653fa0b46a Do actual DCE in LoopUnroll (try 2)
Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-14 10:42:36 -07:00
Nicholas Guy
2b6e0c90f9 [AArch64] Enable runtime unrolling for in-order sched models
Differential Revision: https://reviews.llvm.org/D97947
2021-04-27 13:22:10 +01:00
Florian Hahn
acd9cc7495
[AArch64] Use type-legalization cost for code size memop cost.
At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is
CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory
operations on large vectors have a cost of one, even if they will get
expanded to a large number of memory operations in the backend.

This patch updates getMemoryOpCost to return the cost for the type
legalization for both CodeSize and SizeAndLatency. This should more
accurately reflect the number of memory operations required.

I am not sure how latency should properly be included in SizeAndLatency
from the description, but returning the size cost should be clearly more
accurate.

This does not cause any binary changes when building
MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because
large vector memops are not really formed by code emitted from Clang.
But using the C/C++ matrix extension can easily result in code with very
large vector operations directly from Clang, e.g.
https://clang.godbolt.org/z/6xzxcTGvb

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D100291
2021-04-15 10:11:05 +01:00
Florian Hahn
816cf41462
[LoopUnroll] Add AArch64 test case with large vector ops.
Add test case to illustrate over-eager unrolling on AArch64, due to the
cost-model not estimating the size of vector loads/stores accurately.
2021-04-11 21:39:52 +01:00
Sanjay Patel
99cf39bfed [LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC
See discussion in D90554.

This is a partial un-revert of 32dd5870ee31. I'm adding
back the baseline tests first, so we don't have to
back-track as much in case there are still problems.
2020-11-20 08:15:46 -05:00
Eric Christopher
32dd5870ee Temporarily Revert "[CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation"
as it's causing crashes in the optimizer. A reduced testcase has been posted as a follow-up.

This reverts commit f7eac51b9b3f780c96ca41913293851c5acb465b.

Temporarily Revert "[CostModel] make default size cost for libcalls small (again)" as it depends upon the primary revert.

This reverts commit 8ec7ea3ddce7379e13e8dfb4a5260a6d2004aa1c.

Temporarily Revert "[CostModel] add tests for math library calls; NFC" as it depends upon the primary revert.

This reverts commit df09f825995b10da03f148133c119f52c94fd6e4.

Temporarily Revert "[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC" as it depends upon the primary revert.

This reverts commit 618d555e8d926a83161774df2035519c387269db.
2020-11-19 22:10:23 -08:00
Sanjay Patel
8ec7ea3ddc [CostModel] make default size cost for libcalls small (again)
This was changed recently with D90554 / f7eac51b9b3f
...because we had a regression testing blindspot for intrinsics
that are expected to be lowered to libcalls.

In general, we want the *size* cost for a scalar call to be cheap
even if the other costs are expensive - we expect it to just be
a branch with some optional stack manipulation.

It is likely that we will want to carve out some
exceptions/overrides to this rule as follow-up patches for
calls that have some general and/or target-specific difference
to the expected lowering.

This was noticed as a regression in unrolling, so we have a test
for that now along with a couple of direct cost model tests.

If the assumed scalarization costs for the oversized vector
calls are not realistic, that would be another follow-up
refinement of the cost models.
2020-11-14 08:15:35 -05:00
Sanjay Patel
618d555e8d [LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC
See discussion in D90554.
2020-11-13 17:15:23 -05:00
Florian Hahn
1bd58870e5 [LoopUnroll] Use LoopSize+1 as threshold, to allow unrolling loops matching LoopSize.
We use `< UP.Threshold` later on, so we should use LoopSize + 1, to
allow unrolling if the result won't exceed to loop size.

Fixes PR43305.

Reviewers: efriedma, dmgreen, paquette

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D67594

llvm-svn: 372084
2019-09-17 09:02:48 +00:00
Fangrui Song
ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
Florian Hahn
893aea58ea [LoopUnroll] Allow unrolling if the unrolled size does not exceed loop size.
Summary:
In the following cases, unrolling can be beneficial, even when
optimizing for code size:
 1) very low trip counts
 2) potential to constant fold most instructions after fully unrolling.

We can unroll in those cases, by setting the unrolling threshold to the
loop size. This might highlight some cost modeling issues and fixing
them will have a positive impact in general.

Reviewers: vsk, efriedma, dmgreen, paquette

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D60265

llvm-svn: 358586
2019-04-17 15:57:43 +00:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Geoff Berry
378374d457 [AArch64][Falkor] Try to avoid exhausting HW prefetcher resources when unrolling.
Reviewers: t.p.northover, mcrosier

Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34533

llvm-svn: 306584
2017-06-28 18:53:09 +00:00
Haicheng Wu
1ef17e90b2 Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop"
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.

The original summary:

This patch tries to fully unroll loops having break statement like this

for (int i = 0; i < 8; i++) {
    if (a[i] == value) {
        found = true;
        break;
    }
}

GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.

The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.

The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.

llvm-svn: 284053
2016-10-12 21:29:38 +00:00
Haicheng Wu
45e4ef737d Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop"
This reverts commit r284044.

llvm-svn: 284051
2016-10-12 21:02:22 +00:00
Haicheng Wu
6cac34fd41 [LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop
This patch tries to fully unroll loops having break statement like this

for (int i = 0; i < 8; i++) {
    if (a[i] == value) {
        found = true;
        break;
    }
}

GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.

The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.

The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.

Differential Revision: https://reviews.llvm.org/D24790

llvm-svn: 284044
2016-10-12 20:24:32 +00:00
Michael Zolotukhin
b2738e41bf [LoopUnroll] Switch the default value of -unroll-runtime-epilog back to its original value.
As agreed in post-commit review of r265388, I'm switching the flag to
its original value until the 90% runtime performance regression on
SingleSource/Benchmarks/Stanford/Bubblesort is addressed.

llvm-svn: 277524
2016-08-02 21:24:14 +00:00
Evgeny Stupachenko
23ce61b663 The patch fixes PR27392.
Summary:
 It is incorrect to compare TripCount (which is BECount + 1)
  with extraiters (or Count) to check if we should enter unrolled
  loop or not, because TripCount can potentially overflow
  (when BECount is max unsigned integer).
 While comparing BECount with (Count - 1) is overflow safe and
  therefore correct.

Reviewer: hfinkel

Differential Revision: http://reviews.llvm.org/D19256

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 267662
2016-04-27 03:04:54 +00:00
David L Kreitzer
188de5ae69 Adds the ability to use an epilog remainder loop during loop unrolling and makes
this the default behavior.

Patch by Evgeny Stupachenko (evstupac@gmail.com).

Differential Revision: http://reviews.llvm.org/D18158

llvm-svn: 265388
2016-04-05 12:19:35 +00:00
Kevin Qin
aef68418de [AArch64] Enable partial & runtime unrolling on cortex-a57
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.

llvm-svn: 231632
2015-03-09 06:14:28 +00:00