36 Commits

Author SHA1 Message Date
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Yingwei Zheng
b7f50e13d8
[InstCombine] Improve foldICmpWithDominatingICmp with DomConditionCache (#75370)
This patch uses affected values from DomConditionCache(introduced by #73662), instead of a cheap/incomplete check `getSinglePredecessor`.
2023-12-14 21:02:10 +08:00
David Green
75b3c3d267
[ARM] Disable UpperBound loop unrolling for MVE tail predicated loops. (#69709)
For MVE tail predicated loops, better code can be generated by keeping
the loop whole than to unroll to an upper bound, which requires the
expansion of active lane masks that can be difficult to generate good
code for. This patch disables UpperBound unrolling when we find a
active_lane_mask in the loop.
2023-10-31 09:51:30 +00:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Alex Richardson
e86d6a43f0 Regenerate test checks for tests affected by D141060 2023-10-04 10:51:35 -07:00
Nikita Popov
b9808e5660 [LoopUnroll] Fold add chains during unrolling
Loop unrolling tends to produce chains of
`%x1 = add %x0, 1; %x2 = add %x1, 1; ...` with one add per unrolled
iteration. This patch simplifies these adds to `%xN = add %x0, N`
directly during unrolling, rather than waiting for InstCombine to do so.

The motivation for this is that having a single add (rather than
an add chain) on the induction variable makes it a simple recurrence,
which we specially recognize in a number of places. This allows
InstCombine to directly perform folds with that knowledge, instead
of first folding the add chains, and then doing other folds in another
InstCombine iteration.

Due to the reduced number of InstCombine iterations, this also
results in a small compile-time improvement.

Differential Revision: https://reviews.llvm.org/D153540
2023-07-05 09:54:28 +02:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Nikita Popov
b7b0ce6e76 [LoopUnroll] Convert test to opaque pointers (NFC) 2023-01-06 11:48:03 +01:00
Nikita Popov
9c7afbacd8 [LoopUnroll] Name instructions in test (NFC) 2023-01-06 11:48:03 +01:00
Nikita Popov
ef992b6079 [LoopUnroll] Convert some tests to opaque pointers (NFC) 2022-12-23 16:35:26 +01:00
Roman Lebedev
5103ef64fe
[NFC] Port all (but one) LoopUnroll tests to -passes= syntax 2022-12-07 20:15:43 +03:00
Roman Lebedev
371fcb720e
[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP
That transformation is lossy, as discussed in
https://github.com/llvm/llvm-project/issues/53853
and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574

This is an alternative to D119839,
which would add a limited IPSCCP into SimplifyCFG.

Unlike lowering switch to lookup, we still want this transformation
to happen relatively early, but after giving a chance for the things
like CVP to do their thing. It seems like deferring it just until
the IPSCCP is enough for the tests at hand, but perhaps we need to
be more aggressive and disable it until CVP.

Fixes https://github.com/llvm/llvm-project/issues/53853
Refs. https://github.com/rust-lang/rust/issues/85133

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119854
2022-02-17 12:13:55 +03:00
Philip Reames
37ead201e6 [runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing.

Why does this matter?  A couple of reasons:
* SCEV doesn't have a native subtract node.  Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such.  As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones.  (You can see this in the inferred flags in some of the test cases.)
* Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language.  We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced.  (You can see this looking at nearby phis in the test cases.)

Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen.

* Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value.  We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.
2021-11-12 15:44:58 -08:00
Philip Reames
de2fed6152 [unroll] Keep unrolled iterations with initial iteration
The unrolling code was previously inserting new cloned blocks at the end of the function.  The result of this with typical loop structures is that the new iterations are placed far from the initial iteration.

With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting.  As such, placing Count-1 copies out of line is a fairly poor code placement choice.  We'd much rather fall through into the hot (non-exiting) path.  For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code.

However, the real motivation for this change isn't performance.  Its readability and human understanding.  Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
2021-11-12 11:40:50 -08:00
Philip Reames
f453e23e67 Autogen a bunch of unrolling tests for ease of update 2021-11-12 10:34:50 -08:00
Roman Lebedev
9c4c2f2472
[SimplifyCFG] Tail-merging all blocks with ret terminator
Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104597
2021-06-24 13:15:39 +03:00
David Green
8cfc080132 [ARM] Limit v6m unrolling with multiple live outs
v6m cores only have a limited number of registers available. Unrolling
can mean we spend more on stack spills and reloads than we save from the
unrolling. This patch adds an extra heuristic to put a limit on the
unroll count for loops with multiple live out values, as measured from
the LCSSA phi nodes.

Differential Revision: https://reviews.llvm.org/D104659
2021-06-23 16:36:37 +01:00
Roman Lebedev
ff4b1d379f
[NFCI-ish][SimplifyCFGPass] Rework and generalize ret block tail-merging
This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block,
and generalizes it to support dealing not with just the `ret` in the future.

This effectively lifts the CallBr restriction, although this isn't really intentional.
That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.

Other restrictions of the transform remain.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104598
2021-06-23 14:33:18 +03:00
David Green
da98177cda [ARM] Allow v6m runtime loop unrolling
This removes the restriction that only Thumb2 targets enable runtime
loop unrolling, allowing it for Thumb1 only cores as well. The existing
T2 heuristics are used (for the time being) to control when and how
unrolling is performed.

Differential Revision: https://reviews.llvm.org/D99588
2021-04-01 21:21:40 +01:00
David Green
14b2ec934e [ARM] Enable UpperBound unrolling for all loops
This UpperBound unrolling was already enabled so long as a series of
conditions in ARMTTIImpl::getUnrollingPreferences pass. This just always
enables it as it can help fully unroll loops that would not otherwise
pass those tests.

Differential Revision: https://reviews.llvm.org/D99174
2021-03-24 16:39:21 +00:00
David Green
003fab9e8d [ARM] Additional Upper bound unrolling test. NFC 2021-03-23 12:00:40 +00:00
David Green
c7e275388e [ARM] Don't aggressively unroll vector remainder loops
We already do not unroll loops with vector instructions under MVE, but
that does not include the remainder loops that the vectorizer produces.
These remainder loops will be rarely executed and are not worth
unrolling, as the trip count is likely to be low if they get executed at
all. Luckily they get llvm.loop.isvectorized to make recognizing them
simpler.

We have wanted to do this for a while but hit issues with low overhead
loops being reverted due to difficult registry allocation. With recent
changes that seems to be less of an issue now.

Differential Revision: https://reviews.llvm.org/D90055
2020-11-10 17:01:31 +00:00
David Green
44c1a56869 [ARM] Add extra MVE tests for various patches. NFC 2020-11-01 16:24:23 +00:00
Sam Parker
ea8448e361 [LoopUnroll] Adjust CostKind query
When TTI was updated to use an explicit cost, TCK_CodeSize was used
although the default implicit cost would have been the hand-wavey
cost of size and latency. So, revert back to this behaviour. This is
not expected to have (much) impact on targets since most (all?) of
them return the same value for SizeAndLatency and CodeSize.

When optimising for size, the logic has been changed to query
CodeSize costs instead of SizeAndLatency.

This patch also adds a testing option in the unroller so that
OptSize thresholds can be specified.

Differential Revision: https://reviews.llvm.org/D85723
2020-08-12 12:56:09 +01:00
Sjoerd Meijer
356685a1d8 Follow up of 67bf9a6154d4b82c, minor fix in test case, removed duplicate option 2020-01-10 09:41:41 +00:00
Sjoerd Meijer
67bf9a6154 [SVEV] Recognise hardware-loop intrinsic loop.decrement.reg
Teach SCEV about the @loop.decrement.reg intrinsic, which has exactly the same
semantics as a sub expression. This allows us to query hardware-loops, which
contain this @loop.decrement.reg intrinsic, so that we can calculate iteration
counts, exit values, etc. of hardwareloops.

This "int_loop_decrement_reg" intrinsic is defined as "IntrNoDuplicate". Thus,
while hardware-loops and tripcounts now become analysable by SCEV, this
prevents the usual loop transformations from applying transformations on
hardware-loops, which is what we want at this point, for which I have added
test cases for loopunrolling and IndVarSimplify and LFTR.

Differential Revision: https://reviews.llvm.org/D71563
2020-01-10 09:35:00 +00:00
Sam Parker
15c7fa4d11 [ARM][MVE] Don't unroll intrinsic loops.
We don't unroll vector loops for MVE targets, but we miss the case
when loops only contain intrinsic calls. So just move the logic a
bit to catch this case.

Differential Revision: https://reviews.llvm.org/D72440
2020-01-09 11:57:34 +00:00
David Green
11c4602fce [MVE] Don't try to unroll vectorised MVE loops
Due to the nature of the beat system in the MVE architecture, along with tail
predication and low-overhead loops, unrolling has less benefit compared to
normal loops. You can not, for example, hide the latency of a load with other
instructions as you can for scalar code. Preventing unrolling also makes the
code easier to read and reason about.

So if a loop contains vector code, don't enable the runtime unrolling. At least
for the time being.

Differential Revision: https://reviews.llvm.org/D65803

llvm-svn: 368530
2019-08-11 08:53:18 +00:00
Fangrui Song
ac14f7b10c [lit] Delete empty lines at the end of lit.local.cfg NFC
llvm-svn: 363538
2019-06-17 09:51:07 +00:00
David Green
d847aa573b [ARM] Enable Unroll UpperBound
This option allows loops with small max trip counts to be fully unrolled. This
can help with code like the remainder loops from manually unrolled loops like
those that appear in the cmsis dsp library. We would apparently previously
runtime unroll them with the default unroll count (4).

Differential Revision: https://reviews.llvm.org/D63064

llvm-svn: 362928
2019-06-10 10:22:14 +00:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Sam Parker
a16667e79b [ARM] Use Cortex-A57 sched model for Cortex-A72
This mirrors what we already do for AArch64 as the cores are similar.
As discussed in the review, enabling the machine scheduler causes
more variations in performance changes so it is not enabled for now.
This patch improves LNT scores by a geomean of 1.57% at -O3.

Differential Revision: https://reviews.llvm.org/D53562

llvm-svn: 345272
2018-10-25 15:08:29 +00:00
Sam Parker
487ab86942 [ARM] Allow unrolling of multi-block loops.
Before, loop unrolling was only enabled for loops with a single
block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.

Differential Revision: https://reviews.llvm.org/D38952

llvm-svn: 316313
2017-10-23 08:05:14 +00:00
Sam Parker
84fd0c3bf2 [ARM] Improve loop unrolling for Cortex-M
- Set the default runtime unroll count to 4 and use the newly added
  UnrollRemainder option.
- Create loop cost and force unroll for a cost less than 12.
- Disable unrolling on Thumb1 only targets.

Differential Revision: https://reviews.llvm.org/D36134

llvm-svn: 310997
2017-08-16 07:42:44 +00:00
Sam Parker
19a08e42a8 [ARM] Enable partial and runtime unrolling
Enable runtime and partial loop unrolling of simple loops without
calls on M-class cores. The thresholds are calculated based on
whether the target is Thumb or Thumb-2.

Differential Revision: https://reviews.llvm.org/D34619

llvm-svn: 308956
2017-07-25 08:51:30 +00:00