13 Commits

Author SHA1 Message Date
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Yingwei Zheng
b7f50e13d8
[InstCombine] Improve foldICmpWithDominatingICmp with DomConditionCache (#75370)
This patch uses affected values from DomConditionCache(introduced by #73662), instead of a cheap/incomplete check `getSinglePredecessor`.
2023-12-14 21:02:10 +08:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Alex Richardson
e86d6a43f0 Regenerate test checks for tests affected by D141060 2023-10-04 10:51:35 -07:00
Nikita Popov
ef992b6079 [LoopUnroll] Convert some tests to opaque pointers (NFC) 2022-12-23 16:35:26 +01:00
Roman Lebedev
5103ef64fe
[NFC] Port all (but one) LoopUnroll tests to -passes= syntax 2022-12-07 20:15:43 +03:00
Roman Lebedev
371fcb720e
[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP
That transformation is lossy, as discussed in
https://github.com/llvm/llvm-project/issues/53853
and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574

This is an alternative to D119839,
which would add a limited IPSCCP into SimplifyCFG.

Unlike lowering switch to lookup, we still want this transformation
to happen relatively early, but after giving a chance for the things
like CVP to do their thing. It seems like deferring it just until
the IPSCCP is enough for the tests at hand, but perhaps we need to
be more aggressive and disable it until CVP.

Fixes https://github.com/llvm/llvm-project/issues/53853
Refs. https://github.com/rust-lang/rust/issues/85133

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D119854
2022-02-17 12:13:55 +03:00
Philip Reames
de2fed6152 [unroll] Keep unrolled iterations with initial iteration
The unrolling code was previously inserting new cloned blocks at the end of the function.  The result of this with typical loop structures is that the new iterations are placed far from the initial iteration.

With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting.  As such, placing Count-1 copies out of line is a fairly poor code placement choice.  We'd much rather fall through into the hot (non-exiting) path.  For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code.

However, the real motivation for this change isn't performance.  Its readability and human understanding.  Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
2021-11-12 11:40:50 -08:00
Roman Lebedev
9c4c2f2472
[SimplifyCFG] Tail-merging all blocks with ret terminator
Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104597
2021-06-24 13:15:39 +03:00
Roman Lebedev
ff4b1d379f
[NFCI-ish][SimplifyCFGPass] Rework and generalize ret block tail-merging
This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block,
and generalizes it to support dealing not with just the `ret` in the future.

This effectively lifts the CallBr restriction, although this isn't really intentional.
That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.

Other restrictions of the transform remain.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D104598
2021-06-23 14:33:18 +03:00
David Green
14b2ec934e [ARM] Enable UpperBound unrolling for all loops
This UpperBound unrolling was already enabled so long as a series of
conditions in ARMTTIImpl::getUnrollingPreferences pass. This just always
enables it as it can help fully unroll loops that would not otherwise
pass those tests.

Differential Revision: https://reviews.llvm.org/D99174
2021-03-24 16:39:21 +00:00
David Green
003fab9e8d [ARM] Additional Upper bound unrolling test. NFC 2021-03-23 12:00:40 +00:00
David Green
d847aa573b [ARM] Enable Unroll UpperBound
This option allows loops with small max trip counts to be fully unrolled. This
can help with code like the remainder loops from manually unrolled loops like
those that appear in the cmsis dsp library. We would apparently previously
runtime unroll them with the default unroll count (4).

Differential Revision: https://reviews.llvm.org/D63064

llvm-svn: 362928
2019-06-10 10:22:14 +00:00