73 Commits

Author SHA1 Message Date
Florian Hahn
dce77a3579
[IndVars] Preserve flags of narrow IV inc if replacing with wider inc. (#80446)
We are replacing a narrow IV increment with a wider one. If the original
(narrow) increment did not wrap, the wider one should not wrap either.
Set the flags to be the union of both wide increment and original
increment; this ensures we preserve flags SCEV could infer for the wider
increment.

Fixes https://github.com/llvm/llvm-project/issues/71517.
2024-02-10 18:11:17 +00:00
Nikita Popov
66b339aa6b [IndVars] Regenerate test checks (NFC) 2024-02-02 16:30:29 +01:00
paperchalice
e390c229a4
[Pass] Add hyphen to some pass names (#74287)
Here is the list of the renamed passes:
- `callbrprepare` -> `callbr-prepare`
- `dwarfehprepare` -> `dwarf-eh-prepare`
- `flattencfg` -> `flatten-cfg`
- `loweratomic` -> `lower-atomic`
- `lowerinvoke` -> `lower-invoke`
- `lowerswitch` -> `lower-switch`
- `winehprepare` -> `win-eh-prepare`
- `targetir` -> `target-ir`
- `targetlibinfo` -> `target-lib-info`

Legacy passes are not affected.
2024-01-25 16:05:54 +08:00
Markos Horro
9d2903c8e5
[IndVars] Add check of loop invariant for trunc instructions (#71072)
The same idea as in 34d380e1f63a7e2cdb9ab1e6498f727fcd710a14, but considering
truncation instructions.
Improvement for #59633.
2023-11-08 11:16:23 +00:00
Philip Reames
a7f35d54ee
[SCEV] Extend isImpliedCondOperandsViaRanges to independent predicates (#71110)
As far as I can tell, there's nothing in this code which actually
assumes the two predicates in (FoundLHS FoundPred FoundRHS) => (LHS Pred
RHS) are the same.

Noticed while investigating something else, this is purely an
oppurtunistic optimization while I'm looking at the code. Unfortunately,
this doesn't solve my original problem. :)
2023-11-07 07:25:47 -08:00
Philip Reames
5adf6ab7ff Revert "[IndVars] Generate zext nneg when locally obvious"
This reverts commit a6c8e27b3a052913a15a13ee0d4ac466c5ab3f92.  It appears likely to have caused https://lab.llvm.org/buildbot/#/builders/57/builds/30988.
2023-11-03 11:19:14 -07:00
Philip Reames
a6c8e27b3a [IndVars] Generate zext nneg when locally obvious
zext nneg was recently added to the IR in #67982.  This patch teaches
SimplifyIndVars to prefer zext nneg over *both* sext and plain zext,
when a local SCEV query indicates the source is non-negative.

The choice to prefer zext nneg over sext looks slightly aggressive
here, but probably isn't so much in practice.  For cases where we'd
"remember" the range fact, instcombine would convert the sext into
a zext nneg anyways.  The only cases where this produces a different
result overall are when SCEV knows a non-local fact, and it doesn't
get materialized into the IR.  Those are exactly the cases where
using zext nneg are most useful.  We do run the risk of e.g. a
missing combine - since we haven't updated most of them yet - but
that seems like a manageable risk.

Note that there are much deeper algorithmic changes we could make
to this code to exploit zext nneg, but this seemed like a reasonable
and low risk starting point.
2023-11-03 09:20:59 -07:00
Philip Reames
a78f5c0649
[IndVars] Use IRBuilder in eliminateTrunc [nfc-ish] (#70836)
Mostly a cleanup so that we don't need to manually emit instructions,
and can eagerly constant fold where relevant.
2023-10-31 14:37:57 -07:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Max Kazantsev
f3e2f26378 [IndVars] Expand icmp in preheader rather than in loop
The motivation is that 'createInvariantCond' unconditionally
builds icmp in the loop block,  while it could always do it
in preheader. Build it in preheader instead.

Patch by Aleksandr Popov!

Differential Revision: https://reviews.llvm.org/D141994
Reviewed By: nikic
2023-01-25 14:41:29 +07:00
Max Kazantsev
602916d2de [IndVars] Apply more optimistic SkipLastIter for AND/OR conditions
When exit by condition `C1` dominates exit by condition `C2`, and
max symbolic exit count for `C1` matches those for loop, we will
apply more optimistic logic to `C2` by setting `SkipLastIter` for it,
meaning that it will do 1 iteration less because the dominating branch
must exit on the last loop iteration.

But when we have a single exit by condition `C1 & C2`, we cannot
apply the same logic, because there is no dominating condition.

However, if we can prove that the symbolic max exit count of `C1 & C2`
matches those of `C1`, it means that for `C2` we can assume that it
doesn't matter on the last iteration (because the whole thing is `false`
because `C1` must be `false`). Therefore, in this situation, we can handle
`C2` as if it had `SkipLastIter`.

Differential Revision: https://reviews.llvm.org/D139934
Reviewed By: nikic
2023-01-24 12:26:22 +07:00
Max Kazantsev
32aea4bb84 [Test] Add test showing one more missing case of turn-to-invariant with widening 2023-01-11 13:07:38 +07:00
Max Kazantsev
fae63a9a22 [Test] Give test variables more reasonable names 2023-01-11 12:29:26 +07:00
Max Kazantsev
d02c3b1358 [Test] Add test showing potential conflict b/w AND elimination and IV widening 2022-12-26 14:37:49 +07:00
Max Kazantsev
9a7286b61f [SCEV] Help getLoopInvariantExitCondDuringFirstIterations deal with complex umin exit counts. PR59615
Recent improvements in symbolic exit count computation revealed some problems with
SCEV's ability to find invariant predicate during first iterations. Ultimately it is based on its
ability to prove some facts for value on the last iteration. This last value, when it includes
`umin` as part of exit count, isn't always simplified enough. The motivating example is following:

https://github.com/llvm/llvm-project/issues/59615

Could not prove:
```
        Pred = 36, LHS = (-1 + (-1 * (2147483645 umin (-1 + %var)<nsw>))<nsw> + %var), RHS = %var
        FoundPred = 36, FoundLHS = {1,+,1}<nuw><nsw><%bb3>, FoundRHS = %var
```
Can prove:
```
        Pred = 36, LHS = (-1 + (-1 * (-1 + %var)<nsw>)<nsw> + %var), RHS = %var
        FoundPred = 36, FoundLHS = {1,+,1}<nuw><nsw><%bb3>, FoundRHS = %var
```

Here ` (2147483645 umin (-1 + %var)<nsw>)` is exit count composed of two parts from
two different exits: `2147483645 ` and `(-1 + %var)<nsw>`. When it was only one (latter)
analyzeable exit, for it everything was easily provable. Unfortunately, in general case `umin`
in one of `add`'s operands doesn't guarantee that the whole sum reduces, especially in presence
of negative steps and lack of `nuw`. I don't think there is a generic legal way to somehow play
around this `umin`.

So the ad-hoc solution is following: if we failed to find an equivalent predicate that is invariant
during first `MaxIter` iterations, and `MaxIter = umin(a, b, c...)`, try to find solution for at least one
of `a`, `b`, `c`... Because they all are `uge` than `MaxIter`, whatever is true during `a (b, c)` iterations
is also true during `MaxIter` iterations.

Differential Revision: https://reviews.llvm.org/D140456
Reviewed By: nikic
2022-12-21 18:12:17 +07:00
Max Kazantsev
474c8fe9b7 [Test] Precommit test for PR59615 2022-12-21 11:39:05 +07:00
Nikita Popov
3ce360f15b [IndVarSimplify] Convert more tests to opaque pointers (NFC) 2022-12-14 15:37:58 +01:00
Nikita Popov
8b7b5f9cfe [IndVarSimplify] Regenerate test checks (NFC) 2022-12-14 15:35:58 +01:00
Nikita Popov
864bb84a42 [IndVarSimplify] Convert tests to opaque pointers (NFC)
This leaves lftr.ll alone, because there is a suspicious test diff.
2022-12-13 14:50:13 +01:00
Roman Lebedev
67bbdd05c4
[NFC] Port all IndVarSimplify tests to -passes= syntax 2022-12-08 02:38:44 +03:00
Roman Lebedev
6017d9a628
[NFC] Port all IndVarSimplify tests to -passes= syntax 2022-12-07 22:22:09 +03:00
Bjorn Pettersson
a11faeed44 [test] Switch to use -passes syntax in various test cases 2022-12-01 21:25:59 +01:00
Max Kazantsev
57fd7ffeff [IndVarSimplify] Lift limitations on IV being a Phi for turn-to-invariant
These limitations are too strict, and their only purpose is to avoid code
size explosion. These restrictions seem obsolete, and the size problem
is solved in other places through cheap expansion limits.

The motivation is that the old code cannot deal with comparisons against
induction variant's increment.

Differential Revision: https://reviews.llvm.org/D138412
Reviewed By: lebedev.ri, reames
2022-11-22 12:53:37 +07:00
Max Kazantsev
41e41cc2c4 [Test] Add some test showing limitations of makeIVComparisonInvariant
The transform doesn't work if argument isn't immediately a Phi.
2022-11-21 16:20:36 +07:00
Bjorn Pettersson
f15ed06a65 [test][IndVarSimplify] Use -passes syntax in RUN lines. NFC 2022-10-13 10:44:37 +02:00
Max Kazantsev
86d5586d78 [SCEVExpander] Recompute poison-generating flags on hoisting. PR57187
Instruction being hoisted could have nuw/nsw flags inferred from the old
context, and we cannot simply move it to the new location keeping them
because we are going to introduce new uses to them that didn't exist before.

Example in https://github.com/llvm/llvm-project/issues/57187 shows how
this can produce branch by poison from initially well-defined program.

This patch forcefully recomputes poison-generating flag in the new context.

Differential Revision: https://reviews.llvm.org/D132022
Reviewed By: fhahn, nikic
2022-09-13 12:56:35 +07:00
Max Kazantsev
e351e8213b [Test] Remove addrspace1 ptr to not confuse alive2
addrspace here is not import for the test itself.
2022-08-19 13:25:50 +07:00
David Spickett
8375c3124d [LLVM][IndvarSimplify] Move test that requires X86
This is failing on our bots that only build Arm/AArch64.

https://lab.llvm.org/buildbot/#/builders/171/builds/19033/steps/5/logs/FAIL__LLVM__pr57187_ll
2022-08-17 11:14:08 +00:00
Max Kazantsev
ebabd6bf18 Return "[SCEV] Use context to strengthen flags of BinOps"
This reverts commit 354fa0b48008eca701a110badd6974bf449df257.

Returning as is. The patch was reverted due to a miscompile, but
this patch is not causing it. This patch made it possible to infer
some nuw flags in code guarded by `false` condition, and then someone
else to managed to propagate the flag from dead code outside.

Returning the patch to be able to reproduce the issue.
2022-08-16 14:12:36 +07:00
Max Kazantsev
354fa0b480 Revert "[SCEV] Use context to strengthen flags of BinOps"
This reverts commit 34ae308c73e4d76dbdab25a6206d3fbc5ebafdf5.

Our internal testing found a miscompile. Not sure if it's caused by
this patch or it revealed something else. Reverting while investigating.
2022-08-15 18:51:59 +07:00
Max Kazantsev
34ae308c73 [SCEV] Use context to strengthen flags of BinOps
Sometimes SCEV cannot infer nuw/nsw from something as simple as
```
  len in [0, MAX_INT]
...
  iv = phi(0, iv.next)
  guard(iv <s len)
  guard(iv <u len)
  iv.next = iv + 1
```
just because flag strenthening only relies on definition and does not use local facts.
This patch adds support for the simplest case: inference of flags of `add(x, constant)`
if we can contextually prove that `x <= max_int - constant`.

In case if it has negative CT impact, we can add an option to switch it off. I woudln't
expect that though.

Differential Revision: https://reviews.llvm.org/D129643
Reviewed By: apilipenko
2022-08-03 14:08:57 +07:00
Nuno Lopes
9df0b254d2 [NFC] Switch a few uses of undef to poison as placeholders for unreachable code 2022-07-23 21:50:11 +01:00
Nikita Popov
af49bed933 [IndVars] Simplify instructions after replacing header phi with preheader value
After replacing a loop phi with the preheader value, it's usually
possible to simplify some of the using instructions, so do that as
part of replaceLoopPHINodesWithPreheaderValues().

Doing this as part of IndVars is valuable, because it may make GEPs
in the loop have constant offsets and allow the following SROA run
to succeed (as demonstrated in the PhaseOrdering test).

Differential Revision: https://reviews.llvm.org/D129293
2022-07-13 10:27:04 +02:00
Nikita Popov
a5ee62a141 [IndVars] Call replaceLoopPHINodesWithPreheaderValues() for already constant exits
Currently we only call replaceLoopPHINodesWithPreheaderValues() if
optimizeLoopExits() replaces the exit with an unconditional exit.
However, it is very common that this already happens as part of
eliminateIVComparison(), in which case we're leaving behind the
dead header phi.

Tweak the early bailout for already-constant exits to also call
replaceLoopPHINodesWithPreheaderValues().

Differential Revision: https://reviews.llvm.org/D129214
2022-07-13 09:43:21 +02:00
Nikita Popov
d1e880acaa [SCEV] Enable verification in LoopPM
Currently, we hardly ever actually run SCEV verification, even in
tests with -verify-scev. This is because the NewPM LPM does not
verify SCEV. The reason for this is that SCEV verification can
actually change the result of subsequent SCEV queries, which means
that you see different transformations depending on whether
verification is enabled or not.

To allow verification in the LPM, this limits verification to
BECounts that have actually been cached. It will not calculate
new BECounts.

BackedgeTakenInfo::getExact() is still not entirely readonly,
it still calls getUMinFromMismatchedTypes(). But I hope that this
is not problematic in the same way. (This could be avoided by
performing the umin in the other SCEV instance, but this would
require duplicating some of the code.)

Differential Revision: https://reviews.llvm.org/D120551
2022-03-07 09:46:20 +01:00
Nikita Popov
aeab6167b0 [SCEV] Only verify BECounts for reachable loops (PR50523)
For unreachable loops, any BECount is legal, and since D98706 SCEV
can make use of this for loops that are unreachable due to constant
branches. To avoid false positives, adjust SCEV verification to only
check BECounts in reachable loops.

Fixes https://github.com/llvm/llvm-project/issues/50523.

Differential Revision: https://reviews.llvm.org/D120651
2022-03-01 11:52:35 +01:00
Nikita Popov
859567725d [IndVars] Don't run full optimization pipeline in test (NFC)
This extracts the IR prior to IndVarSimplify and only runs the
single pass.
2022-02-17 09:28:33 +01:00
Dmitry Makogon
62f86d4f95 Reapply 5ec2386 "Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs""
This reverts commit 7cd273c339cfe8427404f881ae280bd9fae6ff78.

Several patches with tests fixes have been applied:
0cada82f0a30e5ae22dce66b58604ab9b47a3897 "[Test] Remove incorrect test in GVN"
97cb13615d6d9df254e3c0f3deef9eaedfe189b6 "[Test] Separate IndVars test into AArch64 and X86 parts"
985cc490f17d28b20392ee214895d947b85120ef "[Test] Remove separated test in IndVars",
and test failures caused by 5ec2386 should be resolved now.
2021-11-10 17:36:14 +07:00
Dmitry Makogon
97cb13615d [Test] Separate IndVars test into AArch64 and X86 parts
The widen-loop-comp.ll in indvars has a target triple with
specified aarch64 architecture. This caused test failures with
db28934 "[IndVars] Pass TTI to replaceCongruentIVs" applied, because
with the patch indvars performed some target-specific
transforms, and for example if a build supported only X86,
then indvars would not have applied those transforms.
However, the checks in the test were generated as for aarch64.
Thus the test failures on such builds.

This patch separates widen-loop-comp.ll into two parts.
The first one is intended to be run only if a build supports aarch64.
This is now in AArch64 directory with a lit config.

The second one was added recently to show db28934 improvements.
This one is now in X86 directory.

This patch should resolve build issues caused by
5ec23863320ca12bfabb6dcff1d0425cb614b7a5.
2021-11-10 16:15:20 +07:00
Douglas Yung
7cd273c339 Revert "Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs""
This reverts commit 5ec23863320ca12bfabb6dcff1d0425cb614b7a5.

This change is causing test failures on the PS4 linux build bot: https://lab.llvm.org/buildbot/#/builders/139/builds/12871
2021-11-09 10:28:41 -08:00
Dmitry Makogon
5ec2386332 Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs"
This reapplies patch db289340c841990055a164e8eb2a3b5ff25677bf.

The test failures on build with expensive checks caused by the patch happened due
to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort,
which shuffles the given container if the expensive checks are enabled,
so equivalent Phis in the sorted vector had different mutual order from run
to run. replaceCongruentIVs tries to replace narrow Phis with truncations
of wide ones. In some test cases there were several Phis with the same
width, so if their order differs from run to run, the narrow Phis would
be replaced with a different Phi, depending on the shuffling result.

The patch ae14fae0ff4304022beda5ab484f84ac0fdda807 fixed this issue by
replacing llvm::sort with llvm::stable_sort.
2021-11-09 17:42:29 +07:00
Dmitry Makogon
8d4eba6c0d Revert "[IndVars] Pass TTI to replaceCongruentIVs"
This reverts commit db289340c841990055a164e8eb2a3b5ff25677bf.

The patch caused 2 crashes with expensive checks enabled.
2021-11-08 19:35:14 +07:00
Dmitry Makogon
db289340c8 [IndVars] Pass TTI to replaceCongruentIVs
In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'.
This function optionally takes a TTI argument to be able to replace narrow IVs uses
with truncates of the widest one.
For some reason the TTI wasn't passed to the function, so it couldn't perform such
transform.
This patch fixes it.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D113024
2021-11-08 19:20:53 +07:00
Philip Reames
e69f6476a8 Autogen tests for ease of future update 2021-11-05 12:46:07 -07:00
Dmitry Makogon
dd000e67f0 [Test] Regenerate IndVars test's checks
This just regenerates a certain IndVars test's checks.
2021-11-02 22:03:58 +07:00
Simon Pilgrim
c931d35216 [CostModel][X86] Increase i64 mul cost from 1 to 2
Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization.

Noticed while investigating PR51436.
2021-09-23 14:48:21 +01:00
Simon Pilgrim
6de42e104f [IndVarSimplify][X86] Regenerate loop-invariant-conditions.ll test checks 2021-07-07 13:58:28 +01:00
serge-sans-paille
4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee04c75b1f1332b4fd1ac4e8ef6c3c247.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille
bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Roman Lebedev
a26f1bf67e
[PassManager] Run additional LICM before LoopRotate
Loop rotation often has to perform code duplication
from header into preheader, which introduces PHI nodes.

>>! In D99204, @thopre wrote:
>
> With loop peeling, it is important that unnecessary PHIs be avoided or
> it will leads to spurious peeling. One source of such PHIs is loop
> rotation which creates PHIs for invariant loads. Those PHIs are
> particularly problematic since loop peeling is now run as part of simple
> loop unrolling before GVN is run, and are thus a source of spurious
> peeling.
>
> Note that while some of the load can be hoisted and eventually
> eliminated by instruction combine, this is not always possible due to
> alignment issue. In particular, the motivating example [1] was a load
> inside a class instance which cannot be hoisted because the `this'
> pointer has an alignment of 1.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp

Now, we could enhance LoopRotate to avoid duplicating code when not needed,
but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*)
We have LICM, and in fact we already run it right after LoopRotation.

We could try to move it to before LoopRotation,
that is basically free from compile-time perspective:
https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions
But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular:
| statistic name                                   | LoopRotate-LICM | LICM-LoopRotate |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                         | 9015930         | 9015799         |  -131 |   0.00% |  0.00% |
| indvars.NumElimCmp                               | 3536            | 3544            |     8 |   0.23% |  0.23% |
| indvars.NumElimExt                               | 36725           | 36580           |  -145 |  -0.39% |  0.39% |
| indvars.NumElimIV                                | 1197            | 1187            |   -10 |  -0.84% |  0.84% |
| indvars.NumElimIdentity                          | 143             | 136             |    -7 |  -4.90% |  4.90% |
| indvars.NumElimRem                               | 4               | 5               |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                                  | 29842           | 29890           |    48 |   0.16% |  0.16% |
| indvars.NumReplaced                              | 2293            | 2227            |   -66 |  -2.88% |  2.88% |
| indvars.NumSimplifiedSDiv                        | 6               | 8               |     2 |  33.33% | 33.33% |
| indvars.NumWidened                               | 26438           | 26329           |  -109 |  -0.41% |  0.41% |
| instcount.TotalBlocks                            | 1178338         | 1173840         | -4498 |  -0.38% |  0.38% |
| instcount.TotalFuncs                             | 111825          | 111829          |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                             | 9905442         | 9896139         | -9303 |  -0.09% |  0.09% |
| lcssa.NumLCSSA                                   | 425871          | 423961          | -1910 |  -0.45% |  0.45% |
| licm.NumHoisted                                  | 378357          | 378753          |   396 |   0.10% |  0.10% |
| licm.NumMovedCalls                               | 2193            | 2208            |    15 |   0.68% |  0.68% |
| licm.NumMovedLoads                               | 35899           | 31821           | -4078 | -11.36% | 11.36% |
| licm.NumPromoted                                 | 11178           | 11154           |   -24 |  -0.21% |  0.21% |
| licm.NumSunk                                     | 13359           | 13587           |   228 |   1.71% |  1.71% |
| loop-delete.NumDeleted                           | 8547            | 8402            |  -145 |  -1.70% |  1.70% |
| loop-instsimplify.NumSimplified                  | 12876           | 11890           |  -986 |  -7.66% |  7.66% |
| loop-peel.NumPeeled                              | 1008            | 925             |   -83 |  -8.23% |  8.23% |
| loop-rotate.NumNotRotatedDueToHeaderSize         | 368             | 365             |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                           | 42015           | 42003           |   -12 |  -0.03% |  0.03% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 240             | 242             |     2 |   0.83% |  0.83% |
| loop-simplifycfg.NumLoopExitsDeleted             | 497             | 20              |  -477 | -95.98% | 95.98% |
| loop-simplifycfg.NumTerminatorsFolded            | 618             | 336             |  -282 | -45.63% | 45.63% |
| loop-unroll.NumCompletelyUnrolled                | 11028           | 11032           |     4 |   0.04% |  0.04% |
| loop-unroll.NumUnrolled                          | 12608           | 12529           |   -79 |  -0.63% |  0.63% |
| mem2reg.NumDeadAlloca                            | 10222           | 10221           |    -1 |  -0.01% |  0.01% |
| mem2reg.NumPHIInsert                             | 192110          | 192106          |    -4 |   0.00% |  0.00% |
| mem2reg.NumSingleStore                           | 637650          | 637643          |    -7 |   0.00% |  0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 814             | 812             |    -2 |  -0.25% |  0.25% |
| scalar-evolution.NumTripCountsComputed           | 283108          | 282934          |  -174 |  -0.06% |  0.06% |
| scalar-evolution.NumTripCountsNotComputed        | 106712          | 106718          |     6 |   0.01% |  0.01% |
| simple-loop-unswitch.NumBranches                 | 5178            | 4752            |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 914             | 503             |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches                 | 20              | 18              |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial                  | 183             | 95              |   -88 | -48.09% | 48.09% |

... but that actually regresses LICM (-12% `licm.NumMovedLoads`),
loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`),
simple-loop-unswitch (`NumTrivial`).

What if we instead have LICM both before and after LoopRotate?
| statistic name                                | LoopRotate-LICM | LICM-LoopRotate-LICM |     Δ |       % | abs(%) |
| asm-printer.EmittedInsts                      | 9015930         | 9014474              | -1456 |  -0.02% |  0.02% |
| indvars.NumElimCmp                            | 3536            | 3546                 |    10 |   0.28% |  0.28% |
| indvars.NumElimExt                            | 36725           | 36681                |   -44 |  -0.12% |  0.12% |
| indvars.NumElimIV                             | 1197            | 1185                 |   -12 |  -1.00% |  1.00% |
| indvars.NumElimIdentity                       | 143             | 146                  |     3 |   2.10% |  2.10% |
| indvars.NumElimRem                            | 4               | 5                    |     1 |  25.00% | 25.00% |
| indvars.NumLFTR                               | 29842           | 29899                |    57 |   0.19% |  0.19% |
| indvars.NumReplaced                           | 2293            | 2299                 |     6 |   0.26% |  0.26% |
| indvars.NumSimplifiedSDiv                     | 6               | 8                    |     2 |  33.33% | 33.33% |
| indvars.NumWidened                            | 26438           | 26404                |   -34 |  -0.13% |  0.13% |
| instcount.TotalBlocks                         | 1178338         | 1173652              | -4686 |  -0.40% |  0.40% |
| instcount.TotalFuncs                          | 111825          | 111829               |     4 |   0.00% |  0.00% |
| instcount.TotalInsts                          | 9905442         | 9895452              | -9990 |  -0.10% |  0.10% |
| lcssa.NumLCSSA                                | 425871          | 425373               |  -498 |  -0.12% |  0.12% |
| licm.NumHoisted                               | 378357          | 383352               |  4995 |   1.32% |  1.32% |
| licm.NumMovedCalls                            | 2193            | 2204                 |    11 |   0.50% |  0.50% |
| licm.NumMovedLoads                            | 35899           | 35755                |  -144 |  -0.40% |  0.40% |
| licm.NumPromoted                              | 11178           | 11163                |   -15 |  -0.13% |  0.13% |
| licm.NumSunk                                  | 13359           | 14321                |   962 |   7.20% |  7.20% |
| loop-delete.NumDeleted                        | 8547            | 8538                 |    -9 |  -0.11% |  0.11% |
| loop-instsimplify.NumSimplified               | 12876           | 12041                |  -835 |  -6.48% |  6.48% |
| loop-peel.NumPeeled                           | 1008            | 924                  |   -84 |  -8.33% |  8.33% |
| loop-rotate.NumNotRotatedDueToHeaderSize      | 368             | 365                  |    -3 |  -0.82% |  0.82% |
| loop-rotate.NumRotated                        | 42015           | 42005                |   -10 |  -0.02% |  0.02% |
| loop-simplifycfg.NumLoopBlocksDeleted         | 240             | 241                  |     1 |   0.42% |  0.42% |
| loop-simplifycfg.NumTerminatorsFolded         | 618             | 619                  |     1 |   0.16% |  0.16% |
| loop-unroll.NumCompletelyUnrolled             | 11028           | 11029                |     1 |   0.01% |  0.01% |
| loop-unroll.NumUnrolled                       | 12608           | 12525                |   -83 |  -0.66% |  0.66% |
| mem2reg.NumPHIInsert                          | 192110          | 192073               |   -37 |  -0.02% |  0.02% |
| mem2reg.NumSingleStore                        | 637650          | 637652               |     2 |   0.00% |  0.00% |
| scalar-evolution.NumTripCountsComputed        | 283108          | 282998               |  -110 |  -0.04% |  0.04% |
| scalar-evolution.NumTripCountsNotComputed     | 106712          | 106691               |   -21 |  -0.02% |  0.02% |
| simple-loop-unswitch.NumBranches              | 5178            | 5185                 |     7 |   0.14% |  0.14% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 925                  |    11 |   1.20% |  1.20% |
| simple-loop-unswitch.NumTrivial               | 183             | 179                  |    -4 |  -2.19% |  2.19% |
| simple-loop-unswitch.NumBranches              | 5178            | 4752                 |  -426 |  -8.23% |  8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914             | 503                  |  -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches              | 20              | 18                   |    -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial               | 183             | 95                   |   -88 | -48.09% | 48.09% |

I.e. we end up with less instructions, less peeling, more LICM activity,
also note how none of those 4 regressions are here. Namely:

| statistic name                                   | LICM-LoopRotate | LICM-LoopRotate-LICM |     Δ |        % |   abs(%) |
| asm-printer.EmittedInsts                         | 9015799         | 9014474              | -1325 |   -0.01% |    0.01% |
| indvars.NumElimCmp                               | 3544            | 3546                 |     2 |    0.06% |    0.06% |
| indvars.NumElimExt                               | 36580           | 36681                |   101 |    0.28% |    0.28% |
| indvars.NumElimIV                                | 1187            | 1185                 |    -2 |   -0.17% |    0.17% |
| indvars.NumElimIdentity                          | 136             | 146                  |    10 |    7.35% |    7.35% |
| indvars.NumLFTR                                  | 29890           | 29899                |     9 |    0.03% |    0.03% |
| indvars.NumReplaced                              | 2227            | 2299                 |    72 |    3.23% |    3.23% |
| indvars.NumWidened                               | 26329           | 26404                |    75 |    0.28% |    0.28% |
| instcount.TotalBlocks                            | 1173840         | 1173652              |  -188 |   -0.02% |    0.02% |
| instcount.TotalInsts                             | 9896139         | 9895452              |  -687 |   -0.01% |    0.01% |
| lcssa.NumLCSSA                                   | 423961          | 425373               |  1412 |    0.33% |    0.33% |
| licm.NumHoisted                                  | 378753          | 383352               |  4599 |    1.21% |    1.21% |
| licm.NumMovedCalls                               | 2208            | 2204                 |    -4 |   -0.18% |    0.18% |
| licm.NumMovedLoads                               | 31821           | 35755                |  3934 |   12.36% |   12.36% |
| licm.NumPromoted                                 | 11154           | 11163                |     9 |    0.08% |    0.08% |
| licm.NumSunk                                     | 13587           | 14321                |   734 |    5.40% |    5.40% |
| loop-delete.NumDeleted                           | 8402            | 8538                 |   136 |    1.62% |    1.62% |
| loop-instsimplify.NumSimplified                  | 11890           | 12041                |   151 |    1.27% |    1.27% |
| loop-peel.NumPeeled                              | 925             | 924                  |    -1 |   -0.11% |    0.11% |
| loop-rotate.NumRotated                           | 42003           | 42005                |     2 |    0.00% |    0.00% |
| loop-simplifycfg.NumLoopBlocksDeleted            | 242             | 241                  |    -1 |   -0.41% |    0.41% |
| loop-simplifycfg.NumLoopExitsDeleted             | 20              | 497                  |   477 | 2385.00% | 2385.00% |
| loop-simplifycfg.NumTerminatorsFolded            | 336             | 619                  |   283 |   84.23% |   84.23% |
| loop-unroll.NumCompletelyUnrolled                | 11032           | 11029                |    -3 |   -0.03% |    0.03% |
| loop-unroll.NumUnrolled                          | 12529           | 12525                |    -4 |   -0.03% |    0.03% |
| mem2reg.NumDeadAlloca                            | 10221           | 10222                |     1 |    0.01% |    0.01% |
| mem2reg.NumPHIInsert                             | 192106          | 192073               |   -33 |   -0.02% |    0.02% |
| mem2reg.NumSingleStore                           | 637643          | 637652               |     9 |    0.00% |    0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 812             | 814                  |     2 |    0.25% |    0.25% |
| scalar-evolution.NumTripCountsComputed           | 282934          | 282998               |    64 |    0.02% |    0.02% |
| scalar-evolution.NumTripCountsNotComputed        | 106718          | 106691               |   -27 |   -0.03% |    0.03% |
| simple-loop-unswitch.NumBranches                 | 4752            | 5185                 |   433 |    9.11% |    9.11% |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 503             | 925                  |   422 |   83.90% |   83.90% |
| simple-loop-unswitch.NumSwitches                 | 18              | 20                   |     2 |   11.11% |   11.11% |
| simple-loop-unswitch.NumTrivial                  | 95              | 179                  |    84 |   88.42% |   88.42% |

{F15983613} {F15983615} {F15983616}
(this is vanilla llvm testsuite + rawspeed + darktable)

As an example of the code where early LICM only is bad, see:
https://godbolt.org/z/GzEbacs4K

This does have an observable compile-time regression of +~0.5% geomean
https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions
but i think that's basically nothing, and there's potential that it might
be avoidable in the future by fixing clang to produce alignment information
on function arguments, thus making the second run unneeded.

Differential Revision: https://reviews.llvm.org/D99249
2021-04-02 11:11:42 +03:00