554 Commits

Author SHA1 Message Date
Nikita Popov
b9808e5660 [LoopUnroll] Fold add chains during unrolling
Loop unrolling tends to produce chains of
`%x1 = add %x0, 1; %x2 = add %x1, 1; ...` with one add per unrolled
iteration. This patch simplifies these adds to `%xN = add %x0, N`
directly during unrolling, rather than waiting for InstCombine to do so.

The motivation for this is that having a single add (rather than
an add chain) on the induction variable makes it a simple recurrence,
which we specially recognize in a number of places. This allows
InstCombine to directly perform folds with that knowledge, instead
of first folding the add chains, and then doing other folds in another
InstCombine iteration.

Due to the reduced number of InstCombine iterations, this also
results in a small compile-time improvement.

Differential Revision: https://reviews.llvm.org/D153540
2023-07-05 09:54:28 +02:00
Nikita Popov
7905c48a84 [LoopUnroll] Add test for early add folding (NFC)
Test for D153540, with adds that have different overflow flags.
2023-07-05 09:46:41 +02:00
Nikita Popov
d179421099 [LoopUnroll] Avoid undef indices in test (NFC)
Doesn't really matter for the larger purpose of the test, but
avoid the use of undef indices and instead use the loop induction
variable as index, which is what was likely intended here.
2023-06-22 16:56:08 +02:00
Nikita Popov
4c748821cd [LoopUnroll] Regenerate test checks (NFC) 2023-06-22 13:00:02 +02:00
Yevgeny Rouban
1ebbbf1614 [LoopUnrollRuntime] Allow indirect transition to deopt non-latch exit blocks
Relax condition on runtime trip count unrolling loops with 1 non-latch exit
that leads to a deop block.

There are cases when the deopt blocks are common exits for different loops.
LoopSimplify pass splits such edges to the common deopting blocks to make
sure that all exit nodes of the loop only have predecessors that are inside
of the loop (See simplifyOneLoop()). This breaks the current condition for
unrolling. This patch allows the split transitive blocks that still lead to
the deopting blocks.

Differential Revision: https://reviews.llvm.org/D152639
2023-06-19 11:10:01 +07:00
Nikita Popov
79115aebb7 [LoopUnroll] Add test for SCEV invalidation issue (NFC)
Test for the issue reported at https://reviews.llvm.org/D149331#4387931.
2023-06-05 17:28:32 +02:00
Joshua Cao
849d01bf3d [LoopUnroll] Peel iterations based on select conditions
This also allows us to peel loops with a `select`:
```
for (int i = 0; i <= N; ++i);
  f3(i == 0 ? a : b); // select instruction
```
into:
```
f3(a); // peel one iteration
for (int i = 1; i <= N; ++i)
  f3(b);
```

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D151052
2023-05-24 00:57:57 -07:00
Joshua Cao
b76f0a0b15 [LoopUnroll] Add tests for peeling iterations based on select, and, or
conditions
2023-05-24 00:57:57 -07:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Yashwant Singh
aea2a14736 [LoopUnroll] Prevent LoopFullUnrollPass to perform partial/runtime unrolling
FullLoopUnroll was performing runtime unrolling in certain cases when
'#pragma unroll' was specified. Patch to fix this by introducing new parameter
to tryToUnrollLoop() to differentiate between LoopUnrollPass and
FullLoopUnrollPass. Based on the discussion here
(https://discourse.llvm.org/t/loop-unroller-fails-to-unroll-loop/69834)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D148071
2023-04-13 10:21:24 +05:30
Joshua Cao
898a9ca5e9 [SCEV] Strengthen huge constant trip multiples.
SCEV determines that loops with trip count >=2^32 have a trip multiple
of 1 to guard against huge multiples. This patch stregthens this to
instead find the greatest power of 2 divisor that is less than the
threshold.

Differential Revision: https://reviews.llvm.org/D147868
2023-04-10 20:00:46 -07:00
Joshua Cao
60470e163a [LoopUnroll] Add script test checks for LoopUnroll/X86/mmx.ll 2023-04-10 19:59:01 -07:00
Max Kazantsev
d12af65d46 [TTI] Treat AND/OR with widenable conditions as free of cost
Because widenable conditions with eventually lower into a constant, such instructions
as `and`, `or` etc. will also be optimized away. Treat them as free.

This is an important thing to have if we want that guards represented as experimental.guard
calls and in their explicit form (branch by `and` with widenable condition) have the same cost
for unroller and other passes like this.

Differential Revision: https://reviews.llvm.org/D146034
Reviewed By: nikic
2023-03-14 20:55:17 +07:00
Max Kazantsev
b0ea210b35 [TTI] Evaluate cost of experimental_widenable_condition as zero
This intrinsic is not supposed to live through lowering, eventually it should turn
into `true` constant and be optimized away.

Differential Revision: https://reviews.llvm.org/D146027
Reviewed By: skatkov
2023-03-14 17:11:07 +07:00
Max Kazantsev
77308dd400 [Test] Add missing REQUIRES: asserts in test 2023-03-14 16:28:23 +07:00
Max Kazantsev
a7bbeba74a [Test] Add test showing difference in cost models for guards 2023-03-14 15:50:45 +07:00
Florian Hahn
7019624ee1
[SCEV] Strengthen nowrap flags via ranges for ARs on construction.
At the moment, proveNoWrapViaConstantRanges is only used when creating
SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by
strengthening flags after creating the AddRec.

I'll also share a follow-up patch that removes the code to strengthen
flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while
creating those can lead to surprising changes.

Compile-time looks neutral:
https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u

Reviewed By: mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D144050
2023-03-07 17:10:34 +01:00
Zhongyunde
15d5c59280 [InstCombine] Improvement the analytics through the dominating condition
Address the dominating condition, the urem fold is benefit from the analytics improvements.
Fix https://github.com/llvm/llvm-project/issues/60546

NOTE: delete the calls in simplifyBinaryIntrinsic and foldICmpWithDominatingICmp
is used to reduce compile time.

Reviewed By: nikic, arsenm, erikdesjardins
Differential Revision: https://reviews.llvm.org/D144248
2023-03-01 17:03:34 +08:00
Mircea Trofin
8cf1524cbc [loop unroll] Fix branch-weights for unrolled loop.
The branch weights of the unrolled loop need to be reduced by the
unroll factor.

Differential Revision: https://reviews.llvm.org/D143948
2023-02-14 12:00:53 -08:00
Florian Hahn
f92b35392e
[LoopUnroll] Add test case exposing crash with d0907ce7ed9f. 2023-01-20 16:08:25 +00:00
Florian Hahn
080a07199e
[LoopUnroll] Add additional DT verification test for D141487. 2023-01-11 17:26:13 +00:00
Nikita Popov
b7b0ce6e76 [LoopUnroll] Convert test to opaque pointers (NFC) 2023-01-06 11:48:03 +01:00
Nikita Popov
9c7afbacd8 [LoopUnroll] Name instructions in test (NFC) 2023-01-06 11:48:03 +01:00
Nikita Popov
ef992b6079 [LoopUnroll] Convert some tests to opaque pointers (NFC) 2022-12-23 16:35:26 +01:00
Roman Lebedev
3a8e009f97
Revert "Reland "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block""
One of these two changes is exposing (or causing) some more miscompiles.
A reproducer is in progress, so reverting until resolved.

This reverts commit 428f36401b1b695fd501ebfdc8773bed8ced8d4e.
2022-12-20 18:36:42 +03:00
Roman Lebedev
428f36401b
Reland "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block"
This reverts commit 37b8f09a4b61bf9bf9d0b9017d790c8b82be2e17,
and returns commit 1bd0b82e508d049efdb07f4f8a342f35818df341.
The miscompile was in InstCombine, and it has been addressed.

This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37

Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb

Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts

Differential Revision: https://reviews.llvm.org/D139275
2022-12-17 05:18:54 +03:00
Alexander Kornienko
37b8f09a4b Revert "[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block"
This reverts commit 1bd0b82e508d049efdb07f4f8a342f35818df341, since it leads to
miscompiles. See https://reviews.llvm.org/D139275#3993229 and
https://reviews.llvm.org/D139275#4001580.
2022-12-16 17:23:35 +01:00
Roman Lebedev
da80639ee2
[NFC][IndVar] Autogenerate checklines in one test 2022-12-14 17:39:10 +03:00
Roman Lebedev
a64b2e9e3e
[NFC][SCEV][LoopUnroll] Add tests where treating or as add raises expansion cost
From https://reviews.llvm.org/rG46db90cc71d1#1154128
2022-12-12 20:41:56 +03:00
Roman Lebedev
1bd0b82e50
[SimplifyCFG] FoldBranchToCommonDest(): deal with mismatched IV's in PHI's in common successor block
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37

Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb

Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts

Differential Revision: https://reviews.llvm.org/D139275
2022-12-12 18:20:03 +03:00
Bjorn Pettersson
3528e63d89 [test] Remove duplicate RUN lines in Transform tests 2022-12-08 11:47:16 +01:00
Roman Lebedev
86faf2cd88
[NFC] Port all LoopUnroll tests to -passes= syntax 2022-12-08 02:38:47 +03:00
Roman Lebedev
5103ef64fe
[NFC] Port all (but one) LoopUnroll tests to -passes= syntax 2022-12-07 20:15:43 +03:00
Jamie Schmeiser
2b6683fd5f Expand loop peeling phi computation to handle binary ops and casts
Summary:
Expand the capabilities of the code for computing how many peels are
needed to make phis determined.  A cast gets the peel count for the
value being casted while a binary op gets the maximum of the operands.

Respond to review comments: remove redundant asserts.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By:mkazantsev (Max Kazantsev),syzaara (Zaara Syeda)
Differential Revision: https://reviews.llvm.org/D138719
2022-12-05 12:10:53 -05:00
Roman Lebedev
b79921a4a8
[NFC] Re-autogenerate checklines in a few tests being affected 2022-12-04 20:58:55 +03:00
Jamie Schmeiser
be1ff1fe58 [NFC] Refactor loop peeling code for calculating phi invariance.
Summary:
Refactor loop peeling code by moving code for calculating phi invariance
into a separate class that does the calculation.  Redescribe and rework
the algorithm in preparation for adding increased functionality.  Add
test case that does not exhibit peeling that will be subsequently supported.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: mkazantsev (Max Kazantsev)
Differential Revision: https://reviews.llvm.org/D138232
2022-11-25 09:07:14 -05:00
Alina Sbirlea
d1b19da854 [LoopPeeling] Add flag to disable support for peeling loops with non-latch exits
Add a flag to allow disabling the changes in
https://reviews.llvm.org/D134803.

Differential Revision: https://reviews.llvm.org/D136643
2022-10-25 12:19:14 -07:00
Yaxun (Sam) Liu
9d5adc7e49 Revert "reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost"
This reverts commit bd7949bcd86633bd4203b2ba6f891aea00fce4d1.

Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.

Will open RFC for further discussion.

Differential Revision: https://reviews.llvm.org/D132408
2022-10-25 12:15:39 -04:00
Yaxun (Sam) Liu
bd7949bcd8 reland e5581df60a35 [SimplifyCFG] accumulate bonus insts cost
Fixed compile time increase due to always constructing LocalCostTracker.
Now only construct LocalCostTracker when needed.
2022-10-24 15:43:53 -04:00
Florian Hahn
e302fa89aa
[LoopUnroll] Forget exit values when making changes.
When unrolling, the exit values in LCSSA phis will get updated.
Invalidate cached SCEV values for those phis in case SCEV looked through
a exit phi.

Fixes #58340.
2022-10-18 15:12:24 +01:00
Florian Hahn
b0ded70ebf
[LoopUnroll] Add test for mis-compile due to missing SCEV invalidation.
Test for #58340.
2022-10-18 14:56:44 +01:00
Arthur Eubanks
f3a928e233 [opt] Don't translate legacy -analysis flag to require<analysis>
Tests relying on this should explicitly use -passes='require<analysis>,foo'.
2022-10-07 14:54:34 -07:00
Florian Hahn
ec86e9a99b
[LoopUnroll] Add test for crash exposed by 9e931439. 2022-10-07 20:02:58 +01:00
Nikita Popov
b43a4d0850 [LoopPeeling] Support peeling loops with non-latch exits
Loop peeling currently requires that a) the latch is exiting
b) a branch and c) other exits are unreachable/deopt. This patch
removes all of these limitations, and adds the necessary branch
weight updating support. It essentially works the same way as
before with latch -> exiting terminator and
loop trip count -> per exit trip count.

It's worth noting that there are still other limitations in
profitability heuristics: This patch enables peeling of loops to
make conditions invariant (which is pretty much always highly
profitable if possible), while peeling to make loads dereferenceable
still checks that non-latch exits are unreachable and PGO-based
peeling has even more conditions. Those checks could be relaxed
later if we consider those cases profitable.

The motivation for this change is that loops using iterator adaptors
in Rust often optimize very badly, and end up with a loop phi of the
form phi(true, false) in the final result. Peeling eliminates that
phi and conditions based on it, which enables a lot of follow-on
simplification.

Differential Revision: https://reviews.llvm.org/D134803
2022-10-07 12:35:52 +02:00
Florian Hahn
7c0ff64b0f
[LAA] Change to function analysis for new PM.
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.

To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.

Fixes #50940.

Fixes #51669.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134606
2022-10-01 15:44:27 +01:00
Florian Hahn
27330882a1
[LoopUnroll] Add cache verification failure test case.
Test case for D134612.
2022-09-26 14:25:37 +01:00
Simon Pilgrim
09cb9fdef9 [InstCombine] Fold ult(add(x,-1),c) -> ule(x,c) iff x != 0 (PR57635)
Alive2: https://alive2.llvm.org/ce/z/sZ6wwS

As detailed on Issue #57635 and #37628 - for unsigned comparisons, we can compare prior to a decrement iff the value is known never to be zero.

Differential Revision: https://reviews.llvm.org/D134172
2022-09-20 16:44:41 +01:00
Nikita Popov
dd61726d5b Revert "[SimplifyCFG] accumulate bonus insts cost"
This reverts commit e5581df60a35fffb0c69589777e4e126c849405f.

This causes major compile-time regressions, about 2-3% end-to-end
on CTMark.
2022-09-19 14:46:43 +02:00
Yaxun (Sam) Liu
e5581df60a [SimplifyCFG] accumulate bonus insts cost
SimplifyCFG folds

bool foo() {
  if (cond1) return false;
  if (cond2) return false;
  return true;
}

as

bool foo() {
  if (cond1 | cond2) return false
  return true;
}

'cond2' is called 'bonus insts' in branch folding since they introduce overhead
since the original CFG could do early exit but the folded CFG always executes
them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into
its predecessor BB which shares the destination. If it is below bonus-inst-threshold,
SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed.

When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts
in the current BB to be considered for folding. This causes issue for unrolled loops
which share destinations, e.g.

bool foo(int *a) {
  for (int i = 0; i < 32; i++)
    if (a[i] > 0) return false;
  return true;
}

After unrolling, it becomes

bool foo(int *a) {
  if(a[0]>0) return false
  if(a[1]>0) return false;
  //...
  if(a[31]>0) return false;
  return true;
}

SimplifyCFG will merge each BB with its predecessor BB,
and ends up with 32 'bonus insts' which are always executed, which
is much slower than the original CFG.

The root cause is that SimplifyCFG does not consider the
accumulated cost of 'bonus insts' which are folded from
different BB's.

This patch fixes that by introducing a ValueMap to track
costs of 'bonus insts' coming from different BB's into
the same BB, and cuts off if the accumulated cost
exceeds a threshold.

Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D132408
2022-09-18 20:21:14 -04:00