This also allows us to peel loops with a `select`:
```
for (int i = 0; i <= N; ++i);
f3(i == 0 ? a : b); // select instruction
```
into:
```
f3(a); // peel one iteration
for (int i = 1; i <= N; ++i)
f3(b);
```
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D151052
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.
Differential Revision: https://reviews.llvm.org/D149210
FullLoopUnroll was performing runtime unrolling in certain cases when
'#pragma unroll' was specified. Patch to fix this by introducing new parameter
to tryToUnrollLoop() to differentiate between LoopUnrollPass and
FullLoopUnrollPass. Based on the discussion here
(https://discourse.llvm.org/t/loop-unroller-fails-to-unroll-loop/69834)
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D148071
SCEV determines that loops with trip count >=2^32 have a trip multiple
of 1 to guard against huge multiples. This patch stregthens this to
instead find the greatest power of 2 divisor that is less than the
threshold.
Differential Revision: https://reviews.llvm.org/D147868
Because widenable conditions with eventually lower into a constant, such instructions
as `and`, `or` etc. will also be optimized away. Treat them as free.
This is an important thing to have if we want that guards represented as experimental.guard
calls and in their explicit form (branch by `and` with widenable condition) have the same cost
for unroller and other passes like this.
Differential Revision: https://reviews.llvm.org/D146034
Reviewed By: nikic
This intrinsic is not supposed to live through lowering, eventually it should turn
into `true` constant and be optimized away.
Differential Revision: https://reviews.llvm.org/D146027
Reviewed By: skatkov
Address the dominating condition, the urem fold is benefit from the analytics improvements.
Fix https://github.com/llvm/llvm-project/issues/60546
NOTE: delete the calls in simplifyBinaryIntrinsic and foldICmpWithDominatingICmp
is used to reduce compile time.
Reviewed By: nikic, arsenm, erikdesjardins
Differential Revision: https://reviews.llvm.org/D144248
One of these two changes is exposing (or causing) some more miscompiles.
A reproducer is in progress, so reverting until resolved.
This reverts commit 428f36401b1b695fd501ebfdc8773bed8ced8d4e.
This reverts commit 37b8f09a4b61bf9bf9d0b9017d790c8b82be2e17,
and returns commit 1bd0b82e508d049efdb07f4f8a342f35818df341.
The miscompile was in InstCombine, and it has been addressed.
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37
Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb
Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts
Differential Revision: https://reviews.llvm.org/D139275
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37
Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb
Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts
Differential Revision: https://reviews.llvm.org/D139275
Summary:
Expand the capabilities of the code for computing how many peels are
needed to make phis determined. A cast gets the peel count for the
value being casted while a binary op gets the maximum of the operands.
Respond to review comments: remove redundant asserts.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By:mkazantsev (Max Kazantsev),syzaara (Zaara Syeda)
Differential Revision: https://reviews.llvm.org/D138719
Summary:
Refactor loop peeling code by moving code for calculating phi invariance
into a separate class that does the calculation. Redescribe and rework
the algorithm in preparation for adding increased functionality. Add
test case that does not exhibit peeling that will be subsequently supported.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: mkazantsev (Max Kazantsev)
Differential Revision: https://reviews.llvm.org/D138232
This reverts commit bd7949bcd86633bd4203b2ba6f891aea00fce4d1.
Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.
Will open RFC for further discussion.
Differential Revision: https://reviews.llvm.org/D132408
When unrolling, the exit values in LCSSA phis will get updated.
Invalidate cached SCEV values for those phis in case SCEV looked through
a exit phi.
Fixes#58340.
Loop peeling currently requires that a) the latch is exiting
b) a branch and c) other exits are unreachable/deopt. This patch
removes all of these limitations, and adds the necessary branch
weight updating support. It essentially works the same way as
before with latch -> exiting terminator and
loop trip count -> per exit trip count.
It's worth noting that there are still other limitations in
profitability heuristics: This patch enables peeling of loops to
make conditions invariant (which is pretty much always highly
profitable if possible), while peeling to make loads dereferenceable
still checks that non-latch exits are unreachable and PGO-based
peeling has even more conditions. Those checks could be relaxed
later if we consider those cases profitable.
The motivation for this change is that loops using iterator adaptors
in Rust often optimize very badly, and end up with a loop phi of the
form phi(true, false) in the final result. Peeling eliminates that
phi and conditions based on it, which enables a lot of follow-on
simplification.
Differential Revision: https://reviews.llvm.org/D134803
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.
To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.
Fixes#50940.
Fixes#51669.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134606
SimplifyCFG folds
bool foo() {
if (cond1) return false;
if (cond2) return false;
return true;
}
as
bool foo() {
if (cond1 | cond2) return false
return true;
}
'cond2' is called 'bonus insts' in branch folding since they introduce overhead
since the original CFG could do early exit but the folded CFG always executes
them. SimplifyCFG calculates the costs of 'bonus insts' of a folding a BB into
its predecessor BB which shares the destination. If it is below bonus-inst-threshold,
SimplifyCFG will fold that BB into its predecessor and cond2 will always be executed.
When SimplifyCFG calculates the cost of 'bonus insts', it only consider 'bonus' insts
in the current BB to be considered for folding. This causes issue for unrolled loops
which share destinations, e.g.
bool foo(int *a) {
for (int i = 0; i < 32; i++)
if (a[i] > 0) return false;
return true;
}
After unrolling, it becomes
bool foo(int *a) {
if(a[0]>0) return false
if(a[1]>0) return false;
//...
if(a[31]>0) return false;
return true;
}
SimplifyCFG will merge each BB with its predecessor BB,
and ends up with 32 'bonus insts' which are always executed, which
is much slower than the original CFG.
The root cause is that SimplifyCFG does not consider the
accumulated cost of 'bonus insts' which are folded from
different BB's.
This patch fixes that by introducing a ValueMap to track
costs of 'bonus insts' coming from different BB's into
the same BB, and cuts off if the accumulated cost
exceeds a threshold.
Reviewed by: Artem Belevich, Florian Hahn, Nikita Popov, Matt Arsenault
Differential Revision: https://reviews.llvm.org/D132408
Summary:
The code for generating a name for loops for various reporting scenarios
created a name by serializing the loop into a string. This may result in
a very large name for a loop containing many blocks. Use the getName()
function on the loop instead.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: Whitney (Whitney Tsang), aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D133587
This updates the naming for the LAA printing pass to be in line with
most other analysis printing passes.
The old name has come up as confusing multiple times already, e.g. in
D131924.
The previous implementation translated from names like sifive-7-series
to sifive-7-rv32 or sifive-7-rv64. This also required sifive-7-rv32
and sifive-7-rv64 to be valid CPU names. As those are not real
CPUs it doesn't make sense to accept them in -mcpu.
This patch does away with the translation and adds sifive-7-series
directly to RISCV.td. Removing sifive-7-rv32 and sifive-7-rv64.
sifive-7-series is only allowed in -mtune.
I've also added "rocket" to RISCV.td but have not removed rocket-rv32
or rocket-rv64.
To prevent -mcpu=sifive-7-series or -mcpu=rocket being used with llc,
I've added a Feature32Bit to all rv32 CPUs. And made it an error to
have an rv32 triple without Feature32Bit. sifive-7-series and rocket
do not have Feature32Bit or Feature64Bit set so the user would need
to provide -mattr=+32bit or -mattr=+64bit along with the -mcpu to
avoid the error.
SiFive no longer names their newer products with 3, 5, or 7 series.
Instead we have p200 series, x200 series, p500 series, and p600 series.
Following the previous behavior would require a sifive-p500-rv32 and
sifive-p500-rv64 in order to support -mtune=sifive-p500-series. There
is currently no p500 product, but it could start getting confusing if
there was in the future.
I'm open to hearing alternatives for how to achieve my main goal
of removing sifive-7-rv32/rv64 as a CPU name.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D131708
This reverts commit 354fa0b48008eca701a110badd6974bf449df257.
Returning as is. The patch was reverted due to a miscompile, but
this patch is not causing it. This patch made it possible to infer
some nuw flags in code guarded by `false` condition, and then someone
else to managed to propagate the flag from dead code outside.
Returning the patch to be able to reproduce the issue.
This reverts commit 34ae308c73e4d76dbdab25a6206d3fbc5ebafdf5.
Our internal testing found a miscompile. Not sure if it's caused by
this patch or it revealed something else. Reverting while investigating.