27 Commits

Author SHA1 Message Date
Yingwei Zheng
6681650025
[InstCombine] Revert the signed icmp -> unsigned icmp canonicalization when folding icmp Pred min|max(X, Y), Z (#76685)
This patch tries to flip the signedness of predicates when folding an
unsigned icmp with a signed min/max. It will enable more optimizations
as we canonicalizes a signed icmp into an unsigned icmp when both
operands are known to have the same sign.
Fixes #76672.

Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=949ec83eaf6fa6dbffb94c2ea9c0a4d5efdbd239&to=2deca1aea8a4e13609bab72c522a97d424f0fc2d&stat=instructions:u


|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|-0.00%|+0.01%|+0.05%|-0.12%|-0.01%|-0.03%|-0.00%|

NOTE: We can flip the signedness of predicate if both operands are
negative. But I don't see the benefit of handling these cases.
2024-01-05 14:39:16 +08:00
David Sherwood
49b0e6dcc2
[LoopVectorize] Enable hoisting of runtime checks by default (#71538)
With commit https://reviews.llvm.org/D152366 I introduced functionality
that permitted the hoisting of runtime memory checks from a vectorised
inner loop to the preheader of the next outer-most loop. This is useful
for benchmarks like SPEC2017's x264 where the inner loop is vectorised
and only has a small trip count. In such cases the runtime memory checks
become expensive and since the checks never fail in the case of x264 it
makes sense to do this. However, this behaviour was controlled by the
flag -hoist-runtime-checks which was off by default.

This patch enables this flag by default for all targets, since I believe
this is a generally beneficial thing to do. I have tested this with
SPEC2017 and I see 2.3% and 2.6% improvements with x264 on neoverse-v1
and neoverse-n1, respectively. Similarly, I saw slight improvements in
the overall geomean on both machines. The only other notable changes
were a 1% drop in the roms benchmark, which was compensated for by a 1%
improvement in fotonik3d.
2023-12-18 09:41:54 +00:00
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
Nikita Popov
745cfa3449 [InstCombine] Compute known bits for multi-use add/sub
We were failing to set the known bits for add/sub in the multi-use
case, resulting in odd behavioral differences depending on the
number of uses. Noticed while adding a consistency assertion.

The test changes are essentially a revert to the state before
d6498ab. These changes are not really desirable, but if we don't
want them, that needs to be handled as part of the heuristic for
demanded constant shrinking, not by artifically suppressing the
known bits in one specific case.
2023-05-17 17:50:00 +02:00
Florian Hahn
68469a80cb
[LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
Nikita Popov
88419a30a0 [LICM] Allow load-only scalar promotion in the presence of aliasing loads
During scalar promotion, if there are additional potentially-aliasing
loads outside the promoted set, we can still perform a load-only
promotion. As the stores are retained, any potentially-aliasing
loads will still read the correct value.

This increases the number of load promotions in llvm-test-suite by
a factor of two:

                                |  Old |  New
    licm.NumPromotionCandidates | 4448 | 6038
    licm.NumLoadPromoted        |  479 | 1069
    licm.NumLoadStorePromoted   | 1459 | 1459

Unfortunately, this does have some impact on compile-time:
http://llvm-compile-time-tracker.com/compare.php?from=57f7f0d6cf0706a88e1ecb74f3d3e8891cceabfa&to=72b811738148aab399966a0435f13b695da1c1c8&stat=instructions
In part this is because we now have less early bailouts from
promotion, but also due to second order effects (e.g. for one case
I looked at we spend more time in SLP now).

Differential Revision: https://reviews.llvm.org/D133192
2022-12-20 10:02:46 +01:00
Nikita Popov
5b40015063 [LoopVectorize] Convert some tests to opaque pointers (NFC)
For these tests update_test_checks.py had to be rerun.
2022-12-14 15:27:31 +01:00
Roman Lebedev
be51fa4580
[NFC] Port all runlines for LoopVectorize pass tests to -passes syntax 2022-12-05 22:17:30 +03:00
Arthur Eubanks
d3d8465446 [opt] Stop treating alias analysis specially when translating legacy opt syntax
I've attempted to keep AA tests as close to their original intent as possible.
2022-10-07 11:50:43 -07:00
Philip Reames
e6ad9ef4e7 [instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.

I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.

We chose the canonical type as i64 arbitrarily.  We might consider changing this to something else in the future if we have cause.

Differential Revision: https://reviews.llvm.org/D115387
2021-12-13 16:56:22 -08:00
Philip Reames
1a18de3d0a Autogen a bunch of instcombine and vectorizer tests
Done in advance of D115387.  These are all the ones which my local script could handle, there's a couple more which need manual updates.
2021-12-13 10:41:38 -08:00
Mindong Chen
f3814ed3e9 [LV] Re-generate check lines of some fragile tests (NFC)
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D105438
2021-07-19 19:38:24 +08:00
Florian Hahn
23c2f2e6b2
[LV] Mark increment of main vector loop induction variable as NUW.
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.

If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.

This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.

At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.

Note that this could probably be further improved by using information
from the original IV.

Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g

Part of a set of fixes required for PR50412.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D103255
2021-06-07 10:47:52 +01:00
Roman Lebedev
b46c085d2b
[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.

This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.

Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.

Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
2021-03-06 21:52:46 +03:00
Sanjay Patel
79b1b4a581 [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics
The vector reduction intrinsics started life as experimental ops, so backend support
was lacking. As part of promoting them to 1st-class intrinsics, however, codegen
support was added/improved:
D58015
D90247

So I think it is safe to now remove this complication from IR.

Note that we still have an IR-level codegen expansion pass for these as discussed
in D95690. Removing that is another step in simplifying the logic. Also note that
x86 was already unconditionally forming reductions in IR, so there should be no
difference for x86.

I spot checked a couple of the tests here by running them through opt+llc and did
not see any asm diffs.

If we do find functional differences for other targets, it should be possible
to (at least temporarily) restore the shuffle IR with the ExpandReductions IR
pass.

Differential Revision: https://reviews.llvm.org/D96552
2021-02-12 08:13:50 -05:00
Juneyoung Lee
9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Juneyoung Lee
278aa65cc4 [IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder
This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93793
2020-12-30 04:21:04 +09:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Martin Storsjo
bfd1d27585 Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix"
This reverts commits r347776 and r347778.

The first one, r347776, caused significant compile time regressions
for certain input files, see PR39836 for details.

llvm-svn: 347867
2018-11-29 14:39:39 +00:00
John Brawn
4557ffeb63 [LICM] Enable control flow hoisting by default
Differential Revision: https://reviews.llvm.org/D54949

llvm-svn: 347778
2018-11-28 17:23:03 +00:00
Benjamin Kramer
2cad359c91 Revert "[LICM] Make LICM able to hoist phis"
This reverts commit r347190.

llvm-svn: 347225
2018-11-19 16:51:57 +00:00
Anna Thomas
5e9215f02b [LV] Avoid vectorizing unsafe dependencies in uniform address
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.

Fixes PR39653.

Reviewers: Ayal, efriedma

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D54538

llvm-svn: 347220
2018-11-19 15:39:59 +00:00
John Brawn
12c046fba0 [LICM] Make LICM able to hoist phis
The general approach taken is to make note of loop invariant branches, then when
we see something conditional on that branch, such as a phi, we create a copy of
the branch and (empty versions of) its successors and hoist using that.

This has no impact by itself that I've been able to see, as LICM typically
doesn't see such phis as they will have been converted into selects by the time
LICM is run, but once we start doing phi-to-select conversion later it will be
important.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347190
2018-11-19 11:31:24 +00:00
Sanjay Patel
747feb28e4 [InstCombine] use 'match' to handle vectors and simplify code
This is another step towards completely removing the fake 
binop queries for not/neg/fneg.

llvm-svn: 345036
2018-10-23 15:05:12 +00:00
Anna Thomas
6f732bfb79 [LV] Teach vectorizer about variant value store into uniform address
Summary:
Teach vectorizer about vectorizing variant value stores to uniform
address. Similar to rL343028, we do not allow vectorization if we have
multiple stores to the same uniform address.

Cost model already has the change for considering the extract
instruction cost for a variant value store. See added test cases for how
vectorization is done.
The patch also contains changes to the ORE messages.

Reviewers: Ayal, mkuper, anemet, hsaito

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D52656

llvm-svn: 344613
2018-10-16 15:46:26 +00:00
Anna Thomas
b1e3d45318 [LV][LAA] Vectorize loop invariant values stored into loop invariant address
Summary:
We are overly conservative in loop vectorizer with respect to stores to loop
invariant addresses.
More details in https://bugs.llvm.org/show_bug.cgi?id=38546
This is the first part of the fix where we start with vectorizing loop invariant
values to loop invariant addresses.

This also includes changes to ORE for stores to invariant address.

Reviewers: anemet, Ayal, mkuper, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50665

llvm-svn: 343028
2018-09-25 20:57:20 +00:00