141 Commits

Author SHA1 Message Date
Muhammad Omair Javaid
a96638e50e Revert "[NFCI] Regenerate PhaseOrdering test checks"
This reverts commit e91fe08999d5f5d7e7777837c529bac692d06c1b.

Breaks following buildbots: https://lab.llvm.org/buildbot/#/builders/171
2022-04-04 15:30:57 +05:00
Dávid Bolvanský
e91fe08999 [NFCI] Regenerate PhaseOrdering test checks 2022-04-04 00:28:57 +02:00
Simon Pilgrim
5dde9c1286 [CostModel][X86] Reduce cost of extracting bool vector elements
For constant indices, these are now just a MOVMSK+TEST/BT
2022-03-18 19:02:47 +00:00
Nikita Popov
a266af7211 [InstCombine] Canonicalize SPF to min/max intrinsics
Now that integer min/max intrinsics have good support in both
InstCombine and other passes, start canonicalizing SPF min/max
to intrinsic min/max.

Once this sticks, we can stop matching SPF min/max in various
places, and can remove hacks we have for preventing infinite loops
and breaking of SPF canonicalization.

Differential Revision: https://reviews.llvm.org/D98152
2022-02-24 09:01:20 +01:00
William S. Moses
d9da6a535f [LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate
LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information.

This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D119965
2022-02-17 20:13:07 -05:00
Roman Lebedev
07cf95942f
[NFC][PhaseOrdering] Improve test coverage for D119975 2022-02-17 14:58:22 +03:00
Roman Lebedev
a5b9987aab
[NFC][PhaseOrdering] spurious-peeling.ll: also test -O1/-O2 results 2022-02-17 00:18:53 +03:00
William S. Moses
73ee82871e
[NFC][PhaseOrdering] Precommit tests from D119965 2022-02-17 00:18:53 +03:00
Roman Lebedev
ee4ba9f3a1
Revert "[SimplifyCFG] Start redesigning FoldTwoEntryPHINode()."
Unfortunately, it seems we really do need to take the long route;
start from the "merge" block, find (all the) "dispatch" blocks,
and deal with each "dispatch" block separately, instead of simply
starting from each "dispatch" block like it would logically make sense,
otherwise we run into a number of other missing folds around
`switch` formation, missing sinking/hoisting and phase ordering.

This reverts commit 85628ce75b3084dc0f185a320152baf85b59aba7.
This reverts commit c5fff9095342a792bf4b9a077fe3c3a83c4e566c.
This reverts commit 34a98e1046e3aa55e5f26ab20a15e96b4034d25a.
This reverts commit 1e353f092288309d74d380367aa50bbd383780ed.
2022-02-03 12:32:50 +03:00
Alexey Bataev
8a1dfbc4d8 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit 842a2360a84692f2e4c37cc3e652640e6627d004 to fix the
bugs reported by users in https://reviews.llvm.org/D115955#3291538.
2022-02-02 12:06:36 -08:00
Alexey Bataev
842a2360a8 [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-02 10:32:52 -08:00
Roman Lebedev
1e353f0922
[SimplifyCFG] Start redesigning FoldTwoEntryPHINode().
The current `FoldTwoEntryPHINode()` is not quite designed correctly.
It starts from the merge point, and then tries to detect
the 'divergence' point.

Because of that, it is limited to the simple two-predecessor case,
where the PHI completely goes away. but that is rather pessimistic,
and it doesn't make much sense from the costmodel side of things.

For example if there is some other unrelated predecessor of
the merge point,  we could split the merge point so that
the then/else blocks first branch to an empty block
and then to the merge point, and then we'd be able to speculate
the then/else code.

But if we'd instead simply start at the divergence point,
and look for the merge point, then we'll just natively support this case.

There's also the fact that `SpeculativelyExecuteBB()` already does
just that, but only if there is a single block to speculate,
and with a much more restrictive cost model.
But that also means we have code duplication.

Now, sadly, while this is as much NFCI as possible,
there is just no way to cleanly migrate to
the proper implementation. The results *are* going to be different
somewhat because of various phase ordering effects and SimplifyCFG
block iteration strategy.
2022-02-02 17:53:56 +03:00
Benjamin Kramer
0c3d22a592 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit 83620bd2ad867f706c699d0f2b8be10e43d9f3d7.

It's causing miscompilations, see review comments at
https://reviews.llvm.org/D115955
2022-02-02 13:08:51 +01:00
Alexey Bataev
83620bd2ad [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-01 09:54:20 -08:00
Benjamin Kramer
5281f0dab2 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit afaaecc88c6e5989de8a6a0266610860ef99d9d6.

Crashes when compiling SciPy, test case https://reviews.llvm.org/P8276
2022-02-01 11:40:43 +01:00
Alexey Bataev
afaaecc88c [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-01-31 11:11:25 -08:00
Sanjay Patel
892c731681 [Support] improve known bits analysis for leading zeros of multiply
Instead of summing leading zeros on the input operands, multiply the
max possible values of those inputs and count the leading zeros of
the result. This can give us an extra zero bit (typically in cases
where one of the operands is a known constant).

This allows folding away the remaining 'add' ops in the motivating
bug (modeled in the PhaseOrdering IR test):
https://github.com/llvm/llvm-project/issues/48399

Fixes #48399

Differential Revision: https://reviews.llvm.org/D115969
2021-12-20 09:10:50 -05:00
Philip Reames
e6ad9ef4e7 [instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.

I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.

We chose the canonical type as i64 arbitrarily.  We might consider changing this to something else in the future if we have cause.

Differential Revision: https://reviews.llvm.org/D115387
2021-12-13 16:56:22 -08:00
Nikita Popov
ae7f468073 [NewPM] Fix MergeFunctions scheduling
MergeFunctions (as well as HotColdSplitting an IROutliner) are
incorrectly scheduled under the new pass manager. The code makes
it look like they run towards the end of the module optimization
pipeline (as they should), while in reality the run at the start.
This is because the OptimizePM populated around them is only
scheduled later.

I'm fixing this by moving these three passes until after OptimizePM
to avoid splitting the function pass pipeline. It doesn't seem
important to me that some of the function passes run after these
late module passes.

Differential Revision: https://reviews.llvm.org/D115098
2021-12-04 17:30:30 +01:00
Anton Afanasyev
c34d157fc7 [Passes] Move AggressiveInstCombine after InstCombine
Swap AIC and IC neighbouring in pipeline. This looks more natural and even
almost has no effect for now (three slightly touched tests of test-suite). Also
this could be the first step towards merging AIC (or its part) to -O2 pipeline.

After several changes in AIC (like D108091, D108201, D107766, D109515, D109236)
there've been observed several regressions (like PR52078, PR52253, PR52289)
that were fixed in different passes (see D111330, D112721) by extending their
functionality, but these regressions were exposed since changed AIC prevents IC
from making some of early optimizations.

This is common problem and it should be fixed by just moving AIC after IC
which looks more logically by itself: make aggressive instruction combining
only after failed ordinary one.

Fixes PR52289

Reviewed By: spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D113179
2021-12-04 14:22:43 +03:00
Nikita Popov
5b94037a30 [PhaseOrdering] Add test for incorrect merge function scheduling
Add an -enable-merge-functions option to allow testing of function
merging as it will actually happen in the optimization pipeline.
Based on that add a test where we currently produce two identical
functions without merging them due to incorrect pass scheduling
under the new pass manager.
2021-12-04 10:12:04 +01:00
Anton Afanasyev
728736e77e [Test][PhaseOrdering] Precommit test for PR52289 2021-12-04 10:58:16 +03:00
Alexey Bataev
ddce6e0561 [SLP]Improve vectorization of cmp instructions sequences.
Final attempt to vectorize bundles of comptatible cmp instructions after
all other instructions processing.

Metric: SLP.NumVectorInstructions

Program                                                                             results results0 diff
        test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    1.00    5.00  400.0%
                              test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test    8.00   11.00   37.5%
                    test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test   20.00   26.00   30.0%
                test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1344.00 1648.00   22.6%
               test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1344.00 1648.00   22.6%
                              test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test  102.00  124.00   21.6%
                test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test  118.00  133.00   12.7%
          test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3233.00 3554.00    9.9%
           test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3233.00 3554.00    9.9%
                        test-suite :: MultiSource/Benchmarks/Olden/power/power.test   64.00   70.00    9.4%
           test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 7879.00 8604.00    9.2%
           test-suite :: MultiSource/Benchmarks/Prolangs-C/simulator/simulator.test   50.00   54.00    8.0%
                        test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   27.00   29.00    7.4%
             test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 8345.00 8955.00    7.3%
     test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test  694.00  738.00    6.3%
                        test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test  361.00  382.00    5.8%
                      test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  409.00  430.00    5.1%
     test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  140.00  147.00    5.0%
      test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  140.00  147.00    5.0%
             test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4013.00 4206.00    4.8%
                       test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  966.00 1011.00    4.7%
                           test-suite :: SingleSource/Benchmarks/Misc/oourafft.test   65.00   68.00    4.6%
                            test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4219.00 4381.00    3.8%
                    test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1911.00 1973.00    3.2%
      test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test   62.00   64.00    3.2%
     test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test   62.00   64.00    3.2%
                 test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  852.00  877.00    2.9%
                  test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  852.00  877.00    2.9%
                       test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1624.00 1668.00    2.7%
                         test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test   39.00   40.00    2.6%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test  613.00  624.00    1.8%
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  378.00  383.00    1.3%
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test  293.00  295.00    0.7%
            test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test  297.00  299.00    0.7%
      test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5522.00 5534.00    0.2%
     test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5522.00 5534.00    0.2%

Differential Revision: https://reviews.llvm.org/D114799
2021-12-01 07:26:29 -08:00
Philip Reames
37ead201e6 [runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing.

Why does this matter?  A couple of reasons:
* SCEV doesn't have a native subtract node.  Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such.  As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones.  (You can see this in the inferred flags in some of the test cases.)
* Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language.  We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced.  (You can see this looking at nearby phis in the test cases.)

Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen.

* Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value.  We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.
2021-11-12 15:44:58 -08:00
Dmitry Makogon
62f86d4f95 Reapply 5ec2386 "Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs""
This reverts commit 7cd273c339cfe8427404f881ae280bd9fae6ff78.

Several patches with tests fixes have been applied:
0cada82f0a30e5ae22dce66b58604ab9b47a3897 "[Test] Remove incorrect test in GVN"
97cb13615d6d9df254e3c0f3deef9eaedfe189b6 "[Test] Separate IndVars test into AArch64 and X86 parts"
985cc490f17d28b20392ee214895d947b85120ef "[Test] Remove separated test in IndVars",
and test failures caused by 5ec2386 should be resolved now.
2021-11-10 17:36:14 +07:00
Douglas Yung
7cd273c339 Revert "Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs""
This reverts commit 5ec23863320ca12bfabb6dcff1d0425cb614b7a5.

This change is causing test failures on the PS4 linux build bot: https://lab.llvm.org/buildbot/#/builders/139/builds/12871
2021-11-09 10:28:41 -08:00
Sanjay Patel
c36b7e21bd [InstCombine] enhance vector bitwise select matching
(Cond & C) | (~bitcast(Cond) & D) --> bitcast (select Cond, (bc C), (bc D))

This is part of fixing:
https://llvm.org/PR34047

That report shows a case where a bitcast is sitting between the select condition
candidate and its 'not' value due to current cast canonicalization rules.

There's a bitcast type restriction that might be violated in existing matching,
but I still need to investigate if that is possible -
Alive2 shows we can only do this transform safely when the bitcast is from
narrow to wide vector elements (otherwise poison could leak into elements
that were safe in the original code):
https://alive2.llvm.org/ce/z/Hf66qh

Differential Revision: https://reviews.llvm.org/D113035
2021-11-09 08:54:59 -05:00
Dmitry Makogon
5ec2386332 Reapply db28934 "[IndVars] Pass TTI to replaceCongruentIVs"
This reapplies patch db289340c841990055a164e8eb2a3b5ff25677bf.

The test failures on build with expensive checks caused by the patch happened due
to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort,
which shuffles the given container if the expensive checks are enabled,
so equivalent Phis in the sorted vector had different mutual order from run
to run. replaceCongruentIVs tries to replace narrow Phis with truncations
of wide ones. In some test cases there were several Phis with the same
width, so if their order differs from run to run, the narrow Phis would
be replaced with a different Phi, depending on the shuffling result.

The patch ae14fae0ff4304022beda5ab484f84ac0fdda807 fixed this issue by
replacing llvm::sort with llvm::stable_sort.
2021-11-09 17:42:29 +07:00
Anton Afanasyev
ce4fa93db8 [SCCP] Tune cast instruction handling for overdefined operand
Extended value is known to be inside range smaller than full one.
Prevent SCCP to mark such value as overdefined.

Fixes PR52253

Differential Revision: https://reviews.llvm.org/D112721
2021-11-08 18:34:30 +03:00
Anton Afanasyev
fba1f36d13 [Test][SCCP] Precommit tests for PR52253 2021-11-08 16:59:38 +03:00
Dmitry Makogon
8d4eba6c0d Revert "[IndVars] Pass TTI to replaceCongruentIVs"
This reverts commit db289340c841990055a164e8eb2a3b5ff25677bf.

The patch caused 2 crashes with expensive checks enabled.
2021-11-08 19:35:14 +07:00
Dmitry Makogon
db289340c8 [IndVars] Pass TTI to replaceCongruentIVs
In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'.
This function optionally takes a TTI argument to be able to replace narrow IVs uses
with truncates of the widest one.
For some reason the TTI wasn't passed to the function, so it couldn't perform such
transform.
This patch fixes it.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D113024
2021-11-08 19:20:53 +07:00
Sanjay Patel
ff30394de8 [PhaseOrdering] add tests for x86 abs/max using SSE intrinsics (PR34047); NFC
D113035
2021-11-03 09:13:23 -04:00
Roman Lebedev
b554e41e2d
[CVP] Canonicalize signed relational comparisons of scalar integers to unsigned comparison predicates
Now that the reasoning was added to ConstantRange in D90924,
this replicates IndVars variant of this transform (D111836)
in a pass that uses value range reasoning for the transform.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112895
2021-11-01 12:16:05 +03:00
Alexey Bataev
ce14d1b690 [SLP]Do not reorder reduction nodes.
The final reduction nodes should not be reordered, the order does not
matter for reductions. Also, it might be profitable to vectorize smaller
reduction trees, reduction cost may compensate small tree cost.

Part of D111574

Differential Revision: https://reviews.llvm.org/D112467
2021-10-26 07:41:24 -07:00
Anton Afanasyev
7b07c01351 [InstCombine] Support arbitrary const shift amount for lshr (sext i1 ...)
Add lshr (sext i1 X to iN), C --> select (X, -1 >> C, 0) case. This expands
C == N-1 case to arbitrary C.

Fixes PR52078.

Reviewed By: spatel, RKSimon, lebedev.ri

Differential Revision: https://reviews.llvm.org/D111330
2021-10-15 13:39:13 +03:00
Anton Afanasyev
e23351cdc9 [Test][InstCombine] Precommit tests for PR52078 2021-10-15 13:38:40 +03:00
hyeongyu kim
ec8311444a [InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3)
This patch is for fixing potential shufflevector-related bugs like D93818.
As D93818, this patch change shufflevector's default placeholder to poison.
To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D110227
2021-09-23 00:14:50 +09:00
Arthur Eubanks
d49cb5b303 [SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest
This makes some tests in vector-reductions-logical.ll more stable when
applying D108837.

The cost of branching is higher when vector ops are involved due to
potential SLP transformations.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D108935
2021-09-16 10:50:36 -07:00
Sanjay Patel
be1028053e [PhaseOrdering] add tests for PR47023; NFC 2021-09-15 08:44:04 -04:00
Roman Lebedev
909cba9699
[SimplifyCFG] performBranchToCommonDestFolding(): require block-closed SSA form for bonus instructions (PR51125)
I can't seem to wrap my head around the proper fix here,
we should be fine without this requirement, iff we can form this form,
but the naive attempt (https://reviews.llvm.org/D106317) has failed.
So just to unblock the release, put up a restriction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51125
2021-09-09 12:28:09 +03:00
Anton Afanasyev
cfb6dfcbd1 [AggressiveInstCombine] Add logical shift right instr to TruncInstCombine DAG
Add `lshr` instruction to the DAG post-dominated by `trunc`, allowing
TruncInstCombine to reduce bitwidth of expressions containing
these instructions.

We should be shifting by less than the target bitwidth.
Also it is sufficient to require that all truncated bits
of the value-to-be-shifted are zeros: https://alive2.llvm.org/ce/z/_LytbB

Alive2 variable-length proof:
https://godbolt.org/z/1srE1aqzf => s/32/8/ => https://alive2.llvm.org/ce/z/StwPia

Part of https://reviews.llvm.org/D107766

Differential Revision: https://reviews.llvm.org/D108201
2021-08-18 22:20:58 +03:00
Anton Afanasyev
8f8f9260a9 [Test][AggressiveInstCombine] Add test for shifts
Precommit test for D107766/D108091. Also move fixed test for PR50555
from SLPVectorizer/X86/ to PhaseOrdering/X86/ subdirectory.
2021-08-17 12:39:53 +03:00
Sanjay Patel
a22c99c3c1 [InstCombine] canonicalize cmp-of-bitcast-of-vector-cmp to use zero constant
We can invert a compare constant and preserve the logic
as shown in this sampling:
https://alive2.llvm.org/ce/z/YAXbfs
(In theory, we could deal with non-all-ones/zero as well,
but it doesn't seem worthwhile.)

I noticed this as a part of the x86 codegen difference in
https://llvm.org/PR51259 - it ends up using "test"
instead of "not + cmp" in that example.

This pattern also shows up in https://llvm.org/PR41312
and https://llvm.org/PR50798 .

Differential Revision: https://reviews.llvm.org/D107170
2021-07-31 13:31:12 -04:00
Roman Lebedev
1901c98dd8
[SimplifyCFG] SwitchToLookupTable(): don't increase ret count
The very next SimplifyCFG pass invocation will tail-merge these two ret's
anyways, there is not much point in creating more work for ourselves.
2021-07-26 23:29:55 +03:00
hyeongyu kim
aca5aeb752 [InstCombine] Add freezeAllUsesOfArgument to visitFreeze
In D106041, a freeze was added before the branch condition to solve the miscompilation problem of SimpleLoopUnswitch.
However, I found that the added freeze disturbed other optimizations in the following situations.
```
arg.fr = freeze(arg)
use(arg.fr)
...
use(arg)
```
It is a problem that occurred when arg and arg.fr were recognized as different values.
Therefore, changing to use arg.fr instead of arg throughout the function eliminates the above problem.
Thus, I add a function that changes all uses of arg to freeze(arg) to visitFreeze of InstCombine.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D106233
2021-07-24 18:08:58 +09:00
Roman Lebedev
d7378259aa
[SimplifyCFG] SimplifyCondBranchToTwoReturns(): really only deal with different ret blocks
This function is called when some predecessor of an empty return block
ends with a conditional branch, with both successors being empty ret blocks.

Now, because of the way SimplifyCFG works, it might happen to simplify
one of the blocks in a way that makes a conditional branch
into an unconditional one, since it's destinations are now identical,
but it might not have actually simplified said conditional branch
into an unconditional one yet.

So, we have to check that ourselves first,
especially now that SimplifyCFG aggressively tail-merges
all ret and resume blocks.

Even if it was an unconditional branch already,
`SimplifyCFGOpt::simplifyReturn()` doesn't call `FoldReturnIntoUncondBranch()`
by default.
2021-07-23 00:36:59 +03:00
Sanjay Patel
0e15de2d0c [InstCombine] fold reassociative FP add into start value of fadd reduction
This pattern is visible in unrolled and vectorized loops.
Although the backend seems to be able to reassociate to
ideal form in the examples I looked at, we might as well
do that in IR for efficiency.
2021-07-18 06:26:20 -04:00
Sanjay Patel
81ce3aa30c [SLP] avoid leaking poison in reduction of safe boolean logic ops
This bug was introduced with D105730 / 25ee55c0baff .

If we are not converting all of the operations of a reduction
into a vector op, we need to preserve the existing select form
of the remaining ops. Otherwise, we are potentially leaking
poison where it did not in the original code.

Alive2 agrees that the version that freezes some inputs
and then falls back to scalar is correct:
https://alive2.llvm.org/ce/z/erF4K2
2021-07-15 17:33:06 -04:00
Roman Lebedev
3e6c383dc6
[SimplifyCFG] Rerun PHI deduplication after common code sinkinkg (PR51092)
`SinkCommonCodeFromPredecessors()` doesn't itself ensure that duplicate PHI nodes aren't created.
I suppose, we could teach it to do that on-the-fly (& account for the already-existing PHI nodes,
& adjust costmodel), the diff will be bigger than this.

The alternative is to schedule a new EarlyCSE pass invocation somewhere later in the pipeline.
Clearly, we don't have any EarlyCSE runs in module optimization passline, so this pattern isn't cleaned up...
That would perhaps better, but it will again have some compile time impact.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D106010
2021-07-15 16:34:34 +03:00