14 Commits

Author SHA1 Message Date
Nikita Popov
580210a0c9 [SLP] Convert some tests to opaque pointers (NFC) 2022-12-23 10:02:57 +01:00
Bjorn Pettersson
3be72f4029 [test][SLPVectorizer] Use -passes syntax in RUN lines. NFC 2022-10-13 10:44:38 +02:00
Philip Reames
7d6e8f2a96 [slp] Delete dead scalar instructions feeding vectorized instructions
If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We can go ahead and delete them as well.

This is purely for test output quality and readability. It should have no effect in any sane pipeline.

Differential Revision: https://reviews.llvm.org/D122493
2022-03-28 20:10:13 -07:00
Philip Reames
48cc9287f5 Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)
The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling).  Most of these were fixed over the weekend and have had several days to bake.  The last was fixed this morning after being noticed in manual review of test changes yesterday.  See the review thread for links to each change.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

   Before this patch:

   704357 SLP                          - Number of calcDeps actions
   699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
   59 SLP                          - Number of ReScheduleOnFail actions
   10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

   After this patch:

   102895 SLP                          - Number of calcDeps actions
   161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
   55 SLP                          - Number of ReScheduleOnFail actions
   10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-03-25 10:39:23 -07:00
Philip Reames
deae979a2c Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""
This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on the original review.
2022-03-03 11:35:34 -08:00
Philip Reames
738042711b Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""
Root issue which triggered the revert was fixed in 689bab.  No changes in the reapplied patch.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instr
uction potentially vectorized to the last. This window can include a very large number of unrelated instruct
ions which are not being considered for vectorization. This change switches the code to only schedule the su
b-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

    Before this patch:

    704357 SLP                          - Number of calcDeps actions
     699021 SLP                          - Number of schedule calls
       5598 SLP                          - Number of ReSchedule actions
         59 SLP                          - Number of ReScheduleOnFail actions
      10084 SLP                          - Number of schedule resets
       8523 SLP                          - Number of vector instructions generated

    After this patch:

    102895 SLP                          - Number of calcDeps actions
     161916 SLP                          - Number of schedule calls
       5637 SLP                          - Number of ReSchedule actions
         55 SLP                          - Number of ReScheduleOnFail actions
      10083 SLP                          - Number of schedule resets
       8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-03-02 10:47:20 -08:00
Arthur Eubanks
9c6250ee41 Revert "[SLP] Schedule only sub-graph of vectorizable instructions"
This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bottom-to-top-reorder.ll.
2022-03-01 17:31:16 -08:00
Philip Reames
0539a26d91 [SLP] Schedule only sub-graph of vectorizable instructions
SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP                          - Number of calcDeps actions
 699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
     59 SLP                          - Number of ReScheduleOnFail actions
  10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

After this patch:

102895 SLP                          - Number of calcDeps actions
 161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
     55 SLP                          - Number of ReScheduleOnFail actions
  10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-02-22 10:15:55 -08:00
Nikita Popov
48ebe427c9 [SLPVectorizer] Make aliasing check more precise
SLPVectorizer currently uses AA::isNoAlias() to determine whether
two locations alias. This does not work if one of the instructions
is a call. Instead, we should check getModRefInfo(), which
determines whether an arbitrary instruction modifies or references
a given location.

Among other things, this prevents @llvm.experimental.noalias.scope.decl()
and other inaccessiblmemonly intrinsics from interfering with SLP
vectorization.

Differential Revision: https://reviews.llvm.org/D109012
2021-08-31 22:35:30 +02:00
Nikita Popov
bf8b69bb3a [SLPVectorizer] Add test for inaccessiblememonly call (NFC) 2021-08-31 20:23:26 +02:00
Eric Christopher
cee313d288 Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

llvm-svn: 358552
2019-04-17 04:52:47 +00:00
Eric Christopher
a863435128 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

llvm-svn: 358546
2019-04-17 02:12:23 +00:00
Alexey Bataev
ce2c8b3360 [SLP]Update test checks for the SPL vectorizer, NFC.
llvm-svn: 350967
2019-01-11 20:21:14 +00:00
Dan Gohman
2c74fe977d Add an @llvm.sideeffect intrinsic
This patch implements Chandler's idea [0] for supporting languages that
require support for infinite loops with side effects, such as Rust, providing
part of a solution to bug 965 [1].

Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual
effect, but which appears to optimization passes to have obscure side effects,
such that they don't optimize away loops containing it. It also teaches
several optimization passes to ignore this intrinsic, so that it doesn't
significantly impact optimization in most cases.

As discussed on llvm-dev [2], this patch is the first of two major parts.
The second part, to change LLVM's semantics to have defined behavior
on infinite loops by default, with a function attribute for opting into
potential-undefined-behavior, will be implemented and posted for review in
a separate patch.

[0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html
[1] https://bugs.llvm.org/show_bug.cgi?id=965
[2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html

Differential Revision: https://reviews.llvm.org/D38336

llvm-svn: 317729
2017-11-08 21:59:51 +00:00