24 Commits

Author SHA1 Message Date
Michael Kruse
42cd38c01e [Polly] Remove -polly-vectorizer=polly.
Polly's internal vectorizer is not well maintained and is known to not work in some cases such as region ScopStmts. Unlike LLVM's LoopVectorize pass it also does not have a target-dependent cost heuristics, and we recommend using LoopVectorize instead of -polly-vectorizer=polly.

In the future we hope that Polly can collaborate better with LoopVectorize, like Polly marking a loop is safe to vectorize with a specific simd width, instead of replicating its functionality.

Reviewed By: grosser

Differential Revision: https://reviews.llvm.org/D142640
2023-03-08 12:51:42 -06:00
Nikita Popov
b332499a94 [Polly] Convert some tests to opaque pointers (NFC) 2023-01-17 10:15:18 +01:00
Michael Kruse
5c02808131 [polly] Introduce -polly-print-* passes to replace -analyze.
The `opt -analyze` option only works with the legacy pass manager and might be removed in the future, as explained in llvm.org/PR53733. This patch introduced -polly-print-* passes that print what the pass would print with the `-analyze` option and replaces all uses of `-analyze` in the regression tests.

There are two exceptions: `CodeGen\single_loop_param_less_equal.ll` and `CodeGen\loop_with_condition_nested.ll` use `-analyze on the `-loops` pass which is not part of Polly.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D120782
2022-03-14 10:27:15 -05:00
Michael Kruse
d7851685a3 [polly] Remove trailing whitespace from tests. NFC. 2022-02-22 15:41:13 -06:00
serge-sans-paille
4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee04c75b1f1332b4fd1ac4e8ef6c3c247.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille
bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Arthur Eubanks
b210c9899b [BasicAA] Replace -basicaa with -basic-aa in polly
Follow up to https://reviews.llvm.org/D82607.
2020-06-30 15:50:17 -07:00
Fangrui Song
502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Roman Gareev
96e1119a96 Make optimizations based on pattern matching be enabled by default
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30293

llvm-svn: 295958
2017-02-23 11:44:12 +00:00
Roman Gareev
11001e1534 Annotation of SIMD loops
Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip
parallelism checks.

The buildbot shows the following compile/execution time changes:

  Compile time:
    Improvements    Δ     Previous  Current  σ
    …/gesummv      -6.06% 0.2640    0.2480   0.0055
    …/gemver       -4.46% 0.4480    0.4280   0.0044
    …/covariance   -4.31% 0.8360    0.8000   0.0065
    …/adi          -3.23% 0.9920    0.9600   0.0065
    …/doitgen      -2.53% 0.9480    0.9240   0.0090
    …/3mm          -2.33% 1.0320    1.0080   0.0087

  Execution time:
    Regressions     Δ     Previous  Current  σ
    …/viterbi       1.70% 5.1840    5.2720   0.0074
    …/smallpt       1.06% 12.4920   12.6240  0.0040

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D14491

llvm-svn: 261620
2016-02-23 09:00:13 +00:00
Tobias Grosser
f4ee371e60 tests: Drop -polly-detect-unprofitable and -polly-no-early-exit
These flags are now always passed to all tests and need to be disabled if
not needed. Disabling these flags, rather than passing them to almost all
tests, significantly simplfies our RUN: lines.

llvm-svn: 249422
2015-10-06 15:36:44 +00:00
Tobias Grosser
07c1c2fcc9 Make prevectorization width configurable
Polly uses 'prevectorization' to enable outer loop vectorization. When
vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop
iterations which are than interchanged to the innermost level such that LLVM's
inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this
loop. The number of loop iterations to strip-mine is now configurable with the
option -polly-prevect-width=<number-of-prevec-dims>.

This is mostly a debugging option. We should probably add a heuristic that
derives the number of prevectorization dimensions from the target data and
the data types used.

llvm-svn: 245424
2015-08-19 08:46:11 +00:00
Tobias Grosser
b241d928bd Rewrite getPrevectorMap using schedule trees operations
Schedule trees are a lot easier to work with, for both humans and machines. For
humans the more structured schedule representation is easier to reason about.
Together with the more abstract isl programming interface this can result in a
lot cleaner code (see this changeset). For machines, the structured schedule and
the fact that we now use explicit piecewise affine expressions instead of
integer maps makes it easier to generate code from this schedule tree. As a
result, we can already see a slight compile-time improvement -- for 3mm from
0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such
as full-partial tile separation will most likely result in more streamlined code
to be generated.

Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243458
2015-07-28 18:03:36 +00:00
Tobias Grosser
808cd69a92 Use schedule trees to represent execution order of statements
Instead of flat schedules, we now use so-called schedule trees to represent the
execution order of the statements in a SCoP. Schedule trees make it a lot easier
to analyze, understand and modify properties of a schedule, as specific nodes
in the tree can be choosen and possibly replaced.

This patch does not yet fully move our DependenceInfo pass to schedule trees,
as some additional performance analysis is needed here. (In general schedule
trees should be faster in compile-time, as the more structured representation
is generally easier to analyze and work with). We also can not yet perform the
reduction analysis on schedule trees.

For more information regarding schedule trees, please see Section 6 of
https://lirias.kuleuven.be/handle/123456789/497238

llvm-svn: 242130
2015-07-14 09:33:13 +00:00
Tobias Grosser
173ecab705 Remove target triples from test cases
I just learned that target triples prevent test cases to be run on other
architectures. Polly test cases are until now sufficiently target independent
to not require any target triples. Hence, we drop them.

llvm-svn: 235384
2015-04-21 14:28:02 +00:00
Tobias Grosser
4f6bceface Do not scale tile loops
We now generate tile loops as:

 for (int c1 = 0; c1 <= 47; c1 += 1)
   for (int c2 = 0; c2 <= 47; c2 += 1)
     for (int c3 = 0; c3 <= 31; c3 += 1)
       for (int c4 = 0; c4 <= 31; c4 += 4)
         #pragma simd
         for (int c5 = c4; c5 <= c4 + 3; c5 += 1)
           Stmt_for_body3(32 * c1 + c3, 32 * c2 + c5);

instead of

 for (int c1 = 0; c1 <= 1535; c1 += 32)
   for (int c2 = 0; c2 <= 1535; c2 += 32)
     for (int c3 = 0; c3 <= 31; c3 += 1)
       for (int c4 = 0; c4 <= 31; c4 += 4)
         #pragma simd
         for (int c5 = c4; c5 <= c4 + 3; c5 += 1)
           Stmt_for_body3(c1 + c3, c2 + c5);

Run-time performance-wise this makes little difference, but this gives a large
reduction in compile time (10-30% on 17 LNT benchmarks). Apparently the isl
AST generator is not yet very efficient in generating the latter.

llvm-svn: 233675
2015-03-31 07:52:36 +00:00
Tobias Grosser
bbb4cec2e8 Use schedule trees to perform post-scheduling transformations
Replacing the old band_tree based code with code that is based on the new
schedule tree [1] interface makes applying complex schedule transformations a lot
more straightforward. We now do not need to reason about the meaning of flat
schedules, but can use a more straightforward tree structure. We do not yet
exploit this a lot in the current code, but hopefully we will be able to do so
soon.

This change also allows us to drop some code, as isl now provides some higher
level interfaces to apply loop transformations such as tiling.

This change causes some small test case changes as isl uses a slightly different
way to perform loop tiling, but no significant functional changes are intended.

[1] http://impact.gforge.inria.fr/impact2014/papers/impact2014-verdoolaege.pdf

llvm-svn: 232911
2015-03-22 12:06:39 +00:00
Tobias Grosser
f2716ea7d5 Add -polly-vectorizer=stripmine
By strip-mining outer loops to the innermost level we can enable LLVM's loop
vectorizer to vectorize outer loops.

llvm-svn: 232100
2015-03-12 20:48:07 +00:00
David Blaikie
c94eca0546 Update Polly tests to handle explicitly typed load changes in LLVM.
llvm-svn: 230796
2015-02-27 21:22:50 +00:00
David Blaikie
bad3ff207f Update Polly tests to handle explicitly typed gep changes in LLVM
llvm-svn: 230784
2015-02-27 19:20:19 +00:00
Tobias Grosser
d1e33e7061 ScopDetection: Only detect scops that have at least one read and one write
Scops that only read seem generally uninteresting and scops that only write are
most likely initializations where there is also little to optimize.  To not
waste compile time we bail early.

Differential Revision: http://reviews.llvm.org/D7735

llvm-svn: 229820
2015-02-19 05:31:07 +00:00
Duncan P. N. Exon Smith
bd62edb20d Run upgrade script from PR21532 to match LLVM changes
Update tests for LLVM assembly format change in r224257 using the script
attached to PR21532.  I'm hoping this unsticks the bot [1].

[1]: http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/25432

llvm-svn: 224269
2014-12-15 20:28:50 +00:00
Tobias Grosser
8b5344fda2 Explicitly annotate loops we want to run thread-parallel
We introduces a new flag -polly-parallel and use it to annotate the for-nodes in
the isl ast that we want to execute thread parallel (e.g., using OpenMP). We
previously already emmitted openmp annotations, but we did this for various
kinds of parallel loops, including some which we can not run in parallel.

With this patch we now have three annotations:

  1) #pragma known-parallel [reduction]
  2) #pragma omp for
  3) #pragma simd

meaning:

  1) loop has no loop carried dependences
  2) loop will be executed thread-parallel
  3) loop can possibly be vectorized

This patch introduces 1) and reduces the use of 2) to only the cases where we
will actually generate thread parallel code.

It is in preparation of openmp code generation in our isl backend.

Legacy:

- We also have a command line option -enable-polly-openmp. This option controls
  the OpenMP code generation in CLooG. It will become an alias of
  -polly-parallel after the CLooG code generation has been dropped.

http://reviews.llvm.org/D6142

llvm-svn: 221479
2014-11-06 19:35:21 +00:00
Tobias Grosser
4ba60fe9eb ScheduleOptimizer: Fix prevectorization.
In case we are at the innermost band, we try to prepare for vectorization. This
means, we look for the innermost parallel loop and strip mine this loop to the
innermost level using a strip-mine factor corresponding to the number of vector
iterations.

For whatever reason, the code that implemented this feature was broken. We now
added a comment, a test case and obviously also the right code.

llvm-svn: 203544
2014-03-11 06:27:36 +00:00