PR #125442 replaces the pass-based Polly architecture with a monolithic
pass consisting of phases. Reasons listed in
https://github.com/llvm/llvm-project/pull/125442.
With this change, the SCoP-passes became redundant problematic versions
of the same functionality and are removed.
Reapply of a22d1c2225543aa9ae7882f6b1a97ee7b2c95574. Using this PR for
pre-merge CI.
Instead of relying on any pass manager to schedule Polly's passes, add
Polly's own pipeline manager which is seen as a monolithic pass in
LLVM's pass manager. Polly's former passes are now phases of the new
PhaseManager component.
Relying on LLVM's pass manager (the legacy as well as the New Pass
Manager) to manage Polly's phases never was a good fit that the
PhaseManager resolves:
* Polly passes were modifying analysis results, in particular RegionInfo
and ScopInfo. This means that there was not just one unique and
"definite" analysis result, the actual result depended on which analyses
ran prior, and the pass manager was not allowed to throw away cached
analyses or prior SCoP optimizations would have been forgotten. The LLVM
pass manger's persistance of analysis results is not contractual but
designed for caching.
* Polly depends on a particular execution order of passes and regions
(e.g. regression tests, invalidation of consecutive SCoPs). LLVM's pass
manager does not guarantee any excecution order.
* Polly does not completely preserve DominatorTree, RegionInfo,
LoopInfo, or ScalarEvolution, but only as-needed for Polly's own uses.
Because the ScopDetection object stores references to those analyses, it
still had to lie to the pass manager that they would be preserved, or
the pass manager would have released and recomputed the invalidated
analysis objects that ScopDetection/ScopInfo was still referencing. To
ensure that no non-Polly pass would see these not-completely-preserved
analyses, all analyses still had to be thrown away after the
ScopPassManager, respectively with a BarrierNoopPass in case of the LPM.
* The NPM's PassInstrumentation wraps the IR unit into an `llvm::Any`
object, but implementations such as PrintIRInstrumentation call
llvm_unreachable on encountering an unknown IR unit, such as SCoPs, with
no extension points to add support. Hence LLVM crashes when dumping IR
between SCoP passes (such as `-print-before-changed` with Polly being
active).
The new PhaseManager uses some command line options that previously
belonged to Polly's legacy passes, such as `-polly-print-detect` (so the
option will continue to work). Hence the LPM support is incompatible
with the new approach and support for it is removed.
When ISL encounters an internal error, it sets the error flag, but it is
not isl_error_quota that was already checked. Check for general errors
and abort the schedule optimization if that happens, instead of
continuing on the good path.
The error occured when compiling llvm-test-suite's
MultiSource/Applications/JM/lencod/leaky_bucket.c with Polly enabled.
Not adding a test case because it depends on ISL internals. We do not
want to a test case to depend on which version of ISL is used.
Instead of relying on any pass manager to schedule Polly's passes, add
Polly's own pipeline manager which is seen as a monolithic pass in
LLVM's pass manager. Polly's former passes are now phases of the new
PhaseManager component.
Relying on LLVM's pass manager (the legacy as well as the New Pass
Manager) to manage Polly's phases never was a good fit that the
PhaseManager resolves:
* Polly passes were modifying analysis results, in particular RegionInfo
and ScopInfo. This means that there was not just one unique and
"definite" analysis result, the actual result depended on which analyses
ran prior, and the pass manager was not allowed to throw away cached
analyses or prior SCoP optimizations would have been forgotten. The LLVM
pass manger's persistance of analysis results is not contractual but
designed for caching.
* Polly depends on a particular execution order of passes and regions
(e.g. regression tests, invalidation of consecutive SCoPs). LLVM's pass
manager does not guarantee any excecution order.
* Polly does not completely preserve DominatorTree, RegionInfo,
LoopInfo, or ScalarEvolution, but only as-needed for Polly's own uses.
Because the ScopDetection object stores references to those analyses, it
still had to lie to the pass manager that they would be preserved, or
the pass manager would have released and recomputed the invalidated
analysis objects that ScopDetection/ScopInfo was still referencing. To
ensure that no non-Polly pass would see these not-completely-preserved
analyses, all analyses still had to be thrown away after the
ScopPassManager, respectively with a BarrierNoopPass in case of the LPM.
* The NPM's PassInstrumentation wraps the IR unit into an `llvm::Any`
object, but implementations such as PrintIRInstrumentation call
llvm_unreachable on encountering an unknown IR unit, such as SCoPs, with
no extension points to add support. Hence LLVM crashes when dumping IR
between SCoP passes (such as `-print-before-changed` with Polly being
active).
The new PhaseManager uses some command line options that previously
belonged to Polly's legacy passes, such as `-polly-print-detect` (so the
option will continue to work). Hence the LPM support is incompatible
with the new approach and support for it is removed.
Bound ISL operations during pre-vectorization to
prevent indefinite compilation. The MaxOpGuard
previously used for schedule computation is now
extended to also guard pre-vectorization optimizations.
This patch includes a reduced test case derived
from the original bug report.
---------
Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
Patch created using the following command line:
```bash
codespell polly --skip="*.pdf,polly/lib/External/*" --write-changes \
--ignore-words-list=couter,createor,distribues,doble,identty,indention,indx,olt,ore,padd,sais,te,theses
```
This flag enable the user to print debug Info from all the passes and
helpers inside polly at once. This will help a novice user as well to
work in polly without explicitly having to know which parts of polly has
actually kicked in and pass them via -debug-only.
To fix long compile time issue of Schedule optimizer, patch #77280 sets
the upper cap on max ISL operations. In case of bailing out when ISL
quota is hit, error handling behavior was restored manually. This commit
replaces the restoration code with IslMaxOperationsGuard helper and also
removes redundant early return.
There is no upper cap set on current Schedule Optimizer to compute
schedule. In some cases a very long compile time taken to compute the
schedule resulting in hang kind of behavior. This patch introduces a
flag 'polly-schedule-computeout' to pass the capwhich is initialized to
300000. This patch handles the compute out cases by bailing out and
exiting gracefully.
Fixed the test that failed in previous commit.
Fixes#69090
This reverts commit d6c4d4c9b910e8ad5ed7cd4825a143742041c1f4.
Broke buildldbots with asserts disabled; -debug-only is only available in
asserts builds.
There is no upper cap set on current Schedule Optimizer to compute
schedule. In some cases a very long compile time taken to compute the
schedule resulting in hang kind of behavior. This patch introduces a
flag 'polly-schedule-computeout' to pass the capwhich is initialized to
300000. This patch handles the compute out cases by bailing out and
exiting gracefully.
Fixes#69090
Polly-ACC is unmaintained and since it has never been ported to the NPM pipeline, since D136621 it is not even accessible anymore without manually specifying the passes on the `opt` command line.
Since there is no plan to put it to a maintainable state, remove it from Polly.
Reviewed By: grosser
Differential Revision: https://reviews.llvm.org/D142580
The pattern matching optimization of Polly detects and optimizes dense general
matrix-matrix multiplication. The generated code is close to high performance
implementations of matrix-matrix multiplications, which are contained in
manually tuned libraries. The described pattern matching optimization is
a particular case of tensor contraction optimization, which was
introduced in [1].
This patch generalizes the pattern matching to the case of tensor contractions
using the form of data dependencies and memory accesses produced by tensor
contractions [1].
Optimization of tensor contractions will be added in the next patch. Following
the ideas introduced in [2], it will logically represent tensor contraction
operands as matrix multiplication operands and use an approach for
optimization of matrix-matrix multiplications.
[1] - Gareev R., Grosser T., Kruse M. High-Performance Generalized Tensor
Operations: A Compiler-Oriented Approach // ACM Transactions on
Architecture and Code Optimization (TACO). 2018. Vol. 15, no. 3.
P. 34:1–34:27. DOI: 10.1145/3235029.
[2] - Matthews D. High-Performance Tensor Contraction without BLAS // SIAM
Journal on Scientific Computing. 2018. Vol. 40, no. 1. P. C 1—C 24.
DOI: 110.1137/16m108968x.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D114336
The copy statements inserted by the matrix-multiplication optimization
introduce new dependencies between the copy statements and other
statements. As a result, the DependenceInfo must be recomputed.
Not recomputing them caused IslAstInfo to deduce that some loops are
parallel but cause race conditions when accessing the packed arrays.
As a result, matrix-matrix multiplication currently cannot be
parallelized.
Also see discussion at https://reviews.llvm.org/D125202
This make is obivious that a class was not intended to be derived from.
NPM analysis pass can unfortunately not marked as final because they are
derived from a llvm::Checker<T> template internally by the NPM.
Also normalize the use of classes/structs
* NPM passes are structs
* Legacy passes are classes
* structs that have methods and are not a visitor pattern are classes
* structs have public inheritance by default, remove "public" keyword
* Use typedef'ed type instead of inline forward declaration
The `opt -analyze` option only works with the legacy pass manager and might be removed in the future, as explained in llvm.org/PR53733. This patch introduced -polly-print-* passes that print what the pass would print with the `-analyze` option and replaces all uses of `-analyze` in the regression tests.
There are two exceptions: `CodeGen\single_loop_param_less_equal.ll` and `CodeGen\loop_with_condition_nested.ll` use `-analyze on the `-loops` pass which is not part of Polly.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120782
A prevectorized loop may contain multiple statements, in which case
isl_schedule_node_band_sink will sink the vector band to multiple
leaves. Instead of statically assuming a specific tree structure after
sinking, add a SIMD marker to all inner bands.
Fixes llvm.org/PR52637
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in lib/External/isl/include/isl/isl-noxceptions.h and the official isl C++ interface.
In the official interface the type `isl::size` cannot be casted to an unsigned without previously having checked if it contains a valid value with the function `isl::size::is_error()`.
For this reason two helping functions have been added:
- `IslAssert`: assert that no errors are present in debug builds and just disables the mandatory error check in non-debug builds
- `unisgnedFromIslSIze`: cast the `isl::size` object to `unsigned`
Changes made:
- Add the functions `IslAssert` and `unsignedFromIslSize`
- Add the utility function `rangeIslSize()`
- Retype `MaxDisjunctsInDomain` from `int` to `unsigned`
- Retype `RunTimeChecksMaxAccessDisjuncts` from `int` to `unsigned`
- Retype `MaxDimensionsInAccessRange` from `int` to `unsigned`
- Replaced some usages of `isl_size` to `unsigned` since we aim not to use `isl_size` anymore
- `isl-noexceptions.h` has been generated by e704f73c88
No functional change intended.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D113101
When the option -polly-loopfusion-greedy is set, the ScheduleOptimizer
tries to aggressively fuse any band it can and does not violate any
dependences.
As part if the implementation, the functionalty for copying a band
into an new schedule was extracted out of the ScheduleTreeRewriter.
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.
It interprets the same metadata as the LoopDistribute pass.
Re-apply after revert in c7bcd72a38bcf99e03e4651ed5204d1a1f2bf695 with
fix: Take isBand out of #ifndef NDEBUG since it now is used
unconditionally.
The name of the option is misleading and has been renamed by isl to
"serialize-sccs". Instead of also renaming the option, remove it.
The option is still accessible using
-polly-isl-arg=--no-schedule-serialize-sccs
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.
It interprets the same metadata as the LoopDistribute pass.
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in `lib/External/isl/include/isl/isl-noxceptions.h` and the official isl C++ interface.
With this commit we are moving from the `polly-generator` branch to the `new-polly-generator` branch that is more mantainable and is based on the official C++ interface `cpp-checked.h`.
Changes made:
- There are now many sublcasses for `isl::ast_node` representing different isl types. Use `isl::ast_node_for`, `isl::ast_node_user`, `isl::ast_node_block` and `isl::ast_node_mark` where needed.
- There are now many sublcasses for `isl::schedule_node` representing different isl types. Use `isl::schedule_node_mark`, `isl::schedule_node_extension`, `isl::schedule_node_band` and `isl::schedule_node_filter` where needed.
- Replace the `isl::*::dump` with `dumpIslObj` since the isl dump method is not exposed in the C++ interface.
- `isl::schedule_node::get_child` has been renamed to `isl::schedule_node::child`
- `isl::pw_multi_aff::get_pw_aff` has been renamed to `isl::pw_multi_aff::at`
- The constructor `isl::union_map(isl::union_pw_multi_aff)` has been replaced with the static method `isl::union_map::from()`
- Replace usages of `isl::val::add_ui` with `isl::val::add`
- `isl::union_set_list::alloc` is now a constructor
- All the `isl_size` values are now wrapped inside the class `isl::size` use `isl::size::release` to get the internal `isl_size` value where needed.
- `isl-noexceptions.h` has been generated by 73f5ed1f4d
No functional change intended.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D107225
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in `lib/External/isl/include/isl/isl-noxceptions.h` and the official isl C++ interface.
Changes made:
- Use `isl::*::ctx()` instead of `isl::*::get_ctx()` (for example `isl::space::ctx()` instead of `isl::space::get_ctx()`)
- Add `isl::` namespace in front of isl types to avoid confusion (for example `isl::space::ctx` and `isl::ctx`
- `isl-noexceptions.h` has been generated by this b64e33c62d
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D105691
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in `lib/External/isl/include/isl/isl-noxceptions.h` and the official isl C++ interface.
Changes made:
- Removing explicit operator bool() from all the classes in the isl C++ bindings.
- Replace each call to operator bool() to method `is_null()`.
- isl-noexceptions.h has been generated by this 27396daac5
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D103976
[Polly][Isl] Removing nullptr constructor from C++ bindings. NFC.
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in `lib/External/isl/include/isl/isl-noxceptions.h` and the official isl C++ interface.
Changes made:
- Removed `std::nullptr_t` constructor from all the classes in the isl C++ bindings.
- `isl-noexceptions.h` has been generated by this a7e00bea38
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D103751
[Polly][Isl] Removing nullptr constructor from C++ bindings. NFC.
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in `lib/External/isl/include/isl/isl-noxceptions.h` and the official isl C++ interface.
Changes made:
- Removed `std::nullptr_t` constructor from all the classes in the isl C++ bindings.
- `isl-noexceptions.h` has been generated by this a7e00bea38
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D103751
Functions shared between generalized matrix-multiplication optimization
and other post-reschedule optimizations (tiling, prevect) are moved into
the schedule tree transformation utility ScheduleTreeTransform.
This produced a compile error with GCC:
llvm-project/polly/lib/Transform/ScheduleOptimizer.cpp:1220:49: error: cannot convert ‘bool’ to ‘llvm::TargetTransformInfo::RegisterKind’
1220 | RegisterBitwidth = TTI->getRegisterBitWidth(true);
Make Polly look for unrolling metadata (https://llvm.org/docs/TransformMetadata.html#loop-unrolling) that is usually only interpreted by the LoopUnroll pass and apply it to the SCoP's schedule.
While not that useful by itself (there already is an unroll pass), it introduces mechanism to apply arbitrary loop transformation directives in arbitrary order to the schedule. Transformations are applied until no more directives are found. Since ISL's rescheduling would discard the manual transformations and it is assumed that when the user specifies the sequence of transformations, they do not want any other transformations to apply. Applying user-directed transformations can be controlled using the `-polly-pragma-based-opts` switch and is enabled by default.
This does not influence the SCoP detection heuristic. As a consequence, loop that do not fulfill SCoP requirements or the initial profitability heuristic will be ignored. `-polly-process-unprofitable` can be used to disable the latter.
Other than manually editing the IR, there is currently no way for the user to add loop transformations in an order other than the order in the default pipeline, or transformations other than the one supported by clang's LoopHint. See the `unroll_double.ll` test as example that clang currently is unable to emit. My own extension of `#pragma clang loop` allowing an arbitrary order and additional transformations is available here: https://github.com/meinersbur/llvm-project/tree/pragma-clang-loop. An effort to upstream this functionality as `#pragma clang transform` (because `#pragma clang loop` has an implicit transformation order defined by the loop pipeline) is D69088.
Additional transformations from my downstream pragma-clang-loop branch are tiling, interchange, reversal, unroll-and-jam, thread-parallelization and array packing. Unroll was chosen because it uses already-defined metadata and does not require correctness checks.
Reviewed By: sebastiankreutzer
Differential Revision: https://reviews.llvm.org/D97977
Regenerate the C++ wrapper header from the current isl version's
headers.
The most notable change is that some dimension sizes are represented by
an isl_size (instead of unsigned), which is a signed int. Additionally,
some function may return -1 in case of an error which already had been
fixed in the past. The C++ may no return -1 instead of UINT_MAX which
caused the problems.
Some types in Polly had been changed from unsigned to isl_size
(that were not already auto) and some loops/comparision had to be
changed to avoid unsigned/signed comparison warnings.
These are implementation details of the IslScheduleOptimizer pass
implementation and not use anywhere else. Hence, we can move them to the
cpp file and into an anonymous namespace.
Only getPartialTilePrefixes is, aside from the pass itself, used
externally (by the ScheduleOptimizerTest) and moved into the polly
namespace.
The schedule of a fused loop has one isl_space per statement, such that
a conversion to a isl_map fails. However, the prevectorization is
interested in the schedule space only: Converting to the non-union
representation only after extracting the schedule range fixes the problem.
This fixes llvm.org/PR46578
The member LastSchedule was never set, such that printScop would always
print "n/a" instead of the last schedule.
To ensure that the isl_ctx lives as least as long as the stored
schedule, also store a shared_ptr.
Also set the schedule tree output style to ISL_YAML_STYLE_BLOCK to avoid
printing everything on a single line.
`opt -polly-opt-isl -analyze` will be used in the next commit.
../polly/lib/Transform/ScheduleOptimizer.cpp:812:54: warning: comparison of integers of different signs: 'isl_size' (aka 'int') and 'const unsigned int' [-Wsign-compare]
isl_schedule_node_band_n_member(Node.get()) >
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^