1207 Commits

Author SHA1 Message Date
Michael Kruse
22c77f2354
[Polly] Use separate DT/LI/SE for outlined subfn. NFC. (#102460)
DominatorTree, LoopInfo, and ScalarEvolution are function-level analyses
that expect to be called only on instructions and basic blocks of the
function they were original created for. When Polly outlined a parallel
loop body into a separate function, it reused the same analyses seemed
to work until new checks to be added in #101198.

This patch creates new analyses for the subfunctions. GenDT, GenLI, and
GenSE now refer to the analyses of the current region of code. Outside
of an outlined function, they refer to the same analysis as used for the
SCoP, but are substituted within an outlined function.

Additionally to the cross-function queries of DT/LI/SE, we must not
create SCEVs that refer to a mix of expressions for old and generated
values. Currently, SCEVs themselves do not "remember" which
ScalarEvolution analysis they were created for, but mixing them is just
as unexpected as using DT/LI across function boundaries. Hence
`SCEVLoopAddRecRewriter` was combined into `ScopExpander`.
`SCEVLoopAddRecRewriter` only replaced induction variables but left
SCEVUnknowns to reference the old function. `SCEVParameterRewriter`
would have done so but its job was effectively superseded by
`ScopExpander`, and now also `SCEVLoopAddRecRewriter`. Some issues
persist put marked with a FIXME in the code. Changing them would
possibly cause this patch to be not NFC anymore.
2024-08-10 14:25:15 +02:00
Karthika Devi C
1e5334bcda
[Polly] Data flow reduction detection to cover more cases (#84901)
The base concept is same as existing reduction algorithm where we get
the list of candidate pairs <store,load>. But the existing algorithm
works only if there is single binary operation between the load and
store.
Example sum += a[i];

This algorithm extends to work with more than single binary operation as
well. It is implemented using data flow reduction detection on basic
block level. We propagate the loads, the number of times the load is
used(flows into instruction) and binary operation performed until we
reach a store.

Example sum += a[i] + b[i];
```
sum(Ld)     a[i](Ld)
      \  +  /
        tmp    b[i](Ld)
           \ + /
            sum(St)
```

In the above case the candidate pairs are formed by associating sum with
all of its load inputs which are sum, a[i] and b[i]. Then check
functions are used to filter a valid reduction pair ie {sum,sum}.

---------

Co-authored-by: Michael Kruse <github@meinersbur.de>
2024-07-30 09:43:24 -07:00
Stephen Tozer
80f881485a
[LLVM] Add InsertPosition union-type to remove overloads of Instruction-creation (#94226)
This patch simplifies instruction creation by replacing all overloads of
instruction constructors/Create methods that are identical other than
the Instruction *InsertBefore/BasicBlock *InsertAtEnd/BasicBlock::iterator
InsertBefore argument with a single version that takes an InsertPosition
argument. The InsertPosition class can be implicitly constructed from
any of the above, internally converting them to the appropriate
BasicBlock::iterator value which can then be used to insert the
instruction (or to not insert it if an invalid iterator is passed).

The upshot of this is that code will be deduplicated, and all callsites
will switch to calling the new unified version without any changes
needed to make the compiler happy. There is at least one exception to
this; the construction of InsertPosition is a user-defined conversion,
so any caller that was already relying on a different user-defined
conversion won't work. In all of LLVM and Clang this happens exactly
once: at clang/lib/CodeGen/CGExpr.cpp:123 we try to construct an alloca
with an AssertingVH<Instruction> argument, which must now be cast to an
Instruction* by using `&*`. If this is more common elsewhere, it could
be fixed by adding an appropriate constructor to InsertPosition.
2024-06-20 10:27:55 +01:00
Karthika Devi C
d33864d5d8
[polly] Fix cppcheck SA comment reported in #91235 (#93505)
This patch moves the unreachable assert before return statement.
Fixes #91235.
2024-05-28 11:41:58 -07:00
Karthika Devi C
601d7eab06
[polly] Add polly-debug flag to print debug info from all parts of polly (#78549)
This flag enable the user to print debug Info from all the passes and
helpers inside polly at once. This will help a novice user as well to
work in polly without explicitly having to know which parts of polly has
actually kicked in and pass them via -debug-only.
2024-03-26 12:02:27 -07:00
Brad Smith
12ed2c90a1
[llvm][NFC] A start at cleaning up zero byte files that should have been removed (#74404) 2023-12-05 01:57:14 -05:00
Brad Smith
2fd66e6eb6 Revert "[lldb] A start at cleaning up zero byte files that should have been removed"
This reverts commit 3223936dc512c9f4f87a230a4d2931e37186ca22.

Commited by accident while mixed in with another commit.
2023-12-05 00:27:11 -05:00
Brad Smith
3223936dc5 [lldb] A start at cleaning up zero byte files that should have been removed 2023-12-04 23:12:54 -05:00
Fangrui Song
678e3ee123 [lldb] Fix duplicate word typos; NFC
Those fixes were taken from https://reviews.llvm.org/D137338
2023-09-01 21:32:24 -07:00
Kazu Hirata
81e149aab9 Replace None with std::nullopt in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-05-12 18:33:26 -07:00
Michael Kruse
19afbfe331 [Polly] Remove Polly-ACC.
Polly-ACC is unmaintained and since it has never been ported to the NPM pipeline, since D136621 it is not even accessible anymore without manually specifying the passes on the `opt` command line.

Since there is no plan to put it to a maintainable state, remove it from Polly.

Reviewed By: grosser

Differential Revision: https://reviews.llvm.org/D142580
2023-03-08 17:33:04 -06:00
Michael Kruse
42cd38c01e [Polly] Remove -polly-vectorizer=polly.
Polly's internal vectorizer is not well maintained and is known to not work in some cases such as region ScopStmts. Unlike LLVM's LoopVectorize pass it also does not have a target-dependent cost heuristics, and we recommend using LoopVectorize instead of -polly-vectorizer=polly.

In the future we hope that Polly can collaborate better with LoopVectorize, like Polly marking a loop is safe to vectorize with a specific simd width, instead of replicating its functionality.

Reviewed By: grosser

Differential Revision: https://reviews.llvm.org/D142640
2023-03-08 12:51:42 -06:00
Paul Walker
62d11b2cca Revert "Revert "[SCEV] Add SCEVType to represent vscale.""
Relanding after fixing Polly related build error.

This reverts commit 7b26dcae9eaf8cdcba7fef032fd83d060dffd4b4.
2023-03-02 13:14:07 +00:00
Florian Hahn
934c82d318
[Polly] Remove CodegenCleanupPass.
The pass uses a bunch of deprecated legacy passes and appears unused.
Remove it to unblock removing legacy passes.

Fixes https://github.com/llvm/llvm-project/issues/60852

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D144332
2023-02-24 13:39:32 +01:00
Kazu Hirata
ccdc271a08 [polly] Use std::optional instead of llvm::Optional (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-02 19:18:46 -08:00
Nikita Popov
e95ca5bb05 [AST] Make AliasSetTracker work on BatchAA
D138014 restricted AST to work on immutable IR. This means it is
also safe to use a single BatchAA instance for the entire AST
lifetime, instead of only batching parts of individual queries.

The primary motivation for this is not compile-time, but rather
having a central place to control cross-iteration AA, which will
be used by D137958.

Differential Revision: https://reviews.llvm.org/D137955
2022-12-05 08:12:26 +01:00
Michael Kruse
b4b7fa234c [Polly] Ensure -polly-detect-keep-going still eventually rejects invalid regions.
Fixes #58484
2022-10-20 13:35:09 -05:00
Gabriel Ravier
ea540bc210
[polly] Fixed a number of typos. NFC
I went over the output of the following mess of a command:

`(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z | parallel --xargs -0 cat | aspell list --mode=none --ignore-case | grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n | grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)`

and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).

Reviewed By: inclyc

Differential Revision: https://reviews.llvm.org/D131167
2022-08-07 22:56:07 +08:00
Roman Gareev
b02c7e2b63 [Polly] Generalize the pattern matching to the case of tensor contractions
The pattern matching optimization of Polly detects and optimizes dense general
matrix-matrix multiplication. The generated code is close to high performance
implementations of matrix-matrix multiplications, which are contained in
manually tuned libraries. The described pattern matching optimization is
a particular case of tensor contraction optimization, which was
introduced in [1].

This patch generalizes the pattern matching to the case of tensor contractions
using the form of data dependencies and memory accesses produced by tensor
contractions [1].

Optimization of tensor contractions will be added in the next patch. Following
the ideas introduced in [2], it will logically represent tensor contraction
operands as matrix multiplication operands and use an approach for
optimization of matrix-matrix multiplications.

[1] - Gareev R., Grosser T., Kruse M. High-Performance Generalized Tensor
Op­erations: A Compiler-Oriented Approach // ACM Transactions on
Architec­ture and Code Optimization (TACO). 2018. Vol. 15, no. 3.
P. 34:1–34:27. DOI: 10.1145/3235029.

[2] - Matthews D. High-Performance Tensor Contraction without BLAS // SIAM
Journal on Scientific Computing. 2018. Vol. 40, no. 1. P. C 1—C 24.
DOI: 110.1137/16m108968x.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D114336
2022-08-07 13:10:32 +03:00
Michael Kruse
fe0e5b3e43 [Polly] Insert !dbg metadata for emitted CallInsts.
The IR Verifier requires that every call instruction to an inlineable
function (among other things, its implementation must be visible in the
translation unit) must also have !dbg metadata attached to it. When
parallelizing, Polly emits calls to OpenMP runtime function out of thin
air, or at least not directly derived from a bounded list of previous
instruction. While we could search for instructions in the SCoP that has
some debug info attached to it, there is no guarantee that we find any.
Our solution is to generate a new DILocation that points to line 0 to
represent optimized code.

The OpenMP function implementation is usually not available in the
user's translation unit, but can become visible in an LTO build. For
the bug to appear, libomp must also be built with debug symbols.

IMHO, the IR verifier rule is too strict. Runtime functions can
also be inserted by other optimization passes, such as
LoopIdiomRecognize. When inserting a call to e.g. memset, it uses the
DebugLoc from a StoreInst from the unoptimized code. It is not
required to have !dbg metadata attached either.

Fixes #56692
2022-07-26 19:43:53 -05:00
Kazu Hirata
3f3930a451 Remove redundaunt virtual specifiers (NFC)
Identified with tidy-modernize-use-override.
2022-07-25 23:00:59 -07:00
Kazu Hirata
360c1111e3 Use llvm::is_contained (NFC) 2022-07-20 09:09:19 -07:00
Michael Kruse
6fa65f8a98 [Polly][MatMul] Abandon dependence analysis.
The copy statements inserted by the matrix-multiplication optimization
introduce new dependencies between the copy statements and other
statements. As a result, the DependenceInfo must be recomputed.

Not recomputing them caused IslAstInfo to deduce that some loops are
parallel but cause race conditions when accessing the packed arrays.
As a result, matrix-matrix multiplication currently cannot be
parallelized.

Also see discussion at https://reviews.llvm.org/D125202
2022-06-29 17:20:05 -05:00
Arthur Eubanks
c80b88ee29 [polly] #include <algorithm>
For the usage of std::max in the header.

Speculative fix for
https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8810806780048763729/overview
reported in https://reviews.llvm.org/D125263.
2022-06-21 13:27:55 -07:00
Guillaume Chatelet
6e930503f4 [NFC][polly] Removed dead code 2022-06-13 07:50:35 +00:00
Yang Keao
02f640672e [Polly] Migrate -polly-mse to the new pass manager.
This patch implements the `MaximalStaticExpansion` and its printer in NPM.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D125870
2022-06-01 13:37:58 -05:00
Michael Kruse
bd93df937a [Polly] Mark classes as final by default. NFC.
This make is obivious that a class was not intended to be derived from.

NPM analysis pass can unfortunately not marked as final because they are
derived from a llvm::Checker<T> template internally by the NPM.

Also normalize the use of classes/structs
 * NPM passes are structs
 * Legacy passes are classes
 * structs that have methods and are not a visitor pattern are classes
 * structs have public inheritance by default, remove "public" keyword
 * Use typedef'ed type instead of inline forward declaration
2022-05-17 12:05:39 -05:00
Simon Pilgrim
3cc2c7deed [polly] Remove 'using namespace llvm/polly' from ScopGraphPrinter.h header.
As mentioned on D123678 this appears to be causing namespace resolution issues on some versions of gcc.
2022-05-16 16:19:03 +01:00
Michael Kruse
6b3b87376b [polly] migrate -polly-show to the new pass manager
Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D123678
2022-05-09 14:04:29 -05:00
Michael Kruse
5c02808131 [polly] Introduce -polly-print-* passes to replace -analyze.
The `opt -analyze` option only works with the legacy pass manager and might be removed in the future, as explained in llvm.org/PR53733. This patch introduced -polly-print-* passes that print what the pass would print with the `-analyze` option and replaces all uses of `-analyze` in the regression tests.

There are two exceptions: `CodeGen\single_loop_param_less_equal.ll` and `CodeGen\loop_with_condition_nested.ll` use `-analyze on the `-loops` pass which is not part of Polly.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D120782
2022-03-14 10:27:15 -05:00
Michael Kruse
ad84c6f657 [polly] Match function definitions and header declarations. NFC.
Ensure that function definitions match their declrations in header
files, even if they have no effect on linking. This includes

 1. Both have the same __isl_* annotations

 2. Both use the same type alias

 3. Remove unused declarations that have no definition

 4. Use explicit polly namespace qualifier for definitions; generally,
    the .cpp file should use at most an anon namespace region since
    only symbols declared in the header file can be accessed from other
    translation units anyway. For defintions that have been declared in
    the header file, the explicit namespace qualifier ensures that both
    match.
2022-02-16 12:52:17 -06:00
serge-sans-paille
e188aae406 Cleanup header dependencies in LLVMCore
Based on the output of include-what-you-use.

This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/

I've tried to summarize the biggest change below:

- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h

And the usual count of preprocessed lines:
$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after:  6189948

200k lines less to process is no that bad ;-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup

Differential Revision: https://reviews.llvm.org/D118652
2022-02-02 06:54:20 +01:00
Roman Lebedev
82fb4f4b22
[SCEV] Sequential/in-order UMin expression
As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692,
SCEV is forbidden from reasoning about 'backedge taken count'
if the branch condition is a poison-safe logical operation,
which is conservatively correct, but is severely limiting.

Instead, we should have a way to express those
poison blocking properties in SCEV expressions.

The proposed semantics is:
```
Sequential/in-order min/max SCEV expressions are non-commutative variants
of commutative min/max SCEV expressions. If none of their operands
are poison, then they are functionally equivalent, otherwise,
if the operand that represents the saturation point* of given expression,
comes before the first poison operand, then the whole expression is not poison,
but is said saturation point.
```
* saturation point - the maximal/minimal possible integer value for the given type

The lowering is straight-forward:
```
compare each operand to the saturation point,
perform sequential in-order logical-or (poison-safe!) ordered reduction
over those checks, and if reduction returned true then return
saturation point else return the naive min/max reduction over the operands
```
https://alive2.llvm.org/ce/z/Q7jxvH (2 ops)
https://alive2.llvm.org/ce/z/QCRrhk (3 ops)
Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS
Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97

That allows us to handle the patterns in question.

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D116766
2022-01-10 20:51:26 +03:00
Kazu Hirata
b12fd13812 Fix bugprone argument comments.
Identified by bugprone-argument-comment.
2022-01-09 12:21:02 -08:00
Kazu Hirata
fb7cf90071 Use nullptr instead of 0 or NULL (NFC)
Identified with modernize-use-nullptr.
2022-01-07 10:17:29 -08:00
Michael Kruse
937b00ab2c [Polly][SchedOpt] Account for prevectorization of multiple statements.
A prevectorized loop may contain multiple statements, in which case
isl_schedule_node_band_sink will sink the vector band to multiple
leaves. Instead of statically assuming a specific tree structure after
sinking, add a SIMD marker to all inner bands.

Fixes llvm.org/PR52637
2021-12-23 14:06:41 -06:00
Riccardo Mori
44596fe6a9 [Polly][Isl] Use the function unsignedFromIslSize to manage a isl::size object. NFCI
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in lib/External/isl/include/isl/isl-noxceptions.h and the official isl C++ interface.
In the official interface the type `isl::size` cannot be casted to an unsigned without previously having checked if it contains a valid value with the function `isl::size::is_error()`.
For this reason two helping functions have been added:
 - `IslAssert`: assert that no errors are present in debug builds and just disables the mandatory error check in non-debug builds
 - `unisgnedFromIslSIze`: cast the `isl::size` object to `unsigned`

Changes made:
 - Add the functions `IslAssert` and `unsignedFromIslSize`
 - Add the utility function `rangeIslSize()`
 - Retype `MaxDisjunctsInDomain` from `int` to `unsigned`
 - Retype `RunTimeChecksMaxAccessDisjuncts` from `int` to `unsigned`
 - Retype `MaxDimensionsInAccessRange` from `int` to `unsigned`
 - Replaced some usages of `isl_size` to `unsigned` since we aim not to use `isl_size` anymore
 - `isl-noexceptions.h` has been generated by e704f73c88

No functional change intended.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D113101
2021-11-05 11:15:22 +01:00
Michael Kruse
19db33c06e [Polly] Remove support for code generated by gfortran+DragonEgg.
DragonEgg is not maintained anymore, hence there is no need for this
functionality.

Fixes llvm.org/PR52173
2021-10-14 14:12:06 -05:00
Michael Kruse
ec2029f986 [Polly] Do not inline dumpIslObj methods. NFC.
Instead of being inline and having a neverCalled() workaround to make it
work in the debugger, define it as a regular exported function.

Also add overloads for the C API types isl_* so it works with managed as
well as unmanaged ISL objects.
2021-10-12 23:52:36 -05:00
Michael Kruse
64489255be [Polly] Add greedy fusion algorithm.
When the option -polly-loopfusion-greedy is set, the ScheduleOptimizer
tries to aggressively fuse any band it can and does not violate any
dependences.

As part if the implementation, the functionalty for copying a band
into an new schedule was extracted out of the ScheduleTreeRewriter.
2021-10-08 20:33:30 -05:00
Michael Kruse
027c036663 [Polly] Reject regions entered by an indirectbr/callbr.
SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.

This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.

This fixes llvm.org/PR51964

Recommit with "REQUIRES: asserts" in test that uses statistics.
2021-09-27 18:49:11 -05:00
Haowei Wu
283ed7de32 Revert "[Polly] Reject reject regions entered by an indirectbr/callbr."
This reverts commit 91f46bb77e6d56955c3b96e9e844ae6a251c41e9 which
causes test failures when assertions are off.
2021-09-27 16:05:33 -07:00
Michael Kruse
91f46bb77e [Polly] Reject reject regions entered by an indirectbr/callbr.
SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.

This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.

This fixes llvm.org/PR51964
2021-09-26 21:21:50 -05:00
Michael Kruse
d5c87162db [Polly] Use VirtualUse to determine references.
VirtualUse ensures consistency over different source of values with
Polly. In particular, this enables its use of instructions moved between
Statement. Before the patch, the code wrongly assumed that the BB's
instructions are also the ScopStmt's instructions. Reference are
determined for OpenMP outlining and GPGPU kernel extraction.

GPGPU CodeGen had some problems. For one, it generated GPU kernel
parameters for constants. Second, it emitted GPU-side invariant loads
which have already been loaded by the host. This has been partially
fixed, it still generates a store for the invariant load result, but
using the value that the host has already written.

WARNING: I did not test the generated PollyACC code on an actual GPU.

The improved consistency will be made use of in the next patch.
2021-09-26 03:26:43 -05:00
Michael Kruse
1cea25eec9 [Polly] Remove isConstCall.
The function was intended to catch OpenMP functions such as
get_thread_id(). If matched, the call would be considered synthesizable.

There were a few problems with this:

 * get_thread_id() is not 'const' in the sense of have the gcc manual
   defines it: "do not examine any values except their arguments".
   get_thread_id() reads OpenCL runtime libreary global state.
   What was inteded was probably 'speculable'.

 * isConstCall was implemented using mayReadOrWriteMemory(). 'const' is
   stricter than that, mayReadOrWriteMemory is e.g. true for malloc(),
   since it may only read/write addresses that are considered
   inaccessible fro the application. However, malloc is certainly not
   speculable.

 * Values that are isConstCall were not handled consistently throughout
   Polly. In particular, it was not considered for referenced values
   (OpenMP outlining and PollyACC).

Fix by removing special handling for isConstCall entirely.
2021-09-26 03:26:43 -05:00
Michael Kruse
e470f9268a [Polly] Implement user-directed loop distribution/fission.
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.

It interprets the same metadata as the LoopDistribute pass.

Re-apply after revert in c7bcd72a38bcf99e03e4651ed5204d1a1f2bf695 with
fix: Take isBand out of #ifndef NDEBUG since it now is used
unconditionally.
2021-09-23 21:11:01 -05:00
Petr Hosek
c7bcd72a38 Revert "[Polly] Implement user-directed loop distribution/fission."
This reverts commit 52c30adc7dfe6334b71adf256d81f70e7b976143 which
breaks the build when NDEBUG is defined.
2021-09-23 14:04:25 -07:00
Michael Kruse
52c30adc7d [Polly] Implement user-directed loop distribution/fission.
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.

It interprets the same metadata as the LoopDistribute pass.
2021-09-22 17:28:25 -05:00
Michael Kruse
cad9f98a2a [Polly] Don't generate inter-iteration noalias metadata.
This metadata was intended to mark all accesses within an iteration to be pairwise non-aliasing, in this case because every memory of a base pointer is touched (read or write) at most once. This is typical for 'sweeps' over all data. The stated motivation from D30606 is to ensure that unrolled iterations are considered non-aliasing.

Rhe implemention had multiple issues:

 * The structure of the noalias metadata was malformed. D110026 added check in the verifier for this metadata, and the tests were failing since then.

 * This is not true for the outer loops of the BLIS matrix multiplication, where it was being inserted. Each element of A, B, C is accessed multiple times, as often as the loop not used as an index is iterating.

 * Scopes were added to SecondLevelOtherAliasScopeList (used for the !noalias scop list) on-the-fly when another SCEV was seen. This meant that previously visited instructions would not be updated with alias scopes that are only seen later, missing out those SCEVs they should not be aliasing with.

 * Since the !noalias scope list would ideally consists of all other SCEV for this base pointer, we might run quickly into scalability issues. Especially after unrolling there would probably at least once SCEV per instruction and unroll instance.

 * The inter-iteration noalias base pointer was not removed after leaving the loop marked with it, effectively marking everything after it to noalias as well.

A solution I considered was to mark each instruction as non-aliasing with its own scope. The instruction itself would obviously alias itself, but such construction might also be considered invalid. Duplicating the instruction (e.g. due to speculation) would mark the instruction non-aliasing with its clone. I don't want to go into this territory, especially since the original motivation of determining unrolled instances as noalias based on SCEV is the what scev-aa does as well.

This effectively reverts D30606 and D35761.
2021-09-20 22:20:17 -05:00
Michael Kruse
c62d9a5ca0 [Polly] Use subtyped isl::schedule_nodes for ScheduleTreeVisitor. NFC.
Change pass-by-const-ref to pass-by-value as objects are recreated
due to custom up-/down-casting anwyway.
2021-08-31 20:54:12 -05:00