The patch moves out of SCEV's scope so they can be re-used for SCEVUse.
SCEVUse gets an additional getNoWrapFlags helper that returns the union
of the expressions SCEV flags and the use-specific flags.
SCEVExpander has been updated to use this new helper.
In order to avoid other changes, the original names are exposed via
constexpr in SCEV. Not sure if there's a nicer way. One alternative
would be to define the enum in struct, and have SCEV inherit from it.
The patch also clarifies that the SCEVUse flags encode NUW/NSW, and
hides getInt, setInt, etc to avoid potential mis-use.
PR: https://github.com/llvm/llvm-project/pull/190199
The code that emits the conditions for whether a statement is executed
by checking whether we are in the statement's domain may apply
assumptions (such as an integer truncation being reversible). Later code
then assumes that these assumptions are only relevent for then the
statement is executed, but actually it is used for determining whether
it is executed.
Break this circular reasoning by introducing an `IsInsideDomain` flag
that can be set when the domain has not been verified yet.
Fixes#190128
Thanks to @thapgua for the bug report
The conversion of SCEVs to isl::pw_aff may only be valid under
conditions that have to be confirmed via RTC. This also happens with
__builtin_assume. These user-added assumptions are then added to
ScopInfo::Context. However, the conclusion in ScopInfo::Context is then
also used to simplify ("gist") its own RTC preconditions in
ScopInfo::AssumedContext and ScopInfo::InvalidContext away.
Avoid by adding user assumptions with preconditions to
ScopInfo::DefinedBehaviourContext instead, which is not used to simplify
AssumedContext/InvalidContext.
Fixes#187922
Thanks @thapgua for the report
#184545 default-enables the IO sandbox in assert-builds. This causes
Clang using Polly to crash (#188568).
The issue is that `PassBuilder` uses `vfs::getRealFileSystem()` by
default which is considered a IO sandbox violation in the Clang process.
With this PR store the VFS from the `PassBuilder` from the original
`registerPollyPasses` call for creating other `PassBuilder` instances.
This PR also adds infrastructure for running Polly in `clang` (in
addition in `opt`). `opt` does not enable the sandbox such that we need
separate tests using Clang.
Closes: #188568
Replace the DenseMap from blocks to their innermost loop a vector
indexed by block numbers, when possible. Supporting number updates is
not trivial as we don't store a list of basic blocks, so this is not
implemented.
NB: I'm generally not happy with the way loops are stored. As I think
that there's room for improvement, I don't want to touch the
representation at this point.
Pull Request: https://github.com/llvm/llvm-project/pull/103400
There's no point constructing a dominator tree or similar on
known-broken IR. Generally, functions should be able to assume that IR
is valid (i.e., passes the verifier). Users of this "feature" were:
- Verifier, fixed by verifying existence of terminators first.
- FuzzMutate, worked around by temporarily inserting terminators.
- OpenMP to run analyses while building the IR, worked around by
temporarily inserting terminators.
- Polly to work with an empty dominator tree, fixed by temporarily
adding an unreachable inst.
- MergeBlockIntoPredecessor, inadvertently, fixed by adding terminator
before updating MemorySSA.
- Some sloppily written unit tests.
`DT` is always the analysis for the to-be-optimized function while
`GenDT` is the analysis of the function that we currently generate code
for, which can also be an outlined function. Here, we want to check
dominance in the generated code, hence we must use `GenDT`.
#179433 already fixed the same issue for `BlockGenerator`. The same
pattern is used in `RegionGenerator` which is fixed here. A good
argument to avoid code duplication.
Fixes: #185313
Thanks to @jaschiu for the bug report and reproducer
This patch implements PCH support. PCH is enabled by default, unless
noted below, and can be disabled with
-DCMAKE_DISABLE_PRECOMPILE_HEADERS=ON.
* Libraries can define precompiled headers using a newly added
PRECOMPILE_HEADERS keyword. If specified, the listed headers will be
compiled into a pre-compiled header using standard CMake mechanisms.
* Libraries that don't define their own PRECOMPILE_HEADERS but directly
depend on a library or component that defines its own PCH will reuse
that PCH. This reuse is not transitive to prevent excessive use of
unrelated headers. If multiple dependencies provide a reusable PCH, the
first one with the longest dependency chain (stored in the CMake target
property LLVM_PCH_PRIORITY) is used. However, due to CMake limitations,
only PCH from targets that are already defined can be reused; therefore
libraries that should reuse a PCH must be defined later in the CMake
file (=> add_subdirectory order matters).
* Libraries and executables can prevent PCH reuse with the keyword
DISABLE_PCH_REUSE. This both prevents reuse from dependencies and reuse
by other dependants. This is useful when, e.g., internal headers are
used in the PCH or the used headers are unlikely to provide benefits for
dependants.
* Precompiled headers are only used for C++ sources, not for C.
* With GCC, PCH provide very little benefits (tested with GCC 14 and 15)
due to increased template instatiation costs, but substantially increase
max-rss and build directory size. Therefore, disable PCH with GCC by
default; this can be explicitly overridden on the command line with
-DCMAKE_DISABLE_PRECOMPILE_HEADERS=OFF.
* With ccache and non-Clang compilers, changes in macro definitions are
not always accurately forwarded with ccache's preprocessed mode. To be
on the safe side, when ccache is enabled, disable PCH with all non-Clang
compilers; this can be explicitly overridden.
* With sccache, changes in macro definitions are not identified, which
in some cases can lead to false positive cache hits. Conservatively
disable PCH with sccache by default.
* Add a base PCH to LLVMSupport, which includes widely used standard
library and Support+ADT headers. The pch.h is placed in include so that
later PCH headers can extend that list of headers.
* Flang PCH use is ported to the general mechanism.
Addition of PCH headers for other components (e.g., IR, CodeGen) will be
posted as separate PRs.
RFC:
https://discourse.llvm.org/t/rfc-use-pre-compiled-headers-to-speed-up-llvm-build-by-1-5-2x/89345
The old NPM was using ScopInfo pass introduced in
https://reviews.llvm.org/D20962, which in contrast to the LPM was using
ScopInfoRegionPass. ScopInfo was instantiating all Scop objects
immediately. After codegenning, all Scop objects need to be recomputed
anyway, making this approach wastful. The PhaseManager inherited this
behaviour from the NPM, leading to some concerns.
Replace the instantiate-all behavior of ScopInfo with an on-demand
instantiation. SCoPs now must be iterated using ScopDetection instead
using ScopInfo, but only some unsed legacy NPM passes (now removed) were
doing that anyway.
`opt -passes=polly-custom<detect>`, or `stopafter=detect` would still
run the ScopInfo analysis even though it should run when explicitly
enabled or required by another phase.
`Constant::isZeroValue` currently behaves same as
`Constant::isNullValue` for all types except floating-point, where it
additionally returns true for negative zero (`-0.0`). However, in
practice, almost all callers operate on integer/pointer types where the
two are equivalent, and the few FP-relevant callers have no meaningful
dependence on the `-0.0` behavior.
This PR removes `isZeroValue` to eliminate the confusing API. All
callers are changed to `isNullValue` with no test failures.
`isZeroValue` will be reintroduced in a future change with clearer
semantics: when null pointers may have non-zero bit patterns,
`isZeroValue` will check for bitwise-all-zeros, while `isNullValue` will
check for the semantic null (which
may be non-zero).
Update isl to include
https://repo.or.cz/isl.git/commit/fc484e004200964f8f18249de1f510393ec924a9
which fixes#180000.
The isl update also fixes#34710 which had the same cause but with an
empty access domain (#180000 has an empty statement domain). Thus we
also revert 163cacb46960be4dd0d8562737bbf0ea97cb14ad which now only adds
unnecessary overhead.
A regression test has been added to isl which is why we do not add a
test in Polly.
Fixes: #180000
Thanks @skimo-openhub for the fix and @thapgua for the bugreport.
`DT` is always the analysis for the to-be-optimized function while
`GenDT` is the analysis of the function that we currently generate code
for which can also be an outlined function. Here, we want to check
dominance in the generated code, hence we must use `GenDT`.
Fixes: #179135
Polly currently does not consider types without fixed length, which can
be encountered if an input source uses e.g. ARM SVE builtins. Such
programs have already been optimized manually. Non-fixed type lengths
also add to the difficulty of dependency analysis. Skip such types
entirely for now.
Fixes: #177859
Fixes: #177527
Updated test cases:
* CodeGen/OpenMP/matmul-parallel.ll, ScheduleOptimizer/pattern-matching-based-opts.ll
Before the update, ISL bailed out the dependency computation due to
hitting the max operation limit. The commit
https://repo.or.cz/isl.git/commit/4bdfe2567715c5d1a8287c07d8685eb3db281e32
seems to have reduced the complexity needed of the dependency
computation, thus now being able to recognize some loops as parallel.
The tests were checking that the outer loop is not parallel, but some
inner loops can be parallized, particularly the array packing loops.
* DeLICM/reduction_looprotate_hoisted.ll
changes in how isl generates expressions
* ScheduleOptimizer/pattern-matching-based-opts_5.ll
changes in how isl generates expressions, and AST node changes
This patch avoids assertion failures by ensuring a null pointer check is
performed before accessing the object's size.
Note: The corresponding test case remains too large even after
reduction, so it has not been included in this patch.
Fixes#174147
Allow the main llvm-project repository to contain the buildbot builder
instructions, instead of storing them in llvm-zorg. The corresponding
llvm-zorg PR is https://github.com/llvm/llvm-zorg/pull/648.
Using polly-x86_64-linux-test-suite as a proof-of-concept because that
builder is currently offline, I am its maintainer, and is easier to
build than an configuration supporting offload. Once the design has been
decided, more builders can follow.
Advantages are:
* It is easier to make changes in the llvm-project repository. There are
more reviewers than for the llvm-zorg repository.
* Buildbot changes can be made in the same PR with changes that require
updating the buildbot, e.g. changing the name of a CMake option.
* Configuration changes take effect immeditately when landing; no
buildbot master restart needed.
* Some builders store a CMake cache file in the llvm-project repository
for the reasons above. However, the number of changes that can be made
with a CMake cache file alone are limited.
Compared to AnnotatedBuilder, advantages are:
* Reproducing a buildbot configuration locally made easy: just execute
the script in-place. No llvm-zorg, local buildbot worker, or buildbot
master needed.
* Same for testing a change of a builder before landing it in llvm-zorg.
Doing so with an AnnotatedBuilder requires two llvm-zorg checkouts:
One for making the change of the builder script itself, which then is
pushed to a private llvm-zorg branch on GitHub, and a second that is
modified to fetch that branch instead of
https://github.com/llvm/llvm-zorg/tree/main.
* The AnnotatedBuilder scripts are located in the llvm-zorg repository
and the buildbot-workers always checkout is always the top-of-trunk.
This means that a buildbot configuration is split over three checkouts:
* The checkout of llvm-project to be tested
* The checkout of llvm-zorg by the buildbot-worker fetches; always the
top-of-trunk, i.e may not match the revision of llvm-project that is
executed (such as the CMake cache files located there), especially when
using the "Force build" feature.
* The checkout of llvm-zorg that the buildbot-master is running, which
is updated only when the master is manually restarted.
* The "Force Build" feature also allows for test-building any
llvm-project PR. This is correctly handled by zorg's
`addGetSourcecodeSteps`, but does not work with AnnotatedBuilders that
checkout the llvm-project source on their own.
The goal is to move as much as possible into the llvm-project repository
such that there cannot be a mismatch between checkouts of different
repositories. Ideally, the buildbot-master only needs to be
updated+restarted for adding/removing workers, not for build
configuration changes.
---------
Co-authored-by: Jan Patrick Lehr <jp.lehr@gmail.com>
This avoid pulling in the entire Passes library with all passes as
dependencies when just referring to PassPlugin, which is in fact
independent of the Passes themselves.
Pull Request: https://github.com/llvm/llvm-project/pull/173279
This avoid pulling in the entire Passes library with all passes as
dependencies when just referring to PassPlugin, which is in fact
independent of the Passes themselves.
Pull Request: https://github.com/llvm/llvm-project/pull/172478
clean up interface of getIndexExpressionsFromGEP to get SCEV expressions
instead of int for Sizes of the arrays.
This intends to simplify the code in #156342 by avoiding conversions
from SCEV to int and back to SCEV.
PR #125442 replaces the pass-based Polly architecture with a monolithic
pass consisting of phases. Reasons listed in
https://github.com/llvm/llvm-project/pull/125442.
With this change, the SCoP-passes became redundant problematic versions
of the same functionality and are removed.
Reapply of a22d1c2225543aa9ae7882f6b1a97ee7b2c95574. Using this PR for
pre-merge CI.
Instead of relying on any pass manager to schedule Polly's passes, add
Polly's own pipeline manager which is seen as a monolithic pass in
LLVM's pass manager. Polly's former passes are now phases of the new
PhaseManager component.
Relying on LLVM's pass manager (the legacy as well as the New Pass
Manager) to manage Polly's phases never was a good fit that the
PhaseManager resolves:
* Polly passes were modifying analysis results, in particular RegionInfo
and ScopInfo. This means that there was not just one unique and
"definite" analysis result, the actual result depended on which analyses
ran prior, and the pass manager was not allowed to throw away cached
analyses or prior SCoP optimizations would have been forgotten. The LLVM
pass manger's persistance of analysis results is not contractual but
designed for caching.
* Polly depends on a particular execution order of passes and regions
(e.g. regression tests, invalidation of consecutive SCoPs). LLVM's pass
manager does not guarantee any excecution order.
* Polly does not completely preserve DominatorTree, RegionInfo,
LoopInfo, or ScalarEvolution, but only as-needed for Polly's own uses.
Because the ScopDetection object stores references to those analyses, it
still had to lie to the pass manager that they would be preserved, or
the pass manager would have released and recomputed the invalidated
analysis objects that ScopDetection/ScopInfo was still referencing. To
ensure that no non-Polly pass would see these not-completely-preserved
analyses, all analyses still had to be thrown away after the
ScopPassManager, respectively with a BarrierNoopPass in case of the LPM.
* The NPM's PassInstrumentation wraps the IR unit into an `llvm::Any`
object, but implementations such as PrintIRInstrumentation call
llvm_unreachable on encountering an unknown IR unit, such as SCoPs, with
no extension points to add support. Hence LLVM crashes when dumping IR
between SCoP passes (such as `-print-before-changed` with Polly being
active).
The new PhaseManager uses some command line options that previously
belonged to Polly's legacy passes, such as `-polly-print-detect` (so the
option will continue to work). Hence the LPM support is incompatible
with the new approach and support for it is removed.
When Polly generates a false runtime condition (RTC), the associated
Polly generated loop is never executed and is eventually eliminated. As
a result, the fallback loop becomes the default execution path.
Disabling vectorization for this fallback loop will be
counterproductive. This patch ensures that vectorization is only
disabled when the RTC is not false (no Codegen failure).