Summary:
This is a standalone tool that does the wrapper stage of the
`clang-linker-wrapper`. We want this to be an external tool because
currently there's no easy way to split apart what the
clang-linker-wrapper is doing under the hood. With this tool, users can
manually extract files with `clang-offload-packager`, feed them through
`clang --target=<triple>` and then use this tool to generate a `.bc`
file they can give to the linker. The goal here is to make reproducing
the linker wrapper steps easier.
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
spirv-sim was supposed to be used to test cross-lane interactions. This
utility was in the end never used for testing, and as we move to proper
end-to-end testing through the llvm/offload-test-suite project, this
becomes obsolete.
Cleaning this up.
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
This commit adds a new pass gate that allows selective disabling
of one or more passes via the clang command line using the
`-opt-disable` option. Passes to be disabled should be specified as a
comma-separated list of their names.
The implementation resides in the same file as the bisection tool. The
`getGlobalPassGate()` function returns the currently enabled gate.
Example: `-opt-disable="PassA,PassB"`
Pass names are matched using case-insensitive comparisons. However, note
that special characters, including spaces, must be included exactly as
they appear in the pass names.
Additionally, a `-opt-disable-enable-verbosity` flag has been introduced to
enable verbose output when this functionality is in use. When enabled,
it prints the status of all passes (either running or NOT running),
similar to the default behavior of `-opt-bisect-limit`. This flag is
disabled by default, which is the opposite of the `-opt-bisect-verbose`
flag (which defaults to enabled).
To validate this functionality, a test file has also been provided. It reuses
the same infrastructure as the opt-bisect test, but disables three
specific passes and checks the output to ensure the expected behavior.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
This PR exposes the backend pass config to plugins via a callback.
Plugin authors can register a callback that is being triggered before
the target backend adds their passes to the pipeline. In the callback
they then get access to the `TargetMachine`, the `PassManager`, and the
`TargetPassConfig`. This allows plugins to call
`TargetPassConfig::insertPass`, which is honored in the subsequent
`addPass` of the main backend. We implemented this using the legacy pass
manager since backends still use it as the default.
Expose the FatLTO pipeline via `-passes="fatlto-pre-link<Ox>"`, similar
to all the other optimization pipelines. This is to allow reproducing it
outside clang. (Possibly also useful for C API users.)
This pass figures out whether inlining has exposed a constant address to
a lowered type test, and remove the test if so and the address is known
to pass the test. Unfortunately this pass ends up needing to reverse
engineer what LowerTypeTests did; this is currently inherent to the design
of ThinLTO importing where LowerTypeTests needs to run at the start.
Reviewers: teresajohnson
Reviewed By: teresajohnson
Pull Request: https://github.com/llvm/llvm-project/pull/141327
If a hot callsite function is not inlined in the 1st build, inlining the
hot callsite in pre-link stage of SPGO 2nd build may lead to Function
Sample not found in profile file in link stage. It will miss some
profile info.
ThinLTO has already considered and dealed with it by setting
HotCallSiteThreshold to 0 to stop the inline. This patch just adds the
same processing for FullLTO.
After https://github.com/llvm/llvm-project/pull/131217 was submitted,
time-passes.ll fails because `opt` prints `-time-report` when
`ManagedTimerGlobals` is destroyed. `ManagedTimerGlobals` stores
`TimerGroup`s in an unordered map, so the ordering of the output
`TimerGroup`s depends on the underlying iterator.
To fix this, we do what Clang does and use
`llvm::TimerGroup::printAll(...)`, which *is* deterministic. This is
also what Clang does. This does put move analysis section before the
pass section for `-time-report`, but again, this is also what Clang
currently does.
Inspired by PR #127944, this patch adds an option to print profile metadata inline with respect to the instruction (or function) it annotates - this saves one time from having to search up and down large textual modules to find this info.
ThinLTO delays handling of coroutines to ThinLTO backend.
However it's usually possible to use ThinLTO prelink objects for FullLTO.
In this case we have left-over coroutines which crash in codegen.
Issue #104525.
Added an extension point after vectorizer passes in the PassBuilder.
Additionally, added extension points before and after vectorizer passes
in `buildLTODefaultPipeline`. Credit goes to @mshockwave for guiding me
through my first LLVM contribution (and my first open source
contribution in general!) :)
- Implemented `registerVectorizerEndEPCallback`
- Implemented `invokeVectorizerEndEPCallbacks`
- Added `VectorizerEndEPCallbacks` SmallVector
- Added a command line option `passes-ep-vectorizer-end` to
`NewPMDriver.cpp`
- `buildModuleOptimizationPipeline` now calls
`invokeVectorizerEndEPCallbacks`
- `buildO0DefaultPipeline` now calls `invokeVectorizerEndEPCallbacks`
- `buildLTODefaultPipeline` now calls BOTH
`invokeVectorizerStartEPCallbacks` and `invokeVectorizerEndEPCallbacks`
- Added LIT tests to `new-pm-defaults.ll`, `new-pm-lto-defaults.ll`,
`new-pm-O0-ep-callbacks.ll`, and `pass-pipeline-parsing.ll`
- Renamed `CHECK-EP-Peephole` to `CHECK-EP-PEEPHOLE` in
`new-pm-lto-defaults.ll` for consistency.
This code is intended for developers that wish to implement and run
custom passes after the vectorizer passes in the PassBuilder pipeline.
For example, in #91796, a pass was created that changed the induction
variables of vectorized code. This is right after the vectorization
passes.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
[PassBuilder] Add RelLookupTableConverterPass to LTO
This patch adds RelLookupTableConverterPass into the LTO
post-link optimization pass pipeline. This optimization
converts lookup tables to relative lookup tables to make
them PIC-friendly, which is already included in the non-LTO
pass pipeline. This patch adds this optimization to the
post-link optimization pipeline to discover more
opportunities in the LTO context.
print-after-all is useful for diffing IR between two passes. When one of
the two is a function pass, and the other is a loop pass, the diff
becomes useless. Add an option which prints the entire function for loop
passes.
Original commit: eb63cd62a4a1907dbd58f12660efd8244e7d81e9
Previously reverted due to non-negligible compile-time impact in
stage1-ReleaseLTO-g scenario. The issue has been addressed by
always reusing previously computed MemorySSA results, and request
new ones only when `isMemorySSAEnabled` is set.
Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
Follow up to PR118508, to avoid unnecessary compile time for an empty
combind regular LTO module if all modules end up being ThinLTO only.
This required minor changes to a few tests to ensure they weren't empty.
Add support for enabling sample loader pass in O0 mode(under
`-fsample-profile-use`). This can help verify PGO raw profile count
quality or provide a more accurate performance proxy(predictor), as O0
mode has minimal or no compiler optimizations that might otherwise
impact profile count accuracy.
- Explicitly disable the sample loader inlining to ensure it only emits
sampling annotation.
- Use flattened profile for O0 mode.
- Add the pass after `AddDiscriminatorsPass` pass to work with
`-fdebug-info-for-profiling`.
As defined in LangRef, branching on `undef` is undefined behavior.
This PR aims to remove undefined behavior from tests. As UB tests break
Alive2 and may be the root cause of breaking future optimizations.
Here's an Alive2 proof for one of the examples:
https://alive2.llvm.org/ce/z/TncxhP
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more
constants to be propagated. We also run PostOrderFunctionAttrs to
improve the information available to ArgumentPromotion's alias analysis,
and SROA to clean up allocas.
Relands #111163.
This patch is inspired by @Snowy1803 excellent work in swift and the
patch: https://github.com/swiftlang/swift/pull/73334/files
Add an instrumentation pass to llvm to collect dropped debug information
variable statistics for every Function-level and Module-level IR pass.
This patch creates adds the class DroppedVariableStats which iterates
over every DbgRecord in a function or module before and after an
optimization pass and counts the number of variables who's debug
information has been dropped due to that pass, then prints that output
to stdout in a csv format.
I ran this patch on optdriver.cpp can see:
Pass Name, Dropped Variables
'InstCombinePass', 1
'SimplifyCFGPass', 6
'JumpThreadingPass', 25
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more
constants to be propagated. We also run PostOrderFunctionAttrs to
improve the information available to ArgumentPromotion's alias analysis,
and SROA to clean up allocas.
Fixes a crash when using `-filter-print-funcs` with
`-ir-dump-directory`. A quick reproducer on trunk (also included as a
test):
```ll
; opt -passes=no-op-function -print-after=no-op-function -filter-print-funcs=nope -ir-dump-directory=somewhere test.ll
define void @test() {
ret void
}
```
[Compiler Explorer](https://godbolt.org/z/sPErz44h4)