Not sure why the "fold-all" option naming didn't match the
variable "FoldPreOutputs", but I've preserved the difference.
More annoyingly, the pass name "normalize" does not match the pass
name IRNormalizer and should probably be fixed one way or the other.
Also the existing test coverage for the flags is lacking. I've added
a test that shows they parse, but we should have tests that they
do something.
This patch does two things:
1. It implements an ephemeral values cache analysis pass that collects the ephemeral values of a function and caches them for fast lookups. The collection of the ephemeral values is done lazily when the user calls `EphemeralValuesCache::ephValues()`.
2. It adds caching of ephemeral values using the `EphemeralValuesCache` to speed up `CallAnalyzer::analyze()`. Without caching this can take a long time to run in cases where the function contains a large number of `@llvm.assume()` calls and a large number of callsites. The time is spent in `collectEphemeralvalues()`.
For the NewPM, the merge-const option was assigned to an unused
option field. Assign it to the correct one. The merge-const-aggressive
option was not supported -- and invalid options were silently ignored.
Accept it and error on invalid options.
For the LegacyPM, the corresponding cl::opt options were ignored when
called via opt rather than llc.
This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR
expansion for frem instruction" which implements the expansion of
another instruction in this pass. The more general name seems more
appropriate given this change and quite reasonable even without it.
EnableTailMerge is false by default and is handled by the pass builder.
Passes are independent of target pipeline options.
This completes the generic `MachineLateOptimization` passes for the NPM
pipeline.
Use `-passes="regallocgreedy<[all|sgpr|wwm|vgpr]>` to insert the greedy
RA with a filter and `-regalloc-npm=<type>` to control which RA to use
in existing pipeline.
Similar to #117309.
The advisor and logger are accessed through the provider, which is
served by the new PM. Legacy PM forwards calls to the provider.
New PM is a machine function analysis that lazily initializes the
provider.
Legacy pass used to provide the advisor, so this extracts that logic
into a provider class used by both analysis passes.
All three (Default, Release, Development) legacy passes
`*AdvisorAnalysis` are basically renamed to `*AdvisorProvider`, so the
actual legacy wrapper passes are `*AdvisorAnalysisLegacy`.
There is only one NPM analysis `RegAllocEvictionAnalysis` that switches
between the three providers in the `::run` method, to be cached by the
NPM.
Also adds `RequireAnalysis<RegAllocEvictionAnalysis>` to the optimized
target reg alloc codegen builder.
When using FatLTO, it is common to want to enable certain types of whole
program optimizations (WPD) or security transforms (CFI), so that they
can be made available when performing LTO. However, these transforms
should not be used when compiling the non-LTO object code. Since the
frontend must emit different IR, we cannot simply clone the module and
optimize the LTO section and non-LTO section differently to work around
this. Instead, we need to remove any problematic instruction sequences.
This patch adds a new pass whose responsibility is to clean up the IR
in the FatLTO pipeline after creating the bitcode section, which is
after running the pre-link pipeline but before running module
optimization. This allows us to safely drop any conflicting instructions
or IR constructs that are inappropriate for non-LTO compilation.
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.
Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.
---------
Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
This patch implements an LLVM IR pass, named kernel-info, that reports
various statistics for codes compiled for GPUs. The ultimate goal of
these statistics to help identify bad code patterns and ways to mitigate
them. The pass operates at the LLVM IR level so that it can, in theory,
support any LLVM-based compiler for programming languages supporting
GPUs. It has been tested so far with LLVM IR generated by Clang for
OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs.
By default, the pass runs at the end of LTO, and options like
``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang`
command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include
summary statistics (e.g., total size of static allocas) and individual
occurrences (e.g., source location of each alloca). Examples of its
output appear in tests in `llvm/test/Analysis/KernelInfo`.
Original commit: eb63cd62a4a1907dbd58f12660efd8244e7d81e9
Previously reverted due to non-negligible compile-time impact in
stage1-ReleaseLTO-g scenario. The issue has been addressed by
always reusing previously computed MemorySSA results, and request
new ones only when `isMemorySSAEnabled` is set.
Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>
This is glue code to convert LowerAllowCheckPass from a FUNCTION_PASS to
FUNCTION_PASS_WITH_PARAMS. The parameters are currently unused.
Future work will plumb `-fsanitize-skip-hot-cutoff` (introduced in
https://github.com/llvm/llvm-project/pull/121619) to
LowerAllowCheckOptions.
And use that as an argument for
allow_ubsan_check when needed.
Other ubsan checks use SanitizerKind,
but those are known to the clang only.
So make it a parameter in LLVM.