These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
When compiling in `--hipstdpar` mode, the builtins corresponding to the
standard library might end up in code that is expected to execute on the
accelerator (e.g. by using the `std::` prefixed functions from
`<cmath>`). We do not have uniform handling for this in AMDGPU, and the
errors that obtain are quite arcane. Furthermore, the user-space changes
required to work around this tend to be rather intrusive.
This patch adds an additional `--hipstdpar` specific pass which forwards
to the run time component of HIPSTDPAR the intrinsics / libcalls which
result from the use of the math builtins, and which are not properly
handled. In the long run we will want to stop relying on this and handle
things in the compiler, but it is going to be a rather lengthy journey,
which makes this medium term escape hatch necessary.
The paired change in the run time component is here
<https://github.com/ROCm/rocThrust/pull/551>.
Adding 2 passes, one to inject `MD_prof` and one to check its presence. A subsequent patch will add these (similar to debugify) to `opt` (and, eventually, a variant of this, to `llc`)
Tracking issue: #147390
same as https://github.com/llvm/llvm-project/pull/139517
This replaces the InvalidateAnalysisPass<MachineFunctionAnalysis> pass.
There are no cross-function analysis requirements right now, so clearing
all analyses works for the last pass in the pipeline.
Having the InvalidateAnalysisPass<MachineFunctionAnalysis>() is causing
a problem with ModuleToCGSCCPassAdaptor by deleting machine functions
for other functions and ending up with exactly one correctly compiled
MF, with the rest being vanished.
This is because ModuleToCGSCCPAdaptor propagates PassPA (received from
the CGSCCToFunctionPassAdaptor that runs the actual codegen pipeline on
MFs) to the next SCC. That causes MFA invalidation on functions in the
next SCC.
For us, PassPA happens to be returned from
invalidate<machine-function-analysis> which abandons the
MachineFunctionAnalysis. So while the first function runs through the
pipeline normally, invalidate also deletes the functions in the next SCC
before its pipeline is run. (this seems to be the intended mechanism of
the CG adaptor to allow cross-SCC invalidations.
Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
This commit adds a new pass gate that allows selective disabling
of one or more passes via the clang command line using the
`-opt-disable` option. Passes to be disabled should be specified as a
comma-separated list of their names.
The implementation resides in the same file as the bisection tool. The
`getGlobalPassGate()` function returns the currently enabled gate.
Example: `-opt-disable="PassA,PassB"`
Pass names are matched using case-insensitive comparisons. However, note
that special characters, including spaces, must be included exactly as
they appear in the pass names.
Additionally, a `-opt-disable-enable-verbosity` flag has been introduced to
enable verbose output when this functionality is in use. When enabled,
it prints the status of all passes (either running or NOT running),
similar to the default behavior of `-opt-bisect-limit`. This flag is
disabled by default, which is the opposite of the `-opt-bisect-verbose`
flag (which defaults to enabled).
To validate this functionality, a test file has also been provided. It reuses
the same infrastructure as the opt-bisect test, but disables three
specific passes and checks the output to ensure the expected behavior.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
This will be useful for testing the set of calls for different systems,
and eventually the product of context specific modifiers applied. In
the future we should also know the type signatures, and be able to
emit the correct one.
As mentioned in https://github.com/llvm/llvm-project/pull/145071,
LoopInterchange should be part of the optimization pipeline rather than
the simplification pipeline. This patch moves LoopInterchange into the
optimization pipeline.
More contexts:
- By default, LoopInterchange attempts to improve data locality,
however, it also takes increasing vectorization opportunities into
account. Given that, it is reasonable to run it as close to
vectorization as possible.
- I looked into previous changes related to the placement of
LoopInterchange, but couldn’t find any strong motivation suggesting that
it benefits other simplifications.
- As far as I tried some tests (including llvm-test-suite), removing
LoopInterchange from the simplification pipeline does not affect other
simplifications. Therefore, there doesn't seem to be much value in
keeping it there.
- The new position reduces compile-time for ThinLTO, probably because it
only runs once per function in post-link optimization, rather than both
in pre-link and post-link optimization.
I haven't encountered any cases where the positional difference affects
optimization results, so please feel free to revert if you run into any issues.
There are a handful of passes in PassRegistry.def with outdated or
missing pass options. These strings describing pass options are used for
the printPassNames() function only, which is likely why they have gotten
out-of-date without being caught. This MR simply changes the few passes
where the option string is out-of-date, fixing the output of
-print-passes. This does not affect functionality of the pipeline
parser, and is hard to verify in a unit test, so no tests were added.
Changes to scale opcodes, types and args once in `IR2VecVocabAnalysis` so that we can avoid scaling each time while computing embeddings. This PR refactors the vocabulary to explicitly define 3 sections---Opcodes, Types, and Arguments---used for computing Embeddings.
(Tracking issue - #141817 ; partly fixes - #141832)
Change `ModulePass::skipModule` to take const Module reference.
Additionally, make `OptPassGate::shouldRunPass` const as well as for
most implementations it's a const query. For `OptBisect`, make
`LastBisectNum` mutable so it could be updated in `shouldRunPass`.
Additional minor cleanup: Change all StringRef arguments to simple
StringRef (no const or reference), change `OptBisect::Disabled` to
constexpr.
Expose the FatLTO pipeline via `-passes="fatlto-pre-link<Ox>"`, similar
to all the other optimization pipelines. This is to allow reproducing it
outside clang. (Possibly also useful for C API users.)
The entry count of a function needs to be updated after a callsite is elided by TRE: before elision, the entry count accounted for the recursive call at that callsite. After TRE, we need to remove that callsite's contribution.
This patch enables this for instrumented profiling cases because, there, we know the function entry count captured entries before TRE. We cannot currently address this for sample-based (because we don't know whether this function was TRE-ed in the binary that donated samples)
Make HashRecognize a non-PassManager analysis that can be called to get
the result on-demand, creating a new getResult() entry-point. The issue
was discovered when attempting to use the analysis to perform a
transform in LoopIdiomRecognize.
This pass figures out whether inlining has exposed a constant address to
a lowered type test, and remove the test if so and the address is known
to pass the test. Unfortunately this pass ends up needing to reverse
engineer what LowerTypeTests did; this is currently inherent to the design
of ThinLTO importing where LowerTypeTests needs to run at the start.
Reviewers: teresajohnson
Reviewed By: teresajohnson
Pull Request: https://github.com/llvm/llvm-project/pull/141327
Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Currently if there's any memory access that AccessAnalysis couldn't
analyze then all of the runtime pointer check results are discarded.
This patch makes this able to be controlled with the AllowPartial
option, which makes it so we generate the runtime check information
for those pointers that we could analyze, as transformations may still
be able to make use of the partial information.
Of the transformations that use LoopAccessAnalysis, only
LoopVersioningLICM changes behaviour as a result of this change. This is
because the others either:
* Check canVectorizeMemory, which will return false when we have partial
pointer information as analyzeLoop() will return false.
* Examine the dependencies returned by getDepChecker(), which will be
empty as we exit analyzeLoop if we have partial pointer information
before calling areDepsSafe(), which is what fills in the dependency
information.
Introduce a fresh analysis for recognizing polynomial hashes, with the
rationale that several targets have specific instructions to optimize
things like CRC and GHASH (eg. X86 and RISC-V crypto extension). We
limit the scope to polynomial hashes computed in a Galois field of
characteristic 2, since this class of operations can also be optimized
in the absence of target-specific instructions to use a lookup table.
At the moment, we only recognize the CRC algorithm.
RFC:
https://discourse.llvm.org/t/rfc-new-analysis-for-polynomial-hash-recognition/86268
When we enable EVL-based loop vectorization w/ predicated tail-folding,
each vectorized loop has effectively two induction variables: one
calculates the step using (VF x vscale) and the other one increases the
IV by values returned from experiment.get.vector.length. The former,
also known as canonical IV, is more favorable for analyses as it's
"countable" in the sense of SCEV; the latter (EVL-based IV), however, is
more favorable to codegen, at least for those that support scalable
vectors like AArch64 SVE and RISC-V.
The idea is that we use canonical IV all the way until the end of all
vectorizers, where we replace it with EVL-based IV using EVLIVSimplify
introduced here. Such that we can have the best from both worlds.
This Pass is enabled by default in RISC-V. However, since we haven't
really vectorize loops with predicate tail-folding by default, this Pass
is no-op at this moment.
This adds a GISelValueTrackingPrinterPass that can print the known bits
and sign bit of each def in a function. It is built on the new pass
manager and so adds a NPM GISelValueTrackingAnalysis, renaming the older
class to GISelValueTrackingAnalysisLegacy.
The first 2 functions from the AArch64GISelMITest are ported over to an
mir test to show it working. It also runs successfully on all files in
llvm/test/CodeGen/AArch64/GlobalISel/*.mir that are not invalid. It can
hopefully be used to test GlobalISel known bits analysis more directly
in common cases, without jumping through the hoops that the C++ tests
requires.
`DXILResourceBindingAnalysis` analyses explicit resource bindings in the
module and puts together lists of used virtual register spaces and
available virtual register slot ranges for each binding type. It also
stores additional information found during the analysis such as whether
the module uses implicit bindings or if any of the bindings overlap.
This information will be used in `DXILResourceImplicitBindings` pass
(coming soon) to assign register slots to resources with implicit
bindings, and in a post-optimization validation pass that will raise
diagnostic about overlapping bindings.
Part 1/2 of #136786
In this change, NVPTX AA is moved before Basic AA to potentially improve
compile time. Additionally, it introduces a flag in the
`ExternalAAWrapper` that allows other backends to run their
target-specific AA passes before Basic AA, if desired.
The change works for both New Pass Manager and Legacy Pass Manager.
Original implementation by Princeton Ferro <pferro@nvidia.com>
/llvm-project/llvm/lib/Passes/PassBuilder.cpp:1508:2:
error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]
1508 | };
| ^
1 error generated.
- Add new pass manager version of `MachineUniformityAnalysis `.
- Query `TargetTransformInfo` in new pass manager version.
- Use `printAsOperand` when printing machine function name