In certain cases the context/size info we use for reporting of hinted
bytes in the LTO link was being dropped when we re-constructed context
tries and memprof metadata after inlining. This only affected cases
where we were using the -memprof-min-percent-max-cold-size option to
only keep that information for the largest cold contexts, and where the
pre-LTO compile did *not* specify -memprof-report-hinted-sizes.
The issue is that we don't have a MaxSize, which is only available
during the profile matching step. Use an existing bool indicating that
we are redoing this from existing metadata to always propagate any
context size metadata in that case.
Reapply llvm/llvm-project#157204 with fix and a new test for the issue
it caused (the test change provoked the assert that was converted to an
if condition).
Also, make the application of this new attribute under an (on by
default) flag, so that it can be more easily disabled if needed. Add
test for the new flag.
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.
While doing this, I noticed some other variables in the global namespace
and moved them as well.
To help track allocations that we matched with memprof profiles but
for which we weren't able to disambiguate the different hotness
contexts, apply an "ambiguous" memprof attribute to all allocations with
matched profiles. These will be replaced if we can identify a single
hotness type, possibly after cloning.
Eventually we plan to translate this to a special hotness hint on the
allocation call.
When we rebuild the call site tries after inlining of an allocation with
MD_memprof metadata, we don't want to reapply the discarding of small
non-cold contexts (under -memprof-callsite-cold-threshold=) because we
have either no context size info (without -memprof-report-hinted-sizes
or another option that causes us to keep that as metadata), and even
with that information in the metadata, we have imperfect information at
that point as we have already discarded some contexts during matching.
The first case was even worse because we didn't guard our check by
whether the number of cold bytes was 0, leading to very aggressive
pruning during post-inline metadata rebuilding without the context size
information.
Reapply PR142507 with fix for test: add in the same x86_64-linux
requirement as other tests as the stack ids are currently computed
differently on big endian systems. This will be investigated separately.
In order to allow selective reporting of context hinting during the LTO
link, and in the future to allow selective more aggressive cloning, add
an option to specify a minimum percent of the max cold size in the
profile summary. Contexts that meet that threshold will get context size
info metadata (and ThinLTO summary information) on the associated
allocations.
Specifying -memprof-report-hinted-sizes during the pre-LTO compile step
will continue to cause all contexts to receive this metadata. But
specifying -memprof-report-hinted-sizes only during the LTO link will
cause only those that meet the new threshold and have the metadata to
get reported.
To support this, because the alloc info summary and associated bitcode
requires the context size information to be in the same order as the
other context information, 0s are inserted for contexts without this
metadata. The bitcode writer uses a more compact format for the context
ids to allow better compression of the 0s.
As part of this change several helper methods are added to query whether
metadata contains context size info on any or all contexts.
In order to allow selective reporting of context hinting during the LTO
link, and in the future to allow selective more aggressive cloning, add
an option to specify a minimum percent of the max cold size in the
profile summary. Contexts that meet that threshold will get context size
info metadata (and ThinLTO summary information) on the associated
allocations.
Specifying -memprof-report-hinted-sizes during the pre-LTO compile step
will continue to cause all contexts to receive this metadata. But
specifying -memprof-report-hinted-sizes only during the LTO link will
cause only those that meet the new threshold and have the metadata to
get reported.
To support this, because the alloc info summary and associated bitcode
requires the context size information to be in the same order as the
other context information, 0s are inserted for contexts without this
metadata. The bitcode writer uses a more compact format for the context
ids to allow better compression of the 0s.
As part of this change several helper methods are added to query whether
metadata contains context size info on any or all contexts.
The context disambiguation code already emits remarks when hinting
allocations (by adding hotness attributes) during cloning. However,
we did not yet emit hints when applying the hotness attributes during
building of the metadata (during matching and again after inlining).
Add remarks when we apply the hint attributes for these
non-context-sensitive allocations.
This patch adds support for a basic MemProf summary section, which is
built along with the indexed MemProf profile (e.g. when reading the raw
or YAML profiles), and serialized through the indexed profile just after
the header.
Currently only 6 fields are written, specifically the number of contexts
(total, cold, hot), and the max context size (cold, warm, hot).
To support forwards and backwards compatibility for added fields in the
indexed profile, the number of fields serialized first. The code is
written to support forwards compatibility (reading newer profiles with
additional summary fields), and comments indicate how to implement
backwards compatibility (reading older profiles with fewer summary
fields) as needed.
Support is added to print the summary as YAML comments when displaying
both the raw and indexed profiles via `llvm-profdata show`. Because they
are YAML comments, the YAML reader ignores these (the summary is always
recomputed when building the indexed profile as described above).
This necessitated moving some options and a couple of interfaces out of
Analysis/MemoryProfileInfo.cpp and into the new
ProfileData/MemProfSummary.cpp file, as we need to classify context
hotness earlier and also compute context ids to build the summary from
older indexed profiles.
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Analysis` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
The bulk of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The following manual adjustments were also applied after running IDS on
Linux:
- Add `#include "llvm/Support/Compiler.h"` to files where it was not
auto-added by IDS due to no pre-existing block of include statements.
- Add `LLVM_TEMPLATE_ABI` and `LLVM_EXPORT_TEMPLATE` to exported
instantiated templates
- Add `LLVM_ABI` to a subset of private class methods and fields that
require export
- Add `LLVM_ABI` to a small number of symbols that require export but
are not declared in headers
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
Adds a new option -memprof-callsite-cold-threshold that allows
specifying a percent that will cause non-cold contexts to be discarded
if the percent cold bytes at a callsite including that context exceeds
the given threshold. Default is 100% (no discarding).
This reduces the amount of cloning needed to expose cold allocation
contexts when parts of the context are dominantly cold.
This motivated the change in PR138792, since discarding a context might
require a different decision about which not-cold contexts must be kept
to expose cloning requirements, so we need to determine that on the fly.
Additionally, this required a change to include the context size
information in the alloc trie in more cases, so we now guard the
inclusion of this information in the generated metadata on the option
values.
The restructuring of the context pruning patch in PR138792
(764614e6355e214c6b64c715d105007b1a4b97fd) introduced a bug under the
non-default -memprof-keep-all-not-cold-contexts handling.
Added more testing of this mode which would have caught the issue.
While here, fix the newly added function name to match code style.
This change is mostly NFC, other than the addition of a new message
printed when contexts are pruned when -memprof-report-hinted-sizes is
enabled.
To prepare for a follow on change, adjust the way we determine which
NotCold contexts can be pruned (because they overlap with longer NotCold
contexts), and change the way we perform this pruning.
Instead of determining the points at which we need to keep NotCold
contexts during the building of the trie, we now determine this on the
fly as the MIB metadata nodes are recursively built. This simplifies a
follow on change that performs additional pruning of some NotCold
contexts, and which can affect which others need to be kept as the
longest overlapping NotCold contexts.
We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.
Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.
For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. To address this,
any hot contexts are converted to notcold contexts immediately after
first checking for unambiguous allocation types, and before checking it
again and before adding metadata while performing context trimming.
Note that hot hints are now disabled by default, however, this avoids
adding unnecessary overhead if they are re-enabled.
By default we were marking some contexts as hot, and adding hot hints to
unambiguously hot allocations. However, there is not yet support for
cloning to expose hot allocation contexts, and none is planned for the
forseeable future.
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. This change simply
disables the generation of hot contexts / hints by default, as few
allocations were unambiguously hot.
A subsequent change will address the issue when hot hints are optionally
enabled. See PR124219 for details.
This change resulted in significant overhead reductions for a large
target:
~48% reduction in the per-module ThinLTO bitcode summary sizes
~72% reduction in the distributed ThinLTO bitcode combined summary sizes
~68% reduction in thin link time
~34% reduction in thin link peak memory
This patch fixes a couple of places where memprof-related metadata
(!memprof and !callsite) were being dropped, and one place where PGO
metadata (!prof) was being dropped.
All were due to instances of combineMetadata() being invoked. That
function drops all metadata not in the list provided by the client, and
also drops any not in its switch statement.
Memprof metadata needed a case in the combineMetadata switch statement.
For now we simply keep the metadata of the instruction being kept, which
doesn't retain all the profile information when two calls with
memprof metadata are being combined, but at least retains some.
For the memprof metadata being dropped during call CSE, add memprof and
callsite metadata to the list of known ids in combineMetadataForCSE.
Neither memprof nor regular prof metadata were in the list of known ids
for the callsite in MemCpyOptimizer, which was added to combine AA
metadata after optimization of byval arguments fed by memcpy
instructions, and similar types of optimizations of memcpy uses.
There is one other callsite of combineMetadata, but it is only invoked
on load instructions, which do not carry these types of metadata.
Emit message when we have aliased contexts that are conservatively
hinted not cold. This is not a change in behavior, just in message when
the -memprof-report-hinted-sizes flag is enabled.
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from the original
profile, similar to what we do when reporting matching statistics. This
will make it easier to correlate with the profile.
Note that the full context hash must be computed at profile match time
and saved in the metadata and summary, because we may trim the context
during matching when it isn't needed for distinguishing hotness.
Similarly, due to the context trimming, we may have more than one full
context id and total size pair per MIB in the metadata and summary,
which now get a list of these pairs.
Remove the old aggregate size from the metadata and summary support.
One other change from the prior support is that we no longer write the
size information into the combined index for the LTO backends, which
don't use this information, which reduces unnecessary bloat in
distributed index files.
createMIBNode does not modify MIBCallStack, so we can take ArrayRef
instead.
While I am at it, this patch changes the type of MIBPayload to
SmallVector. We put at most three elements, so we can avoid a heap
allocation.
This is the first step in being able to track the total profiled sizes
of allocations successfully marked as cold.
Under a new option -memprof-report-hinted-sizes:
- For unambiguous (non-context-sensitive) allocations, print the
profiled size and the allocation coldness, along with a hash of the
allocation's location (to allow for deduplication across modules or
inline instances).
- For context sensitive allocations, add the size as a 3rd operand on
the MIB metadata. A follow on patch will propagate this through to the
thin link where the sizes will be reported for each context after
cloning.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
53 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
Add "Hot" AllocationType (in addition to existing cold, notcold).
Use lifetime access density as metric to identify hot allocations.
Treat hot as notcold for MemProfContextDisambiguation for now
before the disambiguation for "hot" is done.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D149932
Now that the runtime tracks the lifetime access density directly, we can
use that directly in the threshold checks instead of less accurately
computing from other statistics.
Differential Revision: https://reviews.llvm.org/D149684
As suggested in D140908, make the hasSingleAllocType helper non-static
so that it can be used in other files. Add unit testing.
Differential Revision: https://reviews.llvm.org/D144318
This restores commit 98ed423361de2f9dc0113a31be2aa04524489ca9 and
follow on fix 00c22351ba697dbddb4b5bf0ad94e4bcea4b316b, which were
reverted in 5d938eb6f79b16f55266dd23d5df831f552ea082 due to an
MSVC bot failure. I've included a fix for that failure.
Differential Revision: https://reviews.llvm.org/D135714
This reverts commit 00c22351ba697dbddb4b5bf0ad94e4bcea4b316b.
This reverts commit 98ed423361de2f9dc0113a31be2aa04524489ca9.
Seemingly MSVC has some kind of issue with this patch, in terms of linking:
https://lab.llvm.org/buildbot/#/builders/123/builds/14137
I'll post more detail on D135714 momentarily.
This restores 47459455009db4790ffc3765a2ec0f8b4934c2a4, which was
reverted in commit 452a14efc84edf808d1e2953dad2c694972b312f, along with
fixes for a couple of bot failures.
Implements the ThinLTO summary support for memprof related metadata.
This includes support for the assembly format, and for building the
summary from IR during ModuleSummaryAnalysis.
To reduce space in both the bitcode format and the in memory index,
we do 2 things:
1. We keep a single vector of all uniq stack id hashes, and record the
index into this vector in the callsite and allocation memprof
summaries.
2. When building the combined index during the LTO link, the callsite
and allocation memprof summaries are only kept on the FunctionSummary
of the prevailing copy.
Differential Revision: https://reviews.llvm.org/D135714
Adds a number of utilities that are used to help create and update
memprof related metadata. These will be used during profile matching
and annotation, as well as by the inliner when updating the metadata.
Also adds unit tests for the utilities.
See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]
(Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.)
Depends on D128141.
Differential Revision: https://reviews.llvm.org/D128854