This patch does two things:
- During deserialization, we accept a function name as an alternative
to the usual GUID represented as a hexadecimal number.
- During serialization, we print a GUID as a 16-digit hexadecimal
number prefixed with 0x in the usual way. (Without this patch, we
print a decimal number, which is not customary.)
In YAML, the MemProf profile is a vector of pairs of GUID and
MemProfRecord. This patch accepts a function name for the GUID, but
it does not accept a function name for the GUID used in Frames yet.
That will be addressed in a subsequent patch.
When we call IndexedMemProfRecord::toMemProfRecord, we care about
getting the original (that is, non-indexed) MemProfRecord back, so we
should just verify that, not the hash values, which are
intermediaries.
There is a remote possibility of hash collisions where call stack
{F1, F2} might come back as {F1, F1} if F1.hash() == F2.hash() for
example. However, since FrameId uses BLAKE, the hash values should be
consistent across architectures. That is, if this test case works on
one architecture, it should work on others as well.
IndexedMemProfData eliminates the need for the "using" directives.
Also, we do not need to declare maps for individual components of the
MemProf profile.
This patch replaces FrameIdMap and CallStackIdMap with
IndexedMemProfData, which comes with recently introduced methods like
addFrame and addCallStack.
This patch adds a helper function to replace an idiom like:
CallStackId CSId = hashCallStack(CallStack)
MemProfData.CallStacks.try_emplace(CSId, CallStack);
// Do something with CSId.
For Frames, we prefer the inline notation for the brevity.
For PortableMemInfoBlock, we go through all member fields and print
out those that are populated.
This iteration works around the unavailability of
ScalarTraits<uintptr_t> on macOS.
For Frames, we prefer the inline notation for the brevity.
For PortableMemInfoBlock, we go through all member fields and print
out those that are populated.
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because we do not
have a text format for the MemProf profile. We would write a test
case in C++, run it for a binary MemProf profile, and then finally run
a test written in LLVM IR with the binary profile.
This patch paves the way toward YAML-based MemProf profile.
Specifically, it adds new class YAMLMemProfReader derived from
MemProfReader. For now, it only adds a function to parse StringRef
pointing to YAML data. Subseqeunt patches will wire it to
llvm-profdata and read from a file.
The field names are based on various printYAML functions in MemProf.h.
I'm not aiming for compatibility with the format used in printYAML,
but I don't see a point in changing the field names.
This iteration works around the unavailability of
ScalarTraits<uintptr_t> on macOS.
This reverts commit c00e53208db638c35499fc80b555f8e14baa35f0.
It looks like this breaks building LLVM on macOS and some other
platform/compiler combos
https://lab.llvm.org/buildbot/#/builders/23/builds/5252https://green.lab.llvm.org/job/llvm.org/job/clang-san-iossim/5356/console
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:34:
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MemProfReader.h:24:
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfReader.h:22:
In file included from /Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/InstrProfCorrelator.h:21:
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:1173:36: error: implicit instantiation of undefined template 'llvm::yaml::MissingTrait<unsigned long>'
char missing_yaml_trait_for_type[sizeof(MissingTrait<T>)];
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:961:7: note: in instantiation of function template specialization 'llvm::yaml::yamlize<unsigned long>' requested here
yamlize(*this, Val, Required, Ctx);
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:883:11: note: in instantiation of function template specialization 'llvm::yaml::IO::processKey<unsigned long, llvm::yaml::EmptyContext>' requested here
this->processKey(Key, Val, true, Ctx);
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/ProfileData/MIBEntryDef.inc:55:1: note: in instantiation of function template specialization 'llvm::yaml::IO::mapRequired<unsigned long>' requested here
MIBEntryDef(AccessHistogram = 27, AccessHistogram, uintptr_t)
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/lib/ProfileData/MemProfReader.cpp:77:8: note: expanded from macro 'MIBEntryDef'
Io.mapRequired(KeyStr.str().c_str(), MIB.Name); \
^
/Users/ec2-user/jenkins/workspace/llvm.org/clang-san-iossim/llvm-project/llvm/include/llvm/Support/YAMLTraits.h:310:8: note: template is declared here
struct MissingTrait;
^
1 error generated.
This patch adds YAML-based deserialization for MemProf profile.
It's been painful to write tests for MemProf passes because we do not
have a text format for the MemProf profile. We would write a test
case in C++, run it for a binary MemProf profile, and then finally run
a test written in LLVM IR with the binary profile.
This patch paves the way toward YAML-based MemProf profile.
Specifically, it adds new class YAMLMemProfReader derived from
MemProfReader. For now, it only adds a function to parse StringRef
pointing to YAML data. Subseqeunt patches will wire it to
llvm-profdata and read from a file.
The field names are based on various printYAML functions in MemProf.h.
I'm not aiming for compatibility with the format used in printYAML,
but I don't see a point in changing the field names.
IndexedMemProfRecord contains a complete package of the MemProf
profile, including frames, call stacks, and records. This patch
replaces the three member variables of MemProfReader with
IndexedMemProfRecord.
This transition significantly simplies both the constructor and the
final "take" method:
MemProfReader(IndexedMemProfData MemProfData)
: MemProfData(std::move(MemProfData)) {}
IndexedMemProfData takeMemProfData() { return std::move(MemProfData); }
CallStackRadixTreeBuilder::build takes the parameter
MemProfFrameIndexes by value, involving copies:
std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>>
MemProfFrameIndexes
Then "build" makes another copy of MemProfFrameIndexe and passes it to
encodeCallStack for every call stack, which is painfully slow.
This patch changes the type to a pointer so that we don't have to make
a copy every time we pass the argument.
Without this patch, it takes 553 seconds to run "llvm-profdata merge"
on a large MemProf raw profile. This patch shortenes that down to 67
seconds.
This patch updates a unit test to construct MemProfReader with
IndexedMemProfData, a complete package of MemProf profile.
With this change, nobody in the LLVM codebase is using the
MemProfReader constructor that takes individual components of the
MemProf profile, so this patch deprecates the constructor.
Prepare for usage in the bitcode reader/writer where we already have a
LinearFrameId:
- templatize input frame id type in CallStackRadixTreeBuilder
- templatize input frame id type in computeFrameHistogram
- make the map from FrameId to LinearFrameId optional
We plan to use the same radix format in the ThinLTO summary records,
where we already have a LinearFrameId.
This patch adds a new constructor to MemProfReader that takes
IndexedMemProfData, a complete package of MemProf profile. To
showcase its usage, I'm updating one of the unit tests to use the new
constructor.
Because of type mismatches between DenseMap and MapVector, I'm copying
Frames and CallStacks for now. Once we remove the methods and old
constructors that take or return individual components (frames, call
stacks, and records), we will drop the copying, and the new
constructor will collapse down to:
MemProfReader(IndexedMemProfData MemProfData)
: MemProfData(std::move(MemProfData)) {}
Since nobody in the LLVM codebase uses the constructor that takes the
three indivdual components, I'm deprecating the old constructor.
This patch adds another constructor to IndexedAllocationInfo that is
identical to the existing constructor except that the new one leaves
the CallStack field empty.
I'm planning to remove MemProf format Version 1. Then we will migrate
the users of the existing constructor to the new one as nobody will be
using the CallStack field anymore.
Adding the new constructor now allows us to migrate a few existing
users of the old constructor even before we remove the CallStack
field. In turn, that simplifies the patch to actually remove the
field.
This patch removes MemProf format Version 0 now that version 2 and 3
seem to be working well.
I'm not touching version 1 for now because some tests still rely on
version 1.
Note that Version 0 is identical to Version 1 except that the MemProf
section of the indexed format has a MemProf version field.
raw_string_ostream::flush() is essentially a no-op (also specified in docs).
Don't call it in tests that aren't meant to test 'raw_string_ostream' itself.
p.s. remove a few redundant calls to raw_string_ostream::str()
Adds compile time flag -mllvm -memprof-histogram and runtime flag
histogram=true|false to turn Histogram collection on and off. The
-memprof-histogram flag relies on -memprof-use-callbacks=true to work.
Updates shadow mapping logic in histogram mode from having one 8 byte
counter for 64 bytes, to 1 byte for 8 bytes, capped at 255. Only
supports this granularity as of now.
Updates the RawMemprofReader and serializing MemoryInfoBlocks to binary
format, including changing to a new version of the raw binary format
from version 3 to version 4.
Updates creating MemoryInfoBlocks with and without Histograms. When two
MemoryInfoBlocks are merged, AccessCounts are summed up and the shorter
Histogram is removed.
Adds a memprof_histogram test case.
Initial commit for adding AccessCountHistograms up until RawProfile for
memprof
We call llvm::sort in a couple of places in the V3 encoding:
- We sort Frames by FrameIds for stability of the output.
- We sort call stacks in the dictionary order to maximize the length
of the common prefix between adjacent call stacks.
It turns out that we can improve the deserialization performance by
modifying the comparison functions -- without changing the format at
all. Both places take advantage of the histogram of Frames -- how
many times each Frame occurs in the call stacks.
- Frames: We serialize popular Frames in the descending order of
popularity for improved cache locality. For two equally popular
Frames, we break a tie by serializing one that tends to appear
earlier in call stacks. Here, "earlier" means a smaller index
within llvm::SmallVector<FrameId>.
- Call Stacks: We sort the call stacks to reduce the number of times
we follow pointers to parents during deserialization. Specifically,
instead of comparing two call stacks in the strcmp style -- integer
comparisons of FrameIds, we compare two FrameIds F1 and F2 with
Histogram[F1] < Histogram[F2] at respective indexes. Since we
encode from the end of the sorted list of call stacks, we tend to
encode popular call stacks first.
Since the two places use the same histogram, we compute it once and
share it in the two places.
Sorting the call stacks reduces the number of "jumps" by 74% when we
deserialize all MemProfRecords. The cycle and instruction counts go
down by 10% and 1.5%, respectively.
If we sort the Frames in addition to the call stacks, then the cycle
and instruction counts go down by 14% and 1.6%, respectively, relative
to the same baseline (that is, without this patch).
This patch integrates CallStackRadixTreeBuilder into the V3 format,
reducing the profile size to about 27% of the V2 profile size.
- Serialization: writeMemProfCallStackArray just needs to write out
the radix tree array prepared by CallStackRadixTreeBuilder.
Mappings from CallStackIds to LinearCallStackIds are moved by new
function CallStackRadixTreeBuilder::takeCallStackPos.
- Deserialization: Deserializing a call stack is the same as
deserializing an array encoded in the obvious manner -- the length
followed by the payload, except that we need to follow a pointer to
the parent to take advantage of common prefixes once in a while.
This patch teaches LinearCallStackIdConverter to how to handle those
pointers.
Call stacks are a huge portion of the MemProf profile, taking up 70+%
of the profile file size.
This patch implements a radix tree to compress call stacks, which are
known to have long common prefixes. Specifically,
CallStackRadixTreeBuilder, introduced in this patch, takes call stacks
in the MemProf profile, sorts them in the dictionary order to maximize
the common prefix between adjacent call stacks, and then encodes a
radix tree into a single array that is ready for serialization.
The resulting radix array is essentially a concatenation of call stack
arrays, each encoded with its length followed by the payload, except
that these arrays contain "instructions" like "skip 7 elements
forward" to borrow common prefixes from other call stacks.
This patch does not integrate with the MemProf
serialization/deserialization infrastructure yet. Once integrated,
the radix tree is expected to roughly halve the file size of the
MemProf profile.
CallStackIdConverter sets LastUnmappedId when a mapping failure
occurs. Now, since toMemProfRecord takes an instance of
CallStackIdConverter by value, namely std::function, the caller of
toMemProfRecord never receives the mapping failure that occurs inside
toMemProfRecord. The same problem applies to FrameIdConverter.
The patch fixes the problem by passing FrameIdConverter and
CallStackIdConverter by reference, namely llvm::function_ref.
While I am it, this patch deletes the copy constructor and copy
assignment operator to avoid accidental copies.
commit 4c8ec8f8bc3fb4dda4fd36c3b2ad745bd3451970
Author: Kazu Hirata <kazu@google.com>
Date: Wed Apr 24 16:25:35 2024 -0700
introduced the idea of serializing/deserializing a subset of the
fields in PortableMemInfoBlock. While it reduces the size of the
indexed MemProf profile file, we now could inadvertently access
unavailable fields and go without noticing.
To protect ourselves from the risk, this patch adds access checks to
PortableMemInfoBlock::get* methods by embedding a bit set representing
available fields into PortableMemInfoBlock.
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places. This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.
The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions. This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.
This iteration fixes a problem uncovered with ubsan, where we were
dereferencing an uninitialized std::unique_ptr.
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places. This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.
The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions. This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.
This patch enables users of MemProfReader to directly supply mappings
from CallStackId to actual call stacks.
Once the users of the current constructor without CSIdMap switch to
the new constructor, we'll have fewer users of:
- IndexedAllocationInfo::CallStack
- IndexedMemProfRecord::CallSites
bringing us one step closer to the removal of these fields in favor
of:
- IndexedAllocationInfo::CSId
- IndexedMemProfRecord::CallSiteIds
We are in the process of referring to call stacks with CallStackId in
IndexedMemProfRecord and IndexedAllocationInfo instead of holding call
stacks inline (both in memory and the serialized format). Doing so
deduplicates call stacks and reduces the MemProf profile file size.
Before we can eliminate the two fields holding call stacks inline:
- IndexedAllocationInfo::CallStack
- IndexedMemProfRecord::CallSites
we need to eliminate all the read operations on them.
This patch is a step toward that direction. Specifically, we
eliminate the read operations in the context of MemProfReader and
RawMemProfReader. A subsequent patch will eliminate the read
operations during the serialization.
This patch renames RawMemProfReader.{cpp,h} to MemProfReader.{cpp,h},
respectively. Also, it re-creates RawMemProfReader.h just to include
MemProfReader.h for compatibility with out-of-tree users.
I'm currently developing a new version of the indexed memprof format
where we deduplicate call stacks in IndexedAllocationInfo::CallStack
and IndexedMemProfRecord::CallSites. We refer to call stacks with
integer IDs, namely CallStackId, just as we refer to Frame with
FrameId. The deduplication will cut down the profile file size by 80%
in a large memprof file of mine.
As a step toward the goal, this patch teaches
IndexedMemProfRecord::{serialize,deserialize} to speak Version2. A
subsequent patch will add Version2 support to llvm-profdata.
The essense of the patch is to replace the serialization of a call
stack, a vector of FrameIDs, with that of a CallStackId. That is:
const IndexedAllocationInfo &N = ...;
...
LE.write<uint64_t>(N.CallStack.size());
for (const FrameId &Id : N.CallStack)
LE.write<FrameId>(Id);
becomes:
LE.write<CallStackId>(N.CSId);
The indexed MemProf file has a huge amount of redundancy. In a large
internal application, 82% of call stacks, stored in
IndexedAllocationInfo::CallStack, are duplicates.
We should work toward deduplicating call stacks by referring to them
with unique IDs with actual call stacks stored in a separate data
structure, much like we refer to memprof::Frame with memprof::FrameId.
At the same time, we need to facilitate a graceful transition from the
current version of the MemProf format to the next. We should be able
to read (but not write) the current version of the MemProf file even
after we move onto the next one.
With those goals in mind, I propose to have an integer ID next to
CallStack in IndexedAllocationInfo to refer to a call stack in a
succinct manner. We'll gradually increase the areas of the compiler
where IDs and call stacks have one-to-one correspondence and
eventually remove the existing CallStack field.
This patch adds call stack ID, named CSId, to IndexedAllocationInfo
and teaches the raw profile reader to compute unique call stack IDs
and store them in the new field. It does not introduce any user of
the call stack IDs yet, except in verifyFunctionProfileData.
GNU addr2line supports lookup by symbol name in addition to the existing
address lookup. llvm-symbolizer starting from
e144ae54dcb96838a6176fd9eef21028935ccd4f supports lookup by symbol name.
This change extends this lookup with possibility to specify optional
offset.
Now the address for which source information is searched for can be
specified with offset:
llvm-symbolize --obj=abc.so "SYMBOL func_22+0x12"
It decreases the gap in features of llvm-symbolizer and GNU addr2line.
This lookup now is supported for code only.
Migrated from: https://reviews.llvm.org/D139859
Pull request: https://github.com/llvm/llvm-project/pull/75067
Recent versions of GNU binutils starting from 2.39 support symbol+offset
lookup in addition to the usual numeric address lookup. This change adds
symbol lookup to llvm-symbolize and llvm-addr2line.
Now llvm-symbolize behaves closer to GNU addr2line, - if the value specified
as address in command line or input stream is not a number, it is treated as
a symbol name. For example:
llvm-symbolize --obj=abc.so func_22
llvm-symbolize --obj=abc.so "CODE func_22"
This lookup is now supported only for functions. Specification with
offset is not supported yet.
This is a recommit of 2b27948783e4bbc1132d3220d8517ef62607b558, reverted
in 39fec5457c0925bd39f67f63fe17391584e08258 because the test
llvm/test/Support/interrupts.test started failing on Windows. The test was
changed in 18f036d0105589c3175bb51a518c5d272dae61e2 and is also updated in
this commit.
Differential Revision: https://reviews.llvm.org/D149759