Note that the version of getValueProfDataFromInst that returns bool
has been "deprecated" since:
commit 1e15371dd8843dfc52b9435afaa133997c1773d8
Author: Mingming Liu <mingmingl@google.com>
Date: Mon Apr 1 15:14:49 2024 -0700
The new API getValueArrayForSite returns ArrayRef<InstrProfValueData>,
packaging the array length and contents together.
This patch sinks the array length checks just before we check the
contents. This way, we check both the array length and contents
immediately after calling getValueArrayForSite.
Without this patch, we call getValueForSite before veryfing that we
have an expected number of value sites with getNumValueSites.
This patch fixes the order by "sinking" the call to getValueForSite.
While I am at it, this patch migrates the use of getValueForSite to
getValueArrayForSite.
This patch migrates uses of getValueForSite to getValueArrayForSite.
Each hunk is self-contained, meaning that each one can be applied
independently of the others.
In the unit test, there are cases where the array length check is
performed a lot earlier than the array content check. For now, I'm
leaving the length checks where they are. I'll consider moving them
when I migrate uses of getNumValueDataForSite to getValueArrayForSite
in a follow-up patch.
Without this patch, a typical traversal over the value data looks
like:
uint32_t NV = Func.getNumValueDataForSite(VK, S);
std::unique_ptr<InstrProfValueData[]> VD = Func.getValueForSite(VK, S);
for (uint32_t V = 0; V < NV; V++)
Do something with VD[V].Value and/or VD[V].Count;
This patch adds getValueArrayForSite, which returns
ArrayRef<InstrProfValueData>, so we can do:
for (const auto &V : Func.getValueArrayForSite(VK, S))
Do something with V.Value and/or V.Count;
I'm planning to migrate the existing uses of getValueForSite to
getValueArrayForSite in follow-up patches and remove getValueForSite
and getNumValueDataForSite.
getValueForSite computes the total count -- the total number of times
a given value site is visited. The problem is that, excluding tests,
annotateValueSite is the only place that needs the total count.
This patch moves the total count computation to annotateValueSite.
commit 4c8ec8f8bc3fb4dda4fd36c3b2ad745bd3451970
Author: Kazu Hirata <kazu@google.com>
Date: Wed Apr 24 16:25:35 2024 -0700
introduced the idea of serializing/deserializing a subset of the
fields in PortableMemInfoBlock. While it reduces the size of the
indexed MemProf profile file, we now could inadvertently access
unavailable fields and go without noticing.
To protect ourselves from the risk, this patch adds access checks to
PortableMemInfoBlock::get* methods by embedding a bit set representing
available fields into PortableMemInfoBlock.
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places. This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.
The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions. This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.
This iteration fixes a problem uncovered with ubsan, where we were
dereferencing an uninitialized std::unique_ptr.
Currently, we convert FrameId to Frame and CallStackId to a call stack
at several places. This patch unifies those into function objects --
FrameIdConverter and CallStackIdConverter.
The existing implementation of CallStackIdConverter, being removed in
this patch, handles both FrameId and CallStackId conversions. This
patch splits it into two phases for flexibility (but make them
composable) because some places only require the FrameId conversion.
Curently, the compiler only uses several fields of MemoryInfoBlock.
Serializing all fields into the indexed MemProf file simply wastes
storage.
This patch limits the schema down to four fields for Version2 by
default. It retains the old behavior of serializing all fields via:
llvm-profdata merge --memprof-version=2 --memprof-full-schema
This patch reduces the size of the indexed MemProf profile I have by
40% (1.6GB down to 1.0GB).
This patch adds Version2 of the indexed MemProf format. The new
format comes with a hash table from CallStackId to actual call stacks
llvm::SmallVector<FrameId>. The rest of the format refers to call
stacks with CallStackId. This "values + references" model effectively
deduplicates call stacks. Without this patch, a large indexed memprof
file of mine shrinks from 4.4GB to 1.6GB, a 64% reduction.
This patch does not make Version2 generally available yet as I am
planning to make a few more changes to the format.
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)
* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
* Implement profile reader and writer support
* Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
* Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
* Indexed profile writer collects the list of vtable names, and stores that to index profiles.
* Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7
The indexed MemProf file has a huge amount of redundancy. In a large
internal application, 82% of call stacks, stored in
IndexedAllocationInfo::CallStack, are duplicates.
We should work toward deduplicating call stacks by referring to them
with unique IDs with actual call stacks stored in a separate data
structure, much like we refer to memprof::Frame with memprof::FrameId.
At the same time, we need to facilitate a graceful transition from the
current version of the MemProf format to the next. We should be able
to read (but not write) the current version of the MemProf file even
after we move onto the next one.
With those goals in mind, I propose to have an integer ID next to
CallStack in IndexedAllocationInfo to refer to a call stack in a
succinct manner. We'll gradually increase the areas of the compiler
where IDs and call stacks have one-to-one correspondence and
eventually remove the existing CallStack field.
This patch adds call stack ID, named CSId, to IndexedAllocationInfo
and teaches the raw profile reader to compute unique call stack IDs
and store them in the new field. It does not introduce any user of
the call stack IDs yet, except in verifyFunctionProfileData.
- Function getParsedIRPGOFuncName splits name by delimiter. The `[filename;]mangled-name` format could be generalized for non-function global values (e.g., vtables for type profiling). So rename the
function.
- Use kGlobalIdentifierDelimiter rather than semicolon directly for defragmentation.
Change the format of IRPGO counter names to
`[<filepath>;]<mangled-name>` which is computed by
`GlobalValue::getGlobalIdentifier()` to fix#74565.
In fe051934cbb0aaf25d960d7d45305135635d650b
(https://reviews.llvm.org/D156569) the format of IRPGO counter names was
changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is
basically `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O
(yes, it is confusing that `<linkage-name>` is computed with
`Mangler().getNameWithPrefix()` while `<mangled-name>` is just
`F.getName()`). We discovered in #74565 that this causes some missed
import issues on some targets and #74008 is a partial fix.
Since `<mangled-name>` may not match the `<linkage-name>` on some
targets like Mach-O, we will need to post-process the output of
`llvm-profdata order` before passing to the linker via `-order_file`.
Profiles generated after fe051934cbb0aaf25d960d7d45305135635d650b will
become stale after this diff, but I think this is acceptable since that
patch landed after the LLVM 18 cut which hasn't been released yet.
Simplify the compiler-rt test to make it more general for different
platforms, and use `*DAG` matchers for lines that may be emitted
out-of-order.
- The compiler-rt test passed on a Windows machine. Previously name
matchers don't work for MSVC mangling
(https://lab.llvm.org/buildbot/#/builders/127/builds/59907)
- `*DAG` matchers fixed the error in
https://lab.llvm.org/buildbot/#/builders/94/builds/17924
This is the second reland and fixed errors caught in first reland
(https://github.com/llvm/llvm-project/pull/75860)
**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
Fixed build-bot failures caught by post-submit tests
1) Add the list of command line tools needed by new compiler-rt test into dependency.
2) Use `starts_with` to replace deprecated `startswith`.
**Original commit message**
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
Commit fe05193 (phab D156569), IRPGO names uses format
`[<filepath>;]<linkage-name>` while prior format is
`[<filepath>:<mangled-name>`. The format change would break the use case
demonstrated in (updated)
`llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll` and
`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
This patch changes `GlobalValues::getGlobalIdentifer` to use the
semicolon.
To elaborate on the scenario how things break without this PR
1. IRPGO raw profiles stores (compressed) IRPGO names of functions in
one section, and per-function profile data in another section. The
[NameRef](fc715e4cd9/compiler-rt/include/profile/InstrProfData.inc (L72))
field in per-function profile data is the MD5 hash of IRPGO names.
2. When raw profiles are converted to indexed format profiles, the
profiled address is
[mapped](fc715e4cd9/llvm/lib/ProfileData/InstrProf.cpp (L876-L885))
to the MD5 hash of the callee.
3. In `pgo-instr-use` thin-lto prelink pipeline, MD5 hash of IRPGO names
will be
[annotated](fc715e4cd9/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1707))
as value profiles, and used to import indirect-call-prom candidates. If
the annotated MD5 hash is computed from the new format while import uses
the prior format, the callee cannot be imported.
*`compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp`
is added to have an end-to-end test.
* `llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll`
is updated to have better test coverage from another aspect (as runtime
tests are more sensitive to the environment and may be skipped by some
contributors)
There are three test cases to test the merge of value profiles. 'get_icall_data_merge1' tests the basic case;
{get_icall_data_merge1_saturation, get_icall_data_merge_site_trunc} tests the edge case.
This patch parameterizes the edge case test coverage by value kind and adds the coverage of 'IPVK_MemOPSize'. Keep the basic test structure as it is. The main reason is test data construction and test assertions is
clearer for each kind in the basic test.
- Using a loop for different value kinds in one test case doesn't work
very well. The instr-prof-writer is stateful (e.g., keeps track of
per-function profile data in a
[container](a9c149df76/llvm/include/llvm/ProfileData/InstrProfWriter.h (L43)))
This patch factor out the common code among three similar test cases.
The input data and test logic are pretty similar. Parameterize the
differences (prof-weight and endianness) as advised in
https://github.com/llvm/llvm-project/pull/72611.
- Remove duplicated tests
Test fixture `MaybeSparseInstrProfTest` parameterize InstrProfWriter by
whether output is sparse or not. This test fixture has 20 test cases,
and 6 of them doesn't use profile reader and writer. Undo the
parameterization for these test cases will reduce redundant tests. This
is one clean-up PR. (A few more clean-ups to come soon, but they are not
inter-dependent)
Three existing test cases {get_icall_data_read_write, get_icall_data_read_write_with_weight,get_icall_data_read_write_big_endian} have common test data and testing
logic. Extract common code into `testICallDataReadWrite`.
- Add helper function `addValueProfData` and `testValueDataArray`. This two helper functions are used by `testICallDataReadWrite`, and possibly other test cases.
- For instance, `collectPGOFuncNameStrings` is reused a lot in https://github.com/llvm/llvm-project/pull/66825 to get the compressed vtable names; and in some added callsites it's just confusing to see 'func' since context clearly shows it's not. This function currently just takes a list of strings as input so name it to `collectGlobalObjectNameStrings`
- Do the rename in a standalone patch since the method is used in non-llvm codebase. It's easier to rollback this NFC in case rename in that codebase takes longer.
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
- This function looks up MD5ToNameMap to return a name for a given MD5.
https://github.com/llvm/llvm-project/pull/66825 adds MD5 of global
variable names into this map. So rename methods and update comments
Prior to this diff, names in the `__llvm_prf_names` section had the format `[<filepath>:]<function-name>`, e.g., `main.cpp:foo`, `bar`. `<filepath>` is used to discriminate between possibly identical function names when linkage is local and `<function-name>` simply comes from `F.getName()`. This has two problems:
* `:` is commonly found in Objective-C functions so that names like `main.mm:-[C foo::]` and `-[C bar::]` are difficult to parse
* `<function-name>` might be different from the linkage name, so it cannot be used to pass a function order to the linker via `-symbol-ordering-file` or `-order_file` (see https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068)
Instead, this diff changes the format to `[<filepath>;]<linkage-name>`, e.g., `main.cpp;_foo`, `_bar`. The hope is that `;` won't realistically be found in either `<filepath>` or `<linkage-name>`.
To prevent invalidating all prior IRPGO profiles, we also lookup the prior name format when a record is not found (see `InstrProfSymtab::create()`, `readMemprof()`, and `getInstrProfRecord()`). It seems that Swift and Clang FE-PGO rely on the original `getPGOFuncName()`, so we cannot simply replace it.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D156569
When a user uses a mismatched clang + llvm-profdata, they didn't get a very
informative error message. It would just say "unsupported version".
As a result, users are often confused as to what they are supposed to do and
tend to assume that it's a bug in the profiling runtime.
This patch improves the error message by:
- Adding a new class of error (`raw_profile_version_mismatch`) to make it clear
that, specifically, the *raw profile* version is unsupported because of a
tool mismatch.
- Adding an error message that tells the user which raw profile version was
encountered, which version was expected, and instructs them to align their
tool versions.
To support this, this patch also updates `InstrProfError::take` to also
propagate the optional error message.
Differential Revision: https://reviews.llvm.org/D149361
As discussed in [0], add a `weight` field to temporal profiling traces found in profiles. This allows users to use the `--weighted-input=` flag in the `llvm-profdata merge` command to weight traces from different scenarios differently.
Note that this is a breaking change, but since [1] landed very recently and there is no way to "use" this trace data, there should be no users of this feature. We believe it is acceptable to land this change without bumping the profile format version.
[0] https://reviews.llvm.org/D147812#4259507
[1] https://reviews.llvm.org/D147287
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D148150
As described in [0], this extends IRPGO to support //Temporal Profiling//.
When `-pgo-temporal-instrumentation` is used we add the `llvm.instrprof.timestamp()` intrinsic to the entry of functions which in turn gets lowered to a call to the compiler-rt function `INSTR_PROF_PROFILE_SET_TIMESTAMP()`. A new field in the `llvm_prf_cnts` section stores each function's timestamp. Then in `llvm-profdata merge` we convert these function timestamps into a //trace// and add it to the indexed profile.
Since these traces could significantly increase the profile size, we've added `-max-temporal-profile-trace-length` and `-temporal-profile-trace-reservoir-size` to limit the length of a trace and the number of traces in a profile, respectively.
In a future diff we plan to use these traces to construct an optimized function order to reduce the number of page faults during startup.
Special thanks to Julian Mestre for helping with reservoir sampling.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D147287
Basically NFC: A TEST/TEST_F/etc that bails out early (usually because
setup failed or some other runtime condition wasn't met) generally
should use GTEST_SKIP() to report its status correctly, unless it
takes steps to report another status (e.g., FAIL()).
I did see a handful of tests show up as SKIPPED after this change,
which is not unexpected. The status seemed appropriate in all the new
cases.
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
Current implementation promotes a non-cold function in the SampleFDO profile
into a hot function in the FDO profile. This is too aggressive. This patch
promotes a hot functions in the SampleFDO profile into a hot function, and a
warm function in SampleFDO into a warm function in FDO.
Differential Revision: https://reviews.llvm.org/D132601
* Refactor compression namespaces across the project, making way for a possible
introduction of alternatives to zlib compression.
Changes are as follows:
* Relocate the `llvm::zlib` namespace to `llvm::compression::zlib`.
Reviewed By: MaskRay, leonardchan, phosek
Differential Revision: https://reviews.llvm.org/D128953
Switch the error type when a function is not found in the memprof
profile to unknown_function. This gives compatibility with normal PGO
function matching, and also prevents issuing large numbers of additional
matching errors since pgo-warn-missing-function is off by default.
Differential Revision: https://reviews.llvm.org/D124953
The current implementation of memprof information in the indexed profile
format stores the representation of each calling context fram inline.
This patch uses an interned representation where the frame contents are
stored in a separate on-disk hash table. The table is indexed via a hash
of the contents of the frame. With this patch, the compressed size of a
large memprof profile reduces by ~22%.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D123094
This reverts commit 0d362c90d335509c57c0fbd01ae1829e2b9c3765.
Reason: Causes the MSan buildbot to fail (see comments on
https://reviews.llvm.org/D121179 for more information