VTable value profiling can create reference edges from `mod1:func_foo`
to `mod2:local-vtable`. Indirect call profiling can create reference
edges from `mod1:func_foo` to `mod2:local_func_bar`.
Given a ref chain `mod1:func_foo -> mod2:local-var`,`local-var` doesn't
get imported by default.
Compiler checks / requires the module of 'local-var' is the same as the
function that referenced it(`mod1:func_foo`). This is to prevent
mis-compilation when both `mod1` and `mod2` has `local-var` of the same
name, and cpp files are compiled without full path.
This patch allows the import when one of the following conditions
happen:
1) Introduce an option `import-assume-local-unique`. When the compiler
user can guarantee that all files are compiled with full paths, they can
set this option.
2) When there is one instance of value summary.
Test:
* A/B testing this option alone gives -0.16% statistically consistent
cpu cycle reduction on one search workload (no throughput increase)
* Testing it together with existing more-efficient ICP bumps the
throughput increase by a margin (0.05%~0.1%)
* No regressions observed.
Before this change, when `file.profdata` have vtable profiles but `--enable-vtable-value-profiling` is not on for optimized build, warnings from this line [1] will show up. They are benign for performance but confusing.
It's better to automatically annotate vtable profiles if `file.profdata` has them. This PR implements it in profile use pass.
* If `-icp-max-num-vtables` is zero (default value is 6), vtable profiles won't be annotated.
[1] 464d321ee8/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (L1762-L1768)
Clang's `-fwhole-program-vtables` is required for this optimization to
take place. If `-fwhole-program-vtables` is not enabled, this change is
no-op.
* Function-comparison (before):
```
%vtable = load ptr, ptr %obj
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
%cond = icmp eq ptr %func, @callee
br i1 %cond, label bb1, label bb2:
bb1:
call @callee
bb2:
call %func
```
* VTable-comparison (after):
```
%vtable = load ptr, ptr %obj
%cond = icmp eq ptr %vtable, @vtable-address-point
br i1 %cond, label bb1, label bb2:
bb1:
call @callee
bb2:
%vfn = getelementptr inbounds ptr, ptr %vtable, i64 1
%func = load ptr, ptr %vfn
call %func
```
Key changes:
1. Find out virtual calls and the vtables they come from.
- The ICP relies on type intrinsic `llvm.type.test` to find out virtual
calls and the
compatible vtables, and relies on type metadata to find the address
point for comparison.
2. ICP pass does cost-benefit analysis and compares vtable only when the
number of vtables for a function candidate is within (option specified)
threshold.
3. Sink the function addressing and vtable load instruction to indirect
fallback.
- The sink helper functions are simplified versions of
`InstCombinerImpl::tryToSinkInstruction`. Currently debug intrinsics are
not handled. Ideally `InstCombinerImpl::tryToSinkInstructionDbgValues`
and `InstCombinerImpl::tryToSinkInstructionDbgVariableRecords` could be
moved into Transforms/Utils/Local.cpp (or another util cpp file) to
handle debug intrinsics when moving instructions across basic blocks.
4. Keep value profiles updated
1) Update vtable value profiles after inline
2) For either function-based comparison or vtable-based comparison,
update both vtable and indirect call value profiles.
- The indexed iFDO profiles contains compressed vtable names for `llvm-profdata show --show-vtables` debugging
usage. An optimized build doesn't need it and doesn't decompress the blob now [1], since optimized binary has the
source code and IR to find vtable symbols.
- The motivation is to avoid increasing profile size when it's not necessary.
- This doesn't change the indexed profile format and thereby doesn't need a version change.
[1] eac925fb81/llvm/include/llvm/ProfileData/InstrProfReader.h (L696-L699)
`Linux/instrprof-vtable-value-prof.cpp` needs to be built for the test
to run. However, cpp compile & link failed with undefined-ABI error [1].
See original failure in
https://lab.llvm.org/buildbot/#/builders/18/builds/16429
[1]
```
FAIL: Profile-powerpc64 :: Linux/instrprof-vtable-value-prof.cpp (2406 of 2414)
******************** TEST 'Profile-powerpc64 :: Linux/instrprof-vtable-value-prof.cpp' FAILED ********************
Exit Code: 1
Command Output (stderr):
--
RUN: at line 3: /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/build_debug/./bin/clang --driver-mode=g++ -m64 -ldl -fprofile-generate -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/profile/Profile-powerpc64/Linux/Output/instrprof-vtable-value-prof.cpp.tmp-test
+ /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/build_debug/./bin/clang --driver-mode=g++ -m64 -ldl -fprofile-generate -fuse-ld=lld -O2 -g -fprofile-generate=. -mllvm -enable-vtable-value-profiling /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-sanitizer/sanitizer-ppc64be/build/build_debug/runtimes/runtimes-bins/compiler-rt/test/profile/Profile-powerpc64/Linux/Output/instrprof-vtable-value-prof.cpp.tmp-test
ld.lld: error: /lib/../lib64/Scrt1.o: ABI version 1 is not supported
clang: error: linker command failed with exit code 1 (use -v to see invocation)
```
(The profile format change is split into a standalone change into https://github.com/llvm/llvm-project/pull/81691)
* For InstrFDO value profiling, implement instrumentation and lowering for virtual table address.
* This is controlled by `-enable-vtable-value-profiling` and off by default.
* When the option is on, raw profiles will carry serialized `VTableProfData` structs and compressed vtables as payloads.
* Implement profile reader and writer support
* Raw profile reader is used by `llvm-profdata` but not compiler. Raw profile reader will construct InstrProfSymtab with symbol names, and map profiled runtime address to vtable symbols.
* Indexed profile reader is used by `llvm-profdata` and compiler. When initialized, the reader stores a pointer to the beginning of in-memory compressed vtable names and the length of string. When used in `llvm-profdata`, reader decompress the string to show symbols of a profiled site. When used in compiler, string decompression doesn't
happen since IR is used to construct InstrProfSymtab.
* Indexed profile writer collects the list of vtable names, and stores that to index profiles.
* Text profile reader and writer support are added but mostly follow the implementation for indirect-call value type.
* `llvm-profdata show -show-vtables <args> <profile>` is implemented.
rfc in
https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600#pick-instrumentation-points-and-instrument-runtime-types-7