This patch adds code generation for RISCV64 instrumentation.The work
involved includes the following three points:
a) Implements support for instrumenting direct function call and jump
on RISC-V which relies on , Atomic instructions
(used to increment counters) are only available on RISC-V when the A
extension is used.
b) Implements support for instrumenting direct function inderect call
by implementing the createInstrumentedIndCallHandlerEntryBB and
createInstrumentedIndCallHandlerExitBB interfaces. In this process, we
need to accurately record the target address and IndCallID to ensure
the correct recording of the indirect call counters.
c)Implemented the RISCV64 Bolt runtime library, implemented some system
call interfaces through embedded assembly. Get the difference between
runtime addrress of .text section andstatic address in section header
table, which in turn can be used to search for indirect call
description.
However, the community code currently has problems with relocation in
some scenarios, but this has nothing to do with instrumentation. We
may continue to submit patches to fix the related bugs.
This commit adds support for AArch64 in instrumentation runtime library,
including AArch64 system calls.
Also this commit divides syscalls into target-specific files.
Reviewed By: rafauler, yota9
Differential Revision: https://reviews.llvm.org/D151942
Because indirect call tables use static addresses for call sites, but pc
values recorded by runtime may be subject to ASLR in PIE, we couldn't
find indirect call descriptions by their runtime address in PIE. It
resulted in [unknown] entries in profile for all indirect calls. We need
to substract base address of .text from runtime addresses to get the
corresponding static addresses. Here we create a getter for base address
of .text and substract it's return value from recorded PC values. It
converts them to static addresses, which then may be used to find the
corresponding indirect call descriptions.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D154121
When a binary is instrumented with --instrumentation-sleep-time and
instrumentation-wait-forks options and lauched, the profile is
periodically written until all the forks die. The problem is that we
cannot wait for the whole process tree, and we have no way to tell when
it's safe to read the profile. Hovewer, if we keep profile open
throughout the life of the process tree, we can use fuser to determine
when writing is finished.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D154436
Since there are no other means to debug the instrumentation library
other than using stdout, having a function to print hash table entries
is very useful.
Reviewed By: rafauler, Amir
Differential Revision: https://reviews.llvm.org/D153771
The BOLT runtime is specifically hard coded for x86_64 linux or x86_64
darwin. (Using x86_64 syscalls, hardcoding syscall numbers.)
Make it very clear this is for those specific pair of systems.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D148825
This patch adds the huge pages support (-hugify) for PIE/no-PIE
binaries. Also returned functionality to support the kernels < 5.10
where there is a problem in a dynamic loader with the alignment of
pages addresses.
Differential Revision: https://reviews.llvm.org/D129107
Compiler can generate calls to some functions implicitly, even under
constraints of freestanding environment. Make sure these functions are
available in our runtime objects.
Fixes test failures on some systems after https://reviews.llvm.org/D128960.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D129168
Changed acquire implemetaion to __atomic_test_and_set() and release
to __atomic_clear() so it eliminates inline asm usage and is arch
independent.
Elvina Yakubova,
Advanced Software Technology Lab, Huawei
Reviewers: yota9, maksfb, rafauler
Differential Revision: https://reviews.llvm.org/D129162
Summary:
Refactor remaining bolt sources to follow the braces rule for if/else/loop from
[LLVM Coding Standards](https://llvm.org/docs/CodingStandards.html).
(cherry picked from FBD33345885)
Summary:
Sync the file with storage device on data dump to stabilize
instrumentation testing
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD31738021)
Summary:
This commit introduces TryLock usage for SimpleHashTable getter to
avoid deadlock and relax syscalls usage which causes significant
overhead in runtime.
The old behavior left under -conservative-instrumentation option passed
to instrumentation library.
Also, this commit includes a corresponding test case: instrumentation of
executable which performs indirect calls from common code and signal
handler.
Note: in case if TryLock was failed to acquire the lock - this indirect
call will not be accounted in the resulting profile.
Vasily Leonenko,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30821949)
Summary:
This commit adds support for opening libs based on links
/proc/self/map_files. For this we're getting current virtual address
and searching the lib in the directory with such address range. After
that, we're getting full path to the binary by using readlink
function. Direct read from link in /proc/self/map_files entries is not
possible because of lack of permissions.
Elvina Yakubova,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092422)
Summary:
This commit adds support for getting directory entries and
reading value of a symbolic link in instrumentation runtime library
Elvina Yakubova,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD30092362)
Summary:
When indirect call is instrmented it locks SimpleHashTable's mutex on get() call.
If while locked we we receive a signal and signal handler also will call
indirect function we will end up with deadlock.
PR facebookincubator/BOLT#167
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
(cherry picked from FBD28909921)
Summary:
This PR introduces 2 new instrumentation options:
1. instrumentation-no-counters-clear: Discussed at https://github.com/facebookincubator/BOLT/issues/121
2. instrumentation-wait-forks: Since the instrumentation counters are mapped as MAP_SHARED it will be nice to add ability to wait until all forks of the parent process will die using tracking of process group.
The last patch is just emitBinary code refactor.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/125
GitHub Author: Vladislav Khmelevskyi <Vladislav.Khmelevskyi@huawei.com>
(cherry picked from FBD26919011)
Summary:
Right now, the SAVE_ALL sequence executed upon entry of both
of our runtime libs (hugify and instrumentation) will cause the stack to
not be aligned at a 16B boundary because it saves 15 8-byte regs. Change
the code sequence to adjust for that. The compiler may generate code
that assumes the stack is aligned by using movaps instructions, which
will crash.
(cherry picked from FBD22744307)
Summary:
This patch enables automated hugify for Bolt.
When running Bolt against a binary with -hugify specified, Bolt will inject a call to a runtime library function at the entry of the binary. The runtime library calls madvise to map the hot code region into a 2M huge page. We support both new kernel with THP support and old kernels. For kernels with THP support we simply make a madvise call, while for old kernels, we first copy the code out, remap the memory with huge page, and then copy the code back.
With this change, we no longer need to manually call into hugify_self and precompile it with --hot-text. Instead, we could simply combine --hugify option with existing optimizations, and at runtime it will automatically move hot code into 2M pages.
Some details around the changes made:
1. Add an command line option to support --hugify. --hugify will automatically turn on --hot-text to get the proper hot code symbols. However, running with both --hugify and --hot-text is not allowed, since --hot-text is used on binaries that has precompiled call to hugify_self, which contradicts with the purpose of --hugify.
2. Moved the common utility functions out of instr.cpp to common.h, which will also be used by hugify.cpp. Added a few new system calls definitions.
3. Added a new class that inherits RuntimeLibrary, and implemented the necessary emit and link logic for hugify.
4. Added a simple test for hugify.
(cherry picked from FBD21384529)