It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
raw_string_ostream::flush() is essentially a no-op (also specified in docs).
Don't call it in tests that aren't meant to test 'raw_string_ostream' itself.
p.s. remove a few redundant calls to raw_string_ostream::str()
This patch switches most of the uses of intptr_t to uintptr_t within
llvm-exegesis for the subprocess memory support. In the vast majority of
cases we do not want a signed component of the address, hence making
intptr_t undesirable. intptr_t is left for error handling, for example
when making syscalls and we need to see if the syscall returned -1.
Three `llvm-exegesis` tests
```
LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/DefinitionFillsCompletely
LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/MultipleDefinitions
LLVM-Unit :: tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/OneDefinition
```
`FAIL` on Linux/sparc64 like
```
llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp:68: Failure
Expected equality of these values:
SharedMemoryMapping[I]
Which is: '\0'
ExpectedValue[I]
Which is: '\xAA' (170)
```
It seems like this test only works on little-endian hosts: three
sub-tests are already disabled on powerpc and s390x (both big-endian),
and the fourth is additionally guarded against big-endian hosts (making
the other guards unnecessary).
However, since it's not been analyzed if this is really an endianess
issue, this patch disables the whole test on powerpc and s390x as before
adding sparc to the mix.
Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`.
This patch looks up variant scheduling classes using a hash of the
instruction. Keying by the pointer breaks certain use cases that might
occur out of tree, like decoding an execution trace instruction by
instruction and creating MCA instructions as one goes along, like in the
MCAD case. In this case, the MCInst will always have the same address
and thus all instructions with the same variant scheduling class will
end up with the same instruction description, leading to undesired
behavior (assertions, uses after free, invalid results, etc.).
Update the folder titles for targets in the monorepository that have not
seen taken care of for some time. These are the folders that targets are
organized in Visual Studio and XCode
(`set_property(TARGET <target> PROPERTY FOLDER "<title>")`)
when using the respective CMake's IDE generator.
* Ensure that every target is in a folder
* Use a folder hierarchy with each LLVM subproject as a top-level folder
* Use consistent folder names between subprojects
* When using target-creating functions from AddLLVM.cmake, automatically
deduce the folder. This reduces the number of
`set_property`/`set_target_property`, but are still necessary when
`add_custom_target`, `add_executable`, `add_library`, etc. are used. A
LLVM_SUBPROJECT_TITLE definition is used for that in each subproject's
root CMakeLists.txt.
Currently we assume a constant latency of 100 cycles for call
instructions. This commit allows the user to specify a custom value for
the same as a command line argument. Default latency is set to 100.
This reverts commit 1fe9c417a0bf143f9bb9f9e1fbf7b20f44196883.
This relands commit 6bbe8a296ee91754d423c59c35727eaa624f7140.
This was causing build failures on one of the ARMv8 builders. Still not
completely sure why, but relanding it to see if the failure pops up
again. If it does, the plan is to fix forward by disabling tests on ARM
temporarily as llvm-exegesis does not currently use SubprocessMemory
on ARM.
This reverts commit 8003f553a01a9a2a7eb09fe07e88f1ba9ee7d3a7.
This (and/or a related commit) was causing build failures on one of the
buildbots that needs more investigation. More information is available
at https://lab.llvm.org/buildbot/#/builders/178/builds/7015.
This reverts commit aefad27096bba513f06162fac2763089578f3de4.
This relands commit 6bbe8a296ee91754d423c59c35727eaa624f7140.
This patch was casuing build failures on non-Linux platforms due to the
default implementations for the functions not being updated. This ended
up causing out-of-line definition errors. Fixed for the relanding.
This reverts commit 6bbe8a296ee91754d423c59c35727eaa624f7140.
This breaks building LLVM on macOS, failing with
llvm/tools/llvm-exegesis/lib/SubprocessMemory.cpp:146:33: error: out-of-line definition of 'setupAuxiliaryMemoryInSubprocess' does not match any declaration in 'llvm::exegesis::SubprocessMemory'
Expected<int> SubprocessMemory::setupAuxiliaryMemoryInSubprocess(
This patch adds the thread ID to the subprocess memory shared memory
names. This avoids conflicts for downstream consumers that might want to
consume llvm-exegesis across multiple threads, which would otherwise run
into conflicts due to the same PID running multiple instances.
This patch adds a LLVM-EXEGESIS-LOOP-REGISTER snippet annotation which
allows a user to specify the register to use for the loop counter in the
loop repetition mode. This allows for executing snippets that don't work
with the default value (currently R8 on X86).
This patch replaces --num-repetitions with --min-instructions to make it
more clear that the value refers to the minimum number of instructions
in the final assembled snippet rather than the number of repetitions of
the snippet. This patch also refactors some llvm-exegesis internal
variable names to reflect the name change.
Fixes#76890.
This patch adds two new repetition modes to llvm-exegesis, particularly
loop and duplicate repetition modes of what I am terming the middle half
repetition mode. The middle half repetition mode essentially runs each
measurement twice, one with twice the number of iterations of the other.
These two measurements are then agregated by taking their difference.
This subtracts away any setup/overhead that is unrelated to the code in
the snippet, providing more accurate results.
Using this mode on a couple toy examples, I am able to get exact
(integer) throughput values on all of them in contrast to the default
duplicate/loop repetition modes which show a little bit of noise on the
snippet value.
This patch refactors the instantiation of BenchmarkMeasure within all
the unit tests to use BenchmarkMeasure::Create rather than through
direct struct instantialization. This allows us to change what values
are stored in BenchmarkMeasure without getting compiler warnings on
every instantiation in the unit tests, and is also just a cleanup in
general as the Create function didn't seem to exist at the time the unit
tests were originally written.
Currently, the duplicate snippet repetitor will truncate snippets that
do not exactly divide the minimum number of instructions. This patch
corrects that behavior by making the duplicate snippet repetitor
duplicate the snippet in its entirety until the minimum number of
instructions has been reached.
This makes the behavior consistent with the loop snippet repetitor,
which will execute at least `--num-repetitions` (soon to be renamed
`--min-instructions`) instructions.
This patch adds support for validation counters. Validation counters can
be used to measure events that occur during snippet execution like cache
misses to ensure that certain assumed invariants about the benchmark
actually hold. Validation counters are setup within a perf event group,
so are turned on and off at exactly the same time as the "group leader"
counter that measures the desired value.
Currently, the unit tests for the BenchmarkResult struct do not check if
the register initial values can be parsed back in. This patch adds a
matcher and tests that the register initial values can be parsed
correctly. This exercises code already contained within the benchmark to
yaml infrastructure.
This patch removes the redundant RegisterInitialValues parameter from
assembleToStream and friends as it is included within the BenchmarkKey
struct that is also passed to all the functions that need this
information.
The llvm-exegesis unit tests currently fail on PPC after
ceb196d9903f4db7250bbc6c8da13eeae1b85886 landed as the default page size
on most common linux distributions for PPC is 64kb rather than 4kb. This
patch changes the memory mappings to have addresses as multiples of 64kb
rather than multiples of 4kb to fix this issue.
This patch refactors the MMAP platform-specific preprocessor directives
in llvm-exegesis to a single file instead of having duplicate code split
across multiple files. These originally got introduced to get buildbots
green again due to platform specific failures.
This reverts commit 30d700117b772d94d8474ec56bd6f9cc423fc613.
This relands commit 3ab41f912a6c219a93b87c257139822ea07c8863.
When I was updating the patch to use llvm::to_integer, I only ran the
lit tests and didn't run the unit tests, one of which started to fail.
This patch fixes the broken unit test.
This test was an exact duplicate of the one above, providing no value,
and also adding confusion as it referred to a behavior that (was
presumably) moved around and tested differently during the review
process with the test being forgotten about.
This patch refactors ExecutableFunction to use a named constructor
pattern, namely adding the create function, so that errors occurring
during the creation of an ExecutableFunction can be propogated back up
rather than having to deal with them in report_fatal_error.
This is the patch at https://reviews.llvm.org/D153692, migrating to
Github
After testing D147740 with multiple industrial projects with ~10 million
FunctionSamples, no MD5 collision has been found. In perfect hashing,
the probability of collision for N symbols over K possible hash value is
1 - K!/((K-N)! * K^N). When N is 1 million and K is 2^64, the
probability is 3*10^-8, when N is 10 million the probability is 3*10^-6,
so we are probably not going to find an actual case in real world
application. (However if K is 2^32, the probability of collision is
almost 1, this is indeed a problem, if anyone still use a large profile
on 32-bit machine, as hash_code is tied to size_t). Furthermore, when a
collision happens we can't do anything to recover it, unless using a
multi-map, but that is significantly slower, which contradicts the
purpose of optimizing the profile reader. One more thing, since we have
been using profiles with MD5 names, and they have to be coming from
non-MD5 sources, so if hash collision is to happen, it already happened
when we convert a non-MD5 profile to a MD5 one, so there's no point to
check for that in the reader, and this feature can be removed.
With this, check-llvm passes on an arm mac if x86 isn't in
LLVM_TARGETS_TO_BUILD.
This pattern to skip the tests if x86 isn't enabled is used
in every other test in this file.
Make sure the TestCount const definition is guarded the same way
as the use of the constant.
This is an attempt to fix buildbot failures related to
-Wunused-const-variable.
This patch makes SubprocessMemoryTest use process PIDs during creation
of the SubprocessMemory objects within the tests so that there isn't
interference between multiple instances of the test running at the same
time which could potentially occur in multi-user environments.
This is a continuation the review in https://reviews.llvm.org/D154680.
Some 32-bit architectures don't have mmap and define mmap2 instead.
E.g. on riscv32 we may get
```
| /mnt/b/yoe/master/build/tmp/work-shared/llvm-project-source-17.0.0-r0/git/llvm/tools/llvm-exegesis/lib/X86/Target.cpp:1116:19: error: use of undeclared identifier 'SYS_mmap'
| 1116 | generateSyscall(SYS_mmap, MmapCode);
| | ^
| /mnt/b/yoe/master/build/tmp/work-shared/llvm-project-source-17.0.0-r0/git/llvm/tools/llvm-exegesis/lib/X86/Target.cpp:1134:19: error: use of undeclared identifier 'SYS_mmap'
| 1134 | generateSyscall(SYS_mmap, GeneratedCode); | | ^
| 1 warning and 2 errors generated.
```
Co-Authored-By: Fangrui Song <i@maskray.me>
Differential Revision: https://reviews.llvm.org/D158375
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
Having the SubprocessMemory class available on Android was causing build
failures in downstream builds as Android doesn't implement the SystemV
IPC specification that supplies shm_open and other functions that the
class relies on. This patch simply makes it unavailable on Android using
preprocessor directives.
This patch adds memory annotation parsing to llvm-exegesis. The memory
annotations cannot be used currently, but this allows for using parsed
memory annotations within a FunctionExecutorImpl to set up a specified
execution environment.
I fixed compilation on 32-bit ARM earlier in the -Werror case using
preprocessor directives but forgot to update the unit tests that also
depend upon that value. This patch updates the unit tests as well as a
quick fix for the builders that were broken by the earlier patch.
This is phase 1 of multiple planned improvements on the sample profile loader. The major change is to use MD5 hash code ((instead of the function itself) as the key to look up the function offset table and the profiles, which significantly reduce the time it takes to construct the map.
The optimization is based on the fact that many practical sample profiles are using MD5 values for function names to reduce profile size, so we shouldn't need to convert the MD5 to a string and then to a SampleContext and use it as the map's key, because it's extremely slow.
Several changes to note:
(1) For non-CS SampleContext, if it is already MD5 string, the hash value will be its integral value, instead of hashing the MD5 again. In phase 2 this is going to be optimized further using a union to represent MD5 function (without converting it to string) and regular function names.
(2) The SampleProfileMap is a wrapper to *map<uint64_t, FunctionSamples>, while providing interface allowing using SampleContext as key, so that existing code still work. It will check for MD5 collision (unlikely but not too unlikely, since we only takes the lower 64 bits) and handle it to at least guarantee compilation correctness (conflicting old profile is dropped, instead of returning an old profile with inconsistent context). Other code should not try to use MD5 as key to access the map directly, because it will not be able to handle MD5 collision at all. (see exception at (5) )
(3) Any SampleProfileMap::emplace() followed by SampleContext assignment if newly inserted, should be replaced with SampleProfileMap::Create(), which does the same thing.
(4) Previously we ensure an invariant that in SampleProfileMap, the key is equal to the Context of the value, for profile map that is eventually being used for output (as in llvm-profdata/llvm-profgen). Since the key became MD5 hash, only the value keeps the context now, in several places where an intermediate SampleProfileMap is created, each new FunctionSample's context is set immediately after insertion, which is necessary to "remember" the context otherwise irretrievable.
(5) When reading a profile, we cache the MD5 values of all functions, because they are used at least twice (one to index into FuncOffsetTable, the other into SampleProfileMap, more if there are additional sections), in this case the SampleProfileMap is directly accessed with MD5 value so that we don't recalculate it each time (expensive)
Performance impact:
When reading a ~1GB extbinary profile (fixed length MD5, not compressed) with 10 million function names and 2.5 million top level functions (non CS functions, each function has varying nesting level from 0 to 20), this patch improves the function offset table loading time by 20%, and improves full profile read by 5%.
Reviewed By: davidxl, snehasish
Differential Revision: https://reviews.llvm.org/D147740
llvm-exegesis currently isn't compiling on 32-bit ARM due to SYS_mmap
not being defined as 32-bit ARM instead uses SYS_mmap2. This patch
defines SYS_mmap to be SYS_mmap2 in the relevant places to allow for
compilation to succeed.
Some builders are currently failing as they don't have
MAP_FIXED_NOREPLACE available. This patch checks if MAP_FIXED_NOREPLACE
is available and if it isn't, it is simply defined as MAP_FIXED.
On some platforms such as PPC shm_open is in librt and it isn't
automatically linked against. This patch explicitly links against librt
in the unittests which should hopefully fix the symbol resolution
errors.