122 Commits

Author SHA1 Message Date
Aiden Grossman
ceb196d990
[llvm-exegesis] Validate that address annotations are aligned (#75554)
This patch adds in validation at two different levels that address
annotations are page aligned. This is necessary as otherwise the mmap
calls will fail as MAP_FIXED/MAP_FIXED_NOREPLACE require page aligned
addresses. This happens silently in the subprocess. This patch adds
validation at snippet parsing time to give feedback to the user and also
adds asserts at code generation/address usage time to ensure that other
users of the Exegesis APIs conform to the same requirements.
2023-12-15 09:45:30 -08:00
Aiden Grossman
3194928c3c
[llvm-exegesis] Refactor MMAP platform-specific preprocessor directives (#75422)
This patch refactors the MMAP platform-specific preprocessor directives
in llvm-exegesis to a single file instead of having duplicate code split
across multiple files. These originally got introduced to get buildbots
green again due to platform specific failures.
2023-12-14 12:07:46 -08:00
Abhina Sree
ec41462d7a
[SystemZ][z/OS] Add missing strnlen function for z/OS to fix build failures (#75339)
This patch adds strnlen to the zOSSupport.h file to fix build failures in multiple files.
2023-12-13 13:13:53 -05:00
Aiden Grossman
5830e8e745
[llvm-exegesis] Use explicit error classes for different snippet crashes (#74210)
This patch switches to using explicit snippet crashes that contain more
information about the specific type of error (like the address for a
segmentation fault) that occurred. All these new error classes inherit
from SnippetExecutionFailure to allow for easily grabbing all of them in
addition to filtering for specific types using the standard LLVM error
primitives.
2023-12-11 23:15:56 -08:00
Clement Courbet
9017229ecd
[llvm-exegesis]Allow clients to do their own snippet running error ha… (#74711)
…ndling.

Returns an error *and* a benchmark rather than an error *or* a
benchmark. This allows users to have custom error handling while still
being able to inspect the benchmark.

Apart from this small API change, this is an NFC.

This is an alternative to #74211.
2023-12-08 13:01:01 +01:00
Aiden Grossman
5058d738ba [llvm-exegesis] Add MAP_FIXED_NOREPLACE definiton
MAP_FIXED_NOREPLACE doesn't exist on older kernels, so we need to define
it to be MAP_FIXED.
2023-12-07 00:47:04 -08:00
Aiden Grossman
f1963fde9f Reland "[llvm-exegesis] Add in snippet address annotation (#74218)"
This reverts commit 30d700117b772d94d8474ec56bd6f9cc423fc613.

This relands commit 3ab41f912a6c219a93b87c257139822ea07c8863.

When I was updating the patch to use llvm::to_integer, I only ran the
lit tests and didn't run the unit tests, one of which started to fail.
This patch fixes the broken unit test.
2023-12-07 00:20:24 -08:00
Aiden Grossman
30d700117b Revert "[llvm-exegesis] Add in snippet address annotation (#74218)"
This reverts commit 3ab41f912a6c219a93b87c257139822ea07c8863.

Unit tests break after recent changes. Will investigate/reland.
2023-12-06 11:25:03 -08:00
Aiden Grossman
3ab41f912a
[llvm-exegesis] Add in snippet address annotation (#74218) 2023-12-06 11:05:33 -08:00
Kazu Hirata
c630f95f33 [llvm-exegesis] Remove unnecessary includes (NFC)
Identified with clangd.
2023-12-05 23:28:09 -08:00
Kazu Hirata
06c5c27e44 [llvm-exegesis] Stop including array (NFC)
Identified with clangd.
2023-12-05 20:58:17 -08:00
Aiden Grossman
077fe97736
[llvm-exegesis] Disable core dumps in subprocess (#74144)
Core dumps are currently enabled within the llvm-exegesis subprocess
executor. This can create a lot of core dumps when going through
different snippets that might segfault when experimenting with memory
annotations. These core dumps are not really needed as the information
about the segfault is reported directly to the user.
2023-12-04 01:47:33 -08:00
Aiden Grossman
8a02b70324
[llvm-exegesis] Refactor ExecutableFunction to use a named constructor (#72837)
This patch refactors ExecutableFunction to use a named constructor
pattern, namely adding the create function, so that errors occurring
during the creation of an ExecutableFunction can be propogated back up
rather than having to deal with them in report_fatal_error.
2023-11-24 02:15:34 -08:00
Aiden Grossman
3300bc34f7
[llvm-exegesis] Fix race condition in subprocess mode (#72778)
If there were some scheduler effects where something like the parent
process got interrupted while the child process continued to run, there
would be nothing blocking it from exiting before the parent process
issued a PTRACE_ATTACH call. This would cause transient failures as this
occurred pretty rarely. This patch removes the possibility of a
transient failure by ensuring that the parent process attaches to the
child process before sending the counter file descriptor through the
socket, ensuring that the child process has at most progressed to being
blocked in the read call for the counter file descriptor.
2023-11-20 01:10:42 -08:00
Aiden Grossman
9426416994
[llvm-exegesis] Add error handling for fork failures (#65186)
There are still some transient failures on the clang-avx512 builder on
the new subprocess memory tests. Some of them seem to be related to an
inability to fork, but it's hard to debug currently as there is no
explicit error handling for a failed fork call, and nice error reporting
for a failed fork is something that we should have regardless.
2023-09-08 00:27:15 -07:00
Aiden Grossman
c4a769ba03 [llvm-exegesis] Print errno on failures in subprocess
Some error logging in llvm-exegesis under the subprocess executor just
prints a generic failure information rather than any details about the
error as we omit printing the string version of errno. This patch adds
in printing errno at all relevant points in the subprocess executor that
were previously missed.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D157682
2023-09-05 12:41:49 -07:00
Aiden Grossman
34e3bc0b92 [llvm-exegesis] Replace size_t with ssize_t where relevant
Currently BenchmarkRunner.cpp stores the return code of recvmsg as
size_t. Not only is this incorrect (as recvmsg returns ssize_t), but it
also makes the error code check after the statement completely irrelvant
as it checks if the number of bytes read is greater than zero (which
will always be true for an unsigned type).
2023-08-22 23:44:05 -07:00
Guillaume Chatelet
f70e83af7a [llvm-exegesis] Don't try to use SYS_rseq if it's not defined.
When compiling against recent glibc (>= 2.35) but old kernel headers (< 4.18), `SYS_rseq` is not defined and thus llvm-exegesis fails to build. So also check that `SYS_rseq` is defined before trying to use it.

Fixes https://github.com/llvm/llvm-project/issues/64456

Reviewed By: MaskRay, gchatelet

Differential Revision: https://reviews.llvm.org/D157189
2023-08-07 07:32:44 +00:00
Markus Böck
822c31a0fe [llvm-exegesis] Guard __builtin_thread_pointer behind a configure check
Due to arguably a bug in GCC[0], using `__has_builtin` is not sufficient to check whether `__builtin_thread_pointer` can actually be compiled by GCC. This makes it impossible to compile LLVM with `llvm-exegesis` enabled with e.g. GCC 10 as it does have the builtin, but no implementation for architectures such as x86.

This patch works around this issue by making it a cmake configure check whether the builtin can be compiled and used, rather than relying on the broken preprocessor macro.

[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952, demonstration: https://godbolt.org/z/9z5nWM6Ef

Differential Revision: https://reviews.llvm.org/D155828
2023-07-21 08:03:26 +02:00
Aiden Grossman
f3dfcc5053 [llvm-exegesis] Support older kernel versions in subprocess executor
This patch switches from moving the performance counter file descriptor
to the child process to socket calls rather than using the pidfd_getfd
system call which was introduced in kernel 5.6. This significantly
expands the range of kernel versions that are supported.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D154275
2023-07-18 10:42:45 -07:00
Fangrui Song
8f90a5cc45 [llvm-exegesis] Guard __builtin_thread_pointer use with __has_builtin
While Clang targets have supported __builtin_thread_pointer for a very
long time (e.g., 2007 for AArch32, 2015 for AArch64), for some GCC
ports, the support is very new (11.0 for x86[1], while we need to
support GCC 7), and many ports haven't implemented
__builtin_thread_pointer yet (m68k, powerpc, etc).

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96955
2023-07-17 16:42:11 -07:00
Aiden Grossman
2dcba67007 [llvm-exegesis] Remove unnecessary includes
Left some includes around from debugging the last patch that I forgot to
take out, so here's the patch taking them out.
2023-07-06 19:58:41 -07:00
Aiden Grossman
75b5541fe5 [llvm-exegesis] Switch to using PTRACE_ATTACH instead of PTRACE_SEIZE
This patch switches from using PTRACE_SEIZE within the subprocess
benchmark runner for llvm-exegesis as PTRACE_SEIZE was introduced in
Linux kernel 3.4. Some LLVM users were reporting build failures as they
are using Kernel versions older than 3.4 (such as on CentOS/RHEL 6),
hence the patch.
2023-07-06 19:40:19 -07:00
Fangrui Song
46b5b85548 [llvm-exegesis] Adjust GLIBC_INITS_RSEQ condition
Commit 9f80831f3627e800709e2434bbbd5bb179b1576e introduced `#include <sys/rseq.h>`,
but RSEQ_SIG is only defined by some glibc ports (aarch64,arm,mips,powerpc,s390,x86),
causing other hosts (e.g., riscv64, loongarch64) to fail to build.

Reviewed By: aidengrossman, xen0n

Differential Revision: https://reviews.llvm.org/D153938
2023-06-28 00:23:38 -07:00
Aiden Grossman
9b684ecde6 [llvm-exegesis] Fix warning and hoist statement of arch-specific section
My last patch broke most of the builders that aren't currently running
at least Kernel 5.6 as there was a variable used later on inside a
region that required that kernel version. Also fixes a minor warning
left over from a bad merge.
2023-06-27 07:01:20 +00:00
Aiden Grossman
9f80831f36 [llvm-exegesis] Add support for using memory annotations
This patch adds in support for using memory annotations in the
subprocess execution mode.
2023-06-27 06:52:33 +00:00
Aiden Grossman
e802dff0f0 [llvm-exegesis] Introduce Subprocess Executor Mode
This patch introduces the subprocess executor mode. Currently, this new
mode doesn't do anything fancy, just executing the same code that the
inprocess executor would do, but within a subprocess. This sets up the
ability to add in many more memory-related features in the future.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D151021
2023-06-26 01:43:19 +00:00
Aiden Grossman
309950515c Revert "[llvm-exegesis] Add ability to assign perf counters to specific PID"
Revert "[llvm-exegesis] Introduce Subprocess Executor Mode"

This reverts commit 5e9173c43a9b97c8614e36d6f754317f731e71e9.
This reverts commit 4d618b52f6e05e41d35f56653cb36bf7d4dc794e.

Reverting the PID commit as it is currently breaking MinGW builds and
the way I'm checking for the presence of pid_t needs to be fixed and I
need to do some testing. The subprocess executor mode patch is a
dependent patch so also needs to be reverted and also needs some work as
it is currently failing tests where libpfm is installed and the kernel
version is less than 5.6.
2023-06-22 18:05:01 +00:00
Aiden Grossman
4d618b52f6 [llvm-exegesis] Introduce Subprocess Executor Mode
This patch introduces the subprocess executor mode. Currently, this new
mode doesn't do anything fancy, just executing the same code that the
inprocess executor would do, but within a subprocess. This sets up the
ability to add in many more memory-related features in the future.
2023-06-21 07:55:28 +00:00
Aiden Grossman
08aeb7c35d Revert "[llvm-exegesis] Introduce Subprocess Executor Mode"
This reverts commit 0d4ef4ff01addbb40b9122a00d6b2f23104cbb3b.

This was causing build failures on certain platforms when built with
-Werror due to unused variable warnings in addition to causing build
failures on Linux systems with older kernel versions as kernels prior to
v5.15 don't support sys_pidfd_getpid. Reverting as I need to setup a
system to properly test the rest of the patches in this series.

Also reverts 8c6668fa42dba59ddc286ba256d71c1b9c5228b8 which fixed the
first issue so that the patch can actually be reverted.
2023-06-21 02:29:48 +00:00
Jie Fu
8c6668fa42 [llvm-exegesis] Fix -Wunused-variable in BenchmarkRunner.cpp (NFC)
/data/llvm-project/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:275:9: error: unused variable 'ParentPIDFD' [-Werror,-Wunused-variable]
    int ParentPIDFD = syscall(SYS_pidfd_open, ParentPID, 0);
        ^
1 error generated.
2023-06-21 10:20:36 +08:00
Aiden Grossman
0d4ef4ff01 [llvm-exegesis] Introduce Subprocess Executor Mode
This patch introduces the subprocess executor mode. Currently, this new
mode doesn't do anything fancy, just executing the same code that the
inprocess executor would do, but within a subprocess. This sets up the
ability to add in many more memory-related features in the future.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D151021
2023-06-21 02:00:13 +00:00
Aiden Grossman
72df12cce2 [llvm-exegesis] Refactor FunctionExecutorImpl and create factory
In order to better support adding in new implementations of
FunctionExecutor, this patch makes some small changes so that it is
easier to add new ones in. FunctionExecutorImpl is renamed to
InProcessFunctionExecutorImpl to better reflect how it will be placed
relative to the soon-to-be introduced subprocess executor and a new
function is created to create executors so selection can be done more
easily. In addition, a new CLI flag, -execution-mode, which can be used
to select between the different executors.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D151019
2023-06-21 00:04:48 +00:00
Pavel Kosov
27f37db76a [llvm-exegesis] Use MCJIT only for execution
Initially, llvm-exegesis was generating the benchmark code for the
host CPU to execute it inside its own process. Thus, MCJIT was reused
for fetching function's bytes to fill the assembled_snippet field in
the benchmark report.

Later, the --mtriple and --benchmark-phase command line options were
introduced that are handy for testing snippet generation even if
snippet execution is not possible. In that setup, MCJIT is asked to
parse an object file for a foreign CPU or operating system that is
probably not guaranteed to succeed and was actually observed to fail
in https://reviews.llvm.org/D145763.

This commit implements a much simplified function's code fetching,
assuming the benchmark function is the only function in the object file
and it spans across the entire text section (note that MCJIT-based code
has more or less the same assumption - see TrackingSectionMemoryManager
class).

~~~

Huawei RRI, OS Lab

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D148921
2023-06-16 10:38:52 +03:00
Pavel Kosov
8e0ee5ab9f [llvm-exegesis] Allow setting dump file name
This will be used for writing test cases.

~~

Huawei RRI, OS Lab

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D147700
2023-04-19 10:59:07 +03:00
Jie Fu
62a0049ae4 [llvm-exegesis] Fix -Wc++98-compat-extra-semi in BenchmarkRunner.cpp (NFC)
/data/llvm-project/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:66:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-W
c++98-compat-extra-semi]
};
 ^
1 error generated.
2023-04-14 15:46:09 +08:00
Aiden Grossman
d22805940a [llvm-exegesis] Refactor common parts out of FunctionExecutorImpl
This patch refactors some code out of FunctionExecutorImpl into the base
class that should be common across all implementations of
FunctionExecutor. Particularly, this patch factors out
accumulateCounterValues, and also factors out runAndSample, moving
implementation specific code into a new runWithCounter function. This
makes adding new implementations of FunctinExecutor easier.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D148079
2023-04-14 07:37:28 +00:00
Aiden Grossman
999a8b8ce9 [llvm-exegesis][NFC] remove runAndMeasure
This completes the FIXME listed in FunctionExecutor in regards to
deprecating this function. It simply makes the appropriate call into
runAndSample and grabs the first counter value. This patch completely
removes the function, moving that logic into the callers (currently only
uopsBenchmarkRunner). This makes creating new FunctionExecutors easier
as an implementation no longer needs to worry about this detail.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D147878
2023-04-14 07:16:34 +00:00
Aiden Grossman
389bf5d870 [llvm-exegesis] Refactor InstructionBenchmark to Benchmark
When llvm-exegesis was first introduced, it only supported benchmarking
individual instructions, hence the name for the data structure storing
the data corresponding to a benchmark being called InstructionBenchmark
made sense. However, now that benchmarking arbitrary snippets is
supported, InstructionBenchmark doesn't correspond to a single
instruction. This patch refactors InstructionBenchmark to be called
Benchmark to clean up this little bit of technical debt.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D146884
2023-03-27 08:14:36 +00:00
Guillaume Chatelet
bb37cab8a5 [llvm-exegesis][NFC] Update benchmark phase naming to match documentation 2023-01-06 13:40:46 +00:00
Roman Lebedev
e0ad2af691
[exegesis] "Skip codegen" dry-run mode
While "skip measurements mode" is super useful for test coverage,
i've come to discover it's trade-offs. It still calls back-end
to actually codegen the target assembly, and that is what is taking
80%+ of the time regardless of whether or not we skip the measurements.

On the other hand, just being able to see that exegesis can come up
with a snippet to measure something, is already very useful,
and takes maybe a second for a all-opcode sweep.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D140702
2023-01-05 17:47:17 +03:00
Roman Lebedev
dbc76ef791
[NFC][llvm-exegesis] Benchmark: move DumpObjectToDisk handling into runConfiguration()
`getRunnableConfiguration()` may be executed in parallel,
and then this the output would become even less useful.
2022-12-18 17:52:04 +03:00
Roman Lebedev
1dd4a6aac6
[NFC][llvm-exegesis] BenchmarkRunner: split runConfiguration() into getRunnableConfiguration() + runConfiguration()
We can run as many `getRunnableConfiguration()` in parallel as we want,
but `runConfiguration()` must be run *completely* standalone from everything.
This is a step towards enabling threading.
2022-12-18 04:23:20 +03:00
Roman Lebedev
118b49a09b
[NFCI][llvm-exegesis] BenchmarkRunner::runConfiguration(): extract assembleSnippet() helper 2022-12-17 23:14:53 +03:00
Roman Lebedev
41dd767fee
[NFC][llvm-exegesis] BenchmarkRunner::runConfiguration(): deduplicate DumpObjectToDisk handling
Always assemble into buffer, that is then optionally dumped into file.
2022-12-17 23:14:53 +03:00
Roman Lebedev
0db620aa30
[NFC][llvm-exegesis] BenchmarkRunner::runConfiguration(): reformat 2022-12-17 23:14:53 +03:00
Roman Lebedev
17e202424c
[NFCI][llvm-exegesis] Extract 'Min' repetition handling from BenchmarkRunner into it's caller
If `BenchmarkRunner::runConfiguration()` deals with more than a single
repetitor, tasking will be less straight-forward to implement.
But i think dealing with that in it's callee is even more readable.
2022-12-17 23:14:52 +03:00
Roman Lebedev
7a76140220
[llvm-exegesis] Dry run mode
Sometimes we only want to ensure that we can produce snippets (all the way
through `SnippetRepetitor`!), but don't care for the execution.
E.g. all of our tests are this way.

I've built LLVM without PFM and removed my CPU from `X86PfmCounters.td`,
and this produces the expected results in that configuration.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D139448
2022-12-07 20:15:43 +03:00
Roman Lebedev
78eaff2ef8
[llvm-exegesis] Loop unrolling for loop snippet repetitor mode
I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.

Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error:           ''
info:            ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...

```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.

Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?

That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```

So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D102522
2021-05-25 12:08:27 +03:00
Kazu Hirata
441650d589 [tools] Use llvm::append_range (NFC) 2021-01-05 21:15:56 -08:00