llvm-project

Author	SHA1	Message	Date
Aiden Grossman	ceb196d990	[llvm-exegesis] Validate that address annotations are aligned (#75554 ) This patch adds in validation at two different levels that address annotations are page aligned. This is necessary as otherwise the mmap calls will fail as MAP_FIXED/MAP_FIXED_NOREPLACE require page aligned addresses. This happens silently in the subprocess. This patch adds validation at snippet parsing time to give feedback to the user and also adds asserts at code generation/address usage time to ensure that other users of the Exegesis APIs conform to the same requirements.	2023-12-15 09:45:30 -08:00
Aiden Grossman	3194928c3c	[llvm-exegesis] Refactor MMAP platform-specific preprocessor directives (#75422 ) This patch refactors the MMAP platform-specific preprocessor directives in llvm-exegesis to a single file instead of having duplicate code split across multiple files. These originally got introduced to get buildbots green again due to platform specific failures.	2023-12-14 12:07:46 -08:00
Abhina Sree	ec41462d7a	[SystemZ][z/OS] Add missing strnlen function for z/OS to fix build failures (#75339 ) This patch adds strnlen to the zOSSupport.h file to fix build failures in multiple files.	2023-12-13 13:13:53 -05:00
Aiden Grossman	5830e8e745	[llvm-exegesis] Use explicit error classes for different snippet crashes (#74210 ) This patch switches to using explicit snippet crashes that contain more information about the specific type of error (like the address for a segmentation fault) that occurred. All these new error classes inherit from SnippetExecutionFailure to allow for easily grabbing all of them in addition to filtering for specific types using the standard LLVM error primitives.	2023-12-11 23:15:56 -08:00
Clement Courbet	9017229ecd	[llvm-exegesis]Allow clients to do their own snippet running error ha… (#74711 ) …ndling. Returns an error and a benchmark rather than an error or a benchmark. This allows users to have custom error handling while still being able to inspect the benchmark. Apart from this small API change, this is an NFC. This is an alternative to #74211.	2023-12-08 13:01:01 +01:00
Aiden Grossman	5058d738ba	[llvm-exegesis] Add MAP_FIXED_NOREPLACE definiton MAP_FIXED_NOREPLACE doesn't exist on older kernels, so we need to define it to be MAP_FIXED.	2023-12-07 00:47:04 -08:00
Aiden Grossman	f1963fde9f	Reland "[llvm-exegesis] Add in snippet address annotation (#74218 )" This reverts commit 30d700117b772d94d8474ec56bd6f9cc423fc613. This relands commit 3ab41f912a6c219a93b87c257139822ea07c8863. When I was updating the patch to use llvm::to_integer, I only ran the lit tests and didn't run the unit tests, one of which started to fail. This patch fixes the broken unit test.	2023-12-07 00:20:24 -08:00
Aiden Grossman	30d700117b	Revert "[llvm-exegesis] Add in snippet address annotation (#74218 )" This reverts commit 3ab41f912a6c219a93b87c257139822ea07c8863. Unit tests break after recent changes. Will investigate/reland.	2023-12-06 11:25:03 -08:00
Aiden Grossman	3ab41f912a	[llvm-exegesis] Add in snippet address annotation (#74218 )	2023-12-06 11:05:33 -08:00
Kazu Hirata	c630f95f33	[llvm-exegesis] Remove unnecessary includes (NFC) Identified with clangd.	2023-12-05 23:28:09 -08:00
Kazu Hirata	06c5c27e44	[llvm-exegesis] Stop including array (NFC) Identified with clangd.	2023-12-05 20:58:17 -08:00
Aiden Grossman	077fe97736	[llvm-exegesis] Disable core dumps in subprocess (#74144 ) Core dumps are currently enabled within the llvm-exegesis subprocess executor. This can create a lot of core dumps when going through different snippets that might segfault when experimenting with memory annotations. These core dumps are not really needed as the information about the segfault is reported directly to the user.	2023-12-04 01:47:33 -08:00
Aiden Grossman	8a02b70324	[llvm-exegesis] Refactor ExecutableFunction to use a named constructor (#72837 ) This patch refactors ExecutableFunction to use a named constructor pattern, namely adding the create function, so that errors occurring during the creation of an ExecutableFunction can be propogated back up rather than having to deal with them in report_fatal_error.	2023-11-24 02:15:34 -08:00
Aiden Grossman	3300bc34f7	[llvm-exegesis] Fix race condition in subprocess mode (#72778 ) If there were some scheduler effects where something like the parent process got interrupted while the child process continued to run, there would be nothing blocking it from exiting before the parent process issued a PTRACE_ATTACH call. This would cause transient failures as this occurred pretty rarely. This patch removes the possibility of a transient failure by ensuring that the parent process attaches to the child process before sending the counter file descriptor through the socket, ensuring that the child process has at most progressed to being blocked in the read call for the counter file descriptor.	2023-11-20 01:10:42 -08:00
Aiden Grossman	9426416994	[llvm-exegesis] Add error handling for fork failures (#65186 ) There are still some transient failures on the clang-avx512 builder on the new subprocess memory tests. Some of them seem to be related to an inability to fork, but it's hard to debug currently as there is no explicit error handling for a failed fork call, and nice error reporting for a failed fork is something that we should have regardless.	2023-09-08 00:27:15 -07:00
Aiden Grossman	c4a769ba03	[llvm-exegesis] Print errno on failures in subprocess Some error logging in llvm-exegesis under the subprocess executor just prints a generic failure information rather than any details about the error as we omit printing the string version of errno. This patch adds in printing errno at all relevant points in the subprocess executor that were previously missed. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D157682	2023-09-05 12:41:49 -07:00
Aiden Grossman	34e3bc0b92	[llvm-exegesis] Replace size_t with ssize_t where relevant Currently BenchmarkRunner.cpp stores the return code of recvmsg as size_t. Not only is this incorrect (as recvmsg returns ssize_t), but it also makes the error code check after the statement completely irrelvant as it checks if the number of bytes read is greater than zero (which will always be true for an unsigned type).	2023-08-22 23:44:05 -07:00
Guillaume Chatelet	f70e83af7a	[llvm-exegesis] Don't try to use SYS_rseq if it's not defined. When compiling against recent glibc (>= 2.35) but old kernel headers (< 4.18), `SYS_rseq` is not defined and thus llvm-exegesis fails to build. So also check that `SYS_rseq` is defined before trying to use it. Fixes https://github.com/llvm/llvm-project/issues/64456 Reviewed By: MaskRay, gchatelet Differential Revision: https://reviews.llvm.org/D157189	2023-08-07 07:32:44 +00:00
Markus Böck	822c31a0fe	[llvm-exegesis] Guard `__builtin_thread_pointer` behind a configure check Due to arguably a bug in GCC[0], using `__has_builtin` is not sufficient to check whether `__builtin_thread_pointer` can actually be compiled by GCC. This makes it impossible to compile LLVM with `llvm-exegesis` enabled with e.g. GCC 10 as it does have the builtin, but no implementation for architectures such as x86. This patch works around this issue by making it a cmake configure check whether the builtin can be compiled and used, rather than relying on the broken preprocessor macro. [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952, demonstration: https://godbolt.org/z/9z5nWM6Ef Differential Revision: https://reviews.llvm.org/D155828	2023-07-21 08:03:26 +02:00
Aiden Grossman	f3dfcc5053	[llvm-exegesis] Support older kernel versions in subprocess executor This patch switches from moving the performance counter file descriptor to the child process to socket calls rather than using the pidfd_getfd system call which was introduced in kernel 5.6. This significantly expands the range of kernel versions that are supported. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D154275	2023-07-18 10:42:45 -07:00
Fangrui Song	8f90a5cc45	[llvm-exegesis] Guard __builtin_thread_pointer use with __has_builtin While Clang targets have supported __builtin_thread_pointer for a very long time (e.g., 2007 for AArch32, 2015 for AArch64), for some GCC ports, the support is very new (11.0 for x86[1], while we need to support GCC 7), and many ports haven't implemented __builtin_thread_pointer yet (m68k, powerpc, etc). [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96955	2023-07-17 16:42:11 -07:00
Aiden Grossman	2dcba67007	[llvm-exegesis] Remove unnecessary includes Left some includes around from debugging the last patch that I forgot to take out, so here's the patch taking them out.	2023-07-06 19:58:41 -07:00
Aiden Grossman	75b5541fe5	[llvm-exegesis] Switch to using PTRACE_ATTACH instead of PTRACE_SEIZE This patch switches from using PTRACE_SEIZE within the subprocess benchmark runner for llvm-exegesis as PTRACE_SEIZE was introduced in Linux kernel 3.4. Some LLVM users were reporting build failures as they are using Kernel versions older than 3.4 (such as on CentOS/RHEL 6), hence the patch.	2023-07-06 19:40:19 -07:00
Fangrui Song	46b5b85548	[llvm-exegesis] Adjust GLIBC_INITS_RSEQ condition Commit 9f80831f3627e800709e2434bbbd5bb179b1576e introduced `#include <sys/rseq.h>`, but RSEQ_SIG is only defined by some glibc ports (aarch64,arm,mips,powerpc,s390,x86), causing other hosts (e.g., riscv64, loongarch64) to fail to build. Reviewed By: aidengrossman, xen0n Differential Revision: https://reviews.llvm.org/D153938	2023-06-28 00:23:38 -07:00
Aiden Grossman	9b684ecde6	[llvm-exegesis] Fix warning and hoist statement of arch-specific section My last patch broke most of the builders that aren't currently running at least Kernel 5.6 as there was a variable used later on inside a region that required that kernel version. Also fixes a minor warning left over from a bad merge.	2023-06-27 07:01:20 +00:00
Aiden Grossman	9f80831f36	[llvm-exegesis] Add support for using memory annotations This patch adds in support for using memory annotations in the subprocess execution mode.	2023-06-27 06:52:33 +00:00
Aiden Grossman	e802dff0f0	[llvm-exegesis] Introduce Subprocess Executor Mode This patch introduces the subprocess executor mode. Currently, this new mode doesn't do anything fancy, just executing the same code that the inprocess executor would do, but within a subprocess. This sets up the ability to add in many more memory-related features in the future. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D151021	2023-06-26 01:43:19 +00:00
Aiden Grossman	309950515c	Revert "[llvm-exegesis] Add ability to assign perf counters to specific PID" Revert "[llvm-exegesis] Introduce Subprocess Executor Mode" This reverts commit 5e9173c43a9b97c8614e36d6f754317f731e71e9. This reverts commit 4d618b52f6e05e41d35f56653cb36bf7d4dc794e. Reverting the PID commit as it is currently breaking MinGW builds and the way I'm checking for the presence of pid_t needs to be fixed and I need to do some testing. The subprocess executor mode patch is a dependent patch so also needs to be reverted and also needs some work as it is currently failing tests where libpfm is installed and the kernel version is less than 5.6.	2023-06-22 18:05:01 +00:00
Aiden Grossman	4d618b52f6	[llvm-exegesis] Introduce Subprocess Executor Mode This patch introduces the subprocess executor mode. Currently, this new mode doesn't do anything fancy, just executing the same code that the inprocess executor would do, but within a subprocess. This sets up the ability to add in many more memory-related features in the future.	2023-06-21 07:55:28 +00:00
Aiden Grossman	08aeb7c35d	Revert "[llvm-exegesis] Introduce Subprocess Executor Mode" This reverts commit 0d4ef4ff01addbb40b9122a00d6b2f23104cbb3b. This was causing build failures on certain platforms when built with -Werror due to unused variable warnings in addition to causing build failures on Linux systems with older kernel versions as kernels prior to v5.15 don't support sys_pidfd_getpid. Reverting as I need to setup a system to properly test the rest of the patches in this series. Also reverts 8c6668fa42dba59ddc286ba256d71c1b9c5228b8 which fixed the first issue so that the patch can actually be reverted.	2023-06-21 02:29:48 +00:00
Jie Fu	8c6668fa42	[llvm-exegesis] Fix -Wunused-variable in BenchmarkRunner.cpp (NFC) /data/llvm-project/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:275:9: error: unused variable 'ParentPIDFD' [-Werror,-Wunused-variable] int ParentPIDFD = syscall(SYS_pidfd_open, ParentPID, 0); ^ 1 error generated.	2023-06-21 10:20:36 +08:00
Aiden Grossman	0d4ef4ff01	[llvm-exegesis] Introduce Subprocess Executor Mode This patch introduces the subprocess executor mode. Currently, this new mode doesn't do anything fancy, just executing the same code that the inprocess executor would do, but within a subprocess. This sets up the ability to add in many more memory-related features in the future. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D151021	2023-06-21 02:00:13 +00:00
Aiden Grossman	72df12cce2	[llvm-exegesis] Refactor FunctionExecutorImpl and create factory In order to better support adding in new implementations of FunctionExecutor, this patch makes some small changes so that it is easier to add new ones in. FunctionExecutorImpl is renamed to InProcessFunctionExecutorImpl to better reflect how it will be placed relative to the soon-to-be introduced subprocess executor and a new function is created to create executors so selection can be done more easily. In addition, a new CLI flag, -execution-mode, which can be used to select between the different executors. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D151019	2023-06-21 00:04:48 +00:00
Pavel Kosov	27f37db76a	[llvm-exegesis] Use MCJIT only for execution Initially, llvm-exegesis was generating the benchmark code for the host CPU to execute it inside its own process. Thus, MCJIT was reused for fetching function's bytes to fill the assembled_snippet field in the benchmark report. Later, the --mtriple and --benchmark-phase command line options were introduced that are handy for testing snippet generation even if snippet execution is not possible. In that setup, MCJIT is asked to parse an object file for a foreign CPU or operating system that is probably not guaranteed to succeed and was actually observed to fail in https://reviews.llvm.org/D145763. This commit implements a much simplified function's code fetching, assuming the benchmark function is the only function in the object file and it spans across the entire text section (note that MCJIT-based code has more or less the same assumption - see TrackingSectionMemoryManager class). ~~~ Huawei RRI, OS Lab Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D148921	2023-06-16 10:38:52 +03:00
Pavel Kosov	8e0ee5ab9f	[llvm-exegesis] Allow setting dump file name This will be used for writing test cases. ~~ Huawei RRI, OS Lab Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D147700	2023-04-19 10:59:07 +03:00
Jie Fu	62a0049ae4	[llvm-exegesis] Fix -Wc++98-compat-extra-semi in BenchmarkRunner.cpp (NFC) /data/llvm-project/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:66:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-W c++98-compat-extra-semi] }; ^ 1 error generated.	2023-04-14 15:46:09 +08:00
Aiden Grossman	d22805940a	[llvm-exegesis] Refactor common parts out of FunctionExecutorImpl This patch refactors some code out of FunctionExecutorImpl into the base class that should be common across all implementations of FunctionExecutor. Particularly, this patch factors out accumulateCounterValues, and also factors out runAndSample, moving implementation specific code into a new runWithCounter function. This makes adding new implementations of FunctinExecutor easier. Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D148079	2023-04-14 07:37:28 +00:00
Aiden Grossman	999a8b8ce9	[llvm-exegesis][NFC] remove runAndMeasure This completes the FIXME listed in FunctionExecutor in regards to deprecating this function. It simply makes the appropriate call into runAndSample and grabs the first counter value. This patch completely removes the function, moving that logic into the callers (currently only uopsBenchmarkRunner). This makes creating new FunctionExecutors easier as an implementation no longer needs to worry about this detail. Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D147878	2023-04-14 07:16:34 +00:00
Aiden Grossman	389bf5d870	[llvm-exegesis] Refactor InstructionBenchmark to Benchmark When llvm-exegesis was first introduced, it only supported benchmarking individual instructions, hence the name for the data structure storing the data corresponding to a benchmark being called InstructionBenchmark made sense. However, now that benchmarking arbitrary snippets is supported, InstructionBenchmark doesn't correspond to a single instruction. This patch refactors InstructionBenchmark to be called Benchmark to clean up this little bit of technical debt. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D146884	2023-03-27 08:14:36 +00:00
Guillaume Chatelet	bb37cab8a5	[llvm-exegesis][NFC] Update benchmark phase naming to match documentation	2023-01-06 13:40:46 +00:00
Roman Lebedev	e0ad2af691	[exegesis] "Skip codegen" dry-run mode While "skip measurements mode" is super useful for test coverage, i've come to discover it's trade-offs. It still calls back-end to actually codegen the target assembly, and that is what is taking 80%+ of the time regardless of whether or not we skip the measurements. On the other hand, just being able to see that exegesis can come up with a snippet to measure something, is already very useful, and takes maybe a second for a all-opcode sweep. Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D140702	2023-01-05 17:47:17 +03:00
Roman Lebedev	dbc76ef791	[NFC][llvm-exegesis] Benchmark: move DumpObjectToDisk handling into `runConfiguration()` `getRunnableConfiguration()` may be executed in parallel, and then this the output would become even less useful.	2022-12-18 17:52:04 +03:00
Roman Lebedev	1dd4a6aac6	[NFC][llvm-exegesis] `BenchmarkRunner`: split `runConfiguration()` into `getRunnableConfiguration()` + `runConfiguration()` We can run as many `getRunnableConfiguration()` in parallel as we want, but `runConfiguration()` must be run completely standalone from everything. This is a step towards enabling threading.	2022-12-18 04:23:20 +03:00
Roman Lebedev	118b49a09b	[NFCI][llvm-exegesis] `BenchmarkRunner::runConfiguration()`: extract `assembleSnippet()` helper	2022-12-17 23:14:53 +03:00
Roman Lebedev	41dd767fee	[NFC][llvm-exegesis] `BenchmarkRunner::runConfiguration()`: deduplicate `DumpObjectToDisk` handling Always assemble into buffer, that is then optionally dumped into file.	2022-12-17 23:14:53 +03:00
Roman Lebedev	0db620aa30	[NFC][llvm-exegesis] `BenchmarkRunner::runConfiguration()`: reformat	2022-12-17 23:14:53 +03:00
Roman Lebedev	17e202424c	[NFCI][llvm-exegesis] Extract 'Min' repetition handling from `BenchmarkRunner` into it's caller If `BenchmarkRunner::runConfiguration()` deals with more than a single repetitor, tasking will be less straight-forward to implement. But i think dealing with that in it's callee is even more readable.	2022-12-17 23:14:52 +03:00
Roman Lebedev	7a76140220	[llvm-exegesis] Dry run mode Sometimes we only want to ensure that we can produce snippets (all the way through `SnippetRepetitor`!), but don't care for the execution. E.g. all of our tests are this way. I've built LLVM without PFM and removed my CPU from `X86PfmCounters.td`, and this produces the expected results in that configuration. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D139448	2022-12-07 20:15:43 +03:00
Roman Lebedev	78eaff2ef8	[llvm-exegesis] Loop unrolling for loop snippet repetitor mode I really needed this, like, factually, yesterday, when verifying dependency breaking idioms for AMD Zen 3 scheduler model. Consider the following example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 } error: '' info: '' assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3 ... ``` What does it tell us? So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle? That doesn't seem right. That's even less than there are pipes supporting this type of op. Now, second example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` Now that's just worse. Due to the looping, the throughput completely plummeted, and now we can only do a single instruction/cycle!? That's not great. And final example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000 Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x (loop-body-size/instruction count in snippet), and run a loop with 1000 iterations over that duplicated/unrolled snippet, the measured throughput goes through the roof, up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle! Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D102522	2021-05-25 12:08:27 +03:00
Kazu Hirata	441650d589	[tools] Use llvm::append_range (NFC)	2021-01-05 21:15:56 -08:00

1 2 3

122 Commits