This reverts commit 2cd20c255684257b86940bdda6861897f0bf3c00.
This relands commit 9886788a8a500a1b429a6db64397c849b112251c.
This was causing more buildbot failures due to getcpu not being
available with glibc <=2.29. This patch fixes that by directly making
the syscall, assuming the syscall number macro is available.
This reverts commit 5e3d48a68096a0017a0fa4bb89f2d48767c8a7e4.
This relands commit 9886788a8a500a1b429a6db64397c849b112251c.
This was originally causing build failures on more esoteric platforms
that have different definitions of getcpu. This is only intended to be
supported on x86-64 currently, so just use preprocessor definitions to
special case the function.
This patch adds in support for pinning a benchmarking process to a
specific CPU (in the subprocess benchmarking mode on Linux). This is
intended to be used in environments where a certain set of CPUs is
isolated from the scheduler using something like cgroups and thus should
present less potential for noise than normal. This also opens up the
door for doing multithreaded benchmarking as we can now pin benchmarking
processes to specific CPUs that we know won't interfere with each other.
llvm-exegesis currently links and initializes all targets, even though
most of them are not supported by llvm-exegesis. This is particularly
unfortunate because llvm-exegesis does not support the LLVM dylib, so
llvm-exegesis essentially ends up doing a complete relink of all of
LLVM, which is not fun if you use LTO.
Instead, only link and initialize the targets that are part of
LLVM_EXEGESIS_TARGETS.
This patch removes an explicit use of dbgs() and exit when logging an
error to explicitly use ExitOnErr as this is the canonical way to do
this within exegesis.
This patch adds a LLVM-EXEGESIS-LOOP-REGISTER snippet annotation which
allows a user to specify the register to use for the loop counter in the
loop repetition mode. This allows for executing snippets that don't work
with the default value (currently R8 on X86).
This patch removes the exegesis:: prefix within the exegesis namespace
in llvm-exegesis.cpp as it isn't necessary due to the code already being
wrapped in the namespace.
…table.
All data is derived from a single table rather than being spread out
over an enum, a table and the main entry point.
This is intended as a replacement for #82092.
This patch replaces --num-repetitions with --min-instructions to make it
more clear that the value refers to the minimum number of instructions
in the final assembled snippet rather than the number of repetitions of
the snippet. This patch also refactors some llvm-exegesis internal
variable names to reflect the name change.
Fixes#76890.
This patch adds two new repetition modes to llvm-exegesis, particularly
loop and duplicate repetition modes of what I am terming the middle half
repetition mode. The middle half repetition mode essentially runs each
measurement twice, one with twice the number of iterations of the other.
These two measurements are then agregated by taking their difference.
This subtracts away any setup/overhead that is unrelated to the code in
the snippet, providing more accurate results.
Using this mode on a couple toy examples, I am able to get exact
(integer) throughput values on all of them in contrast to the default
duplicate/loop repetition modes which show a little bit of noise on the
snippet value.
This patch removes the llvm:: prefix within llvm-exegesis where it is
not necessary. This is most occurrences of the prefix within exegesis as
exegesis is within the llvm namespace. This patch makes things more
consistent as the vast majority of the code did not use the llvm::
prefix for anything.
This patch adds support for additional types of validation counters and
also adds mappings between these new validation counter types and
physical counters on the hardware for microarchitectures that I have the
ability to test on.
This patch adds support for validation counters. Validation counters can
be used to measure events that occur during snippet execution like cache
misses to ensure that certain assumed invariants about the benchmark
actually hold. Validation counters are setup within a perf event group,
so are turned on and off at exactly the same time as the "group leader"
counter that measures the desired value.
This patch refactors InstrBenchmark to BenchmarkResult. Most of the
renaming away from things prefixed with Instr was performed in a
previous commit, but this specific instance was missed.
This patch switches to using explicit snippet crashes that contain more
information about the specific type of error (like the address for a
segmentation fault) that occurred. All these new error classes inherit
from SnippetExecutionFailure to allow for easily grabbing all of them in
addition to filtering for specific types using the standard LLVM error
primitives.
…ndling.
Returns an error *and* a benchmark rather than an error *or* a
benchmark. This allows users to have custom error handling while still
being able to inspect the benchmark.
Apart from this small API change, this is an NFC.
This is an alternative to #74211.
Currently, the llvm-exegesis LatencyBenchmarkRunner repeats the
benchmark several times (currently 30) and then aggregates the result to
deal with noise in the measurement process. With this patch, the number
of repetitions to perform is made configurable rather than left as a
static number. This allows for significantly faster execution in
situations where someone is performing a task like experimenting with
memory annotations where the exact cycle counts might not be useful, and
also allows for increased precision when desired.
This reverts commit 30d700117b772d94d8474ec56bd6f9cc423fc613.
This relands commit 3ab41f912a6c219a93b87c257139822ea07c8863.
When I was updating the patch to use llvm::to_integer, I only ran the
lit tests and didn't run the unit tests, one of which started to fail.
This patch fixes the broken unit test.
When using the subprocess execution mode and dummy perf counters
currently, cryptic errors will be printed out related to not being able
to send the file descriptor through a socket. This patch prints out an
explicit message that this configuration isn't (currently) supported.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D157686
When generating snippets for AArch64 with --opcode-index=-1, the code
generator asserts on opcodes that are not supported according to CPU
features.
The same assertion can be triggered even when generating a serial
snippet for a supported opcode if SERIAL_VIA_NON_MEMORY_INSTR execution
mode is used and an unsupported instruction is chosen as the "other
instruction". Unlike the first case, this one may result in flaky
failures because the other instruction is randomly chosen from the
instructions suitable for serializing execution.
This patch adjusts TableGen emitter for *GenInstrInfo.inc to make
possible to query for opcode availability instead of just asserting on
unsupported ones.
~~
Huawei RRI, OS Lab
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D146303
This patch adds memory annotation parsing to llvm-exegesis. The memory
annotations cannot be used currently, but this allows for using parsed
memory annotations within a FunctionExecutorImpl to set up a specified
execution environment.
This patch introduces the subprocess executor mode. Currently, this new
mode doesn't do anything fancy, just executing the same code that the
inprocess executor would do, but within a subprocess. This sets up the
ability to add in many more memory-related features in the future.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D151021
Revert "[llvm-exegesis] Introduce Subprocess Executor Mode"
This reverts commit 5e9173c43a9b97c8614e36d6f754317f731e71e9.
This reverts commit 4d618b52f6e05e41d35f56653cb36bf7d4dc794e.
Reverting the PID commit as it is currently breaking MinGW builds and
the way I'm checking for the presence of pid_t needs to be fixed and I
need to do some testing. The subprocess executor mode patch is a
dependent patch so also needs to be reverted and also needs some work as
it is currently failing tests where libpfm is installed and the kernel
version is less than 5.6.
This patch introduces the subprocess executor mode. Currently, this new
mode doesn't do anything fancy, just executing the same code that the
inprocess executor would do, but within a subprocess. This sets up the
ability to add in many more memory-related features in the future.
This reverts commit 0d4ef4ff01addbb40b9122a00d6b2f23104cbb3b.
This was causing build failures on certain platforms when built with
-Werror due to unused variable warnings in addition to causing build
failures on Linux systems with older kernel versions as kernels prior to
v5.15 don't support sys_pidfd_getpid. Reverting as I need to setup a
system to properly test the rest of the patches in this series.
Also reverts 8c6668fa42dba59ddc286ba256d71c1b9c5228b8 which fixed the
first issue so that the patch can actually be reverted.
This patch introduces the subprocess executor mode. Currently, this new
mode doesn't do anything fancy, just executing the same code that the
inprocess executor would do, but within a subprocess. This sets up the
ability to add in many more memory-related features in the future.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D151021
In order to better support adding in new implementations of
FunctionExecutor, this patch makes some small changes so that it is
easier to add new ones in. FunctionExecutorImpl is renamed to
InProcessFunctionExecutorImpl to better reflect how it will be placed
relative to the soon-to-be introduced subprocess executor and a new
function is created to create executors so selection can be done more
easily. In addition, a new CLI flag, -execution-mode, which can be used
to select between the different executors.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D151019
Performance counters may be unavailable due to various reasons (such as
access restriction via sysctl properties or the CPU model being unknown
to libpfm). On the other hand, for debugging llvm-exegesis itself it is
still useful to be able to run generated code snippets to ensure that
the snippet does not crash at run time.
The --use-dummy-perf-counters command line option makes llvm-exegesis
behave just as usual except for using fake event counts instead of asking
the kernel for actual values.
~~
Huawei RRI, OS Lab
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D146301
When llvm-exegesis was first introduced, it only supported benchmarking
individual instructions, hence the name for the data structure storing
the data corresponding to a benchmark being called InstructionBenchmark
made sense. However, now that benchmarking arbitrary snippets is
supported, InstructionBenchmark doesn't correspond to a single
instruction. This patch refactors InstructionBenchmark to be called
Benchmark to clean up this little bit of technical debt.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D146884
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
While "skip measurements mode" is super useful for test coverage,
i've come to discover it's trade-offs. It still calls back-end
to actually codegen the target assembly, and that is what is taking
80%+ of the time regardless of whether or not we skip the measurements.
On the other hand, just being able to see that exegesis can come up
with a snippet to measure something, is already very useful,
and takes maybe a second for a all-opcode sweep.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D140702
By default, all benchmark results are analysed, but sometimes it may be useful
to only look at those that to not involve memory, or vice versa. This option
allows to either keep all benchmarks, or filter out (ignore) either all the
ones that do involve memory (involve instructions that may read or write to
memory), or the opposite, to only keep such benchmarks.
Personally, so far i have found the benchmarks that do involve memory
to have dubious results. But the ones that do not involve memory,
are generally actionable. So i would like to have a toggle to declutter results.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D140734
This change is rather more invasive than intended. The main intention
here is to make CommandLine.cpp not rely on llvm/Support/Host.h. Right
now, this reliance is only in 3 superficial places:
- Choosing how to expand response files (in two places)
- Printing the default triple and current CPU in `--version` output.
The built in version system has a method for adding "extra version
printers", commonly used by several tools (such as llc) to report the
registered targets in the built version of LLVM. It was reasonably easy
to move the logic for printing the default triple and current CPU into
a similar function, and register it with any relevant binaries.
The incompatible change here is that now, even if
LLVM_VERSION_PRINTER_SHOW_HOST_TARGET_INFO is defined, most binaries
will no longer print out the default target triple and cpu when provided
with `--version`, for instance llvm-as and llvm-dis. This breakage is
intended, but the changes in this patch keep printing the default target
and detected in `llc` and `opt` as these were remarked as important
binaries in the LLVM install.
The change to expanding response files may also be controversial, but I
believe that these macros should correspond exactly to the host triple
introspection used before.
Differential Revision: https://reviews.llvm.org/D137837
This was a regression from 17e202424c021fd903950fec7a8b6cca2d83abce.
Previously we'd gracefully handle missing measurements,
but that handling got accidentally lost during the code move,
and we'd assert.
What we want to do, is to discard all measurements (from all repetitors
in a given config) if any of them failed, but do append the snippet,
and do emit the empty measurement.
We can run as many `getRunnableConfiguration()` in parallel as we want,
but `runConfiguration()` must be run *completely* standalone from everything.
This is a step towards enabling threading.