114 Commits

Author SHA1 Message Date
Pavel Kosov
b02e2ed7ac [llvm-exegesis] Make possible to execute snippets without perf counters
Performance counters may be unavailable due to various reasons (such as
access restriction via sysctl properties or the CPU model being unknown
to libpfm). On the other hand, for debugging llvm-exegesis itself it is
still useful to be able to run generated code snippets to ensure that
the snippet does not crash at run time.

The --use-dummy-perf-counters command line option makes llvm-exegesis
behave just as usual except for using fake event counts instead of asking
the kernel for actual values.

~~

Huawei RRI, OS Lab

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D146301
2023-04-06 13:08:48 +03:00
Aiden Grossman
389bf5d870 [llvm-exegesis] Refactor InstructionBenchmark to Benchmark
When llvm-exegesis was first introduced, it only supported benchmarking
individual instructions, hence the name for the data structure storing
the data corresponding to a benchmark being called InstructionBenchmark
made sense. However, now that benchmarking arbitrary snippets is
supported, InstructionBenchmark doesn't correspond to a single
instruction. This patch refactors InstructionBenchmark to be called
Benchmark to clean up this little bit of technical debt.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D146884
2023-03-27 08:14:36 +00:00
Archibald Elliott
d768bf994f [NFC][TargetParser] Replace uses of llvm/Support/Host.h
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
2023-02-10 09:59:46 +00:00
Guillaume Chatelet
bb37cab8a5 [llvm-exegesis][NFC] Update benchmark phase naming to match documentation 2023-01-06 13:40:46 +00:00
Roman Lebedev
e0ad2af691
[exegesis] "Skip codegen" dry-run mode
While "skip measurements mode" is super useful for test coverage,
i've come to discover it's trade-offs. It still calls back-end
to actually codegen the target assembly, and that is what is taking
80%+ of the time regardless of whether or not we skip the measurements.

On the other hand, just being able to see that exegesis can come up
with a snippet to measure something, is already very useful,
and takes maybe a second for a all-opcode sweep.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D140702
2023-01-05 17:47:17 +03:00
Roman Lebedev
6a67b633b9
[exegesis] Analysis: filtering for benchmark results
By default, all benchmark results are analysed, but sometimes it may be useful
to only look at those that to not involve memory, or vice versa. This option
allows to either keep all benchmarks, or filter out (ignore) either all the
ones that do involve memory (involve instructions that may read or write to
memory), or the opposite, to only keep such benchmarks.

Personally, so far i have found the benchmarks that do involve memory
to have dubious results. But the ones that do not involve memory,
are generally actionable. So i would like to have a toggle to declutter results.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D140734
2023-01-04 21:16:11 +03:00
Roman Lebedev
498704d511
[NFC][exegesis] By default, don't dump objects to disk
It's a strictly-developer feature, which is useless most of the time.

Fixes https://github.com/llvm/llvm-project/issues/59082

Reviewed By: RKSimon, gchatelet

Differential Revision: https://reviews.llvm.org/D140700
2022-12-28 16:56:54 +03:00
Roman Lebedev
03aa6b9197
[NFC][llvm-exegesis] Ensure that target options show up in --help
Fixes https://github.com/llvm/llvm-project/issues/59377
2022-12-21 03:38:30 +03:00
Archibald Elliott
142aa1bdd1 [Support] Move Target/CPU Printing out of CommandLine
This change is rather more invasive than intended. The main intention
here is to make CommandLine.cpp not rely on llvm/Support/Host.h. Right
now, this reliance is only in 3 superficial places:
- Choosing how to expand response files (in two places)
- Printing the default triple and current CPU in `--version` output.

The built in version system has a method for adding "extra version
printers", commonly used by several tools (such as llc) to report the
registered targets in the built version of LLVM. It was reasonably easy
to move the logic for printing the default triple and current CPU into
a similar function, and register it with any relevant binaries.

The incompatible change here is that now, even if
LLVM_VERSION_PRINTER_SHOW_HOST_TARGET_INFO is defined, most binaries
will no longer print out the default target triple and cpu when provided
with `--version`, for instance llvm-as and llvm-dis. This breakage is
intended, but the changes in this patch keep printing the default target
and detected in `llc` and `opt` as these were remarked as important
binaries in the LLVM install.

The change to expanding response files may also be controversial, but I
believe that these macros should correspond exactly to the host triple
introspection used before.

Differential Revision: https://reviews.llvm.org/D137837
2022-12-20 09:56:14 +00:00
Roman Lebedev
3ebfc88637
[NFC][llvm-exegesis] Improve getOpcodesOrDie()
We already have opcode name -> opcode index map, use it.
Reserve memory where appropriate.
2022-12-18 20:07:02 +03:00
Roman Lebedev
f95ddf0ee7
[llvm-exegesis] Benchmark: gracefully handle lack of configurations to run
Otherwise the progress meter would assert.
2022-12-18 20:07:02 +03:00
Roman Lebedev
18da9a0cb3
[llvm-exegesis] Fix 'min' repetition mode in presence of missing measurements
This was a regression from 17e202424c021fd903950fec7a8b6cca2d83abce.
Previously we'd gracefully handle missing measurements,
but that handling got accidentally lost during the code move,
and we'd assert.

What we want to do, is to discard all measurements (from all repetitors
in a given config) if any of them failed, but do append the snippet,
and do emit the empty measurement.
2022-12-18 17:52:04 +03:00
Roman Lebedev
dbc76ef791
[NFC][llvm-exegesis] Benchmark: move DumpObjectToDisk handling into runConfiguration()
`getRunnableConfiguration()` may be executed in parallel,
and then this the output would become even less useful.
2022-12-18 17:52:04 +03:00
Roman Lebedev
81b35aa69f
[NFC][llvm-exegesis] Extract runBenchmarkConfigurations() out of benchmarkMain()
`benchmarkMain()` is already bigger than it should be,
let's extract the important chunk before making it even larger.
2022-12-18 04:23:20 +03:00
Roman Lebedev
1dd4a6aac6
[NFC][llvm-exegesis] BenchmarkRunner: split runConfiguration() into getRunnableConfiguration() + runConfiguration()
We can run as many `getRunnableConfiguration()` in parallel as we want,
but `runConfiguration()` must be run *completely* standalone from everything.
This is a step towards enabling threading.
2022-12-18 04:23:20 +03:00
Roman Lebedev
a340019113
[llvm-exegesis] Unbreak --benchmarks-file=<f>
I'm absolutely flabbergasted by this.
I was absolutely sure this worked.
But apparently not.

When outputting to the file, we'd write a single `InstructionBenchmark`
to it, and then close the file. And for the next `InstructionBenchmark`,
we'd reopen the file, truncating it in process,
and thus the first `InstructionBenchmark` would be gone...
2022-12-18 03:50:10 +03:00
Roman Lebedev
17e202424c
[NFCI][llvm-exegesis] Extract 'Min' repetition handling from BenchmarkRunner into it's caller
If `BenchmarkRunner::runConfiguration()` deals with more than a single
repetitor, tasking will be less straight-forward to implement.
But i think dealing with that in it's callee is even more readable.
2022-12-17 23:14:52 +03:00
Roman Lebedev
233ca84a25
[exegesis] Benchmark: provide optional progress meter / ETA
Now that `--opcode-index=-1` is mostly stable,
and i can migrate off of my custom tooling that emulated it,
there comes a bit of confusion as to the status of the run.

It is normal for the single all-opcode run to take ~3 minutes,
and it's a bit more than one can be comfortable with,
without having some sort of visual indication of the progress.

Thus, i present:
```
$ ./bin/llvm-exegesis -mode=inverse_throughput --opcode-index=-1 --benchmarks-file=/dev/null --dump-object-to-disk=0 --measurements-print-progress --skip-measurements
<...>
XAM_Fp80: unsupported opcode: pseudo instruction
XBEGIN: Unsupported opcode: isPseudo/usesCustomInserter
XBEGIN_2: Unsupported opcode: isBranch/isIndirectBranch
XBEGIN_4: Unsupported opcode: isBranch/isIndirectBranch
XCH_F: unsupported second-form X87 instruction
Processing...   1%, ETA 02:10
Processing...   2%, ETA 02:03
Processing...   3%, ETA 02:00
Processing...   4%, ETA 01:57
Processing...   5%, ETA 01:54
Processing...   6%, ETA 01:53
Processing...   7%, ETA 01:51
Processing...   8%, ETA 01:50
Processing...   9%, ETA 01:49
Processing...  10%, ETA 01:48
Processing...  11%, ETA 01:46
Processing...  12%, ETA 01:45
Processing...  13%, ETA 01:44
Processing...  14%, ETA 01:43
Processing...  15%, ETA 01:42
Processing...  16%, ETA 01:42
Processing...  17%, ETA 01:41
```

As usual, the ETA estimation is statically-insignificant,
and is a lie/does not converge at least until 50% through.
It would be nice to have an actual progress indicator like in LIT,
but i'm not sure we have such a luxury in C++ form in LLVM codebase already.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D139797
2022-12-13 15:30:34 +03:00
Roman Lebedev
f6b1d88527
[Exegesis] Unbreak running benchmarks on local machine
`LLVMState::Create()` would autodetect if `native` is specified,
but the default was `""`
2022-12-08 03:48:24 +03:00
Roman Lebedev
7a76140220
[llvm-exegesis] Dry run mode
Sometimes we only want to ensure that we can produce snippets (all the way
through `SnippetRepetitor`!), but don't care for the execution.
E.g. all of our tests are this way.

I've built LLVM without PFM and removed my CPU from `X86PfmCounters.td`,
and this produces the expected results in that configuration.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D139448
2022-12-07 20:15:43 +03:00
Philip Reames
47afaf2eb0 [llvm-exegesis] Initialize all supported targets
Enable registration of multiple exegesis targets at once. Use idiomatic approach to defining target select macros, but leave code in the llvm-mca sub-directories for now.

This is a stepping stone towards allowing llvm-exegesis benchmarking via simulator or testing in non-target dependent tests.

Differential Revision: https://reviews.llvm.org/D133605
2022-09-22 09:02:22 -07:00
Clement Courbet
e52f8406e8 Re-land "[llvm-exegesis] Support analyzing results from a different target."
With Mips fixes.

This reverts commit 7daf60e3440b22b79084bb325d823aa3ad8df0f3.
2022-09-22 11:39:52 +02:00
Clement Courbet
7daf60e344 Revert "[llvm-exegesis] Support analyzing results from a different target."
Breaks MIPS compile.

This reverts commit cc61c822e05c51e494c50d1e72f963750116ef08.
2022-09-22 11:19:01 +02:00
Clement Courbet
cc61c822e0 [llvm-exegesis] Support analyzing results from a different target.
We were using the native triple to parse the benchmarks. Use the triple
from the benchmarks file.

Right now this still only allows analyzing files produced by the current
target until D133605 is in.

This also makes the `Analysis` class much less ad-hoc.

Differential Revision: https://reviews.llvm.org/D133697
2022-09-22 11:11:18 +02:00
Clement Courbet
7053e863a1 [llvm-exegesis][NFC] Use factory function for LlvmState.
This allows failing more gracefully.
2022-09-12 14:19:33 +02:00
Philip Reames
4d50a39240 [llvm-exegesis] Cross compile all enabled targets
llvm-exegesis is rather odd in the LLVM ecosystem in code is selectively compiled based on the native machine. LLVM is cross compiler by default, so this stands out as odd. It's also less then helpful when working on code for a target other than your native dev environment.

This change only changes the build setup. A later change will enable -march support to allow actual benchmarking under e.g. simulators in a cross compilation environment.

Differential Revision: https://reviews.llvm.org/D133150
2022-09-09 09:04:55 -07:00
Reid Kleckner
89b57061f7 Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454
2021-10-08 14:51:48 -07:00
Roman Lebedev
e030f808ec
[Exegesis] Native clusterization: sub-partition by sched class id
Currently native clusterization simply groups all benchmarks
by the opcode of key instruction, but that is suboptimal in certain cases,
e.g. where we can already tell that the particular instructions
already resolve into different sched classes.
2021-09-07 17:54:37 +03:00
Roman Lebedev
78eaff2ef8
[llvm-exegesis] Loop unrolling for loop snippet repetitor mode
I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.

Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error:           ''
info:            ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...

```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.

Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?

That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```

So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D102522
2021-05-25 12:08:27 +03:00
Nico Weber
ba7a92c01e [Support] Don't include VirtualFileSystem.h in CommandLine.h
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.

(Already remarked by jhenderson on D70769.)

No behavior change.

Differential Revision: https://reviews.llvm.org/D100957
2021-04-21 10:19:01 -04:00
Clement Courbet
9e9f991ac0 [llvm-exegesis] Honor -mcpu in analysis mode.
This is useful to set the baseline model for an unknown CPU.

Fixes PR50013.

Differential Revision: https://reviews.llvm.org/D100743
2021-04-19 10:44:28 +02:00
Qiu Chaofan
9d2f06445f [llvm-exegesis] Ignore instructions using custom inserter
Some instructions defined in table-gen files sets usesCustomInserter
bit, which means it has to be lowered by target code and isn't actually
valid instruction at MC level. So we should treat them like pseudo
instructions.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D94898
2021-02-19 17:04:27 +08:00
Ella Ma
1756d67934 [llvm][clang][mlir] Add checks for the return values from Target::createXXX to prevent protential null deref
All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.

The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.

Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.

Reviewed By: tejohnson, MaskRay, jpienaar

Differential Revision: https://reviews.llvm.org/D91410
2020-11-21 21:04:12 -08:00
Vy Nguyen
cb3fd715f3 Reland rG4fcd1a8e6528:[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking.
This is mostly for the benefit of the LBR latency mode.
Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency.

      Differential Revision: https://reviews.llvm.org/D85254

New change: Updated lit.local.cfg to use pass the right argument to llvm-exegesis to actually request the LBR mode.

Differential Revision: https://reviews.llvm.org/D88670
2020-10-01 12:21:16 -04:00
Michael Liao
2c9dc7bbbf Revert "[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking."
This reverts commit 4fcd1a8e6528ca42fe656f2745e15d2b7f5de495 as
`llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s` failed on hosts
without LBR supported if the build has LIBPFM enabled. On that host,
`perf_event_open` fails with `EOPNOTSUPP` on LBR config. That change's
basic assumption

> If this is run on a non-supported hardware, it will produce all zeroes for latency.

could not stand as `perf_event_open` system call will fail if the
underlying hardware really don't have LBR supported.
2020-09-30 23:15:35 -04:00
Vy Nguyen
4fcd1a8e65 [llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking.
This is mostly for the benefit of the LBR latency mode.
Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency.

Differential Revision: https://reviews.llvm.org/D85254
2020-09-30 12:25:59 -04:00
Jinsong Ji
29ec5901c9 [llvm-exegesis] Add whitespace between words in error message 2020-09-24 18:20:57 +00:00
Vy Nguyen
ee7caa7593 Reland [llvm-exegesis] Add benchmark latency option on X86 that uses LBR for more precise measurements.
Starting with Skylake, the LBR contains the precise number of cycles between the two
        consecutive branches.
        Making use of this will hopefully make the measurements more precise than the
        existing methods of using RDTSC.

                Differential Revision: https://reviews.llvm.org/D77422

New change: check for existence of field `cycles` in perf_branch_entry before enabling this mode.
This should prevent compilation errors when building for older kernel whose headers don't support it.
2020-07-27 12:38:05 -04:00
Clement Courbet
6bddd099ac Revert "[llvm-exegesis] Add benchmark latency option on X86 that uses LBR for more precise measurements."
From @erichkeane:
```
This patch doesn't seem to build for me:
/iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp: In function ‘llvm::Error llvm::exegesis::parseDataBuffer(const char*, size_t, const void*, const void*, llvm::SmallVector<long int, 4>*)’:
/iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:99:37: error: ‘struct perf_branch_entry’ has no member named ‘cycles’

CycleArray->push_back(Entry.cycles);
I'm on RHEL7, so I have kernel 3.10, so it doesn't have 'cycles'.

According ot this: https://elixir.bootlin.com/linux/v4.3/source/include/uapi/linux/perf_event.h#L963 kernel 4.3 is the first time that 'cycles' appeared in this structure.
```
2020-07-17 16:55:17 +02:00
Vy Nguyen
1360e140cc [llvm-exegesis] Add benchmark latency option on X86 that uses LBR for more precise measurements.
Starting with Skylake, the LBR contains the precise number of cycles between the two
    consecutive branches.
    Making use of this will hopefully make the measurements more precise than the
    existing methods of using RDTSC.

            Differential Revision: https://reviews.llvm.org/D77422
2020-07-16 12:12:46 -04:00
Vy Nguyen
e086a39c11 [llvm-exegesis] Let Counter returns up to 16 entries
LBR contains (up to) 16 entries for last x branches and the X86LBRCounter (from D77422) should be able to return all those.
    Currently, it just returns the latest entry, which could lead to mis-leading measurements.
    This patch aslo changes the LatencyBenchmarkRunner to accommodate multi-value readings.

         https://reviews.llvm.org/D81050
2020-06-26 10:57:20 -04:00
Roman Lebedev
de22d7154b
[llvm-exegesis] 'Min' repetition mode
Summary:
As noted in documentation, different repetition modes have different trade-offs:

> .. option:: -repetition-mode=[duplicate|loop]
>
>  Specify the repetition mode. `duplicate` will create a large, straight line
>  basic block with `num-repetitions` copies of the snippet. `loop` will wrap
>  the snippet in a loop which will be run `num-repetitions` times. The `loop`
>  mode tends to better hide the effects of the CPU frontend on architectures
>  that cache decoded instructions, but consumes a register for counting
>  iterations.

Indeed. Example:

>>! In D74156#1873657, @lebedev.ri wrote:
> At least for `CMOV`, i'm seeing wildly different results
> |           | Latency | RThroughput |
> | duplicate | 1       | 0.8         |
> | loop      | 2       | 0.6         |
> where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

This isn't great for analysis, at least for schedule model development.

As discussed in excruciating detail in

>>! In D74156#1924514, @gchatelet wrote:
>>>! In D74156#1920632, @lebedev.ri wrote:
>> ... did that explanation of the question i'm having made any sense?
>
> Thx for digging in the conversation !
> Ok it makes more sense now.
>
> I discussed it a bit with @courbet:
>  - We want the analysis tool to stay simple so we'd rather not make it knowledgeable of the repetition mode.
>  - We'd like to still be able to select either repetition mode to dig into special cases
>
> So we could add a third `min` repetition mode that would run both and take the minimum. It could be the default option.
> Would you have some time to look what it would take to add this third mode?

there appears to be an agreement that it is indeed sub-par,
and that we should provide an optional, measurement (not analysis!) -time
way to rectify the situation.

However, the solutions isn't entirely straight-forward.

We can just add an actual 'multiplexer' `MinSnippetRepetitor`, because
if we just concatenate snippets produced by `DuplicateSnippetRepetitor`
and `LoopSnippetRepetitor` and run+measure that, the measurement will
naturally be different from what we'd get by running+measuring
them separately and taking the min.
([[ https://www.wolframalpha.com/input/?i=%28x%2By%29%2F2+%21%3D+min%28x%2C+y%29 | `time(D+L)/2 != min(time(D), time(L))` ]])

Also, it seems best to me to have a single snippet instead of generating
a snippet per repetition mode, since the only difference here is that the
loop repetition mode reserves one register for loop counter.

As far as i can tell, we can either teach `BenchmarkRunner::runConfiguration()`
to produce a single report given multiple repetitors (as in the patch),
or do that one layer higher - don't modify `BenchmarkRunner::runConfiguration()`,
produce multiple reports, don't actually print each one, but aggregate them somehow
and only print the final one.

Initially i've gone ahead with the latter approach, but it didn't look like a natural fit;
the former (as in the diff) does seem like a better fit to me.

There's also a question of the test coverage. It sure currently does work here:
```
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8fb949.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R15 i_0x0'
    - 'CMOV64rr RBX RBX RBX i_0x0'
    - 'CMOV64rr RCX RCX RBX i_0x0'
    - 'CMOV64rr RDI RDI R10 i_0x0'
    - 'CMOV64rr RDX RDX RAX i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R8 R8 R8 i_0x0'
    - 'CMOV64rr R9 R9 RDX i_0x0'
    - 'CMOV64rr R10 R10 RBX i_0x0'
    - 'CMOV64rr R11 R11 R14 i_0x0'
    - 'CMOV64rr R12 R12 R9 i_0x0'
    - 'CMOV64rr R13 R13 R12 i_0x0'
    - 'CMOV64rr R14 R14 R15 i_0x0'
    - 'CMOV64rr R15 R15 R13 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R15=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'R10=0x0'
    - 'RDX=0x0'
    - 'RSI=0x0'
    - 'R8=0x0'
    - 'R9=0x0'
    - 'R14=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.819, per_snippet_value: 12.285 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BF000000000000000048BB000000000000000048B9000000000000000048BF000000000000000049BA000000000000000048BA000000000000000048BE000000000000000049B8000000000000000049B9000000000000000049BE000000000000000049BC000000000000000049BD0000000000000000490F40C3490F40EF480F40DB480F40CB490F40FA480F40D0480F40F04D0F40C04C0F40CA4C0F40D34D0F40DE4D0F40E14D0F40EC4D0F40F74D0F40FD490F40C35B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-051eb3.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP RSI i_0x0'
    - 'CMOV64rr RBX RBX R9 i_0x0'
    - 'CMOV64rr RCX RCX RSI i_0x0'
    - 'CMOV64rr RDI RDI RBP i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RDI i_0x0'
    - 'CMOV64rr R9 R9 R12 i_0x0'
    - 'CMOV64rr R10 R10 R11 i_0x0'
    - 'CMOV64rr R11 R11 R9 i_0x0'
    - 'CMOV64rr R12 R12 RBP i_0x0'
    - 'CMOV64rr R13 R13 RSI i_0x0'
    - 'CMOV64rr R14 R14 R14 i_0x0'
    - 'CMOV64rr R15 R15 R10 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'RSI=0x0'
    - 'RBX=0x0'
    - 'R9=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'RDX=0x0'
    - 'R12=0x0'
    - 'R10=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6083, per_snippet_value: 8.5162 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000048BE000000000000000048BB000000000000000049B9000000000000000048B9000000000000000048BF000000000000000048BA000000000000000049BC000000000000000049BA000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3480F40EE490F40D9480F40CE480F40FD490F40D1480F40F74D0F40CC4D0F40D34D0F40D94C0F40E54C0F40EE4D0F40F64D0F40FA4983C0FF75C25B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=min
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c7a47d.o
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2581f1.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R10 i_0x0'
    - 'CMOV64rr RBX RBX R10 i_0x0'
    - 'CMOV64rr RCX RCX RDX i_0x0'
    - 'CMOV64rr RDI RDI RAX i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R9 R9 RBX i_0x0'
    - 'CMOV64rr R10 R10 R12 i_0x0'
    - 'CMOV64rr R11 R11 RDI i_0x0'
    - 'CMOV64rr R12 R12 RDI i_0x0'
    - 'CMOV64rr R13 R13 RDI i_0x0'
    - 'CMOV64rr R14 R14 R9 i_0x0'
    - 'CMOV64rr R15 R15 RBP i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R10=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDX=0x0'
    - 'RDI=0x0'
    - 'R9=0x0'
    - 'RSI=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6073, per_snippet_value: 8.5022 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF0000000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD490F40C3490F40EA5B415C415D415E415F5DC35541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD4983C0FF75C25B415C415D415E415F5DC3
...
```
but i open to suggestions as to how test that.

I also have gone with the suggestion to default to this new mode.
This was irking me for some time, so i'm happy to finally see progress here.
Looking forward to feedback.

Reviewers: courbet, gchatelet

Reviewed By: courbet, gchatelet

Subscribers: mstojanovic, RKSimon, llvm-commits, courbet, gchatelet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76921
2020-04-02 09:28:35 +03:00
Roman Lebedev
cc5549dbc2
[NFC][llvm-exegesis] Docs/help: opcode-index=-1 means measure everything 2020-02-13 12:46:12 +03:00
Roman Lebedev
6030fe01f4
[llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE
Summary:
Currently, we only have nice exploration for LEA instruction,
while for the rest, we rely on `randomizeUnsetVariables()`
to sometimes generate something interesting.
While that works, it isn't very reliable in coverage :)

Here, i'm making an assumption that while we may want to explore
multi-instruction configs, we are most interested in the
characteristics of the main instruction we were asked about.

Which we can do, by taking the existing `randomizeMCOperand()`,
and turning it on it's head - instead of relying on it to randomly fill
one of the interesting values, let's pregenerate all the possible interesting
values for the variable, and then generate as much `InstructionTemplate`
combinations of these possible values for variables as needed/possible.

Of course, that requires invasive changes to no longer pass just the
naked `Instruction`, but sometimes partially filled `InstructionTemplate`.

As it can be seen from the test, this allows us to explore
`X86::OperandType::OPERAND_COND_CODE` for instructions
that take such an operand.
I'm hoping this will greatly simplify exploration.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: orodley, mgorny, sdardis, tschuett, jrtc27, atanasyan, mstojanovic, andreadb, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74156
2020-02-12 21:33:52 +03:00
Miloš Stojanović
205292740d [llvm-exegesis] Improve error reporting in BenchmarkRunner.cpp
Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.
To facilitate this, a new `Error` type has been added which is only used
to log errors to the yaml output.

Differential Revision: https://reviews.llvm.org/D74215
2020-02-07 16:29:52 +01:00
Miloš Stojanović
4bd40f71a7 Recommit: "[llvm-exegesis] Improve error reporting in Target.cpp"
Summary: Commit 141915963b6ab36ee4e577d1b27673fa4d05b409 was reverted in
abe01e17f648a97666d4fbed41f0861686a17972 because it broke builds testing
without libpfm. A preparatory commit <commit_sha1> was added to enable
this recommit.

Original commit message:

Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.

Differential Revision: https://reviews.llvm.org/D74113
2020-02-07 14:34:58 +01:00
Miloš Stojanović
830af528a5 Recommit: "[llvm-exegesis] Improve error reporting"
Summary: Commit b3576f60ebc8f660afad8120a72473be47517573 was reverted in
abe01e17f648a97666d4fbed41f0861686a17972 because it broke builds testing
without libpfm. A preparatory commit <commit_sha1> was added to enable
this recommit.

Original commit message:

Fix inconsistencies in error reporting created by mixing
`report_fatal_error()` and `ExitOnErr()`, and add additional information
to the error message to make it more user friendly. Minimize the use
`report_fatal_error()` because it's meant for use in very rare cases and
it results in low information density of the error messages.

Summary of the new design:

 * For command line argument errors output `llvm-exegesis: <error_message>`,
   which is consistent with the error output format emitted by the backend
   which checks correctness of the command line arguments.
 * For other errors the format `llvm-exegesis error: <error_message>` is used.
 ** If the error occurred during file access `<error_message>` will have
    of two parts: `'<file_name>': <rest_of_the_error_message>`

Differential Revision: https://reviews.llvm.org/D74085
2020-02-07 14:34:58 +01:00
Miloš Stojanović
446268a223 [llvm-exegesis] Add a custom error for clustering
All errors of type `Failure` are `StringError`s. In order for exit code
mapping to detect that specifically a clustering error has occurred it
needs to have a different type.

This patch also prepares D74085 where termination `report_fatal_error()`
will be replaced with emitting `StringError`s.

Differential Revision: https://reviews.llvm.org/D74124
2020-02-07 14:34:57 +01:00
Hans Wennborg
abe01e17f6 Revert "[llvm-exegesis] Improve error reporting" and follow-up.
It broke e.g. all tests under tools/llvm-exegesis/X86/ when libpfm is
not available, see comment on D74085.

This reverts commit b3576f60ebc8f660afad8120a72473be47517573 and
141915963b6ab36ee4e577d1b27673fa4d05b409.
2020-02-06 12:53:16 +01:00
Miloš Stojanović
141915963b [llvm-exegesis] Improve error reporting in Target.cpp
Followup to D74085.
Replace the use of `report_fatal_error()` with returning the error to
`llvm-exegesis.cpp` and handling it there.

Differential Revision: https://reviews.llvm.org/D74113
2020-02-06 12:26:08 +01:00