6 Commits

Author SHA1 Message Date
Roman Lebedev
a5b56313dd
[llvm-exegesis] AliasingConfigurations: pay attention to forbidden registers
When trying to measure latency of certain opcodes, e.g.
`./bin/llvm-exegesis --opcode-name=BT32ri8 --mode=latency --repetition-mode=loop  --benchmarks-file=- --max-configs-per-opcode=65536`,
we'd pick such an aliasing instruction, and such an aliasing registers,
that would alias with forbidden registers.

And in particular with loop counter in `loop` repetition mode,
which made the measurements never finish.
This does not address all such cases, only the most obvious one.

The added test case fails without the patch.

Fixes https://github.com/llvm/llvm-project/issues/59441
2022-12-21 02:01:14 +03:00
Roman Lebedev
97bdba8171
[exegesis] ParallelSnippetGenerator: SingleStaticRegPerOperand if 2+ use regs
For instrs with tied operands, that strategy will not produce anything
different from `SingleStaticReg` unless there are at least two registers.
2022-12-06 19:26:45 +03:00
Roman Lebedev
2ffe225d11
[llvm-exegesis] parallel snippet generator: avoid Read-After-Write pitfail for instrs w/ tied variables
As it is being discussed in https://github.com/llvm/llvm-project/issues/59325,
at least for the instructions with tied variables,
when trying to parallelize the instructions,
register selection is rather bad, and may either
use a register which we have used for def,
or vice versa.

That introduces serialization, and leads to
overly pessimistic inverse throughput measurement.

The new implementation avoids that,

New result:
```
$ ninja llvm-exegesis && ./bin/llvm-exegesis --mode=inverse_throughput --opcode-name=VFMADD132PDr --max-configs-per-opcode=9182
ninja: no work to do.
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4af034.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VFMADD132PDr XMM3 XMM3 XMM4 XMM8'
    - 'VFMADD132PDr XMM5 XMM5 XMM14 XMM7'
    - 'VFMADD132PDr XMM10 XMM10 XMM11 XMM15'
    - 'VFMADD132PDr XMM13 XMM13 XMM15 XMM15'
    - 'VFMADD132PDr XMM12 XMM12 XMM11 XMM1'
    - 'VFMADD132PDr XMM0 XMM0 XMM6 XMM9'
    - 'VFMADD132PDr XMM2 XMM2 XMM15 XMM11'
  config:          ''
  register_initial_values:
    - 'XMM3=0x0'
    - 'XMM4=0x0'
    - 'XMM8=0x0'
    - 'MXCSR=0x0'
    - 'XMM5=0x0'
    - 'XMM14=0x0'
    - 'XMM7=0x0'
    - 'XMM10=0x0'
    - 'XMM11=0x0'
    - 'XMM15=0x0'
    - 'XMM13=0x0'
    - 'XMM12=0x0'
    - 'XMM1=0x0'
    - 'XMM0=0x0'
    - 'XMM6=0x0'
    - 'XMM9=0x0'
    - 'XMM2=0x0'
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6403, per_snippet_value: 4.4821 }
error:           ''
info:            instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, randomizing registers for uses
assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F04244883C4104883EC04C70424801F0000C5F8AE14244883C4044883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F14244883C410C4C2D998D8C4E28998EFC442A198D7C4428198EFC462A198E1C4C2C998C1C4C28198D3C4C2D998D8C4E28998EFC442A198D7C4428198EFC462A198E1C4C2C998C1C4C28198D3C4C2D998D8C4E28998EFC442A198D7C4428198EFC462A198E1C4C2C998C1C4C28198D3C4C2D998D8C4E28998EFC442A198D7C4428198EFC462A198E1C4C2C998C1C4C28198D3C3
...
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-f05c2f.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VFMADD132PDr XMM15 XMM15 XMM11 XMM2'
    - 'VFMADD132PDr XMM5 XMM5 XMM11 XMM2'
    - 'VFMADD132PDr XMM14 XMM14 XMM11 XMM2'
    - 'VFMADD132PDr XMM4 XMM4 XMM11 XMM2'
    - 'VFMADD132PDr XMM8 XMM8 XMM11 XMM2'
    - 'VFMADD132PDr XMM3 XMM3 XMM11 XMM2'
    - 'VFMADD132PDr XMM10 XMM10 XMM11 XMM2'
    - 'VFMADD132PDr XMM7 XMM7 XMM11 XMM2'
    - 'VFMADD132PDr XMM13 XMM13 XMM11 XMM2'
    - 'VFMADD132PDr XMM9 XMM9 XMM11 XMM2'
    - 'VFMADD132PDr XMM1 XMM1 XMM11 XMM2'
    - 'VFMADD132PDr XMM6 XMM6 XMM11 XMM2'
    - 'VFMADD132PDr XMM0 XMM0 XMM11 XMM2'
    - 'VFMADD132PDr XMM12 XMM12 XMM11 XMM2'
  config:          ''
  register_initial_values:
    - 'XMM15=0x0'
    - 'XMM11=0x0'
    - 'XMM2=0x0'
    - 'MXCSR=0x0'
    - 'XMM5=0x0'
    - 'XMM14=0x0'
    - 'XMM4=0x0'
    - 'XMM8=0x0'
    - 'XMM3=0x0'
    - 'XMM10=0x0'
    - 'XMM7=0x0'
    - 'XMM13=0x0'
    - 'XMM9=0x0'
    - 'XMM1=0x0'
    - 'XMM6=0x0'
    - 'XMM0=0x0'
    - 'XMM12=0x0'
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.5312, per_snippet_value: 7.4368 }
error:           ''
info:            instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, one unique register for each use position
assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F14244883C4104883EC04C70424801F0000C5F8AE14244883C4044883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F24244883C410C462A198FAC4E2A198EAC462A198F2C4E2A198E2C462A198C2C4E2A198DAC462A198D2C4E2A198FAC462A198EAC462A198CAC4E2A198CAC4E2A198F2C4E2A198C2C462A198E2C462A198FAC4E2A198EAC462A198F2C4E2A198E2C462A198C2C4E2A198DAC462A198D2C4E2A198FAC462A198EAC462A198CAC4E2A198CAC4E2A198F2C4E2A198C2C462A198E2C462A198FAC4E2A198EAC462A198F2C4E2A198E2C462A198C2C4E2A198DAC462A198D2C4E2A198FAC462A198EAC462A198CAC4E2A198CAC4E2A198F2C4E2A198C2C462A198E2C462A198FAC4E2A198EAC462A198F2C4E2A198E2C462A198C2C4E2A198DAC462A198D2C4E2A198FAC462A198EAC462A198CAC4E2A198CAC4E2A198F2C4E2A198C2C462A198E2C3
...
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c32060.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VFMADD132PDr XMM10 XMM10 XMM6 XMM6'
    - 'VFMADD132PDr XMM8 XMM8 XMM6 XMM6'
    - 'VFMADD132PDr XMM12 XMM12 XMM6 XMM6'
    - 'VFMADD132PDr XMM9 XMM9 XMM6 XMM6'
    - 'VFMADD132PDr XMM7 XMM7 XMM6 XMM6'
    - 'VFMADD132PDr XMM1 XMM1 XMM6 XMM6'
    - 'VFMADD132PDr XMM0 XMM0 XMM6 XMM6'
    - 'VFMADD132PDr XMM5 XMM5 XMM6 XMM6'
    - 'VFMADD132PDr XMM11 XMM11 XMM6 XMM6'
    - 'VFMADD132PDr XMM2 XMM2 XMM6 XMM6'
    - 'VFMADD132PDr XMM15 XMM15 XMM6 XMM6'
    - 'VFMADD132PDr XMM3 XMM3 XMM6 XMM6'
    - 'VFMADD132PDr XMM14 XMM14 XMM6 XMM6'
    - 'VFMADD132PDr XMM4 XMM4 XMM6 XMM6'
    - 'VFMADD132PDr XMM13 XMM13 XMM6 XMM6'
  config:          ''
  register_initial_values:
    - 'XMM10=0x0'
    - 'XMM6=0x0'
    - 'MXCSR=0x0'
    - 'XMM8=0x0'
    - 'XMM12=0x0'
    - 'XMM9=0x0'
    - 'XMM7=0x0'
    - 'XMM1=0x0'
    - 'XMM0=0x0'
    - 'XMM5=0x0'
    - 'XMM11=0x0'
    - 'XMM2=0x0'
    - 'XMM15=0x0'
    - 'XMM3=0x0'
    - 'XMM14=0x0'
    - 'XMM4=0x0'
    - 'XMM13=0x0'
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.5311, per_snippet_value: 7.9665 }
error:           ''
info:            instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, reusing the same register for all uses
assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F34244883C4104883EC04C70424801F0000C5F8AE14244883C4044883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F2C244883C410C462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC3
...
```

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D139283
2022-12-06 18:05:22 +03:00
Roman Lebedev
03512ae9bf
[exegesis][X86] ParallelSnippetGenerator: don't accidentally create serialized instructions
In the case of no tied variables, we pick random defs, and then random uses that don't alias with defs we just picked.
Sounds good, except that an X86 instruction may have implicit reg uses,
e.g. for `MULX` it's `EDX`/`RDX`: `Intel SDM, 4-162 Vol. 2B MULX — Unsigned Multiply Without Affecting Flags`
> Performs an unsigned multiplication of the implicit source operand (EDX/RDX) and the specified source operand
> (the third operand) and stores the low half of the result in the second destination (second operand), the high half
> of the result in the first destination operand (first operand), without reading or writing the arithmetic flags.

And indeed, every once in a while `llvm-exegesis` happened to pick EDX as a def while measuring throughput,
and producing garbage output:
```
$ ./bin/llvm-exegesis -num-repetitions=1000000 -mode=inverse_throughput -repetition-mode=min --loop-body-size=4096 -dump-object-to-disk=false -opcode-name=MULX32rr --max-configs-per-opcode=65536
---
mode:            inverse_throughput
key:
  instructions:
    - 'MULX32rr EDX R11D R12D'
  config:          ''
  register_initial_values:
    - 'R12D=0x0'
    - 'EDX=0x0'
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 4.00014, per_snippet_value: 4.00014 }
error:           ''
info:            instruction has no tied variables picking Uses different from defs
assembled_snippet: 415441BC00000000BA00000000C4C223F6D4C4C223F6D4C4C223F6D4C4C223F6D4415CC3415441BC00000000BA0000000049B80200000000000000C4C223F6D4C4C223F6D44983C0FF75F0415CC3
...
```
```
$ ./bin/llvm-exegesis -num-repetitions=1000000 -mode=inverse_throughput -repetition-mode=min --loop-body-size=4096 -dump-object-to-disk=false -opcode-name=MULX32rr --max-configs-per-opcode=65536
---
mode:            inverse_throughput
key:
  instructions:
    - 'MULX32rr R13D EDX ECX'
  config:          ''
  register_initial_values:
    - 'ECX=0x0'
    - 'EDX=0x0'
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 3.00013, per_snippet_value: 3.00013 }
error:           ''
info:            instruction has no tied variables picking Uses different from defs
assembled_snippet: 4155B900000000BA00000000C4626BF6E9C4626BF6E9C4626BF6E9C4626BF6E9415DC34155B900000000BA0000000049B80200000000000000C4626BF6E9C4626BF6E94983C0FF75F0415DC3
...
```
Oops! Not only does that not look fun, i did hit that pitfail during AMD Zen 3 enablement.
While i have since then addressed this in rGd4d459e7475b4bb0d15280f12ed669342fa5edcd,
i suspect there may be other buggy results lying around, so we should at least stop producing them.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D109275
2021-09-07 12:39:23 +03:00
Roman Lebedev
6030fe01f4
[llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE
Summary:
Currently, we only have nice exploration for LEA instruction,
while for the rest, we rely on `randomizeUnsetVariables()`
to sometimes generate something interesting.
While that works, it isn't very reliable in coverage :)

Here, i'm making an assumption that while we may want to explore
multi-instruction configs, we are most interested in the
characteristics of the main instruction we were asked about.

Which we can do, by taking the existing `randomizeMCOperand()`,
and turning it on it's head - instead of relying on it to randomly fill
one of the interesting values, let's pregenerate all the possible interesting
values for the variable, and then generate as much `InstructionTemplate`
combinations of these possible values for variables as needed/possible.

Of course, that requires invasive changes to no longer pass just the
naked `Instruction`, but sometimes partially filled `InstructionTemplate`.

As it can be seen from the test, this allows us to explore
`X86::OperandType::OPERAND_COND_CODE` for instructions
that take such an operand.
I'm hoping this will greatly simplify exploration.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: orodley, mgorny, sdardis, tschuett, jrtc27, atanasyan, mstojanovic, andreadb, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74156
2020-02-12 21:33:52 +03:00
Miloš Stojanović
24b7b99b7d [llvm-exegesis][NFC] Disassociate snippet generators from benchmark runners
The addition of `inverse_throughput` mode highlighted the disjointedness
of snippet generators and benchmark runners because it used the
`UopsSnippetGenerator` with the  `LatencyBenchmarkRunner`.
To keep the code consistent tie the snippet generators to
parallelization/serialization rather than their benchmark runners.

Renaming `LatencySnippetGenerator` -> `SerialSnippetGenerator`.
Renaming `UopsSnippetGenerator` -> `ParallelSnippetGenerator`.

Differential Revision: https://reviews.llvm.org/D72928
2020-01-20 16:19:13 +01:00