llvm-project

Author	SHA1	Message	Date
Reid Kleckner	bb7242477c	[MC] Use StringTable to reduce dynamic relocations (#144202 ) Dynamic relocations are expensive on ELF/Linux platforms because they are applied in userspace on process startup. Therefore, it is worth optimizing them to make PIE and PIC dylib builds faster. In +asserts builds (non-NDEBUG), nikic identified these schedule class name string pointers as the leading source of dynamic relocations. [1] This change uses llvm::StringTable and the StringToOffsetTable TableGen helper to turn the string pointers into 32-bit offsets into a separate character array. The number of dynamic relocations is reduced by ~60%: ❯ llvm-readelf --dyn-relocations lib/libLLVM.so \| wc -l 381376 # before 155156 # after The test suite time is modestly affected, but I'm running on a shared noisy workstation VM with a ton of cores: https://gist.github.com/rnk/f38882c2fe2e63d0eb58b8fffeab69de Testing Time: 100.88s # before Testing Time: 78.50s. # after Testing Time: 96.25s. # before again I haven't used any fancy hyperfine/denoising tools, but I think the result is clearly visible and we should ship it. [1] https://gist.github.com/nikic/554f0a544ca15d5219788f1030f78c5a	2025-06-25 05:23:11 -07:00
Min-Yih Hsu	8241106bf3	[Exegesis] Print epsilon value in the sched model inconsistency report (#80080 ) Since I've formatted the epsilon value, I don't think it's necessary to escape it.	2024-01-31 09:13:38 -08:00
Aiden Grossman	faf675ce34	[llvm-exegesis] Remove llvm prefix where unnecessary (#79802 ) This patch removes the llvm:: prefix within llvm-exegesis where it is not necessary. This is most occurrences of the prefix within exegesis as exegesis is within the llvm namespace. This patch makes things more consistent as the vast majority of the code did not use the llvm:: prefix for anything.	2024-01-29 11:40:06 -08:00
Kazu Hirata	c630f95f33	[llvm-exegesis] Remove unnecessary includes (NFC) Identified with clangd.	2023-12-05 23:28:09 -08:00
Michael Maitland	85e3875ad7	[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics D150312 added a TODO: TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction. This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency. This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems. Differential Revision: https://reviews.llvm.org/D158568	2023-08-24 19:21:36 -07:00
Michael Maitland	71bfec762b	Revert "[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics" This reverts commit 5b854f2c23ea1b000cb4cac4c0fea77326c03d43. Build still failing.	2023-08-24 15:37:27 -07:00
Michael Maitland	5b854f2c23	[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics D150312 added a TODO: TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction. This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency. This commit as previously reverted since it missed renaming that came down after rebasing. This version of the commit fixes those problems. Differential Revision: https://reviews.llvm.org/D158568	2023-08-24 15:25:42 -07:00
Michael Maitland	4d27dffb43	Revert "[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics" This reverts commit 030d33409568b2f0ea61116e83fd40ca27ba33ac. This commit is causing build failures	2023-08-24 11:58:53 -07:00
Michael Maitland	030d334095	[TableGen] Rename ResourceCycles and StartAtCycle to clarify semantics D150312 added a TODO: TODO: consider renaming the field `StartAtCycle` and `Cycles` to `AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the fact that resource allocation is now represented as an interval, relatively to the issue cycle of the instruction. This patch implements that TODO. This naming clarifies how to use these fields in the scheduler. In addition it was confusing that `StartAtCycle` was singular but `Cycles` was plural. This renaming fixes this inconsistency. Differential Revision: https://reviews.llvm.org/D158568	2023-08-24 11:20:37 -07:00
Pavel Kosov	39fc67b8af	[llvm-exegesis] Factor out DisassemblerHelper from the Analysis class As part of preparing the reports, the Analysis class needs to print machine instructions in a disassembled form. For this purpose, the class has four fields (namely Context_, AsmInfo_, InstPrinter_ and Disasm_). All the constructor of the Analysis class does is conditionally initializing these four fields. This commit factors out the logic for decoding machine code and printing it in an assembler form into a separate DisassemblerHelper class. ~~ Huawei RRI, OS Lab Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D147156	2023-04-04 09:17:50 +03:00
Aiden Grossman	389bf5d870	[llvm-exegesis] Refactor InstructionBenchmark to Benchmark When llvm-exegesis was first introduced, it only supported benchmarking individual instructions, hence the name for the data structure storing the data corresponding to a benchmark being called InstructionBenchmark made sense. However, now that benchmarking arbitrary snippets is supported, InstructionBenchmark doesn't correspond to a single instruction. This patch refactors InstructionBenchmark to be called Benchmark to clean up this little bit of technical debt. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D146884	2023-03-27 08:14:36 +00:00
Kazu Hirata	8bdf387858	Use *{Map,Set}::contains (NFC) Differential Revision: https://reviews.llvm.org/D146104	2023-03-15 08:46:32 -07:00
Clement Courbet	e52f8406e8	Re-land "[llvm-exegesis] Support analyzing results from a different target." With Mips fixes. This reverts commit 7daf60e3440b22b79084bb325d823aa3ad8df0f3.	2022-09-22 11:39:52 +02:00
Clement Courbet	7daf60e344	Revert "[llvm-exegesis] Support analyzing results from a different target." Breaks MIPS compile. This reverts commit cc61c822e05c51e494c50d1e72f963750116ef08.	2022-09-22 11:19:01 +02:00
Clement Courbet	cc61c822e0	[llvm-exegesis] Support analyzing results from a different target. We were using the native triple to parse the benchmarks. Use the triple from the benchmarks file. Right now this still only allows analyzing files produced by the current target until D133605 is in. This also makes the `Analysis` class much less ad-hoc. Differential Revision: https://reviews.llvm.org/D133697	2022-09-22 11:11:18 +02:00
Roman Lebedev	e030f808ec	[Exegesis] Native clusterization: sub-partition by sched class id Currently native clusterization simply groups all benchmarks by the opcode of key instruction, but that is suboptimal in certain cases, e.g. where we can already tell that the particular instructions already resolve into different sched classes.	2021-09-07 17:54:37 +03:00
David Blaikie	1def2579e1	PR51018: Remove explicit conversions from SmallString to StringRef to future-proof against C++23 C++23 will make these conversions ambiguous - so fix them to make the codebase forward-compatible with C++23 (& a follow-up change I've made will make this ambiguous/invalid even in <C++23 so we don't regress this & it generally improves the code anyway)	2021-07-08 13:37:57 -07:00
Philipp Krones	c2f819af73	[MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo This makes it possible for targets to define their own MCObjectFileInfo. This MCObjectFileInfo is then used to determine things like section alignment. This is a follow up to D101462 and prepares for the RISCV backend defining the text section alignment depending on the enabled extensions. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101921	2021-05-23 14:15:23 -07:00
Philipp Krones	632ebc4ab4	[MC] Untangle MCContext and MCObjectFileInfo This untangles the MCContext and the MCObjectFileInfo. There is a circular dependency between MCContext and MCObjectFileInfo. Currently this dependency also exists during construction: You can't contruct a MOFI without a MCContext without constructing the MCContext with a dummy version of that MOFI first. This removes this dependency during construction. In a perfect world, MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the MCContext, like other MC information. This is future work. This also shifts/adds more information to the MCContext making it more available to the different targets. Namely: - TargetTriple - ObjectFileType - SubtargetInfo Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D101462	2021-05-05 10:03:02 -07:00
Simon Pilgrim	907d4e754e	[llvm-exegesis] Analysis.cpp - use for-range loop. NFCI.	2021-04-19 12:56:10 +01:00
Clement Courbet	9e9f991ac0	[llvm-exegesis] Honor -mcpu in analysis mode. This is useful to set the baseline model for an unknown CPU. Fixes PR50013. Differential Revision: https://reviews.llvm.org/D100743	2021-04-19 10:44:28 +02:00
Kazu Hirata	89e8eb946d	[llvm] Use llvm::find_if (NFC)	2021-01-11 18:48:06 -08:00
Kazu Hirata	902cbcd59e	Use llvm::is_contained where appropriate (NFC) Summary: This patch replaces std::find with llvm::is_contained where appropriate. Reviewers: efriedma, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr Tags: #llvm Differential Revision: https://reviews.llvm.org/D84489	2020-07-27 10:20:44 -07:00
Miloš Stojanović	24b7b99b7d	[llvm-exegesis][NFC] Disassociate snippet generators from benchmark runners The addition of `inverse_throughput` mode highlighted the disjointedness of snippet generators and benchmark runners because it used the `UopsSnippetGenerator` with the `LatencyBenchmarkRunner`. To keep the code consistent tie the snippet generators to parallelization/serialization rather than their benchmark runners. Renaming `LatencySnippetGenerator` -> `SerialSnippetGenerator`. Renaming `UopsSnippetGenerator` -> `ParallelSnippetGenerator`. Differential Revision: https://reviews.llvm.org/D72928	2020-01-20 16:19:13 +01:00
Fangrui Song	6fdd6a7b3f	[Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction() The argument is llvm::null() everywhere except llvm::errs() in llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds. If we ever have the needs to add verbose log to disassemblers, we can record log with a member function, instead of passing it around as an argument.	2020-01-11 13:34:52 -08:00
Fangrui Song	aa708763d3	[MC] Add parameter `Address` to MCInstPrinter::printInst printInst prints a branch/call instruction as `b offset` (there are many variants on various targets) instead of `b address`. It is a convention to use address instead of offset in most external symbolizers/disassemblers. This difference makes `llvm-objdump -d` output unsatisfactory. Add `uint64_t Address` to printInst(), so that it can pass the argument to printInstruction(). `raw_ostream &OS` is moved to the last to be consistent with other print* methods. The next step is to pass `Address` to printInstruction() (generated by tablegen from the instruction set description). We can gradually migrate targets to print addresses instead of offsets. In any case, downstream projects which don't know `Address` can pass 0 as the argument. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72172	2020-01-06 20:42:22 -08:00
Mirko Brkusanin	4b63ca1379	[Mips] Use appropriate private label prefix based on Mips ABI MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64 regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo we can find out Mips ABI and pick appropriate prefix. Tags: #llvm, #clang, #lldb Differential Revision: https://reviews.llvm.org/D66795	2019-10-23 12:24:35 +02:00
Clement Courbet	c8eb0547ef	[llvm-exegesis] Show noise cluster in analysis output. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68780 llvm-svn: 374533	2019-10-11 11:33:18 +00:00
Clement Courbet	50cdd56beb	[llvm-exegesis][NFC] Remove extra `llvm::` qualifications. Summary: Second patch: in the lib. Reviewers: gchatelet Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68692 llvm-svn: 374158	2019-10-09 11:58:42 +00:00
Jonas Devlieghere	0eaee545ee	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
Roman Lebedev	4d81e87765	[NFC][llvm-exegesis] Also promote getSchedClassPoint() into ResolvedSchedClass. Summary: It doesn't need anything from Analysis::SchedClassCluster class, and takes ResolvedSchedClass as param, so this seems rather fitting. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59994 llvm-svn: 357263	2019-03-29 14:58:01 +00:00
Roman Lebedev	1d1330c546	[NFC][llvm-exegesis] Refactor ResolvedSchedClass & friends Summary: `ResolvedSchedClass` will need to be used outside of `Analysis` (before `InstructionBenchmarkClustering` even), therefore promote it into a non-private top-level class, and while there also move all of the functions that are only called by `ResolvedSchedClass` into that same new file. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59993 llvm-svn: 357259	2019-03-29 14:24:27 +00:00
Roman Lebedev	b8fb15d412	[NFC][llvm-exegesis] Refactor Analysis::SchedClassCluster::measurementsMatch() Summary: The diff looks scary but it really isn't: 1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()` 2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially. 3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid. 3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`. This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix. 3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()` I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 \| PR41275 ]] Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59951 llvm-svn: 357245	2019-03-29 11:36:08 +00:00
Roman Lebedev	c2423fe689	[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880) Summary: This is an alternative to D59539. Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`. Let's suppose we are using `-analysis-clustering-epsilon=0.5`. By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster. Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster. Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster. So all these points ended up in the same cluster. This may or may not be a correct implementation of dbscan clustering algorithm. But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data. Let's suppose all those opcodes are currently in the same sched cluster. If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter the LLVM values this cluster will never match the LLVM values, and thus this cluster will always be displayed as inconsistent. The solution is obviously to split off some of these opcodes into different sched cluster. But how do i do that? Out of 4 opcodes displayed in the inconsistency report, which ones are the "bad ones"? Which ones are the most different from the checked-in data? I'd need to go in to the `.yaml` and look it up manually. The trivial solution is to, when creating clusters, don't use the full dbscan algorithm, but instead "pick some unclustered point, pick all unclustered points that are it's neighbor, put them all into a new cluster, repeat". And just so as it happens, we can arrive at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step. But that won't work well once we teach analyze mode to operate in on-1D mode (i.e. on more than a single measurement type at a time), because the clustering would depend on the order of the measurements. Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster. And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster, and if they are not, the cluster (==opcode) is unstable. This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 \| PR40880 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59820 llvm-svn: 357152	2019-03-28 08:55:01 +00:00
Roman Lebedev	542e5d7bb5	[llvm-exegesis] Split Epsilon param into two (PR40787) Summary: This eps param is used for two distinct things: * initial point clusterization * checking clusters against the llvm values What if one wants to only look at highly different clusters, without changing the clustering itself? In particular, this helps to weed out noisy measurements (since the clusterization epsilon is still small, so there is a better chance that noisy measurements from the same opcode will go into different clusters) By splitting it into two params it is now possible. This is nearly-free performance-wise: Old: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 390.01 msec task-clock # 0.998 CPUs utilized ( +- 0.25% ) 12 context-switches # 31.735 M/sec ( +- 27.38% ) 0 cpu-migrations # 0.000 K/sec 4745 page-faults # 12183.732 M/sec ( +- 0.54% ) 1562711900 cycles # 4012303.327 GHz ( +- 0.24% ) (82.90%) 185567822 stalled-cycles-frontend # 11.87% frontend cycles idle ( +- 0.52% ) (83.30%) 392106234 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.31% ) (33.79%) 1839236666 instructions # 1.18 insn per cycle # 0.21 stalled cycles per insn ( +- 0.15% ) (50.37%) 407035764 branches # 1045074878.710 M/sec ( +- 0.12% ) (66.80%) 10896459 branch-misses # 2.68% of all branches ( +- 0.17% ) (83.20%) 0.390629 +- 0.000972 seconds time elapsed ( +- 0.25% ) ``` ``` $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs): 6803.36 msec task-clock # 0.999 CPUs utilized ( +- 0.96% ) 262 context-switches # 38.546 M/sec ( +- 23.06% ) 0 cpu-migrations # 0.065 M/sec ( +- 76.03% ) 13287 page-faults # 1953.206 M/sec ( +- 0.32% ) 27252537904 cycles # 4006024.257 GHz ( +- 0.95% ) (83.31%) 1496314935 stalled-cycles-frontend # 5.49% frontend cycles idle ( +- 0.97% ) (83.32%) 16128404524 stalled-cycles-backend # 59.18% backend cycles idle ( +- 0.30% ) (33.37%) 17611143370 instructions # 0.65 insn per cycle # 0.92 stalled cycles per insn ( +- 0.05% ) (50.04%) 3894906599 branches # 572537147.437 M/sec ( +- 0.03% ) (66.69%) 116314514 branch-misses # 2.99% of all branches ( +- 0.20% ) (83.35%) 6.8118 +- 0.0689 seconds time elapsed ( +- 1.01%) ``` New: ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 10099 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs): 400.14 msec task-clock # 0.998 CPUs utilized ( +- 0.66% ) 12 context-switches # 29.429 M/sec ( +- 25.95% ) 0 cpu-migrations # 0.100 M/sec ( +-100.00% ) 4714 page-faults # 11796.496 M/sec ( +- 0.55% ) 1603131306 cycles # 4011840.105 GHz ( +- 0.66% ) (82.85%) 199538509 stalled-cycles-frontend # 12.45% frontend cycles idle ( +- 2.40% ) (83.10%) 402249109 stalled-cycles-backend # 25.09% backend cycles idle ( +- 1.19% ) (34.05%) 1847783963 instructions # 1.15 insn per cycle # 0.22 stalled cycles per insn ( +- 0.18% ) (50.64%) 407162722 branches # 1018925730.631 M/sec ( +- 0.12% ) (67.02%) 10932779 branch-misses # 2.69% of all branches ( +- 0.51% ) (83.28%) 0.40077 +- 0.00267 seconds time elapsed ( +- 0.67% ) lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 50572 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new.html' ... Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs): 6947.79 msec task-clock # 1.000 CPUs utilized ( +- 0.90% ) 217 context-switches # 31.236 M/sec ( +- 36.16% ) 1 cpu-migrations # 0.096 M/sec ( +- 50.00% ) 13258 page-faults # 1908.389 M/sec ( +- 0.34% ) 27830796523 cycles # 4006032.286 GHz ( +- 0.89% ) (83.30%) 1504554006 stalled-cycles-frontend # 5.41% frontend cycles idle ( +- 2.10% ) (83.32%) 16716574843 stalled-cycles-backend # 60.07% backend cycles idle ( +- 0.65% ) (33.38%) 17755545931 instructions # 0.64 insn per cycle # 0.94 stalled cycles per insn ( +- 0.09% ) (50.04%) 3897255686 branches # 560980426.597 M/sec ( +- 0.06% ) (66.70%) 117045395 branch-misses # 3.00% of all branches ( +- 0.47% ) (83.34%) 6.9507 +- 0.0627 seconds time elapsed ( +- 0.90% ) ``` I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps. Within noise i'd say. Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 \| PR40787 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58476 llvm-svn: 354767	2019-02-25 09:36:12 +00:00
Roman Lebedev	69716394f3	[llvm-exegesis] Opcode stabilization / reclusterization (PR40715) Summary: Given an instruction `Opcode`, we can make benchmarks (measurements) of the instruction characteristics/performance. Then, to facilitate further analysis we group the benchmarks with similar characteristics into clusters. Now, this is all not entirely deterministic. Some instructions have variable characteristics, depending on their arguments. And thus, if we do several benchmarks of the same instruction `Opcode`, we may end up with different performance characteristics measurements. And when we then do clustering, these several benchmarks of the same instruction `Opcode` may end up being clustered into different clusters. This is not great for further analysis. We shall find every `Opcode` with benchmarks not in just one cluster, and move all the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`. I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit, and introducing `-analysis-display-unstable-clusters` switch to toggle between displaying stable-only clusters and unstable-only clusters. The reclusterization is deterministically stable, produces identical reports between runs. (Or at least that is what i'm seeing, maybe it isn't) Timings/comparisons: old (current trunk/head) {F8303582} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-old.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs): 6624.73 msec task-clock # 0.999 CPUs utilized ( +- 0.53% ) 172 context-switches # 25.965 M/sec ( +- 29.89% ) 0 cpu-migrations # 0.042 M/sec ( +- 56.54% ) 31073 page-faults # 4690.754 M/sec ( +- 0.08% ) 26538711696 cycles # 4006230.292 GHz ( +- 0.53% ) (83.31%) 2017496807 stalled-cycles-frontend # 7.60% frontend cycles idle ( +- 0.93% ) (83.32%) 13403650062 stalled-cycles-backend # 50.51% backend cycles idle ( +- 0.33% ) (33.37%) 19770706799 instructions # 0.74 insn per cycle # 0.68 stalled cycles per insn ( +- 0.04% ) (50.04%) 4419821812 branches # 667207369.714 M/sec ( +- 0.03% ) (66.69%) 121741669 branch-misses # 2.75% of all branches ( +- 0.28% ) (83.34%) 6.6283 +- 0.0358 seconds time elapsed ( +- 0.54% ) ``` patch, with reclustering but without filtering (i.e. outputting all the stable and unstable clusters) {F8303586} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs): 6475.29 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 213 context-switches # 32.952 M/sec ( +- 23.81% ) 1 cpu-migrations # 0.130 M/sec ( +- 43.84% ) 31287 page-faults # 4832.057 M/sec ( +- 0.08% ) 25939086577 cycles # 4006160.279 GHz ( +- 0.31% ) (83.31%) 1958812858 stalled-cycles-frontend # 7.55% frontend cycles idle ( +- 0.68% ) (83.32%) 13218961512 stalled-cycles-backend # 50.96% backend cycles idle ( +- 0.29% ) (33.37%) 19752995402 instructions # 0.76 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.04%) 4417079244 branches # 682195472.305 M/sec ( +- 0.03% ) (66.70%) 121510065 branch-misses # 2.75% of all branches ( +- 0.19% ) (83.34%) 6.4832 +- 0.0229 seconds time elapsed ( +- 0.35% ) ``` Funnily, this measurement shows that said reclustering actually improved performance. patch, with reclustering, only the stable clusters {F8303594} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs): 6387.71 msec task-clock # 0.999 CPUs utilized ( +- 0.13% ) 133 context-switches # 20.792 M/sec ( +- 23.39% ) 0 cpu-migrations # 0.063 M/sec ( +- 61.24% ) 31318 page-faults # 4903.256 M/sec ( +- 0.08% ) 25591984967 cycles # 4006786.266 GHz ( +- 0.13% ) (83.31%) 1881234904 stalled-cycles-frontend # 7.35% frontend cycles idle ( +- 0.25% ) (83.33%) 13209749965 stalled-cycles-backend # 51.62% backend cycles idle ( +- 0.16% ) (33.36%) 19767554347 instructions # 0.77 insn per cycle # 0.67 stalled cycles per insn ( +- 0.04% ) (50.03%) 4417480305 branches # 691618858.046 M/sec ( +- 0.03% ) (66.68%) 118676358 branch-misses # 2.69% of all branches ( +- 0.07% ) (83.33%) 6.3954 +- 0.0118 seconds time elapsed ( +- 0.18% ) ``` Performance improved even further?! Makes sense i guess, less clusters to print. patch, with reclustering, only the unstable clusters {F8303601} ``` $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' ... no exegesis target for x86_64-unknown-linux-gnu, using default Parsed 43970 benchmark points Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html' Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs): 6124.96 msec task-clock # 1.000 CPUs utilized ( +- 0.20% ) 194 context-switches # 31.709 M/sec ( +- 20.46% ) 0 cpu-migrations # 0.039 M/sec ( +- 49.77% ) 31413 page-faults # 5129.261 M/sec ( +- 0.06% ) 24536794267 cycles # 4006425.858 GHz ( +- 0.19% ) (83.31%) 1676085087 stalled-cycles-frontend # 6.83% frontend cycles idle ( +- 0.46% ) (83.32%) 13035595603 stalled-cycles-backend # 53.13% backend cycles idle ( +- 0.16% ) (33.36%) 18260877653 instructions # 0.74 insn per cycle # 0.71 stalled cycles per insn ( +- 0.05% ) (50.03%) 4112411983 branches # 671484364.603 M/sec ( +- 0.03% ) (66.68%) 114066929 branch-misses # 2.77% of all branches ( +- 0.11% ) (83.32%) 6.1278 +- 0.0121 seconds time elapsed ( +- 0.20% ) ``` This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps) (Also, wow this is fast, it used to take several minutes originally) Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 \| PR40715 ]]. Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon Tags: #llvm Differential Revision: https://reviews.llvm.org/D58355 llvm-svn: 354441	2019-02-20 09:14:04 +00:00
Roman Lebedev	1a0d595f15	[llvm-exegesis] Throughput support in analysis mode Summary: D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 \| PR37698 ]] added support for measuring of the inverse throughput. But the support for the analysis was not added. This attempts to fix that. (analysis done o bdver2 / piledriver) First, small-scale experiment: ``` $ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o --- mode: inverse_throughput key: instructions: - 'BSF64rr RAX RDX' config: '' register_initial_values: - 'RDX=0x0' cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 } error: '' info: instruction has no tied variables picking Uses different from defs assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3 ... ``` If we plug `bsfq %r12, %r10` into llvm-mca: https://godbolt.org/z/ZtOyhJ ``` Dispatch Width: 4 uOps Per Cycle: 3.00 IPC: 0.50 Block RThroughput: 2.0 ``` So RThroughput mismatch exists. Now, let's upscale and analyse: {F8207148} `$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`: {F8207172} {F8207197} And if we now look at https://www.agner.org/optimize/instruction_tables.pdf, `Reciprocal throughput` for `BSF r,r` is listed as `3`. Yay? Reviewers: courbet, gchatelet Reviewed By: courbet Subscribers: tschuett, RKSimon, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57647 llvm-svn: 353023	2019-02-04 09:12:17 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Roman Lebedev	b311c1d6b8	[llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54382) ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 9024.354355 task-clock (msec) # 1.000 CPUs utilized ( +- 0.18% ) ... 9.0262 +- 0.0161 seconds time elapsed ( +- 0.18% ) ``` New time: ``` Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs): 8996.541057 task-clock (msec) # 0.999 CPUs utilized ( +- 0.19% ) ... 9.0045 +- 0.0172 seconds time elapsed ( +- 0.19% ) ``` -~0.3%, not that much. But this isn't the important part. Old: * calls to allocation functions: 2109712 * temporary allocations: 33112 * bytes allocated in total (ignoring deallocations): 4.43 GB New: * calls to allocation functions: 2095345 (-0.68%) * temporary allocations: 18745 (-43.39% !!!) * bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54383 llvm-svn: 347199	2018-11-19 13:28:17 +00:00
Roman Lebedev	f8b28e9bf4	[llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations. Summary: Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!) Old time: (D54381) ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m10.487s user 0m9.745s sys 0m0.740s ``` New time: ``` $ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null real 0m9.599s user 0m8.824s sys 0m0.772s ``` Not that much, around -9%. But that is not the good part yet, again. Old: * calls to allocation functions: 3347676 * temporary allocations: 277818 * bytes allocated in total (ignoring deallocations): 10.52 GB New: * calls to allocation functions: 2109712 (-36%) * temporary allocations: 33112 (-88%) * bytes allocated in total (ignoring deallocations): 4.43 GB (-58% sic) Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn Reviewed By: courbet, MaskRay Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D54382 llvm-svn: 347198	2018-11-19 13:28:14 +00:00
Fangrui Song	32401afd8c	[llvm-exegesis] Move namespace exegesis inside llvm:: Summary: This allows simplifying references of llvm::foo with foo when the needs come in the future. Reviewers: courbet, gchatelet Reviewed By: gchatelet Subscribers: javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D53455 llvm-svn: 344922	2018-10-22 17:10:47 +00:00
Fangrui Song	2e83b2e9ee	Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC llvm-svn: 344774	2018-10-19 06:12:02 +00:00
Clement Courbet	5a768ddd44	[llvm-exegesis][NFC] Revert rL343682 "Fix unused variable warning". That was not the proper fix: the variable is used in debug mode. llvm-svn: 343685	2018-10-03 12:48:50 +00:00
Clement Courbet	8a5a6be47a	[llvm-exegesis] Fix rL343680 in release mode. llvm-svn: 343684	2018-10-03 12:35:35 +00:00
Clement Courbet	af50a5b85f	[llvm-exegesis][NFC] Fix unused variable warning. llvm-svn: 343682	2018-10-03 12:27:43 +00:00
Clement Courbet	d5a39553ff	[llvm-exegesis] Resolve variant classes in analysis. Summary: See PR38884. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D52825 llvm-svn: 343680	2018-10-03 11:50:25 +00:00
Clement Courbet	30183093ab	[llvm-exegesis] Fix PR39096. Summary: The key is now the resource name, not the resource id. Reviewers: gchatelet Subscribers: tschuett, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D52607 llvm-svn: 343208	2018-09-27 13:26:37 +00:00
Fangrui Song	0cac726a00	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...) Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163	2018-09-27 02:13:45 +00:00
Clement Courbet	28d4f85824	[llvm-exegesis] Get rid of debug_string. Summary: THis is a backwards-compatible change (existing files will work as expected). See PR39082. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52546 llvm-svn: 343108	2018-09-26 13:35:10 +00:00
Clement Courbet	684a5f6753	[llvm-exegesis] Output the unscaled value as well as the scaled one. Summary: See PR38936 for context. Reviewers: gchatelet Subscribers: tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D52500 llvm-svn: 343081	2018-09-26 08:37:21 +00:00

1 2

71 Commits