900 Commits

Author SHA1 Message Date
Franklin
e45f9aa7fa
[AArch64] Initial sched model for Neoverse N3 (#106371)
References:

* Arm Neoverse N3 Software Optimization Guide
* Arm A64 Instruction Set for A-profile architecture
2024-09-19 19:22:24 +01:00
Franklin
ef34cba1c3
[AArch64] Fix sched model of Neoverse N2 (#106376)
* fix write order of "Load vector reg, immed post-index"
* fix a typo
2024-09-18 09:33:57 +01:00
Aiden Grossman
6b78ea8b75
[X86] Complete AMD znver4 AVX512 zeroing idioms (#108740)
This patch completes scheduling information for the AVX512 zeroing
idioms according to the znver4 software optimization guide.
2024-09-17 11:07:24 -07:00
Anton Sidorenko
09fc178180
[RISCV] Add scheduling model for Syntacore SCR7 (#108814)
Syntacore SCR7 is rv64imafdcv_zba_zbb_zbc_zbs_zkn.
Scheduling model for RVV will be added later.
Overview: https://syntacore.com/products/scr7

---------

Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com>
Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com>
Co-authored-by: Elena Lepilkina <elena.lepilkina@syntacore.com>
2024-09-17 18:52:55 +03:00
Andrea Di Biagio
6784202b6b
[MCA][ResourceManager] Fix a bug in the instruction issue logic. (#108386)
Before this patch, the pipeline selection logic in
ResourceManager::issueInstruction() didn't know how to correctly handle
instructions which consume multiple partially overlapping resource
groups. In some cases (like the test case from #108157), the inability
to correctly allocate resources on instruction issue was leading to
crashes.

The presence of multiple partially overlapping groups complicates the
selection process by introducing extra constraints. For those cases, the
issue logic now prioritizes groups which are more constrained than
others.

Fixes #108157
2024-09-16 09:48:42 +01:00
Aiden Grossman
ee40ffd1ee
[X86] Recognize VPXORDZrr as a zero-idiom on Znver4 (#108314)
This patch adds information about VPXORDZrr to the znver4 scheduling
model, particularly that it is a zero-idiom.

This fixes a proximal cause of #108157.
2024-09-12 09:27:33 -07:00
Simon Pilgrim
aa95b5c121 [X86] Fix Skylake/Icelake port usage for MMX PACK instructions
Matches uops.info + Agner
2024-08-27 16:32:29 +01:00
Simon Pilgrim
83de8c2369 [X86] Fix SkylakeClient ports for int-to-double conversions
These are performed on SKLPort01 (+ SKLPort5/SKLPort23 for rr/rm shuffles/loads)

Also, cleanup some MMX CVT overrides that match the SSE equivalents.

Matches uops.info + Agner
2024-08-27 16:32:29 +01:00
Simon Pilgrim
12e0e312c6 [X86] Fix Skylake/Icelake uops for masked stored
Matches uops.info + Agner
2024-08-27 16:32:29 +01:00
Simon Pilgrim
cf6cd1fd67 [MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedule data (REAPPLIED)
This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should

Reapplied with codegen fix for scatter-schedule.ll

Fixes #105675
2024-08-23 10:32:19 +01:00
Chris Apple
e738c816f2
Revert "[MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedu… (#105716)
…le data"

This reverts commit 2c1f0642a2647883f35463aebf4f90a6b1f158c1.

Many build failures in: CodeGen/X86/scatter-schedule.ll

Example of a build failure:
https://lab.llvm.org/buildbot/#/builders/155/builds/1675
2024-08-22 11:55:31 -07:00
Simon Pilgrim
2c1f0642a2 [MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedule data
This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should

Fixes #105675
2024-08-22 18:29:43 +01:00
Simon Pilgrim
7faf2c95a4 [MCA][X86] Add scatter instruction test coverage for #105675
Missed IceLakeServer when I updated the other CPUs in 6ec4c9c3eb4a556f848dac37a2d6f0d46ecc6f02
2024-08-22 18:29:43 +01:00
Simon Pilgrim
6ec4c9c3eb [MCA][X86] Add scatter instruction test coverage for #105675 2024-08-22 17:22:49 +01:00
Phil Camp
42386dc46d
[llvm-mca] Add bottle-neck analysis to JSON output. (#90056)
This patch implements the bottle-neck analysis data in the JSON dump
mode.
2024-08-19 17:16:19 +01:00
Michael Maitland
7efa068f7a
[RISCV] Add vector and vector crypto to SiFiveP400 scheduler model (#102155)
The SiFiveP400 scheduler model did not support vector or vector crypto.
With the addition of the sifive-p470 processor, this model needs to support
these extensions.

The processors who use this model but do not have vector or vector
crypto will never produce these instructions, so there is no impact to these
processors.

Co-authored-by: Min Hsu <min.hsu@sifive.com>
2024-08-19 09:41:42 -04:00
Simon Pilgrim
9e3e8b5715 [X86] VPERM2*128 instructions aren't microcoded on znver1
AMD refer to them as microcoded, but not in the same way as LLVM - the uop count and pipe usage is high but predictable

Confirmed with Agner + uops.info.
2024-08-19 10:05:15 +01:00
Simon Pilgrim
5ab65a6c1c [X86] VPERM2*128 instructions aren't microcoded on znver2
This appears to be a copy+paste error from znver1 (which isn't really microcoded either - but it is rather complex!).

Confirmed with Agner + uops.info.
2024-08-19 10:05:15 +01:00
Anton Sidorenko
5ab99bf1a7
[RISCV] Add scheduling model for Syntacore SCR4 and SCR5 (#102909)
Syntacore SCR4 is a microcontroller-class processor core that has much
in common with SCR3, but also supports F and D extensions.
Overview: https://syntacore.com/products/scr4

Syntacore SCR5 is an entry-level Linux-capable 32/64-bit RISC-V
processor core which scheduling model almost match SCR4.
Overview: https://syntacore.com/products/scr5

Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com>
Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com>
2024-08-14 11:42:31 +03:00
Simon Pilgrim
888ef0f0fc [X86] Fix pipe resources for INT (V)PEXTR* instructions
IceLakeServer can use ICXPort15 for (V)PEXTR* (but only ICXPort5 for (V)EXTRACTPS)

Confirmed with uops.info + Agner
2024-08-05 10:36:28 +01:00
Simon Pilgrim
3276ee3022 [llvm-mca][x86] Add test coverage for evex variant of vextractps 2024-08-05 10:32:17 +01:00
Simon Pilgrim
a48222451d [X86] Add missing PSUBQ handling to SandyBridge model
Matches PADDQ modelling, confirmed with numbers from uops.info and Agner
2024-07-26 15:54:59 +01:00
Simon Pilgrim
ed8c561170 [X86] haswell/broadwell only uses port5 for mmx pack reg-reg instructions
Matches numbers from uops.info, Agner and instlatx64.
2024-07-26 15:54:58 +01:00
Simon Pilgrim
94e966255f [X86] skylake only uses Port0 for (v)phminposuw instructions
Now matches skylake-server - and matches reports from uops.info, Agner and instlatx64.
2024-07-26 11:21:59 +01:00
Rin Dobrescu
7c18195563
[AArch64] Add flag setting instructions to scheduling model. (#96880)
Some flag setting instructions (such as ANDS, ADDS, CCMN) were missing
from the V2 scheduling model. This patch adds them in.
2024-06-28 10:11:49 +01:00
Anton Sidorenko
2d84e0ffef
[RISCV] Add scheduling model for Syntacore SCR3 (#95427)
Syntacore SCR3 is a microcontroller-class processor core. Overview:
https://syntacore.com/products/scr3

Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com>
2024-06-25 11:34:59 +03:00
Simon Pilgrim
630a6dd687 [X86] Fix throughput of AVX2/AVX512VL vector extension/truncations
These should only consume 1cy on either of the 2 pipes (only zmm ops should double pump) - matches AMD SoG + uops.info

Noticed while updating costs for #90748
2024-06-16 10:03:21 +01:00
Simon Pilgrim
62e2eb2154 [X86] Fix pipe resources for FP HADD/SUB instructions
IceLakeServer/SkylakeServer can only use Port01 for the FADD/FSUB stage

Confirmed with uops.info + Agner
2024-06-06 14:16:57 +01:00
Simon Pilgrim
24a39f364d [X86] Fix pipe resources for HADD/SUB instructions
IceLakeServer was copying these from SkylakeServer, but integer HADD/SUB can now run on an extra port
2024-06-06 14:11:00 +01:00
Min-Yih Hsu
6147a7b5f9
[RISCV] Adjust FP load latencies from 6 to 5 in SiFiveP400/P600 scheduling models (#93735)
According to our performance measurements, FLH/W/D have load latencies
closer to 5 rather than 6 in these two models.
2024-05-30 08:47:27 -07:00
Chinmay Deshpande
848bef5d85
[llvm-mca] Add command line option -call-latency (#92958)
Currently we assume a constant latency of 100 cycles for call
instructions. This commit allows the user to specify a custom value for
the same as a command line argument. Default latency is set to 100.
2024-05-22 13:51:55 -07:00
Rin Dobrescu
267de8543c
[llvm-mca][AArch64] Add AArch64 version of clearsSuperRegisters. (#92548)
This patch overrides the clearsSuperRegisters method defined in
MCInstrAnalysis to identify register writes that clear the upper portion
of all super-registers on AArch64 architecture.

On AArch64, a write to a general-purpose register of 32-bit data size is
defined to use the lower 32-bits of the register and zero extend the
upper 32-bits.
Similarly, SIMD and FP instructions operating on scalar data only access
the lower bits of the SIMD&FP register. The unused upper bits are
cleared to zero on a write.
This also applies to SIMD vector registers when the element size in bits
multiplied by the number of lanes is lower than 128. The upper 64 bits
of the vector register are cleared to zero on a write.
2024-05-22 15:31:35 +01:00
Peter Waller
458d706741 [llvm-mca] Make bad-input.s even more CPU specific
Note: This patch is distinct from the previous one titled
  "[llvm-mca] Move bad-input.s test to be target specific"

This is a followup to #90474 and commit
afc10fc9b7ce3d23d9012f5a1496e849fe873ba2

Context: Builders failing because they're unable to run the failure
test.

This still doesn't work in various circumstances, it seems MCA doesn't
want to run on a wide variety of hosts in various configurations, so
stick to the tried and tested method and pass -mtriple and -mcpu.
2024-05-07 13:10:40 +01:00
Peter Waller
afc10fc9b7 [llvm-mca] Move bad-input.s test to be target specific
... for now.

This is a follow up to #90474 in response to build bot failures.

This test is intended to check a case where invalid assembly is passed
to llvm-mca.

Unfortunately it appears that a cross-toolchain built with
-DTOOLCHAIN_TARGET_TRIPLE does not have an llvm-mca which works out of
the box if the host target is not enabled.

As a quick fix to make the build bots green, move the test into AArch64
and X86 so that there is reasonable coverage for this test; later I hope
mca can be fixed to work out of the box in this configuration.
2024-05-07 12:50:20 +01:00
Peter Waller
1de0535e84
[llvm-mca] Abort on parse error without -skip-unsupported-instructions (#90474)
[llvm-mca] Abort on parse error without -skip-unsupported-instructions

Prior to this patch, llvm-mca would continue executing after parse
errors. These errors can lead to some confusion since some analysis
results are printed on the standard output, and they're printed after
the errors, which could otherwise be easy to miss.

However it is still useful to be able to continue analysis after errors;
so extend the recently added -skip-unsupported-instructions to support
this.

Two tests which have parse errors for some of the 'RUN' branches are
updated to use -skip-unsupported-instructions so they can remain as-is.

Add a description of -skip-unsupported-instructions to the llvm-mca
command guide, and add it to the llvm-mca --help output:

```
  --skip-unsupported-instructions=<value> - Force analysis to continue in the presence of unsupported instructions
    =none                                 -   Exit with an error when an instruction is unsupported for any reason (default)
    =lack-sched                           -   Skip instructions on input which lack scheduling information
    =parse-failure                        -   Skip lines on the input which fail to parse for any reason
    =any                                  -   Skip instructions or lines on input which are unsupported for any reason
```

Tests within this patch are intended to cover each of the cases.

Reason        | Flag | Comment
--------------|------|-------
none          | none | Usual case, existing test suite
lack-sched    | none | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | none | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s
any           | none | (N/A, covered above)
lack-sched    | any  | Continues, prints warnings, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | any  | Continues, prints errors, tested in llvm/test/tools/llvm-mca/bad-input.s
lack-sched    | parse-failure | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | lack-sched    | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s
none          | * | This would be any test case with skip-unsupported-instructions, coverage added in llvm/test/tools/llvm-mca/X86/BtVer2/simple-test.s
any           | * | (Logically covered by the other cases)
2024-05-07 09:13:44 +01:00
Michael Maitland
56b8bd7744
[RISCV] Add Sched classes for vector crypto instructions (#90068)
The vector crypto instructions may have different scheduling behavior
compared to VALU operations. Instead of using scheduling resources that
describe VALU operations, we give these instructions their own
scheduling resources. This is similar to what we did for Zb* instructions.

The sifive-p670 has vector crypto, so we model behavior for these instructions
in the P600SchedModel. The numbers are based off of measurements collected
internally. These numbers are a bit old and new measurements show that they may
not be fully accurate. It is likely that we will refine these numbers in a
follow up patch(s) based on new measurements.

This PR is stacked on #89256.
2024-05-03 11:11:29 -04:00
Michael Maitland
4821882cdf [RISCV][llvm-mca] Add vector crypto llvm-mca tests for P600 2024-05-03 08:03:39 -07:00
Rin Dobrescu
385f59f9f5
[llvm-mca] Teach MCA constant registers do not create dependencies (#89387)
Constant registers like the zero registers XZR and WZR are treated as
any other register by LLVM-MCA. This can create non existent dependency
chains.
Currently there is no method in MCA to query if a register is constant.
This patch fixes the issue by adding a bool Constant
variable to MCRegisterDesc that is true for constant registers. Since
constant registers do not create dependencies, it makes sense to add
this check to MCA.
2024-05-03 10:30:22 +01:00
Peter Waller
a19a4113df [llvm-mca] Fix -skip-unsupported-instruction tests on Windows
Builder alerted me to the failing test, attempt #1 in the blind.
2024-04-29 09:04:41 +01:00
Peter Waller
5f79f7506a
[llvm-mca] Add -skip-unsupported-instructions option (#89733)
Prior to this patch, if llvm-mca encountered an instruction which parses
but has no scheduler info, the instruction is always reported as
unsupported, and llvm-mca halts with an error.

However, it would still be useful to allow MCA to continue even in the
case of instructions lacking scheduling information. Obviously if
scheduling information is lacking, it's not possible to give an accurate
analysis for those instructions, and therefore a warning is emitted.

A user could previously have worked around such unsupported instructions
manually by deleting such instructions from the input, but this provides
them a way of doing this for bulk inputs where they may not have a list
of such unsupported instructions to drop up front.

Note that this behaviour of instructions with no scheduling information
under -skip-unsupported-instructions is analagous to current
instructions which fail to parse: those are currently dropped from the
input with a message printed, after which the analysis continues.

~Testing the feature is a little awkward currently, it relies on an
instruction
which is currently marked as unsupported, which may not remain so;
should the
situation change it would be necessary to find an alternative
unsupported
instruction or drop the test.~

A test is added to check that analysis still reports an error if all
instructions are removed from the input, to mirror the current behaviour
of giving an error if no instructions are supplied.
2024-04-29 08:39:15 +01:00
Usman Nadeem
cc82f1290a
[AArch64] Update latencies for Cortex-A510 scheduling model (#87293)
Updated according to the Software Optimization Guide for Arm®
Cortex®‑A510 Core Revision: r1p3 Issue 6.0.
2024-04-17 11:42:52 -07:00
Simon Pilgrim
5fd9babbfc [X86] Rename Zn3FPP# ports -> Zn3FP#. NFC
Matches Zn4FP# (which is mostly a copy) and avoids an issue in llvm-exegesis which is terrible at choosing the right portname when they have aliases.
2024-04-04 16:54:33 +01:00
Simon Pilgrim
a69673615b [X86] Haswell/Broadwell - fix (V)ROUND*ri sched behaviours to use 2*Port1
We were only using the Port23 memory ports and were missing the 2*Port1 uops entirely.

Confirmed by Agner + uops.info/uica
2024-04-04 15:19:07 +01:00
Simon Pilgrim
51107be7dd [X86] Haswell/Broadwell/Skylake DPPS folded instructions use an extra port06 resource
This is an extension to 07151f0241d3f893cb36eb2dbc395d4098f74a87 which handled SandyBridge so we at least model the regression identified in #14640

Confirmed by Agner + uops.info/uica (SkylakeServer also had an incorrect use of Port015 instead of just Port01)

I raised #86669 as a proposal for a 'x86 unfold' pass that can unfold these (if we have the free registers) driven by the scheduler model.
2024-04-03 12:28:46 +01:00
Rin Dobrescu
46246683a6
[AArch64] Update Neoverse V2 FSQRT execution units in schedule model. (#86803)
This patch updates the SVE FSQRT instruction execution units to be able to run on VX0 and VX2.
2024-04-02 10:47:51 +01:00
Simon Pilgrim
5d7e7abc82 [X86] ICX - vector XMM splat use Port 1 or 5 when boradcasting the shift amount
Noticed while trying to compare splat vs per-element shift perf stats for #39424

Confirmed with uops.info
2024-03-26 10:07:07 +00:00
Simon Pilgrim
3dcf62b5ee [X86] HSW/BDW - vector splat shifts don't use Port5 when loading the shift amount
Noticed while trying to compare splat vs per-element shift perf stats for #39424

Confirmed with uops.info
2024-03-25 18:22:29 +00:00
David Green
4e29c6acd3
[AArch64] Correct Neoverse V1 SVE 16-bit sdot/udot schedule pipelines. (#86142)
Fixes #86102
2024-03-25 11:20:56 +00:00
Alfie Richards
295cdd5c3d
[ARM][TableGen][MC] Change the ARM mnemonic operands to be optional for ASM parsing (#83436)
This changs the way the assembly matcher works for Aarch32 parsing.
Previously there was a pile of hacks which dictated whether the CC,
CCOut, and VCC operands should be present which de-facto chose if the
wide/narrow (or thumb1/thumb2/arm) instruction version were chosen.

This meant much of the TableGen machinery present for the assembly
matching was effectively being bypassed and worked around.

This patch makes the CC and CCOut operands optional which allows the ASM
matcher operate as it was designed and means we can avoid doing some of
the hacks done previously. This also adds the option for the target to
allow the prioritizing the smaller instruction encodings as is required
for Aarch32.
2024-03-18 11:25:13 +00:00
Michael Maitland
818e0272f5 [RISCV] Model integer min max instructions from Zbb execute in late-B ALU
We don't model the early vs late ALU so we just need to remove usage of
SiFivePipeA for these instructions.
2024-03-14 06:02:53 -07:00