774 Commits

Author SHA1 Message Date
Ricardo Jesus
4d96a6b7be [AArch64] Fix N2 SchedModel for arithmetic and logic ops with cheap LSL
According to the N2 Software Optimization Guide, arithmetic ops with LSL
≤ 4, no flagset logical ops, and flagset logical ops with LSL = 0 have a
latency of 1 and use pipeline group I. However, most of these ops were
being modelled as having a latency of 2 and using pipeline M. The
affected instructions include the "unshifted" versions of ADD/SUB, among
others.

Differential Revision: https://reviews.llvm.org/D145370
2023-03-10 12:37:59 +00:00
Ganesh Gopalasubramanian
ffdd5a330c [X86] AMD Genoa (znver4) Scheduler model update 2023-03-09 01:03:23 +05:30
Haohai Wen
5c3c176ccb Revert "[X86] Revise Alderlake P-Core schedule model"
This reverts commit 3083b65c3494b912e622a006a1b563a7e9f1d508.
Since latency from intel doc doesn't reflect worst case.
2023-03-07 11:01:41 +08:00
Haohai Wen
3083b65c34 [X86] Revise Alderlake P-Core schedule model
The previous Alderlake P-Core model prefer data from uops.info than intel doc.
Some measures latency from uops.info is larger than real latency. e.g. addpd
latency is 3 in uops.info while 2 in intel doc. This patch adjust the priority
of those two data source so that intel doc is more preferable.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D144388
2023-03-01 07:39:30 +08:00
Sjoerd Meijer
314e431406 [AArch64] Fix N2 SchedModel element-to-element INS latencies
The instruction regexp "^INSv" for the insert gen-reg-to-element was also
matching the element-to-element instruction, which only has a latency of 2 and
not 5, so we were getting that wrong.

Differential Revision: https://reviews.llvm.org/D144508
2023-02-22 10:55:55 +00:00
Paul Walker
9394088ca0 [SVE][InstrFormats] Explcitly set hasSideEffects for all SVE instructions.
The instruction property hasSideEffects relies on the presence of
tablegen isel patterns when constructing its value, unless
specifically overriden. Since adding SVE scheduling information
we've noticed this property flip-flop as isel patterns have been
updated. To make things consistent (and correct) this patch
explicitly sets the property for all SVE instructions.

This has resulted in the following notable changes:
* Normal load and store instructions no longer report having side
  effects.
* All prefetch instructions correctly report having side effects.
* FFR related instructions continue to report having side effects.
  This is likely overkill but I've chosen to remain cautious here.
* Most all integer instructions no longer report having side effects.
* Most all floating point instructions no longer report having side
  effects, but do now report their potential for raising FP
  exceptions. I do not know how to test the latter so I've again
  took a caution route of taging all floating point instructions
  except for DUPs.
* The conflict detection intrinsics now report they don't touch
  memory.

NOTE: SVE isel makes significant use of psuedo instructions but
this patch makes no effort to update them.

NOTE: We'll need a similar patch for SME but without a scheduling
model it'll be harder to verify the results.

Differential Revision: https://reviews.llvm.org/D142122
2023-01-25 12:30:46 +00:00
David Green
d8ba9e505a [ARM] Cortex-M55 Scheduling Model
This adds an Arm Cortex-M55 scheduling model, using the information from
https://developer.arm.com/documentation/102692/latest/

Differential Revision: https://reviews.llvm.org/D141523
2023-01-21 18:03:24 +00:00
Simon Pilgrim
10cdad4065 [X86] Fix SLM uops/resources counts for XADD/XCHG reg-reg instructions
The RMW instructions still need addressing, probably with a new 'WriteXCHGRMW' scheduler class.

Based off llvm-exegesis captures, confirmed with Agner + uops.info
2023-01-14 18:34:18 +00:00
zhongyunde
c69d83908a [AArch64][MachineScheduler] Set no side effect for movprfx
The movprfx is a vector copy, so it doesn't access memory. Set the
value of hasSideEffects 0 to avoid return true for the hasUnmodeledSideEffects(),
which will block the machine scheduler which load/store instructions.

Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D140680
2022-12-28 01:18:14 +08:00
Simon Pilgrim
e16b4f5b16 [X86] Fix SLM uops/resources counts for CMPXCHG instructions
LOCK + CMPXCHG8/CMPXCHG16 variants still need overriding as they are not completely correct - already much better though

Based off llvm-exegesis captures, confirmed with Agner + uops.info
2022-12-20 13:07:03 +00:00
Simon Pilgrim
e5abaf8dec [X86] Fix SLM uops counts for WriteBitTestSetRegRMW instructions
The set/reset/complement RMW variants use +1uop compared to the BT read-only instructions

Based off llvm-exegesis captures, confirmed with Agner + uops.info
2022-12-19 18:21:31 +00:00
Simon Pilgrim
c39c2cc954 [X86] Fix SLM uops counts for AES instructions
Based off llvm-exegesis captures, confirmed with uops.info
2022-12-19 11:03:41 +00:00
Simon Pilgrim
bbf84fcf18 [X86] SandyBridge - fix ADC RMW uop count
These should consistently use the fused domain count, not the unfused domain

Confirmed with Agner + uops.info
2022-12-17 21:52:44 +00:00
Simon Pilgrim
ed37234f9b [X86] Fix BMI uop/throughputs on znver1/znver2
Most BMI ops are 2uop and 0.5 throughput - interestingly TZCNTrm doesn't take an extra uop but the other instructions do

Confirmed by AMD SoG + Agner
2022-12-17 20:38:40 +00:00
Simon Pilgrim
2bc2bcb246 [X86] All the WriteBLS instructions take 2uops, not 1uop
Confirmed by AMD SoG + Agner + uops.info
2022-12-17 15:40:41 +00:00
Peter Waller
15406d2cd6 [AArch64][SVE][ISel] Combine dup of load to replicating load
(dup (load) z_or_x_passthrough) => (replicating load)

Differential Revision: https://reviews.llvm.org/D139637
2022-12-14 10:34:26 +00:00
Roman Lebedev
680b33b66e
[X86] AMD Zen 3 sched model: FMA ops have inverse throughput of 0.5
Now that exegesis produces meaningful snippets to measure throughtput
for instructions with tied operands:
2ffe225d11
the measurements clearly show these instructions to have
more optimistic throughtput.

There's still some noise in the reports, especially around instructions
with memory operands. I'm not sure if we measure those correctly.

Fixes https://github.com/llvm/llvm-project/issues/59325
2022-12-11 21:12:55 +03:00
Simon Pilgrim
cff55e1980 [MCA][X86] Add test coverage for PFI instructions 2022-12-11 15:57:47 +00:00
Simon Pilgrim
e42abef9bf [MCA][X86] Add test coverage for ERI instructions 2022-12-11 15:52:38 +00:00
Simon Pilgrim
794649f317 [MCA][X86] Add missing knotw test 2022-12-11 15:36:06 +00:00
Simon Pilgrim
95880122c0 [MCA][X86] Add missing test coverage for DQ instructions 2022-12-11 15:36:06 +00:00
Phoebe Wang
b1221d10dd [X86][ConstraintFP] Model MXCSR when load/store it
This patch partially fixes #59305.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D139246
2022-12-08 09:48:30 +08:00
Simon Pilgrim
f51170bffd [X86] Fix SLM ldmxcsr/stmxcsr schedule classes
Fix a long standing FIXME comment using a mixture of llvm-exegesis and Agner numbers
2022-11-28 17:43:17 +00:00
Simon Pilgrim
c65d5d4aec [X86] Remove unnecessary (V)?PBLENDW(Y)?rm overrides
The znver1/znver2 overrides shouldn't need 2uops for the xmm case (but znver1 should double-pump for the ymm case).

Found with the help of D138359
2022-11-28 16:32:55 +00:00
Simon Pilgrim
026df9514e [X86] Remove unnecessary VBLENDWYrr overrides
The znver2 override already matched the WriteBlendY class exactly, and the znver1 override wasn't accounting for ymm double-pumping.

Found with the help of D138359
2022-11-27 16:54:47 +00:00
Simon Pilgrim
2285ba9acc [X86] Fix uops counts for SLM extract/extract-store instructions
Matches Intel AoM + Agner
2022-11-27 16:16:36 +00:00
Simon Pilgrim
746cf4f13f [X86] Synchronise scheduler classes of VPERM2F128/VBROADCASTF128/VEXTRACTF128/VINSERTF128 with I128 equivalents
znver1/znver2 has barely any difference in behaviour between the AVX1/2 variants of these instructions - it looks like it was a copy+paste mistake to miss the AVX2 integer domain instructions in the overrides.

Having said that the override numbers don't appear to match the numbers in the AMD 17h SoGs very well - for instance vperm2f128/vperm2i128 might be microcoded from the AMD sense of >3 uops, but it doesn't have a 100cy latency..... These will need to be further addressed.
2022-11-21 17:15:47 +00:00
Simon Pilgrim
89365b159e [X86] IceLakeServer - PACKS instructions take latency 3cy
This appears to be a slow down vs Skylake (which the model was copied off) - confirmed with uops.info / instlatx64

Noticed as D138359 was reporting that many of the PACKS overrides were redundant, but were in fact incorrect
2022-11-20 19:28:35 +00:00
Simon Pilgrim
7de156d1cc [MCA][X86] Add missing test coverage for BWI instructions 2022-11-20 17:19:58 +00:00
Simon Pilgrim
421bdc119a [MCA][X86] Add test coverage for IFMA instructions 2022-11-20 17:19:58 +00:00
Simon Pilgrim
6a8fabf5c3 [MCA][X86] Add test coverage for XSAVE instructions 2022-11-20 13:56:04 +00:00
Simon Pilgrim
9148aeac00 [X86] Remove unnecessary string instruction overrides from znver1/znver2 models
Reported by D138359 - they were being overridden as WriteMicrocoded despite already being declared WriteMicrocoded

It also fixes a rather funny instregex mismatch that was matching the movsldup shuffle by mistake
2022-11-20 12:57:44 +00:00
Simon Pilgrim
357f1c4ef1 [X86] Improve LOOP/LOOPE/LOOPNE schedule on SandyBridge model
D138359 was reporting that this override was superfluous, but it had never been setup - I took the numbers from uops.info (I couldn't find an estimate in Intel docs).
2022-11-20 12:13:02 +00:00
Simon Pilgrim
420d02bb55 [MCA][X86] Add test coverage for LOOP/LOOPE/LOOPNE instructions
These were missed for some reason - only noticed this while investigating a FIXME in the SandyBridge model

Also sync the znver2/znver3 tests which had been missed when LOCK test coverage was added
2022-11-20 11:35:21 +00:00
Simon Pilgrim
13fd7373b6 [X86] znver2 - (V)EXTRACTPSrr takes 2 uops
D138359 was reporting that the EXTRACTPSrr override was unnecessary, however the AMD SoG and Agner both confirm that both the rr and rm versions take 2uops (matching znver1)
2022-11-20 09:24:55 +00:00
Simon Pilgrim
474e41f1b9 [MCA][X86] Add test coverage for BF16 instructions 2022-11-19 21:46:23 +00:00
Simon Pilgrim
ba5714d773 [MCA][X86] Add test coverage for VP2INTERSECT instructions
NOTE: For IceLakeServer we actually test TigerLake as that's the only target that supports it (we do something similar for F16C on IvyBridge in the SandyBridge tests).
2022-11-19 21:46:23 +00:00
Simon Pilgrim
420d0d3aa6 [MCA][X86] Add test coverage for VAES instructions 2022-11-19 21:02:19 +00:00
Simon Pilgrim
aae08b1d37 [MCA][X86] Add test coverage for BITALG instructions 2022-11-19 12:04:45 +00:00
Simon Pilgrim
91deae999a [MCA][X86] Add test coverage for VPCLMULQDQ instructions 2022-11-18 21:22:10 +00:00
Simon Pilgrim
ffe05b8f57 [MCA][X86] Add missing IceLake test coverage for VPOPCNTDQ instructions 2022-11-18 20:58:29 +00:00
Simon Pilgrim
4c854120c2 [MCA][X86] Add test coverage for AVX512CD instructions 2022-11-18 20:58:29 +00:00
Simon Pilgrim
c6a838e9c8 [MCA][X86] Add test coverage for VBMI instructions 2022-11-16 16:58:26 +00:00
Simon Pilgrim
896271dbea [MCA][X86] Ensure the avx512 gfni tests use the upper xmm/ymm registers
Ensure we're testing the avx512vl gfni instructions and not the avx gfni instructions
2022-11-15 11:06:59 +00:00
Simon Pilgrim
7e78685752 [MCA][X86] Ensure the avx512 vnni tests use the upper xmm/ymm registers
Ensure we're testing the avx512vl vnni instructions and not the avx vnni instructions
2022-11-14 16:29:31 +00:00
Simon Pilgrim
d7208b0404 [MCA][X86] Add test coverage for VBMI2 instructions 2022-11-14 16:29:31 +00:00
Simon Pilgrim
e5120a43d5 [X86] Update WriteMPSAD class and remove VMPSADBWrri override
AMD 15h SoG + Agner both indicate there's no difference between MPSADBWrri + VMPSADBWrri - I can't find any data on the folded variant so I've kept the existing numbers

Removes the last X86 override for WriteMPSAD/WritePSADBW classes - removing a further 3 entries from every sched class table
2022-11-13 15:19:37 +00:00
Simon Pilgrim
6a99f23845 [MCA][X86] Add test coverage for VDBPSADBW instructions 2022-11-13 15:19:36 +00:00
Simon Pilgrim
313a4aef7f [X86] Fix scheduler tag for GFNI YMM instructions
These were hardcoded to XMM width
2022-11-13 14:10:09 +00:00
Simon Pilgrim
e19cb9c57f [X86] Cleanup CVTPD2PS schedule values
The znver1/znver2 schedules for CVTPD2PS were incorrectly double pumping the xmm-load variant instead of the ymm variants (znver1 only)

Also, the xmm-load variant was incorrectly using FP03 instead of just FP3

Confirmed by the AMD SoG 17h tables, Agner + uops.info

Another step towards removing a lot of unnecessary overrides from all the x86 scheduler models - these should hopefully be convertible into regular WriteCvtPD2I classes soon.
2022-11-13 11:13:30 +00:00