llvm-project

Author	SHA1	Message	Date
Ricardo Jesus	4d96a6b7be	[AArch64] Fix N2 SchedModel for arithmetic and logic ops with cheap LSL According to the N2 Software Optimization Guide, arithmetic ops with LSL ≤ 4, no flagset logical ops, and flagset logical ops with LSL = 0 have a latency of 1 and use pipeline group I. However, most of these ops were being modelled as having a latency of 2 and using pipeline M. The affected instructions include the "unshifted" versions of ADD/SUB, among others. Differential Revision: https://reviews.llvm.org/D145370	2023-03-10 12:37:59 +00:00
Ganesh Gopalasubramanian	ffdd5a330c	[X86] AMD Genoa (znver4) Scheduler model update	2023-03-09 01:03:23 +05:30
Haohai Wen	5c3c176ccb	Revert "[X86] Revise Alderlake P-Core schedule model" This reverts commit 3083b65c3494b912e622a006a1b563a7e9f1d508. Since latency from intel doc doesn't reflect worst case.	2023-03-07 11:01:41 +08:00
Haohai Wen	3083b65c34	[X86] Revise Alderlake P-Core schedule model The previous Alderlake P-Core model prefer data from uops.info than intel doc. Some measures latency from uops.info is larger than real latency. e.g. addpd latency is 3 in uops.info while 2 in intel doc. This patch adjust the priority of those two data source so that intel doc is more preferable. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D144388	2023-03-01 07:39:30 +08:00
Sjoerd Meijer	314e431406	[AArch64] Fix N2 SchedModel element-to-element INS latencies The instruction regexp "^INSv" for the insert gen-reg-to-element was also matching the element-to-element instruction, which only has a latency of 2 and not 5, so we were getting that wrong. Differential Revision: https://reviews.llvm.org/D144508	2023-02-22 10:55:55 +00:00
Paul Walker	9394088ca0	[SVE][InstrFormats] Explcitly set hasSideEffects for all SVE instructions. The instruction property hasSideEffects relies on the presence of tablegen isel patterns when constructing its value, unless specifically overriden. Since adding SVE scheduling information we've noticed this property flip-flop as isel patterns have been updated. To make things consistent (and correct) this patch explicitly sets the property for all SVE instructions. This has resulted in the following notable changes: * Normal load and store instructions no longer report having side effects. * All prefetch instructions correctly report having side effects. * FFR related instructions continue to report having side effects. This is likely overkill but I've chosen to remain cautious here. * Most all integer instructions no longer report having side effects. * Most all floating point instructions no longer report having side effects, but do now report their potential for raising FP exceptions. I do not know how to test the latter so I've again took a caution route of taging all floating point instructions except for DUPs. * The conflict detection intrinsics now report they don't touch memory. NOTE: SVE isel makes significant use of psuedo instructions but this patch makes no effort to update them. NOTE: We'll need a similar patch for SME but without a scheduling model it'll be harder to verify the results. Differential Revision: https://reviews.llvm.org/D142122	2023-01-25 12:30:46 +00:00
David Green	d8ba9e505a	[ARM] Cortex-M55 Scheduling Model This adds an Arm Cortex-M55 scheduling model, using the information from https://developer.arm.com/documentation/102692/latest/ Differential Revision: https://reviews.llvm.org/D141523	2023-01-21 18:03:24 +00:00
Simon Pilgrim	10cdad4065	[X86] Fix SLM uops/resources counts for XADD/XCHG reg-reg instructions The RMW instructions still need addressing, probably with a new 'WriteXCHGRMW' scheduler class. Based off llvm-exegesis captures, confirmed with Agner + uops.info	2023-01-14 18:34:18 +00:00
zhongyunde	c69d83908a	[AArch64][MachineScheduler] Set no side effect for movprfx The movprfx is a vector copy, so it doesn't access memory. Set the value of hasSideEffects 0 to avoid return true for the hasUnmodeledSideEffects(), which will block the machine scheduler which load/store instructions. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D140680	2022-12-28 01:18:14 +08:00
Simon Pilgrim	e16b4f5b16	[X86] Fix SLM uops/resources counts for CMPXCHG instructions LOCK + CMPXCHG8/CMPXCHG16 variants still need overriding as they are not completely correct - already much better though Based off llvm-exegesis captures, confirmed with Agner + uops.info	2022-12-20 13:07:03 +00:00
Simon Pilgrim	e5abaf8dec	[X86] Fix SLM uops counts for WriteBitTestSetRegRMW instructions The set/reset/complement RMW variants use +1uop compared to the BT read-only instructions Based off llvm-exegesis captures, confirmed with Agner + uops.info	2022-12-19 18:21:31 +00:00
Simon Pilgrim	c39c2cc954	[X86] Fix SLM uops counts for AES instructions Based off llvm-exegesis captures, confirmed with uops.info	2022-12-19 11:03:41 +00:00
Simon Pilgrim	bbf84fcf18	[X86] SandyBridge - fix ADC RMW uop count These should consistently use the fused domain count, not the unfused domain Confirmed with Agner + uops.info	2022-12-17 21:52:44 +00:00
Simon Pilgrim	ed37234f9b	[X86] Fix BMI uop/throughputs on znver1/znver2 Most BMI ops are 2uop and 0.5 throughput - interestingly TZCNTrm doesn't take an extra uop but the other instructions do Confirmed by AMD SoG + Agner	2022-12-17 20:38:40 +00:00
Simon Pilgrim	2bc2bcb246	[X86] All the WriteBLS instructions take 2uops, not 1uop Confirmed by AMD SoG + Agner + uops.info	2022-12-17 15:40:41 +00:00
Peter Waller	15406d2cd6	[AArch64][SVE][ISel] Combine dup of load to replicating load (dup (load) z_or_x_passthrough) => (replicating load) Differential Revision: https://reviews.llvm.org/D139637	2022-12-14 10:34:26 +00:00
Roman Lebedev	680b33b66e	[X86] AMD Zen 3 sched model: FMA ops have inverse throughput of 0.5 Now that exegesis produces meaningful snippets to measure throughtput for instructions with tied operands: `2ffe225d11` the measurements clearly show these instructions to have more optimistic throughtput. There's still some noise in the reports, especially around instructions with memory operands. I'm not sure if we measure those correctly. Fixes https://github.com/llvm/llvm-project/issues/59325	2022-12-11 21:12:55 +03:00
Simon Pilgrim	cff55e1980	[MCA][X86] Add test coverage for PFI instructions	2022-12-11 15:57:47 +00:00
Simon Pilgrim	e42abef9bf	[MCA][X86] Add test coverage for ERI instructions	2022-12-11 15:52:38 +00:00
Simon Pilgrim	794649f317	[MCA][X86] Add missing knotw test	2022-12-11 15:36:06 +00:00
Simon Pilgrim	95880122c0	[MCA][X86] Add missing test coverage for DQ instructions	2022-12-11 15:36:06 +00:00
Phoebe Wang	b1221d10dd	[X86][ConstraintFP] Model `MXCSR` when load/store it This patch partially fixes #59305. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D139246	2022-12-08 09:48:30 +08:00
Simon Pilgrim	f51170bffd	[X86] Fix SLM ldmxcsr/stmxcsr schedule classes Fix a long standing FIXME comment using a mixture of llvm-exegesis and Agner numbers	2022-11-28 17:43:17 +00:00
Simon Pilgrim	c65d5d4aec	[X86] Remove unnecessary (V)?PBLENDW(Y)?rm overrides The znver1/znver2 overrides shouldn't need 2uops for the xmm case (but znver1 should double-pump for the ymm case). Found with the help of D138359	2022-11-28 16:32:55 +00:00
Simon Pilgrim	026df9514e	[X86] Remove unnecessary VBLENDWYrr overrides The znver2 override already matched the WriteBlendY class exactly, and the znver1 override wasn't accounting for ymm double-pumping. Found with the help of D138359	2022-11-27 16:54:47 +00:00
Simon Pilgrim	2285ba9acc	[X86] Fix uops counts for SLM extract/extract-store instructions Matches Intel AoM + Agner	2022-11-27 16:16:36 +00:00
Simon Pilgrim	746cf4f13f	[X86] Synchronise scheduler classes of VPERM2F128/VBROADCASTF128/VEXTRACTF128/VINSERTF128 with I128 equivalents znver1/znver2 has barely any difference in behaviour between the AVX1/2 variants of these instructions - it looks like it was a copy+paste mistake to miss the AVX2 integer domain instructions in the overrides. Having said that the override numbers don't appear to match the numbers in the AMD 17h SoGs very well - for instance vperm2f128/vperm2i128 might be microcoded from the AMD sense of >3 uops, but it doesn't have a 100cy latency..... These will need to be further addressed.	2022-11-21 17:15:47 +00:00
Simon Pilgrim	89365b159e	[X86] IceLakeServer - PACKS instructions take latency 3cy This appears to be a slow down vs Skylake (which the model was copied off) - confirmed with uops.info / instlatx64 Noticed as D138359 was reporting that many of the PACKS overrides were redundant, but were in fact incorrect	2022-11-20 19:28:35 +00:00
Simon Pilgrim	7de156d1cc	[MCA][X86] Add missing test coverage for BWI instructions	2022-11-20 17:19:58 +00:00
Simon Pilgrim	421bdc119a	[MCA][X86] Add test coverage for IFMA instructions	2022-11-20 17:19:58 +00:00
Simon Pilgrim	6a8fabf5c3	[MCA][X86] Add test coverage for XSAVE instructions	2022-11-20 13:56:04 +00:00
Simon Pilgrim	9148aeac00	[X86] Remove unnecessary string instruction overrides from znver1/znver2 models Reported by D138359 - they were being overridden as WriteMicrocoded despite already being declared WriteMicrocoded It also fixes a rather funny instregex mismatch that was matching the movsldup shuffle by mistake	2022-11-20 12:57:44 +00:00
Simon Pilgrim	357f1c4ef1	[X86] Improve LOOP/LOOPE/LOOPNE schedule on SandyBridge model D138359 was reporting that this override was superfluous, but it had never been setup - I took the numbers from uops.info (I couldn't find an estimate in Intel docs).	2022-11-20 12:13:02 +00:00
Simon Pilgrim	420d02bb55	[MCA][X86] Add test coverage for LOOP/LOOPE/LOOPNE instructions These were missed for some reason - only noticed this while investigating a FIXME in the SandyBridge model Also sync the znver2/znver3 tests which had been missed when LOCK test coverage was added	2022-11-20 11:35:21 +00:00
Simon Pilgrim	13fd7373b6	[X86] znver2 - (V)EXTRACTPSrr takes 2 uops D138359 was reporting that the EXTRACTPSrr override was unnecessary, however the AMD SoG and Agner both confirm that both the rr and rm versions take 2uops (matching znver1)	2022-11-20 09:24:55 +00:00
Simon Pilgrim	474e41f1b9	[MCA][X86] Add test coverage for BF16 instructions	2022-11-19 21:46:23 +00:00
Simon Pilgrim	ba5714d773	[MCA][X86] Add test coverage for VP2INTERSECT instructions NOTE: For IceLakeServer we actually test TigerLake as that's the only target that supports it (we do something similar for F16C on IvyBridge in the SandyBridge tests).	2022-11-19 21:46:23 +00:00
Simon Pilgrim	420d0d3aa6	[MCA][X86] Add test coverage for VAES instructions	2022-11-19 21:02:19 +00:00
Simon Pilgrim	aae08b1d37	[MCA][X86] Add test coverage for BITALG instructions	2022-11-19 12:04:45 +00:00
Simon Pilgrim	91deae999a	[MCA][X86] Add test coverage for VPCLMULQDQ instructions	2022-11-18 21:22:10 +00:00
Simon Pilgrim	ffe05b8f57	[MCA][X86] Add missing IceLake test coverage for VPOPCNTDQ instructions	2022-11-18 20:58:29 +00:00
Simon Pilgrim	4c854120c2	[MCA][X86] Add test coverage for AVX512CD instructions	2022-11-18 20:58:29 +00:00
Simon Pilgrim	c6a838e9c8	[MCA][X86] Add test coverage for VBMI instructions	2022-11-16 16:58:26 +00:00
Simon Pilgrim	896271dbea	[MCA][X86] Ensure the avx512 gfni tests use the upper xmm/ymm registers Ensure we're testing the avx512vl gfni instructions and not the avx gfni instructions	2022-11-15 11:06:59 +00:00
Simon Pilgrim	7e78685752	[MCA][X86] Ensure the avx512 vnni tests use the upper xmm/ymm registers Ensure we're testing the avx512vl vnni instructions and not the avx vnni instructions	2022-11-14 16:29:31 +00:00
Simon Pilgrim	d7208b0404	[MCA][X86] Add test coverage for VBMI2 instructions	2022-11-14 16:29:31 +00:00
Simon Pilgrim	e5120a43d5	[X86] Update WriteMPSAD class and remove VMPSADBWrri override AMD 15h SoG + Agner both indicate there's no difference between MPSADBWrri + VMPSADBWrri - I can't find any data on the folded variant so I've kept the existing numbers Removes the last X86 override for WriteMPSAD/WritePSADBW classes - removing a further 3 entries from every sched class table	2022-11-13 15:19:37 +00:00
Simon Pilgrim	6a99f23845	[MCA][X86] Add test coverage for VDBPSADBW instructions	2022-11-13 15:19:36 +00:00
Simon Pilgrim	313a4aef7f	[X86] Fix scheduler tag for GFNI YMM instructions These were hardcoded to XMM width	2022-11-13 14:10:09 +00:00
Simon Pilgrim	e19cb9c57f	[X86] Cleanup CVTPD2PS schedule values The znver1/znver2 schedules for CVTPD2PS were incorrectly double pumping the xmm-load variant instead of the ymm variants (znver1 only) Also, the xmm-load variant was incorrectly using FP03 instead of just FP3 Confirmed by the AMD SoG 17h tables, Agner + uops.info Another step towards removing a lot of unnecessary overrides from all the x86 scheduler models - these should hopefully be convertible into regular WriteCvtPD2I classes soon.	2022-11-13 11:13:30 +00:00

1 2 3 4 5 ...

774 Commits