llvm-project

Author	SHA1	Message	Date
Franklin	e45f9aa7fa	[AArch64] Initial sched model for Neoverse N3 (#106371 ) References: * Arm Neoverse N3 Software Optimization Guide * Arm A64 Instruction Set for A-profile architecture	2024-09-19 19:22:24 +01:00
Franklin	ef34cba1c3	[AArch64] Fix sched model of Neoverse N2 (#106376 ) * fix write order of "Load vector reg, immed post-index" * fix a typo	2024-09-18 09:33:57 +01:00
Aiden Grossman	6b78ea8b75	[X86] Complete AMD znver4 AVX512 zeroing idioms (#108740 ) This patch completes scheduling information for the AVX512 zeroing idioms according to the znver4 software optimization guide.	2024-09-17 11:07:24 -07:00
Anton Sidorenko	09fc178180	[RISCV] Add scheduling model for Syntacore SCR7 (#108814 ) Syntacore SCR7 is rv64imafdcv_zba_zbb_zbc_zbs_zkn. Scheduling model for RVV will be added later. Overview: https://syntacore.com/products/scr7 --------- Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com> Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com> Co-authored-by: Elena Lepilkina <elena.lepilkina@syntacore.com>	2024-09-17 18:52:55 +03:00
Andrea Di Biagio	6784202b6b	[MCA][ResourceManager] Fix a bug in the instruction issue logic. (#108386 ) Before this patch, the pipeline selection logic in ResourceManager::issueInstruction() didn't know how to correctly handle instructions which consume multiple partially overlapping resource groups. In some cases (like the test case from #108157), the inability to correctly allocate resources on instruction issue was leading to crashes. The presence of multiple partially overlapping groups complicates the selection process by introducing extra constraints. For those cases, the issue logic now prioritizes groups which are more constrained than others. Fixes #108157	2024-09-16 09:48:42 +01:00
Aiden Grossman	ee40ffd1ee	[X86] Recognize VPXORDZrr as a zero-idiom on Znver4 (#108314 ) This patch adds information about VPXORDZrr to the znver4 scheduling model, particularly that it is a zero-idiom. This fixes a proximal cause of #108157.	2024-09-12 09:27:33 -07:00
Simon Pilgrim	aa95b5c121	[X86] Fix Skylake/Icelake port usage for MMX PACK instructions Matches uops.info + Agner	2024-08-27 16:32:29 +01:00
Simon Pilgrim	83de8c2369	[X86] Fix SkylakeClient ports for int-to-double conversions These are performed on SKLPort01 (+ SKLPort5/SKLPort23 for rr/rm shuffles/loads) Also, cleanup some MMX CVT overrides that match the SSE equivalents. Matches uops.info + Agner	2024-08-27 16:32:29 +01:00
Simon Pilgrim	12e0e312c6	[X86] Fix Skylake/Icelake uops for masked stored Matches uops.info + Agner	2024-08-27 16:32:29 +01:00
Simon Pilgrim	cf6cd1fd67	[MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedule data (REAPPLIED) This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should Reapplied with codegen fix for scatter-schedule.ll Fixes #105675	2024-08-23 10:32:19 +01:00
Chris Apple	e738c816f2	Revert "[MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedu… (#105716 ) …le data" This reverts commit 2c1f0642a2647883f35463aebf4f90a6b1f158c1. Many build failures in: CodeGen/X86/scatter-schedule.ll Example of a build failure: https://lab.llvm.org/buildbot/#/builders/155/builds/1675	2024-08-22 11:55:31 -07:00
Simon Pilgrim	2c1f0642a2	[MCA][X86] Add missing 512-bit vpscatterqd/vscatterqps schedule data This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should Fixes #105675	2024-08-22 18:29:43 +01:00
Simon Pilgrim	7faf2c95a4	[MCA][X86] Add scatter instruction test coverage for #105675 Missed IceLakeServer when I updated the other CPUs in 6ec4c9c3eb4a556f848dac37a2d6f0d46ecc6f02	2024-08-22 18:29:43 +01:00
Simon Pilgrim	6ec4c9c3eb	[MCA][X86] Add scatter instruction test coverage for #105675	2024-08-22 17:22:49 +01:00
Phil Camp	42386dc46d	[llvm-mca] Add bottle-neck analysis to JSON output. (#90056 ) This patch implements the bottle-neck analysis data in the JSON dump mode.	2024-08-19 17:16:19 +01:00
Michael Maitland	7efa068f7a	[RISCV] Add vector and vector crypto to SiFiveP400 scheduler model (#102155 ) The SiFiveP400 scheduler model did not support vector or vector crypto. With the addition of the sifive-p470 processor, this model needs to support these extensions. The processors who use this model but do not have vector or vector crypto will never produce these instructions, so there is no impact to these processors. Co-authored-by: Min Hsu <min.hsu@sifive.com>	2024-08-19 09:41:42 -04:00
Simon Pilgrim	9e3e8b5715	[X86] VPERM2*128 instructions aren't microcoded on znver1 AMD refer to them as microcoded, but not in the same way as LLVM - the uop count and pipe usage is high but predictable Confirmed with Agner + uops.info.	2024-08-19 10:05:15 +01:00
Simon Pilgrim	5ab65a6c1c	[X86] VPERM2*128 instructions aren't microcoded on znver2 This appears to be a copy+paste error from znver1 (which isn't really microcoded either - but it is rather complex!). Confirmed with Agner + uops.info.	2024-08-19 10:05:15 +01:00
Anton Sidorenko	5ab99bf1a7	[RISCV] Add scheduling model for Syntacore SCR4 and SCR5 (#102909 ) Syntacore SCR4 is a microcontroller-class processor core that has much in common with SCR3, but also supports F and D extensions. Overview: https://syntacore.com/products/scr4 Syntacore SCR5 is an entry-level Linux-capable 32/64-bit RISC-V processor core which scheduling model almost match SCR4. Overview: https://syntacore.com/products/scr5 Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com> Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com>	2024-08-14 11:42:31 +03:00
Simon Pilgrim	888ef0f0fc	[X86] Fix pipe resources for INT (V)PEXTR* instructions IceLakeServer can use ICXPort15 for (V)PEXTR* (but only ICXPort5 for (V)EXTRACTPS) Confirmed with uops.info + Agner	2024-08-05 10:36:28 +01:00
Simon Pilgrim	3276ee3022	[llvm-mca][x86] Add test coverage for evex variant of vextractps	2024-08-05 10:32:17 +01:00
Simon Pilgrim	a48222451d	[X86] Add missing PSUBQ handling to SandyBridge model Matches PADDQ modelling, confirmed with numbers from uops.info and Agner	2024-07-26 15:54:59 +01:00
Simon Pilgrim	ed8c561170	[X86] haswell/broadwell only uses port5 for mmx pack reg-reg instructions Matches numbers from uops.info, Agner and instlatx64.	2024-07-26 15:54:58 +01:00
Simon Pilgrim	94e966255f	[X86] skylake only uses Port0 for (v)phminposuw instructions Now matches skylake-server - and matches reports from uops.info, Agner and instlatx64.	2024-07-26 11:21:59 +01:00
Rin Dobrescu	7c18195563	[AArch64] Add flag setting instructions to scheduling model. (#96880 ) Some flag setting instructions (such as ANDS, ADDS, CCMN) were missing from the V2 scheduling model. This patch adds them in.	2024-06-28 10:11:49 +01:00
Anton Sidorenko	2d84e0ffef	[RISCV] Add scheduling model for Syntacore SCR3 (#95427 ) Syntacore SCR3 is a microcontroller-class processor core. Overview: https://syntacore.com/products/scr3 Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com>	2024-06-25 11:34:59 +03:00
Simon Pilgrim	630a6dd687	[X86] Fix throughput of AVX2/AVX512VL vector extension/truncations These should only consume 1cy on either of the 2 pipes (only zmm ops should double pump) - matches AMD SoG + uops.info Noticed while updating costs for #90748	2024-06-16 10:03:21 +01:00
Simon Pilgrim	62e2eb2154	[X86] Fix pipe resources for FP HADD/SUB instructions IceLakeServer/SkylakeServer can only use Port01 for the FADD/FSUB stage Confirmed with uops.info + Agner	2024-06-06 14:16:57 +01:00
Simon Pilgrim	24a39f364d	[X86] Fix pipe resources for HADD/SUB instructions IceLakeServer was copying these from SkylakeServer, but integer HADD/SUB can now run on an extra port	2024-06-06 14:11:00 +01:00
Min-Yih Hsu	6147a7b5f9	[RISCV] Adjust FP load latencies from 6 to 5 in SiFiveP400/P600 scheduling models (#93735 ) According to our performance measurements, FLH/W/D have load latencies closer to 5 rather than 6 in these two models.	2024-05-30 08:47:27 -07:00
Chinmay Deshpande	848bef5d85	[llvm-mca] Add command line option -call-latency (#92958 ) Currently we assume a constant latency of 100 cycles for call instructions. This commit allows the user to specify a custom value for the same as a command line argument. Default latency is set to 100.	2024-05-22 13:51:55 -07:00
Rin Dobrescu	267de8543c	[llvm-mca][AArch64] Add AArch64 version of clearsSuperRegisters. (#92548 ) This patch overrides the clearsSuperRegisters method defined in MCInstrAnalysis to identify register writes that clear the upper portion of all super-registers on AArch64 architecture. On AArch64, a write to a general-purpose register of 32-bit data size is defined to use the lower 32-bits of the register and zero extend the upper 32-bits. Similarly, SIMD and FP instructions operating on scalar data only access the lower bits of the SIMD&FP register. The unused upper bits are cleared to zero on a write. This also applies to SIMD vector registers when the element size in bits multiplied by the number of lanes is lower than 128. The upper 64 bits of the vector register are cleared to zero on a write.	2024-05-22 15:31:35 +01:00
Peter Waller	458d706741	[llvm-mca] Make bad-input.s even more CPU specific Note: This patch is distinct from the previous one titled "[llvm-mca] Move bad-input.s test to be target specific" This is a followup to #90474 and commit afc10fc9b7ce3d23d9012f5a1496e849fe873ba2 Context: Builders failing because they're unable to run the failure test. This still doesn't work in various circumstances, it seems MCA doesn't want to run on a wide variety of hosts in various configurations, so stick to the tried and tested method and pass -mtriple and -mcpu.	2024-05-07 13:10:40 +01:00
Peter Waller	afc10fc9b7	[llvm-mca] Move bad-input.s test to be target specific ... for now. This is a follow up to #90474 in response to build bot failures. This test is intended to check a case where invalid assembly is passed to llvm-mca. Unfortunately it appears that a cross-toolchain built with -DTOOLCHAIN_TARGET_TRIPLE does not have an llvm-mca which works out of the box if the host target is not enabled. As a quick fix to make the build bots green, move the test into AArch64 and X86 so that there is reasonable coverage for this test; later I hope mca can be fixed to work out of the box in this configuration.	2024-05-07 12:50:20 +01:00
Peter Waller	1de0535e84	[llvm-mca] Abort on parse error without -skip-unsupported-instructions (#90474 ) [llvm-mca] Abort on parse error without -skip-unsupported-instructions Prior to this patch, llvm-mca would continue executing after parse errors. These errors can lead to some confusion since some analysis results are printed on the standard output, and they're printed after the errors, which could otherwise be easy to miss. However it is still useful to be able to continue analysis after errors; so extend the recently added -skip-unsupported-instructions to support this. Two tests which have parse errors for some of the 'RUN' branches are updated to use -skip-unsupported-instructions so they can remain as-is. Add a description of -skip-unsupported-instructions to the llvm-mca command guide, and add it to the llvm-mca --help output: ``` --skip-unsupported-instructions=<value> - Force analysis to continue in the presence of unsupported instructions =none - Exit with an error when an instruction is unsupported for any reason (default) =lack-sched - Skip instructions on input which lack scheduling information =parse-failure - Skip lines on the input which fail to parse for any reason =any - Skip instructions or lines on input which are unsupported for any reason ``` Tests within this patch are intended to cover each of the cases. Reason \| Flag \| Comment --------------\|------\|------- none \| none \| Usual case, existing test suite lack-sched \| none \| Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s parse-failure \| none \| Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s any \| none \| (N/A, covered above) lack-sched \| any \| Continues, prints warnings, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s parse-failure \| any \| Continues, prints errors, tested in llvm/test/tools/llvm-mca/bad-input.s lack-sched \| parse-failure \| Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s parse-failure \| lack-sched \| Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s none \| * \| This would be any test case with skip-unsupported-instructions, coverage added in llvm/test/tools/llvm-mca/X86/BtVer2/simple-test.s any \| * \| (Logically covered by the other cases)	2024-05-07 09:13:44 +01:00
Michael Maitland	56b8bd7744	[RISCV] Add Sched classes for vector crypto instructions (#90068 ) The vector crypto instructions may have different scheduling behavior compared to VALU operations. Instead of using scheduling resources that describe VALU operations, we give these instructions their own scheduling resources. This is similar to what we did for Zb* instructions. The sifive-p670 has vector crypto, so we model behavior for these instructions in the P600SchedModel. The numbers are based off of measurements collected internally. These numbers are a bit old and new measurements show that they may not be fully accurate. It is likely that we will refine these numbers in a follow up patch(s) based on new measurements. This PR is stacked on #89256.	2024-05-03 11:11:29 -04:00
Michael Maitland	4821882cdf	[RISCV][llvm-mca] Add vector crypto llvm-mca tests for P600	2024-05-03 08:03:39 -07:00
Rin Dobrescu	385f59f9f5	[llvm-mca] Teach MCA constant registers do not create dependencies (#89387 ) Constant registers like the zero registers XZR and WZR are treated as any other register by LLVM-MCA. This can create non existent dependency chains. Currently there is no method in MCA to query if a register is constant. This patch fixes the issue by adding a bool Constant variable to MCRegisterDesc that is true for constant registers. Since constant registers do not create dependencies, it makes sense to add this check to MCA.	2024-05-03 10:30:22 +01:00
Peter Waller	a19a4113df	[llvm-mca] Fix -skip-unsupported-instruction tests on Windows Builder alerted me to the failing test, attempt #1 in the blind.	2024-04-29 09:04:41 +01:00
Peter Waller	5f79f7506a	[llvm-mca] Add -skip-unsupported-instructions option (#89733 ) Prior to this patch, if llvm-mca encountered an instruction which parses but has no scheduler info, the instruction is always reported as unsupported, and llvm-mca halts with an error. However, it would still be useful to allow MCA to continue even in the case of instructions lacking scheduling information. Obviously if scheduling information is lacking, it's not possible to give an accurate analysis for those instructions, and therefore a warning is emitted. A user could previously have worked around such unsupported instructions manually by deleting such instructions from the input, but this provides them a way of doing this for bulk inputs where they may not have a list of such unsupported instructions to drop up front. Note that this behaviour of instructions with no scheduling information under -skip-unsupported-instructions is analagous to current instructions which fail to parse: those are currently dropped from the input with a message printed, after which the analysis continues. ~Testing the feature is a little awkward currently, it relies on an instruction which is currently marked as unsupported, which may not remain so; should the situation change it would be necessary to find an alternative unsupported instruction or drop the test.~ A test is added to check that analysis still reports an error if all instructions are removed from the input, to mirror the current behaviour of giving an error if no instructions are supplied.	2024-04-29 08:39:15 +01:00
Usman Nadeem	cc82f1290a	[AArch64] Update latencies for Cortex-A510 scheduling model (#87293 ) Updated according to the Software Optimization Guide for Arm® Cortex®‑A510 Core Revision: r1p3 Issue 6.0.	2024-04-17 11:42:52 -07:00
Simon Pilgrim	5fd9babbfc	[X86] Rename Zn3FPP# ports -> Zn3FP#. NFC Matches Zn4FP# (which is mostly a copy) and avoids an issue in llvm-exegesis which is terrible at choosing the right portname when they have aliases.	2024-04-04 16:54:33 +01:00
Simon Pilgrim	a69673615b	[X86] Haswell/Broadwell - fix (V)ROUNDri sched behaviours to use 2Port1 We were only using the Port23 memory ports and were missing the 2*Port1 uops entirely. Confirmed by Agner + uops.info/uica	2024-04-04 15:19:07 +01:00
Simon Pilgrim	51107be7dd	[X86] Haswell/Broadwell/Skylake DPPS folded instructions use an extra port06 resource This is an extension to 07151f0241d3f893cb36eb2dbc395d4098f74a87 which handled SandyBridge so we at least model the regression identified in #14640 Confirmed by Agner + uops.info/uica (SkylakeServer also had an incorrect use of Port015 instead of just Port01) I raised #86669 as a proposal for a 'x86 unfold' pass that can unfold these (if we have the free registers) driven by the scheduler model.	2024-04-03 12:28:46 +01:00
Rin Dobrescu	46246683a6	[AArch64] Update Neoverse V2 FSQRT execution units in schedule model. (#86803 ) This patch updates the SVE FSQRT instruction execution units to be able to run on VX0 and VX2.	2024-04-02 10:47:51 +01:00
Simon Pilgrim	5d7e7abc82	[X86] ICX - vector XMM splat use Port 1 or 5 when boradcasting the shift amount Noticed while trying to compare splat vs per-element shift perf stats for #39424 Confirmed with uops.info	2024-03-26 10:07:07 +00:00
Simon Pilgrim	3dcf62b5ee	[X86] HSW/BDW - vector splat shifts don't use Port5 when loading the shift amount Noticed while trying to compare splat vs per-element shift perf stats for #39424 Confirmed with uops.info	2024-03-25 18:22:29 +00:00
David Green	4e29c6acd3	[AArch64] Correct Neoverse V1 SVE 16-bit sdot/udot schedule pipelines. (#86142 ) Fixes #86102	2024-03-25 11:20:56 +00:00
Alfie Richards	295cdd5c3d	[ARM][TableGen][MC] Change the ARM mnemonic operands to be optional for ASM parsing (#83436 ) This changs the way the assembly matcher works for Aarch32 parsing. Previously there was a pile of hacks which dictated whether the CC, CCOut, and VCC operands should be present which de-facto chose if the wide/narrow (or thumb1/thumb2/arm) instruction version were chosen. This meant much of the TableGen machinery present for the assembly matching was effectively being bypassed and worked around. This patch makes the CC and CCOut operands optional which allows the ASM matcher operate as it was designed and means we can avoid doing some of the hacks done previously. This also adds the option for the target to allow the prioritizing the smaller instruction encodings as is required for Aarch32.	2024-03-18 11:25:13 +00:00
Michael Maitland	818e0272f5	[RISCV] Model integer min max instructions from Zbb execute in late-B ALU We don't model the early vs late ALU so we just need to remove usage of SiFivePipeA for these instructions.	2024-03-14 06:02:53 -07:00

1 2 3 4 5 ...

900 Commits