Syntacore SCR7 is rv64imafdcv_zba_zbb_zbc_zbs_zkn.
Scheduling model for RVV will be added later.
Overview: https://syntacore.com/products/scr7
---------
Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com>
Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com>
Co-authored-by: Elena Lepilkina <elena.lepilkina@syntacore.com>
Before this patch, the pipeline selection logic in
ResourceManager::issueInstruction() didn't know how to correctly handle
instructions which consume multiple partially overlapping resource
groups. In some cases (like the test case from #108157), the inability
to correctly allocate resources on instruction issue was leading to
crashes.
The presence of multiple partially overlapping groups complicates the
selection process by introducing extra constraints. For those cases, the
issue logic now prioritizes groups which are more constrained than
others.
Fixes#108157
This patch adds information about VPXORDZrr to the znver4 scheduling
model, particularly that it is a zero-idiom.
This fixes a proximal cause of #108157.
These are performed on SKLPort01 (+ SKLPort5/SKLPort23 for rr/rm shuffles/loads)
Also, cleanup some MMX CVT overrides that match the SSE equivalents.
Matches uops.info + Agner
This doesn't match uops.info yet - but it matches the existing vpscatterdq/vscatterqpd entries like uops.info says it should
Reapplied with codegen fix for scatter-schedule.ll
Fixes#105675
The SiFiveP400 scheduler model did not support vector or vector crypto.
With the addition of the sifive-p470 processor, this model needs to support
these extensions.
The processors who use this model but do not have vector or vector
crypto will never produce these instructions, so there is no impact to these
processors.
Co-authored-by: Min Hsu <min.hsu@sifive.com>
AMD refer to them as microcoded, but not in the same way as LLVM - the uop count and pipe usage is high but predictable
Confirmed with Agner + uops.info.
This appears to be a copy+paste error from znver1 (which isn't really microcoded either - but it is rather complex!).
Confirmed with Agner + uops.info.
Syntacore SCR4 is a microcontroller-class processor core that has much
in common with SCR3, but also supports F and D extensions.
Overview: https://syntacore.com/products/scr4
Syntacore SCR5 is an entry-level Linux-capable 32/64-bit RISC-V
processor core which scheduling model almost match SCR4.
Overview: https://syntacore.com/products/scr5
Co-authored-by: Dmitrii Petrov <dmitrii.petrov@syntacore.com>
Co-authored-by: Anton Afanasyev <anton.afanasyev@syntacore.com>
These should only consume 1cy on either of the 2 pipes (only zmm ops should double pump) - matches AMD SoG + uops.info
Noticed while updating costs for #90748
Currently we assume a constant latency of 100 cycles for call
instructions. This commit allows the user to specify a custom value for
the same as a command line argument. Default latency is set to 100.
This patch overrides the clearsSuperRegisters method defined in
MCInstrAnalysis to identify register writes that clear the upper portion
of all super-registers on AArch64 architecture.
On AArch64, a write to a general-purpose register of 32-bit data size is
defined to use the lower 32-bits of the register and zero extend the
upper 32-bits.
Similarly, SIMD and FP instructions operating on scalar data only access
the lower bits of the SIMD&FP register. The unused upper bits are
cleared to zero on a write.
This also applies to SIMD vector registers when the element size in bits
multiplied by the number of lanes is lower than 128. The upper 64 bits
of the vector register are cleared to zero on a write.
Note: This patch is distinct from the previous one titled
"[llvm-mca] Move bad-input.s test to be target specific"
This is a followup to #90474 and commit
afc10fc9b7ce3d23d9012f5a1496e849fe873ba2
Context: Builders failing because they're unable to run the failure
test.
This still doesn't work in various circumstances, it seems MCA doesn't
want to run on a wide variety of hosts in various configurations, so
stick to the tried and tested method and pass -mtriple and -mcpu.
... for now.
This is a follow up to #90474 in response to build bot failures.
This test is intended to check a case where invalid assembly is passed
to llvm-mca.
Unfortunately it appears that a cross-toolchain built with
-DTOOLCHAIN_TARGET_TRIPLE does not have an llvm-mca which works out of
the box if the host target is not enabled.
As a quick fix to make the build bots green, move the test into AArch64
and X86 so that there is reasonable coverage for this test; later I hope
mca can be fixed to work out of the box in this configuration.
[llvm-mca] Abort on parse error without -skip-unsupported-instructions
Prior to this patch, llvm-mca would continue executing after parse
errors. These errors can lead to some confusion since some analysis
results are printed on the standard output, and they're printed after
the errors, which could otherwise be easy to miss.
However it is still useful to be able to continue analysis after errors;
so extend the recently added -skip-unsupported-instructions to support
this.
Two tests which have parse errors for some of the 'RUN' branches are
updated to use -skip-unsupported-instructions so they can remain as-is.
Add a description of -skip-unsupported-instructions to the llvm-mca
command guide, and add it to the llvm-mca --help output:
```
--skip-unsupported-instructions=<value> - Force analysis to continue in the presence of unsupported instructions
=none - Exit with an error when an instruction is unsupported for any reason (default)
=lack-sched - Skip instructions on input which lack scheduling information
=parse-failure - Skip lines on the input which fail to parse for any reason
=any - Skip instructions or lines on input which are unsupported for any reason
```
Tests within this patch are intended to cover each of the cases.
Reason | Flag | Comment
--------------|------|-------
none | none | Usual case, existing test suite
lack-sched | none | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | none | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s
any | none | (N/A, covered above)
lack-sched | any | Continues, prints warnings, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | any | Continues, prints errors, tested in llvm/test/tools/llvm-mca/bad-input.s
lack-sched | parse-failure | Advises user to use -skip-unsupported-instructions=lack-sched, tested in llvm/test/tools/llvm-mca/X86/BtVer2/unsupported-instruction.s
parse-failure | lack-sched | Advises user to use -skip-unsupported-instructions=parse-failure, tested in llvm/test/tools/llvm-mca/bad-input.s
none | * | This would be any test case with skip-unsupported-instructions, coverage added in llvm/test/tools/llvm-mca/X86/BtVer2/simple-test.s
any | * | (Logically covered by the other cases)
The vector crypto instructions may have different scheduling behavior
compared to VALU operations. Instead of using scheduling resources that
describe VALU operations, we give these instructions their own
scheduling resources. This is similar to what we did for Zb* instructions.
The sifive-p670 has vector crypto, so we model behavior for these instructions
in the P600SchedModel. The numbers are based off of measurements collected
internally. These numbers are a bit old and new measurements show that they may
not be fully accurate. It is likely that we will refine these numbers in a
follow up patch(s) based on new measurements.
This PR is stacked on #89256.
Constant registers like the zero registers XZR and WZR are treated as
any other register by LLVM-MCA. This can create non existent dependency
chains.
Currently there is no method in MCA to query if a register is constant.
This patch fixes the issue by adding a bool Constant
variable to MCRegisterDesc that is true for constant registers. Since
constant registers do not create dependencies, it makes sense to add
this check to MCA.
Prior to this patch, if llvm-mca encountered an instruction which parses
but has no scheduler info, the instruction is always reported as
unsupported, and llvm-mca halts with an error.
However, it would still be useful to allow MCA to continue even in the
case of instructions lacking scheduling information. Obviously if
scheduling information is lacking, it's not possible to give an accurate
analysis for those instructions, and therefore a warning is emitted.
A user could previously have worked around such unsupported instructions
manually by deleting such instructions from the input, but this provides
them a way of doing this for bulk inputs where they may not have a list
of such unsupported instructions to drop up front.
Note that this behaviour of instructions with no scheduling information
under -skip-unsupported-instructions is analagous to current
instructions which fail to parse: those are currently dropped from the
input with a message printed, after which the analysis continues.
~Testing the feature is a little awkward currently, it relies on an
instruction
which is currently marked as unsupported, which may not remain so;
should the
situation change it would be necessary to find an alternative
unsupported
instruction or drop the test.~
A test is added to check that analysis still reports an error if all
instructions are removed from the input, to mirror the current behaviour
of giving an error if no instructions are supplied.
This is an extension to 07151f0241d3f893cb36eb2dbc395d4098f74a87 which handled SandyBridge so we at least model the regression identified in #14640
Confirmed by Agner + uops.info/uica (SkylakeServer also had an incorrect use of Port015 instead of just Port01)
I raised #86669 as a proposal for a 'x86 unfold' pass that can unfold these (if we have the free registers) driven by the scheduler model.
This changs the way the assembly matcher works for Aarch32 parsing.
Previously there was a pile of hacks which dictated whether the CC,
CCOut, and VCC operands should be present which de-facto chose if the
wide/narrow (or thumb1/thumb2/arm) instruction version were chosen.
This meant much of the TableGen machinery present for the assembly
matching was effectively being bypassed and worked around.
This patch makes the CC and CCOut operands optional which allows the ASM
matcher operate as it was designed and means we can avoid doing some of
the hacks done previously. This also adds the option for the target to
allow the prioritizing the smaller instruction encodings as is required
for Aarch32.