The ldr instructions implicitly zero any upper lanes, so we can use them
for insert(zerovec, load, 0) patterns. Likewise insert(undef, load, 0)
or scalar_to_reg can reuse the scalar loads as the top bits are undef.
This patch makes sure there are patterns for each type and for each of
the normal, unaligned, roW and roX addressing modes.
Refresh of the generic scheduling model to use A510 instead of A55.
Main benefits are to the little core, and introducing SVE scheduling information.
Changes tested on various OoO cores, no performance degradation is seen.
Differential Revision: https://reviews.llvm.org/D156799
Using "eabi" for aarch64 targets is a common mistake and warned by Clang Driver.
We want to avoid it elsewhere as well. Just use the common "aarch64" without
other triple components.
This extends the patterns for loading into the zeroth lane of a zero vector
from D144086 to SVE, which work in the same way as the existing patterns. Only
full length vectors are added here, not the narrower floating point vector
types.
A LDR will implicitly zero the rest of the vector, so vector_insert(zeros,
load, 0) can use a single load. This adds tablegen patterns for both scaled and
unscaled loads, detecting where we are inserting a load into the lower element
of a zero vector.
Differential Revision: https://reviews.llvm.org/D144086