229 Commits

Author SHA1 Message Date
Sjoerd Meijer
9bccf61f5f
[AArch64][LV] Set MaxInterleaving to 4 for Neoverse V2 and V3 (#100385)
Set the maximum interleaving factor to 4, aligning with the number of available
SIMD pipelines. This increases the number of vector instructions in the vectorised
loop body, enhancing performance during its execution. However, for very low
iteration counts, the vectorised body might not execute at all, leaving only the
epilogue loop to run. This issue affects e.g. cam4_r from SPEC FP, which
experienced a performance regression. To address this, the patch reduces the
minimum epilogue vectorisation factor from 16 to 8, enabling the epilogue to be
vectorised and largely mitigating the regression.
2024-11-20 09:33:39 +00:00
Anatoly Trosinenko
44076c9822
[AArch64][PAC] Move emission of LR checks in tail calls to AsmPrinter (#110705)
Move the emission of the checks performed on the authenticated LR value
during tail calls to AArch64AsmPrinter class, so that different checker
sequences can be reused by pseudo instructions expanded there.
This adds one more option to AuthCheckMethod enumeration, the generic
XPAC variant which is not restricted to checking the LR register.
2024-11-12 18:27:19 +03:00
David Green
1d6d073fbb [AArch64] Remove FeatureUseScalarIncVL
FeatureUseScalarIncVL is a tuning feature, used to control whether addvl or
add+cnt is used. It was previously added as a dependency for FeatureSVE2, an
architecture feature but this can be seen as a layering violation. The main
disadvantage is that -use-scalar-inc-vl cannot be used without disabling sve2
and all dependant features.

This patch now replaces that with an option that if unset defaults to hasSVE ||
hasSME, but is otherwise overriden by the option. The hope is that no cpus will
rely on the tuning feature (or we can readdit if needed.
2024-11-10 14:51:55 +00:00
Sander de Smalen
5192cb772a [AArch64] Add hidden option to enable subreg liveness tracking.
Subreg liveness tracking is disabled by default for now until all issues
are ironed out. This option allows the feature to be used in tests.
2024-10-30 17:09:56 +00:00
Benjamin Maxwell
ddd463be7e
[AArch64] Add getStreamingHazardSize() to AArch64Subtarget (#113679)
This is defined by the `-aarch64-streaming-hazard-size` option or its
alias `-aarch64-stack-hazard-size` (the original name). It has been
renamed to be more general as this option will (for the time being) be
used to detect if the current target has streaming mode memory hazards.

---------

Co-authored-by: Hari Limaye <hari.limaye@arm.com>
2024-10-28 13:01:22 +00:00
Madhur Amilkanthwar
b73771cf0f
[AArch64] Increase scatter overhead on Neoverse-V2 (#101296)
This patch increases scatter overhead on Neoverse-V2 to 13. This
benefits s128 kernel from TSVC_2 test suite.
SPEC 17, RAJAPerf, and Sptter are unaffected by this patch.

This patch boosts s128 kernel's performance from TSVC test suite by about
40% as this enables vectorization. Also, handle minor code refactoring
for gather related part.
2024-08-14 10:12:40 +05:30
Daniil Kovalev
56fd2472d8
[PAC] Sign LR with B key for non-leaf functions with ptrauth-returns attr (#100552)
For pauthtest ABI, there is a bunch of ptrauth-* options, including
ptrauth-returns. Use "ptrauth-returns" function attribute to indicate
need for LR signing with B key for non-leaf function to avoid using
"sign-return-address" and "sign-return-address-key" which were
originally designed for pac-ret.

Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>
2024-07-25 22:21:03 +03:00
Ahmed Bougacha
b8721fa0af
[AArch64][PAC] Sign block addresses used in indirectbr. (#97647)
Enabled in clang using:
    -fptrauth-indirect-gotos

and at the IR level using function attribute:
    "ptrauth-indirect-gotos"

Signing uses IA and a per-function integer discriminator. The
discriminator isn't ABI-visible, and is currently:
    ptrauth_string_discriminator("<function_name> blockaddress")

A sufficiently sophisticated frontend could benefit from per-indirectbr
discrimination, which would need additional machinery, such as allowing
"ptrauth" bundles on indirectbr. For our purposes, the simple scheme
above is sufficient.

This approach doesn't support subtracting label addresses and using
the result as offsets, because each label address is signed.
Pointer arithmetic on signed pointers corrupts the signature bits,
and because label address expressions aren't typed beyond void*,
we can't do anything reliably intelligent on the arithmetic exprs.
Not signing addresses when used to form offsets would allow
easily hijacking control flow by overwriting the offset.

This diagnoses the basic cases (`&&lbl2 - &&lbl1`) in the frontend,
while we evaluate either alternative implementations (e.g., lowering
blockaddress to a bb number, and indirectbr to a checked jump-table),
or better diagnostics (both at the frontend level and on unencodable
IR constants).
2024-07-22 21:24:39 -07:00
Jon Roelofs
2b33591386
[llvm][AArch64] Support -mcpu=apple-m4 (#95478) 2024-06-14 17:24:45 -07:00
Jonathan Thackray
e80c59556d
[AArch64] Add support for Cortex-A725 and Cortex-X925 (#95214)
Cortex-A725 and Cortex-X925 are Armv9.2 AArch64 CPUs.

Technical Reference Manual for Cortex-A725:
   https://developer.arm.com/documentation/107652/latest

Technical Reference Manual for Cortex-X925:
   https://developer.arm.com/documentation/102807/latest
2024-06-13 00:00:57 +01:00
Wei Zhao
6b9753a0ec
[AArch64] Add support for Qualcomm Oryon processor (#91022)
Oryon is an ARM V8 AArch64 CPU from Qualcomm.

---------

Co-authored-by: Wei Zhao <wezhao@qti.qualcomm.com>
2024-06-06 07:27:50 -07:00
Sander de Smalen
1015f51dd9
[AArch64] NFC: Rename -force-streaming-compatible-sve to -force-streaming-compatible (#92774)
The behaviour of the flag should be equivalent to
__arm_streaming_compatible.

At the moment, the name suggests that '-force-streaming-compatible-sve'
on its own (i.e. without specifying `+sve`) enables the compiler to use
the streaming-compatible subset of SVE instructions, but the semantics
merely are that the function can be called with either PSTATE.SM=0 or
PSTATE.SM=1.
2024-05-22 07:58:54 +01:00
Jonathan Thackray
e50a857fb1
[AArch64] Add support for Cortex-R82AE and improve Cortex-R82 (#90440) 2024-04-30 14:15:01 +01:00
Jonathan Thackray
a670cdadca
[AArch64] Add support for Neoverse-N3, Neoverse-V3 and Neoverse-V3AE (#90143)
Neoverse-N3, Neoverse-V3 and Neoverse-V3AE are Armv9.2 AArch64 CPUs.

Technical Reference Manual for Neoverse-N3:
   https://developer.arm.com/documentation/107997/latest/

Technical Reference Manual for Neoverse-V3:
   https://developer.arm.com/documentation/107734/latest/

Technical Reference Manual for Neoverse-V3AE:
   https://developer.arm.com/documentation/101595/latest/
2024-04-26 13:04:35 +01:00
Tomas Matheson
71c5964f5c
[ARM][AArch64] autogenerate header file for TargetParser from Target tablegen files (#88378)
Introduce a mechanism to share data between the ARM and AArch64 backends and
TargetParser, to reduce duplication of code. This is similar to the current
RISC-V implementation.

The target tablegen file (in this case `ARM.td` or `AArch64.td`) is
processed during building of `TargetParser` to generate the following
files in the build tree:
 - `build/include/llvm/TargetParser/ARMTargetParserDef.inc`
 - `build/include/llvm/TargetParser/AArch64TargetParserDef.inc`

For now, the use of these generated files is limited to files _outside_
of `TargetParser`. The main reason for this is that the modifications to
`TargetParser` will require additional data added to the tablegen files,
which I want to split into separate PRs.
2024-04-24 09:18:36 +01:00
David Green
b24af43fdf
[AArch64] Improve scheduling latency into Bundles (#86310)
By default the scheduling info of instructions into a BUNDLE are given a
latency of 0 as they operate on the implicit register of the bundle.
This modifies that for AArch64 so that the latency is adjusted to use
the latency from the instruction in the bundle instead. This essentially
assumes that the bundled instructions are executed in a single cycle,
which for AArch64 is probably OK considering they are mostly used for
MOVPFX bundles, where this can help create slightly better scheduling
especially for in-order cores.
2024-04-12 10:57:01 +01:00
Arthur Eubanks
94c988bcfd [NFC] Remove unused parameter from shouldAssumeDSOLocal() 2024-03-11 19:48:17 +00:00
Jonathan Thackray
8160139136
Add support for Arm Cortex A78AE CPU (#84485)
Add support for Arm Cortex A78AE CPU

Technical Reference Manual for Arm Cortex A78AE:
   https://developer.arm.com/documentation/101779/0003

Fixes #84450
2024-03-08 16:11:36 +00:00
Fangrui Song
201572e34b
[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel
Clang sets the nonlazybind attribute for certain ObjC features. The
AArch64 SelectionDAG implementation for non-intrinsic calls
(46e36f0953aabb5e5cd00ed8d296d60f9f71b424) is behind a cl option.

GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also
sets the nonlazybind attribute. For SelectionDAG, make the cl option not
affect ELF so that non-intrinsic calls to a dso_preemptable function use
GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls.

For FastISel, change `fastLowerCall` to bail out when a call is due to
-fno-plt.

For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall
and intrinsic calls in AArch64CallLowering::lowerCall (where the
target-independent CallLowering::lowerCall is not called).
The GlobalISel test in `call-rv-marker.ll` is therefore updated.

Note: the current -fno-plt -fpic implementation does not use GOT for a
preemptable function.

Link: #78275

Pull Request: https://github.com/llvm/llvm-project/pull/78890
2024-03-05 13:55:29 -08:00
Philipp Tomsich
fbba818a78
[AArch64] Add the Ampere1B core (#81297)
The Ampere1B is Ampere's third-generation core implementing a
superscalar, out-of-order microarchitecture with nested virtualization,
speculative side-channel mitigation and architectural support for
defense against ROP/JOP style software attacks.

Ampere1B is an ARMv8.7+ implementation, adding support for the FEAT
WFxT, FEAT CSSC, FEAT PAN3 and FEAT AFP extensions. It also includes all
features of the second-generation Ampere1A, such as the Memory Tagging
Extension and SM3/SM4 cryptography instructions.
2024-02-09 15:22:09 -08:00
Yuta Mukai
70eab122bc
[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589)
Add AArch64 implementations for the interfaces of MachinePipeliner pass.
The pass is disabled by default for AArch64. It is enabled by specifying
--aarch64-enable-pipeliner.

5 tests in llvm-test-suites show performance improvement by more than 5%
on a Neoverse V1 processor.

| test | improvement |
| ---------------------------------------------------------------- |
-----------:|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test | 16%
|
| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test | 16%
|
| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test | 14% |
| SingleSource/Benchmarks/Misc/flops-5.test | 13% |
| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test | 6% |

(base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm
-aarch64-enable-pipeliner -mllvm
-pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm
-pipeliner-enable-copytophi=0)

On the other hand, there are cases of significant performance
degradation. Algorithm improvements and adding the option/pragma will be
needed in the future.
2024-02-02 10:33:44 +09:00
Eli Friedman
a6065f0fa5
Arm64EC entry/exit thunks, consolidated. (#79067)
This combines the previously posted patches with some additional work
I've done to more closely match MSVC output.

Most of the important logic here is implemented in
AArch64Arm64ECCallLowering. The purpose of the
AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for
other targets, and generate most of the Arm64EC-specific bits:
generating thunks, mangling symbols, generating aliases, and generating
the .hybmp$x table. This is all done late for a few reasons: to
consolidate the logic as much as possible, and to ensure the IR exposed
to optimization passes doesn't contain complex arm64ec-specific
constructs.

The other changes are supporting changes, to handle the new constructs
generated by that pass.

There's a global llvm.arm64ec.symbolmap representing the .hybmp$x
entries for the thunks. This gets handled directly by the AsmPrinter
because it needs symbol indexes that aren't available before that.

There are two new calling conventions used to represent calls to and
from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few
changes to handle the associated exception-handling info,
SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.

I've intentionally left out handling for structs with small
non-power-of-two sizes, because that's easily separated out. The rest of
my current work is here. I squashed my current patches because they were
split in ways that didn't really make sense. Maybe I could split out
some bits, but it's hard to meaningfully test most of the parts
independently.

Thanks to @dpaoliello for extensive testing and suggestions.

(Originally posted as https://reviews.llvm.org/D157547 .)
2024-01-22 21:28:07 -08:00
Tim Northover
10d6d5f224
AArch64: add support for currently released Apple CPUs. (#73499)
These are still v8.6a and have no real changes as far as LLVM cares, so
it's mostly just a copy/paste job.
2023-11-29 09:51:42 +00:00
Matthew Devereau
cdf6693f07
[AArch64][SME] Add support for sme-fa64 (#70809) 2023-11-20 08:37:52 +00:00
Jonathan Thackray
066c4524bc
[AArch64] Add support for Cortex-A520, Cortex-A720 and Cortex-X4 CPUs (#72395)
Cortex-A520, Cortex-A720 and Cortex-X4 are Armv9.2 AArch64 CPUs.

Technical Reference Manual for Cortex-A520:
   https://developer.arm.com/documentation/102517/latest/

Technical Reference Manual for Cortex-A720:
   https://developer.arm.com/documentation/102530/latest/

Technical Reference Manual for Cortex-X4:
   https://developer.arm.com/documentation/102484/latest/

Patch co-authored by: Sivan Shani <sivan.shani@arm.com>
2023-11-16 22:08:58 +00:00
David Sherwood
bdc0afc871
[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets (#71166)
There are some workloads that are negatively impacted by using jump
tables when the number of entries is small. The SPEC2017 perlbench
benchmark is one example of this, where increasing the threshold to
around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum
threshold based on empirical evidence rather than science, and just
manually increased the threshold until I got the best performance
without impacting other workloads. For neoverse-v1 I saw around ~0.2%
improvement in the SPEC2017 integer geomean, and no overall change for
neoverse-n1. If we find issues with this threshold later on we can
always revisit this.

The most significant SPEC2017 score changes on neoverse-v1 were:

500.perlbench_r: +1.6%
520.omnetpp_r: +0.6%

and the rest saw changes < 0.5%.

I updated CodeGen/AArch64/min-jump-table.ll to reflect the new
threshold. For most of the affected tests I manually set the min number
of entries back to 4 on the RUN line because the tests seem to rely upon
this behaviour.
2023-11-14 13:00:28 +00:00
Anatoly Trosinenko
1d2b558265 [AArch64][PAC] Check authenticated LR value during tail call
When performing a tail call, check the value of LR register after
authentication to prevent the callee from signing and spilling an
untrusted value. This commit implements a few variants of check,
more can be added later.

If it is safe to assume that executable pages are always readable,
LR can be checked just by dereferencing the LR value via LDR.

As an alternative, LR can be checked as follows:

    ; lowered AUT* instruction
    ; <some variant of check that LR contains a valid address>
    b.cond break_block
  ret_block:
    ; lowered TCRETURN
  break_block:
    brk 0xc471

As the existing methods either break the compatibility with execute-only
memory mappings or can degrade the performance, they are disabled by
default and can be explicitly enabled with a command line option.

Individual subtargets can opt-in to use one of the available methods
by updating AArch64FrameLowering::getAuthenticatedLRCheckMethod().

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D156716
2023-10-11 17:38:17 +03:00
Sander de Smalen
7e815dd76d [AArch64][SME] Create new interface for isSVEAvailable.
When a function is compiled to be in Streaming(-compatible) mode, the full
set of SVE instructions may not be available. This patch adds an interface
to query that and changes the codegen for FADDA (not legal in Streaming-SVE
mode) to instead be expanded for fixed-length vectors, or otherwise not to
code-generate for scalable vectors.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D156109
2023-09-01 12:00:36 +00:00
Sander de Smalen
08fd44b300 [AArch64] Force streaming-compatible codegen when attributes are set.
Before this patch, the only way to generate streaming-compatible code
was to use the `-force-streaming-compatible-sve` flag, but the compiler
should also avoid the use of instructions invalid in streaming mode
when a function has the aarch64_pstate_sm_enabled/compatible attribute.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D155428
2023-07-18 10:26:00 +00:00
Sander de Smalen
ec6af93d02 [AArch64] NFC: Replace 'forceStreamingCompatibleSVE' with 'isNeonAvailable'.
The AArch64Subtarget interface 'isNeonAvailable' is more appropriate going
forward, as we may also want to generate 'streaming SVE' code (not just
'streaming-compatible SVE' code), but here we must still make sure not to
use NEON instructions which are invalid in streaming SVE mode.
2023-07-17 08:24:10 +00:00
David Sherwood
c7dbe326df [AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1
This patch enables the tail-folding of simple loops by default
when targeting the neoverse-v1 CPU. Simple loops exclude those
with recurrences or reductions or loops that are reversed.

New tests have been added here:

Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

In terms of SPEC2017 only one benchmark is really affected when
building with "-Ofast -mcpu=neoverse-v1 -flto", which is
(+ faster, - slower):

525.x264: +7.0%

Differential Revision: https://reviews.llvm.org/D130618
2023-05-18 10:35:57 +00:00
Archibald Elliott
4679d7a26a [NFC][ARM][AArch64] Cleanup TargetParser includes
llvm/TargetParser/TargetParser.h now only includes AMDGPU-specific
functionality, the ARM- and AArch64-specific functionality is in other
headers.
2023-03-03 16:24:55 +00:00
Archibald Elliott
8e3d7cf5de [NFC][TargetParser] Remove llvm/Support/TargetParser.h 2023-02-07 11:08:21 +00:00
Archibald Elliott
8c712296fb [NFC][TargetParser] Remove llvm/Support/AArch64TargetParser.h
Removes the forwarding header `llvm/Support/AArch64TargetParser.h`.

I am proposing to do this for all the forwarding headers left after
rGf09cf34d00625e57dea5317a3ac0412c07292148 - for each header:
- Update all relevant in-tree includes
- Remove the forwarding Header

Differential Revision: https://reviews.llvm.org/D140999
2023-02-03 17:34:01 +00:00
Guillaume Chatelet
d6e0ff6074 [NFC] Migrate aarch64 alignment to Align 2023-02-03 16:29:11 +00:00
Mitch Phillips
486729ce06 Re-land: [MTE] Add AArch64GlobalsTagging Pass
Adds an IR pass for -fsanitize=memtag-globals. This pass goes over the
tag-capable global variables, and replaces them with a tagged global
variable of the same contents. This new global variable will have its
size and alignment adjusted if neccesary so that they're both a multiple
of the tag granule size (16 bytes).

Global merge must also be suppressed for tagged globals, as each global
variable must have a unique tag. This can possibly be relaxed in future;
globals that are identical in size, alignment, and content can possibly
be merged. The major problem comes from tail- or head-merging, which if
left unchecked, could have partially-overlapping global variables with
different memory tags, leading to crashes at runtime.

Reviewed By: fmayer, eugenis

Differential Revision: https://reviews.llvm.org/D133392
2023-01-31 13:03:37 -08:00
Mitch Phillips
15e33c699c Revert "[MTE] Add AArch64GlobalsTagging Pass"
This reverts commit 4edfcff71e150770675a19576f698c7bbe788ee2.

Broke the non-aarch64-containing target builds.
https://reviews.llvm.org/D133392 has more context.
2023-01-31 12:25:58 -08:00
Mitch Phillips
4edfcff71e [MTE] Add AArch64GlobalsTagging Pass
Adds an IR pass for -fsanitize=memtag-globals. This pass goes over the
tag-capable global variables, and replaces them with a tagged global
variable of the same contents. This new global variable will have its
size and alignment adjusted if neccesary so that they're both a multiple
of the tag granule size (16 bytes).

Global merge must also be suppressed for tagged globals, as each global
variable must have a unique tag. This can possibly be relaxed in future;
globals that are identical in size, alignment, and content can possibly
be merged. The major problem comes from tail- or head-merging, which if
left unchecked, could have partially-overlapping global variables with
different memory tags, leading to crashes at runtime.

Reviewed By: fmayer, eugenis

Differential Revision: https://reviews.llvm.org/D133392
2023-01-31 09:24:18 -08:00
Philipp Tomsich
fb0af89193 [AArch64] Add the Ampere1A core
The Ampere1A core improves on the Ampere1 with key differences being:
 * memory tagging is supported
 * SM3/SM4 are supported
 * adds a new fusion pair for (A+B+1 and A-B-1)
   (added in a later commit)

Depends on D142395

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D142396
2023-01-24 22:36:39 +01:00
Florian Hahn
830d0bc56b
[AArch64] Set MaxInterleaveFactor for Apple A14, A15, A16.
Those CPUs can benefit from additional interleaving.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D141499
2023-01-11 18:52:51 +00:00
Benjamin Kramer
07e7168048 [AArch64] Stringref'ize AArch64Subtarget constructor. NFCI 2022-12-30 18:02:53 +01:00
Benjamin Maxwell
5eec8dfc2b [AArch64] Add hasSVEorSME() helper and fix some incorrect checks
This adds a little hasSVEorSME() helper, and as a NFC updates existing
code to use it. The assertions get[Min|Max]SVEVectorSizeInBits() are
also now corrected to use hasSVEorSME() rather than just hasSVE().

Differential Revision: https://reviews.llvm.org/D138575
2022-11-24 17:54:37 +00:00
Guozhi Wei
835da13ae0 [AArch64] Correctly recognize -reserve-regs-for-regalloc=X30,X29
In AArch64 backend X30 is named as LR, X29 is named as FP. So the code in AArch64Subtarget::AArch64Subtarget can't recognize these 2 registers.

  for (unsigned i = 0; i < 31; ++i) {
    if (ReservedRegNames.count(TRI->getName(AArch64::X0 + i)))
      ReserveXRegisterForRA.set(i);
  }

This patch add code to explicitly handle these 2 registers.

Differential Revision: https://reviews.llvm.org/D137810
2022-11-22 17:18:29 +00:00
Victor Campos
9d1ff787e5 [AArch64] Add support for the Cortex-X3 CPU
Cortex-X3 is an Armv9-A AArch64 CPU.

This patch introduces support for Cortex-X3.

Technical Reference Manual: https://developer.arm.com/documentation/101593/latest

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D136589
2022-11-09 11:33:48 +00:00
Simi Pallipurath
fa8aeab606 [AArch64] Add support for the Cortex-A715 CPU
Cortex-A715 is an Armv9-A AArch64 CPU.

This patch introduces support for Cortex-A715.

Technical Reference Manual: https://developer.arm.com/documentation/101590/latest.

Reviewed By: vhscampos

Differential Revision: https://reviews.llvm.org/D136957
2022-11-03 09:28:46 +00:00
Eli Friedman
a6ac968360 [Arm64EC] Refer to dllimport'ed functions correctly.
Arm64EC has two different ways to refer to dllimport'ed functions in an
object file. One is using the usual __imp_ prefix, the other is using an
Arm64EC-specific prefix __imp_aux_. As far as I can tell, if a function
is in an x64 DLL, __imp_aux_ refers to the actual x64 address, while
__imp_ points to some linker-generated code that calls the exit thunk.
So __imp_aux_ is used to refer to the address in non-call contexts,
while __imp_ is used for calls to avoid the indirect call checker.

There's one twist to this, though: if an object refers to a symbol using
the __imp_aux_ prefix, the object file's symbol table must also contain
the symbol with the usual __imp_ prefix. The symbol doesn't actually
have to be used anywhere, it just has to exist; otherwise, the linker's
symbol lookup in x64 import libraries doesn't work correctly. Currently,
this is handled by emitting a .globl __imp_foo directive; we could try
to design some better way to handle this.

One minor quirk I haven't figured out: apparently, in Arm64EC mode, MSVC
prefers to use a linker-synthesized stub to call dllimport'ed functions,
instead of branching directly. The linker stub appears to do the same
thing that inline code would do, so not sure if it's just a code-size
optimization, or if the synthesized stub can actually do something other
than just load from the import table in some circumstances.

Differential Revision: https://reviews.llvm.org/D136202
2022-10-20 15:08:56 -07:00
Sander de Smalen
137459aff6 [AArch64][SME] Disable (SLP|Loop)Vectorizer when function may be executed in streaming mode.
When the SME attributes tell that a function is or may be executed in Streaming
SVE mode, we currently need to be conservative and disable _any_ vectorization
(fixed or scalable) because the code-generator does not yet support generating
streaming-compatible code.

Scalable auto-vec will be gradually enabled in the future when we have
confidence that the loop-vectorizer won't use any SVE or NEON instructions
that are illegal in Streaming SVE mode.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D135950
2022-10-19 16:42:20 +00:00
Hassnaa Hamdi
2c72d90ecc [AArch64-SVE]: Force generating code compatible to streaming mode.
Add a compile-time flag for enabling streaming mode.
When streaming mode is enabled, lower basic loads and stores of fixed-width vectors;
to generate code that is compatible to streaming mode.

Differential Revision: https://reviews.llvm.org/D133433
2022-10-14 17:46:56 +00:00
David Sherwood
fbb119412f [AArch64] Add Neoverse V2 CPU support
Adds support for the Neoverse V2 CPU to the AArch64 backend.

Differential Revision: https://reviews.llvm.org/D134352
2022-09-27 07:56:08 +00:00
Tim Northover
677da09d02 AArch64: add support for newer Apple CPUs
They're roughly ARMv8.6. This works in the .td file, but in
AArch64TargetParser.def, marking them v8.6 brings in support for the SM4
cryptographic hash and we don't actually have that. So TargetParser side
they're marked as v8.5, with the extra features (BF16 and I8MM added manually).

Finally, A16 supports the HCX extension in addition to v8.6. This has no
TargetParser implications.
2022-09-22 11:58:51 +01:00