llvm-project

Author	SHA1	Message	Date
Sjoerd Meijer	9bccf61f5f	[AArch64][LV] Set MaxInterleaving to 4 for Neoverse V2 and V3 (#100385 ) Set the maximum interleaving factor to 4, aligning with the number of available SIMD pipelines. This increases the number of vector instructions in the vectorised loop body, enhancing performance during its execution. However, for very low iteration counts, the vectorised body might not execute at all, leaving only the epilogue loop to run. This issue affects e.g. cam4_r from SPEC FP, which experienced a performance regression. To address this, the patch reduces the minimum epilogue vectorisation factor from 16 to 8, enabling the epilogue to be vectorised and largely mitigating the regression.	2024-11-20 09:33:39 +00:00
Anatoly Trosinenko	44076c9822	[AArch64][PAC] Move emission of LR checks in tail calls to AsmPrinter (#110705 ) Move the emission of the checks performed on the authenticated LR value during tail calls to AArch64AsmPrinter class, so that different checker sequences can be reused by pseudo instructions expanded there. This adds one more option to AuthCheckMethod enumeration, the generic XPAC variant which is not restricted to checking the LR register.	2024-11-12 18:27:19 +03:00
David Green	1d6d073fbb	[AArch64] Remove FeatureUseScalarIncVL FeatureUseScalarIncVL is a tuning feature, used to control whether addvl or add+cnt is used. It was previously added as a dependency for FeatureSVE2, an architecture feature but this can be seen as a layering violation. The main disadvantage is that -use-scalar-inc-vl cannot be used without disabling sve2 and all dependant features. This patch now replaces that with an option that if unset defaults to hasSVE \|\| hasSME, but is otherwise overriden by the option. The hope is that no cpus will rely on the tuning feature (or we can readdit if needed.	2024-11-10 14:51:55 +00:00
Sander de Smalen	5192cb772a	[AArch64] Add hidden option to enable subreg liveness tracking. Subreg liveness tracking is disabled by default for now until all issues are ironed out. This option allows the feature to be used in tests.	2024-10-30 17:09:56 +00:00
Benjamin Maxwell	ddd463be7e	[AArch64] Add getStreamingHazardSize() to AArch64Subtarget (#113679 ) This is defined by the `-aarch64-streaming-hazard-size` option or its alias `-aarch64-stack-hazard-size` (the original name). It has been renamed to be more general as this option will (for the time being) be used to detect if the current target has streaming mode memory hazards. --------- Co-authored-by: Hari Limaye <hari.limaye@arm.com>	2024-10-28 13:01:22 +00:00
Madhur Amilkanthwar	b73771cf0f	[AArch64] Increase scatter overhead on Neoverse-V2 (#101296 ) This patch increases scatter overhead on Neoverse-V2 to 13. This benefits s128 kernel from TSVC_2 test suite. SPEC 17, RAJAPerf, and Sptter are unaffected by this patch. This patch boosts s128 kernel's performance from TSVC test suite by about 40% as this enables vectorization. Also, handle minor code refactoring for gather related part.	2024-08-14 10:12:40 +05:30
Daniil Kovalev	56fd2472d8	[PAC] Sign LR with B key for non-leaf functions with ptrauth-returns attr (#100552 ) For pauthtest ABI, there is a bunch of ptrauth-* options, including ptrauth-returns. Use "ptrauth-returns" function attribute to indicate need for LR signing with B key for non-leaf function to avoid using "sign-return-address" and "sign-return-address-key" which were originally designed for pac-ret. Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org> Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>	2024-07-25 22:21:03 +03:00
Ahmed Bougacha	b8721fa0af	[AArch64][PAC] Sign block addresses used in indirectbr. (#97647 ) Enabled in clang using: -fptrauth-indirect-gotos and at the IR level using function attribute: "ptrauth-indirect-gotos" Signing uses IA and a per-function integer discriminator. The discriminator isn't ABI-visible, and is currently: ptrauth_string_discriminator("<function_name> blockaddress") A sufficiently sophisticated frontend could benefit from per-indirectbr discrimination, which would need additional machinery, such as allowing "ptrauth" bundles on indirectbr. For our purposes, the simple scheme above is sufficient. This approach doesn't support subtracting label addresses and using the result as offsets, because each label address is signed. Pointer arithmetic on signed pointers corrupts the signature bits, and because label address expressions aren't typed beyond void*, we can't do anything reliably intelligent on the arithmetic exprs. Not signing addresses when used to form offsets would allow easily hijacking control flow by overwriting the offset. This diagnoses the basic cases (`&&lbl2 - &&lbl1`) in the frontend, while we evaluate either alternative implementations (e.g., lowering blockaddress to a bb number, and indirectbr to a checked jump-table), or better diagnostics (both at the frontend level and on unencodable IR constants).	2024-07-22 21:24:39 -07:00
Jon Roelofs	2b33591386	[llvm][AArch64] Support -mcpu=apple-m4 (#95478 )	2024-06-14 17:24:45 -07:00
Jonathan Thackray	e80c59556d	[AArch64] Add support for Cortex-A725 and Cortex-X925 (#95214 ) Cortex-A725 and Cortex-X925 are Armv9.2 AArch64 CPUs. Technical Reference Manual for Cortex-A725: https://developer.arm.com/documentation/107652/latest Technical Reference Manual for Cortex-X925: https://developer.arm.com/documentation/102807/latest	2024-06-13 00:00:57 +01:00
Wei Zhao	6b9753a0ec	[AArch64] Add support for Qualcomm Oryon processor (#91022 ) Oryon is an ARM V8 AArch64 CPU from Qualcomm. --------- Co-authored-by: Wei Zhao <wezhao@qti.qualcomm.com>	2024-06-06 07:27:50 -07:00
Sander de Smalen	1015f51dd9	[AArch64] NFC: Rename -force-streaming-compatible-sve to -force-streaming-compatible (#92774 ) The behaviour of the flag should be equivalent to __arm_streaming_compatible. At the moment, the name suggests that '-force-streaming-compatible-sve' on its own (i.e. without specifying `+sve`) enables the compiler to use the streaming-compatible subset of SVE instructions, but the semantics merely are that the function can be called with either PSTATE.SM=0 or PSTATE.SM=1.	2024-05-22 07:58:54 +01:00
Jonathan Thackray	e50a857fb1	[AArch64] Add support for Cortex-R82AE and improve Cortex-R82 (#90440 )	2024-04-30 14:15:01 +01:00
Jonathan Thackray	a670cdadca	[AArch64] Add support for Neoverse-N3, Neoverse-V3 and Neoverse-V3AE (#90143 ) Neoverse-N3, Neoverse-V3 and Neoverse-V3AE are Armv9.2 AArch64 CPUs. Technical Reference Manual for Neoverse-N3: https://developer.arm.com/documentation/107997/latest/ Technical Reference Manual for Neoverse-V3: https://developer.arm.com/documentation/107734/latest/ Technical Reference Manual for Neoverse-V3AE: https://developer.arm.com/documentation/101595/latest/	2024-04-26 13:04:35 +01:00
Tomas Matheson	71c5964f5c	[ARM][AArch64] autogenerate header file for TargetParser from Target tablegen files (#88378 ) Introduce a mechanism to share data between the ARM and AArch64 backends and TargetParser, to reduce duplication of code. This is similar to the current RISC-V implementation. The target tablegen file (in this case `ARM.td` or `AArch64.td`) is processed during building of `TargetParser` to generate the following files in the build tree: - `build/include/llvm/TargetParser/ARMTargetParserDef.inc` - `build/include/llvm/TargetParser/AArch64TargetParserDef.inc` For now, the use of these generated files is limited to files _outside_ of `TargetParser`. The main reason for this is that the modifications to `TargetParser` will require additional data added to the tablegen files, which I want to split into separate PRs.	2024-04-24 09:18:36 +01:00
David Green	b24af43fdf	[AArch64] Improve scheduling latency into Bundles (#86310 ) By default the scheduling info of instructions into a BUNDLE are given a latency of 0 as they operate on the implicit register of the bundle. This modifies that for AArch64 so that the latency is adjusted to use the latency from the instruction in the bundle instead. This essentially assumes that the bundled instructions are executed in a single cycle, which for AArch64 is probably OK considering they are mostly used for MOVPFX bundles, where this can help create slightly better scheduling especially for in-order cores.	2024-04-12 10:57:01 +01:00
Arthur Eubanks	94c988bcfd	[NFC] Remove unused parameter from shouldAssumeDSOLocal()	2024-03-11 19:48:17 +00:00
Jonathan Thackray	8160139136	Add support for Arm Cortex A78AE CPU (#84485 ) Add support for Arm Cortex A78AE CPU Technical Reference Manual for Arm Cortex A78AE: https://developer.arm.com/documentation/101779/0003 Fixes #84450	2024-03-08 16:11:36 +00:00
Fangrui Song	201572e34b	[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel Clang sets the nonlazybind attribute for certain ObjC features. The AArch64 SelectionDAG implementation for non-intrinsic calls (46e36f0953aabb5e5cd00ed8d296d60f9f71b424) is behind a cl option. GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also sets the nonlazybind attribute. For SelectionDAG, make the cl option not affect ELF so that non-intrinsic calls to a dso_preemptable function use GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls. For FastISel, change `fastLowerCall` to bail out when a call is due to -fno-plt. For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall and intrinsic calls in AArch64CallLowering::lowerCall (where the target-independent CallLowering::lowerCall is not called). The GlobalISel test in `call-rv-marker.ll` is therefore updated. Note: the current -fno-plt -fpic implementation does not use GOT for a preemptable function. Link: #78275 Pull Request: https://github.com/llvm/llvm-project/pull/78890	2024-03-05 13:55:29 -08:00
Philipp Tomsich	fbba818a78	[AArch64] Add the Ampere1B core (#81297 ) The Ampere1B is Ampere's third-generation core implementing a superscalar, out-of-order microarchitecture with nested virtualization, speculative side-channel mitigation and architectural support for defense against ROP/JOP style software attacks. Ampere1B is an ARMv8.7+ implementation, adding support for the FEAT WFxT, FEAT CSSC, FEAT PAN3 and FEAT AFP extensions. It also includes all features of the second-generation Ampere1A, such as the Memory Tagging Extension and SM3/SM4 cryptography instructions.	2024-02-09 15:22:09 -08:00
Yuta Mukai	70eab122bc	[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589 ) Add AArch64 implementations for the interfaces of MachinePipeliner pass. The pass is disabled by default for AArch64. It is enabled by specifying --aarch64-enable-pipeliner. 5 tests in llvm-test-suites show performance improvement by more than 5% on a Neoverse V1 processor. \| test \| improvement \| \| ---------------------------------------------------------------- \| -----------:\| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test \| 16% \| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test \| 16% \| \| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test \| 14% \| \| SingleSource/Benchmarks/Misc/flops-5.test \| 13% \| \| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test \| 6% \| (base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm -pipeliner-enable-copytophi=0) On the other hand, there are cases of significant performance degradation. Algorithm improvements and adding the option/pragma will be needed in the future.	2024-02-02 10:33:44 +09:00
Eli Friedman	a6065f0fa5	Arm64EC entry/exit thunks, consolidated. (#79067 ) This combines the previously posted patches with some additional work I've done to more closely match MSVC output. Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs. The other changes are supporting changes, to handle the new constructs generated by that pass. There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that. There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX. I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently. Thanks to @dpaoliello for extensive testing and suggestions. (Originally posted as https://reviews.llvm.org/D157547 .)	2024-01-22 21:28:07 -08:00
Tim Northover	10d6d5f224	AArch64: add support for currently released Apple CPUs. (#73499 ) These are still v8.6a and have no real changes as far as LLVM cares, so it's mostly just a copy/paste job.	2023-11-29 09:51:42 +00:00
Matthew Devereau	cdf6693f07	[AArch64][SME] Add support for sme-fa64 (#70809 )	2023-11-20 08:37:52 +00:00
Jonathan Thackray	066c4524bc	[AArch64] Add support for Cortex-A520, Cortex-A720 and Cortex-X4 CPUs (#72395 ) Cortex-A520, Cortex-A720 and Cortex-X4 are Armv9.2 AArch64 CPUs. Technical Reference Manual for Cortex-A520: https://developer.arm.com/documentation/102517/latest/ Technical Reference Manual for Cortex-A720: https://developer.arm.com/documentation/102530/latest/ Technical Reference Manual for Cortex-X4: https://developer.arm.com/documentation/102484/latest/ Patch co-authored by: Sivan Shani <sivan.shani@arm.com>	2023-11-16 22:08:58 +00:00
David Sherwood	bdc0afc871	[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets (#71166 ) There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this. The most significant SPEC2017 score changes on neoverse-v1 were: 500.perlbench_r: +1.6% 520.omnetpp_r: +0.6% and the rest saw changes < 0.5%. I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.	2023-11-14 13:00:28 +00:00
Anatoly Trosinenko	1d2b558265	[AArch64][PAC] Check authenticated LR value during tail call When performing a tail call, check the value of LR register after authentication to prevent the callee from signing and spilling an untrusted value. This commit implements a few variants of check, more can be added later. If it is safe to assume that executable pages are always readable, LR can be checked just by dereferencing the LR value via LDR. As an alternative, LR can be checked as follows: ; lowered AUT* instruction ; <some variant of check that LR contains a valid address> b.cond break_block ret_block: ; lowered TCRETURN break_block: brk 0xc471 As the existing methods either break the compatibility with execute-only memory mappings or can degrade the performance, they are disabled by default and can be explicitly enabled with a command line option. Individual subtargets can opt-in to use one of the available methods by updating AArch64FrameLowering::getAuthenticatedLRCheckMethod(). Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D156716	2023-10-11 17:38:17 +03:00
Sander de Smalen	7e815dd76d	[AArch64][SME] Create new interface for isSVEAvailable. When a function is compiled to be in Streaming(-compatible) mode, the full set of SVE instructions may not be available. This patch adds an interface to query that and changes the codegen for FADDA (not legal in Streaming-SVE mode) to instead be expanded for fixed-length vectors, or otherwise not to code-generate for scalable vectors. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D156109	2023-09-01 12:00:36 +00:00
Sander de Smalen	08fd44b300	[AArch64] Force streaming-compatible codegen when attributes are set. Before this patch, the only way to generate streaming-compatible code was to use the `-force-streaming-compatible-sve` flag, but the compiler should also avoid the use of instructions invalid in streaming mode when a function has the aarch64_pstate_sm_enabled/compatible attribute. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D155428	2023-07-18 10:26:00 +00:00
Sander de Smalen	ec6af93d02	[AArch64] NFC: Replace 'forceStreamingCompatibleSVE' with 'isNeonAvailable'. The AArch64Subtarget interface 'isNeonAvailable' is more appropriate going forward, as we may also want to generate 'streaming SVE' code (not just 'streaming-compatible SVE' code), but here we must still make sure not to use NEON instructions which are invalid in streaming SVE mode.	2023-07-17 08:24:10 +00:00
David Sherwood	c7dbe326df	[AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1 This patch enables the tail-folding of simple loops by default when targeting the neoverse-v1 CPU. Simple loops exclude those with recurrences or reductions or loops that are reversed. New tests have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll In terms of SPEC2017 only one benchmark is really affected when building with "-Ofast -mcpu=neoverse-v1 -flto", which is (+ faster, - slower): 525.x264: +7.0% Differential Revision: https://reviews.llvm.org/D130618	2023-05-18 10:35:57 +00:00
Archibald Elliott	4679d7a26a	[NFC][ARM][AArch64] Cleanup TargetParser includes llvm/TargetParser/TargetParser.h now only includes AMDGPU-specific functionality, the ARM- and AArch64-specific functionality is in other headers.	2023-03-03 16:24:55 +00:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Archibald Elliott	8c712296fb	[NFC][TargetParser] Remove llvm/Support/AArch64TargetParser.h Removes the forwarding header `llvm/Support/AArch64TargetParser.h`. I am proposing to do this for all the forwarding headers left after rGf09cf34d00625e57dea5317a3ac0412c07292148 - for each header: - Update all relevant in-tree includes - Remove the forwarding Header Differential Revision: https://reviews.llvm.org/D140999	2023-02-03 17:34:01 +00:00
Guillaume Chatelet	d6e0ff6074	[NFC] Migrate aarch64 alignment to Align	2023-02-03 16:29:11 +00:00
Mitch Phillips	486729ce06	Re-land: [MTE] Add AArch64GlobalsTagging Pass Adds an IR pass for -fsanitize=memtag-globals. This pass goes over the tag-capable global variables, and replaces them with a tagged global variable of the same contents. This new global variable will have its size and alignment adjusted if neccesary so that they're both a multiple of the tag granule size (16 bytes). Global merge must also be suppressed for tagged globals, as each global variable must have a unique tag. This can possibly be relaxed in future; globals that are identical in size, alignment, and content can possibly be merged. The major problem comes from tail- or head-merging, which if left unchecked, could have partially-overlapping global variables with different memory tags, leading to crashes at runtime. Reviewed By: fmayer, eugenis Differential Revision: https://reviews.llvm.org/D133392	2023-01-31 13:03:37 -08:00
Mitch Phillips	15e33c699c	Revert "[MTE] Add AArch64GlobalsTagging Pass" This reverts commit 4edfcff71e150770675a19576f698c7bbe788ee2. Broke the non-aarch64-containing target builds. https://reviews.llvm.org/D133392 has more context.	2023-01-31 12:25:58 -08:00
Mitch Phillips	4edfcff71e	[MTE] Add AArch64GlobalsTagging Pass Adds an IR pass for -fsanitize=memtag-globals. This pass goes over the tag-capable global variables, and replaces them with a tagged global variable of the same contents. This new global variable will have its size and alignment adjusted if neccesary so that they're both a multiple of the tag granule size (16 bytes). Global merge must also be suppressed for tagged globals, as each global variable must have a unique tag. This can possibly be relaxed in future; globals that are identical in size, alignment, and content can possibly be merged. The major problem comes from tail- or head-merging, which if left unchecked, could have partially-overlapping global variables with different memory tags, leading to crashes at runtime. Reviewed By: fmayer, eugenis Differential Revision: https://reviews.llvm.org/D133392	2023-01-31 09:24:18 -08:00
Philipp Tomsich	fb0af89193	[AArch64] Add the Ampere1A core The Ampere1A core improves on the Ampere1 with key differences being: * memory tagging is supported * SM3/SM4 are supported * adds a new fusion pair for (A+B+1 and A-B-1) (added in a later commit) Depends on D142395 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D142396	2023-01-24 22:36:39 +01:00
Florian Hahn	830d0bc56b	[AArch64] Set MaxInterleaveFactor for Apple A14, A15, A16. Those CPUs can benefit from additional interleaving. Reviewed By: jroelofs Differential Revision: https://reviews.llvm.org/D141499	2023-01-11 18:52:51 +00:00
Benjamin Kramer	07e7168048	[AArch64] Stringref'ize AArch64Subtarget constructor. NFCI	2022-12-30 18:02:53 +01:00
Benjamin Maxwell	5eec8dfc2b	[AArch64] Add hasSVEorSME() helper and fix some incorrect checks This adds a little hasSVEorSME() helper, and as a NFC updates existing code to use it. The assertions get[Min\|Max]SVEVectorSizeInBits() are also now corrected to use hasSVEorSME() rather than just hasSVE(). Differential Revision: https://reviews.llvm.org/D138575	2022-11-24 17:54:37 +00:00
Guozhi Wei	835da13ae0	[AArch64] Correctly recognize -reserve-regs-for-regalloc=X30,X29 In AArch64 backend X30 is named as LR, X29 is named as FP. So the code in AArch64Subtarget::AArch64Subtarget can't recognize these 2 registers. for (unsigned i = 0; i < 31; ++i) { if (ReservedRegNames.count(TRI->getName(AArch64::X0 + i))) ReserveXRegisterForRA.set(i); } This patch add code to explicitly handle these 2 registers. Differential Revision: https://reviews.llvm.org/D137810	2022-11-22 17:18:29 +00:00
Victor Campos	9d1ff787e5	[AArch64] Add support for the Cortex-X3 CPU Cortex-X3 is an Armv9-A AArch64 CPU. This patch introduces support for Cortex-X3. Technical Reference Manual: https://developer.arm.com/documentation/101593/latest Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D136589	2022-11-09 11:33:48 +00:00
Simi Pallipurath	fa8aeab606	[AArch64] Add support for the Cortex-A715 CPU Cortex-A715 is an Armv9-A AArch64 CPU. This patch introduces support for Cortex-A715. Technical Reference Manual: https://developer.arm.com/documentation/101590/latest. Reviewed By: vhscampos Differential Revision: https://reviews.llvm.org/D136957	2022-11-03 09:28:46 +00:00
Eli Friedman	a6ac968360	[Arm64EC] Refer to dllimport'ed functions correctly. Arm64EC has two different ways to refer to dllimport'ed functions in an object file. One is using the usual __imp_ prefix, the other is using an Arm64EC-specific prefix __imp_aux_. As far as I can tell, if a function is in an x64 DLL, __imp_aux_ refers to the actual x64 address, while __imp_ points to some linker-generated code that calls the exit thunk. So __imp_aux_ is used to refer to the address in non-call contexts, while __imp_ is used for calls to avoid the indirect call checker. There's one twist to this, though: if an object refers to a symbol using the __imp_aux_ prefix, the object file's symbol table must also contain the symbol with the usual __imp_ prefix. The symbol doesn't actually have to be used anywhere, it just has to exist; otherwise, the linker's symbol lookup in x64 import libraries doesn't work correctly. Currently, this is handled by emitting a .globl __imp_foo directive; we could try to design some better way to handle this. One minor quirk I haven't figured out: apparently, in Arm64EC mode, MSVC prefers to use a linker-synthesized stub to call dllimport'ed functions, instead of branching directly. The linker stub appears to do the same thing that inline code would do, so not sure if it's just a code-size optimization, or if the synthesized stub can actually do something other than just load from the import table in some circumstances. Differential Revision: https://reviews.llvm.org/D136202	2022-10-20 15:08:56 -07:00
Sander de Smalen	137459aff6	[AArch64][SME] Disable (SLP\|Loop)Vectorizer when function may be executed in streaming mode. When the SME attributes tell that a function is or may be executed in Streaming SVE mode, we currently need to be conservative and disable _any_ vectorization (fixed or scalable) because the code-generator does not yet support generating streaming-compatible code. Scalable auto-vec will be gradually enabled in the future when we have confidence that the loop-vectorizer won't use any SVE or NEON instructions that are illegal in Streaming SVE mode. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135950	2022-10-19 16:42:20 +00:00
Hassnaa Hamdi	2c72d90ecc	[AArch64-SVE]: Force generating code compatible to streaming mode. Add a compile-time flag for enabling streaming mode. When streaming mode is enabled, lower basic loads and stores of fixed-width vectors; to generate code that is compatible to streaming mode. Differential Revision: https://reviews.llvm.org/D133433	2022-10-14 17:46:56 +00:00
David Sherwood	fbb119412f	[AArch64] Add Neoverse V2 CPU support Adds support for the Neoverse V2 CPU to the AArch64 backend. Differential Revision: https://reviews.llvm.org/D134352	2022-09-27 07:56:08 +00:00
Tim Northover	677da09d02	AArch64: add support for newer Apple CPUs They're roughly ARMv8.6. This works in the .td file, but in AArch64TargetParser.def, marking them v8.6 brings in support for the SM4 cryptographic hash and we don't actually have that. So TargetParser side they're marked as v8.5, with the extra features (BF16 and I8MM added manually). Finally, A16 supports the HCX extension in addition to v8.6. This has no TargetParser implications.	2022-09-22 11:58:51 +01:00

1 2 3 4 5

229 Commits