llvm-project

Author	SHA1	Message	Date
Harrison Hao	8c14d3f44f	[MISched] Use SchedRegion in overrideSchedPolicy and overridePostRASchedPolicy (#149297 ) This patch updates `overrideSchedPolicy` and `overridePostRASchedPolicy` to take a `SchedRegion` parameter instead of just `NumRegionInstrs`. This provides access to both the instruction range and the parent `MachineBasicBlock`, which enables looking up function-level attributes. With this change, targets can select post-RA scheduling direction per function using a function attribute. For example: ```cpp void overridePostRASchedPolicy(MachineSchedPolicy &Policy, const SchedRegion &Region) const { const Function &F = Region.RegionBegin->getMF()->getFunction(); Attribute Attr = F.getFnAttribute("amdgpu-post-ra-direction"); ... }	2025-07-22 15:55:12 +08:00
Sjoerd Meijer	d26106dbf0	[AArch64] Set the cache line size to 64 for the V2 and V3. (#148213 ) This sets the cache line size to 64 for the Neoverse V2 and V3. I've tested this with loop-interchange: it doesn't result in extra compile-times, but it does enable a lot more interchange.	2025-07-15 14:59:18 +01:00
Peter Collingbourne	2197671109	AArch64: Relax x16/x17 constraint on AUT in certain cases. On most operating systems, the x16 and x17 registers are not special, so there is no benefit, and only a code size cost, to constraining AUT to only using them. Therefore, adjust the backend to only use the AUT pseudo (renamed AUTx16x17 for clarity) on Darwin platforms. All other platforms use an unconstrained variant of the pseudo, AUTxMxN, for selection. Reviewers: ahmedbougacha, kovdan01, atrosinenko Reviewed By: atrosinenko Pull Request: https://github.com/llvm/llvm-project/pull/132857	2025-07-09 13:46:44 -07:00
Guy David	bb1f5c3189	[AArch64] Lower jump table cases threshold to 10 (#143632 ) Previous stabs at this setting (https://github.com/llvm/llvm-project/pull/71166) hypertuned it for SPEC2017, but Clang's own compilation can benefit from a slightly lower threshold, yielding a 0.3% improvement in compile time, while still not regressing SPEC. Most notable beneficiaries of this change are: - `llvm::Instruction::getNumSuccessors` (11 cases) - `llvm::Instruction::getSuccessor` (11 cases) Test Suite with a bootstrapped build: ``` Tests: 4316 Metric: compile_time Program compile_time lhs rhs diff SingleSour...ce/UnitTests/SignlessTypes/div 0.02 0.02 3.0% SingleSour.../UnitTests/SignlessTypes/cast2 0.02 0.02 2.8% SingleSource/Benchmarks/Misc/flops-4 0.02 0.02 1.9% SingleSour...ebra/solvers/cholesky/cholesky 0.05 0.05 1.8% SingleSour...tTests/2020-01-06-coverage-006 0.02 0.02 1.7% SingleSour...ce/Benchmarks/Stanford/FloatMM 0.03 0.03 1.7% SingleSour...9-04-16-BitfieldInitialization 0.02 0.02 1.7% SingleSour...nitTests/2003-07-08-BitOpsTest 0.02 0.02 1.7% MultiSourc...marks/Prolangs-C++/vcirc/vcirc 0.02 0.02 1.6% MultiSourc...Prolangs-C/fixoutput/fixoutput 0.05 0.05 1.5% SingleSour...h/stencils/jacobi-1d/jacobi-1d 0.04 0.04 1.4% MultiSourc...rks/Prolangs-C++/office/office 0.28 0.28 1.4% SingleSour...arks/Adobe-C++/functionobjects 0.39 0.40 1.3% SingleSour...Tests/2003-10-29-ScalarReplBug 0.02 0.02 1.2% SingleSour...arks/Adobe-C++/stepanov_vector 0.41 0.42 1.2% Geomean difference -0.3% compile_time l/r lhs rhs diff count 4316.000000 4316.000000 469.000000 mean 0.057747 0.057595 -0.003034 std 0.544528 0.543139 0.007625 min 0.000000 0.000000 -0.035294 25% 0.000000 0.000000 -0.007006 50% 0.000000 0.000000 -0.003257 75% 0.000000 0.000000 0.000000 max 18.295300 18.252500 0.030151 ```	2025-06-19 01:53:36 +03:00
Ties Stuij	269f5fe91e	[AARCH64] Add support for Cortex-A320 (#139055 ) This patch adds initial support for the recently announced Armv9 Cortex-A320 processor. For more information, including the Technical Reference Manual, see: https://developer.arm.com/Processors/Cortex-A320 --------- Co-authored-by: Oliver Stannard <oliver.stannard@arm.com>	2025-05-09 16:24:48 +01:00
Ricardo Jesus	847e46ca01	[AArch64] Add initial support for -mcpu=olympus. (#132368 ) This patch adds support for the NVIDIA Olympus core. This does not add any special tuning decisions, and those may come later.	2025-03-25 08:09:04 +00:00
Kazu Hirata	41b76119ec	[llvm] Use range constructors for *Set (NFC) (#132636 )	2025-03-23 15:50:34 -07:00
Kazu Hirata	71935281e0	[Target] Use *Set::insert_range (NFC) (#132140 ) DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently gained C++23-style insert_range. This patch replaces: Dest.insert(Src.begin(), Src.end()); with: Dest.insert_range(Src); This patch does not touch custom begin like succ_begin for now.	2025-03-20 09:09:30 -07:00
David Green	6424abcd6c	[AArch64] Enable AvoidLDAPUR for cpu=generic between armv8.4 and armv9.3. (#125261 ) As added in #124274, CPUs in this range can suffer from performance issues with ldapur. As the gain from ldar->ldapr is expected to be greater than the minor gain from ldapr->ldapur, this opts to avoid the instruction under the default -mcpu=generic when the -march is less that armv8.8 / armv9.3. I renamed AArch64Subtarget::Others to AArch64Subtarget::Generic to be clearer what it means.	2025-02-07 10:16:57 +00:00
Benjamin Maxwell	82c6b8f7bb	[AArch64][SME] Spill p-regs as z-regs when streaming hazards are possible (#123752 ) This patch adds a new option `-aarch64-enable-zpr-predicate-spills` (which is disabled by default), this option replaces predicate spills with vector spills in streaming[-compatible] functions. For example: ``` str p8, [sp, #7, mul vl] // 2-byte Folded Spill // ... ldr p8, [sp, #7, mul vl] // 2-byte Folded Reload ``` Becomes: ``` mov z0.b, p8/z, #1 str z0, [sp] // 16-byte Folded Spill // ... ldr z0, [sp] // 16-byte Folded Reload ptrue p4.b cmpne p8.b, p4/z, z0.b, #0 ``` This is done to avoid streaming memory hazards between FPR/vector and predicate spills, which currently occupy the same stack area even when the `-aarch64-stack-hazard-size` flag is set. This is implemented with two new pseudos SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO. The expansion of these pseudos handles scavenging the required registers (z0 in the above example) and, in the worst case spilling a register to an emergency stack slot in the expansion. The condition flags are also preserved around the `cmpne` in case they are live at the expansion point.	2025-02-03 12:04:46 +00:00
Benjamin Maxwell	6e1ea7e5a7	[AArch64] Set the default streaming hazard size to 1024 for +sme,+sve (#123753 ) The default for all other feature combinations remains at zero (i.e. no streaming hazards). This value may be adjusted in the future (e.g. based on the processor family), for now, it is set conservatively.	2025-01-22 09:08:17 +00:00
Sander de Smalen	61510b51c3	Revert "[AArch64] Enable subreg liveness tracking by default." This reverts commit 9c319d5bb40785c969d2af76535ca62448dfafa7. Some issues were discovered with the bootstrap builds, which seem like they were caused by this commit. I'm reverting to investigate.	2024-12-12 17:22:15 +00:00
Sander de Smalen	9c319d5bb4	[AArch64] Enable subreg liveness tracking by default. Internal testing didn't flag up any functional- or performance regressions.	2024-12-12 16:05:49 +00:00
Kinoshita Kotaro	a1197a2ca8	[AArch64] Add initial support for FUJITSU-MONAKA (#118432 ) This patch adds initial support for FUJITSU-MONAKA CPU (-mcpu=fujitsu-monaka). The scheduling model will be corrected in the future.	2024-12-09 09:56:02 +09:00
Sjoerd Meijer	9bccf61f5f	[AArch64][LV] Set MaxInterleaving to 4 for Neoverse V2 and V3 (#100385 ) Set the maximum interleaving factor to 4, aligning with the number of available SIMD pipelines. This increases the number of vector instructions in the vectorised loop body, enhancing performance during its execution. However, for very low iteration counts, the vectorised body might not execute at all, leaving only the epilogue loop to run. This issue affects e.g. cam4_r from SPEC FP, which experienced a performance regression. To address this, the patch reduces the minimum epilogue vectorisation factor from 16 to 8, enabling the epilogue to be vectorised and largely mitigating the regression.	2024-11-20 09:33:39 +00:00
Anatoly Trosinenko	44076c9822	[AArch64][PAC] Move emission of LR checks in tail calls to AsmPrinter (#110705 ) Move the emission of the checks performed on the authenticated LR value during tail calls to AArch64AsmPrinter class, so that different checker sequences can be reused by pseudo instructions expanded there. This adds one more option to AuthCheckMethod enumeration, the generic XPAC variant which is not restricted to checking the LR register.	2024-11-12 18:27:19 +03:00
David Green	1d6d073fbb	[AArch64] Remove FeatureUseScalarIncVL FeatureUseScalarIncVL is a tuning feature, used to control whether addvl or add+cnt is used. It was previously added as a dependency for FeatureSVE2, an architecture feature but this can be seen as a layering violation. The main disadvantage is that -use-scalar-inc-vl cannot be used without disabling sve2 and all dependant features. This patch now replaces that with an option that if unset defaults to hasSVE \|\| hasSME, but is otherwise overriden by the option. The hope is that no cpus will rely on the tuning feature (or we can readdit if needed.	2024-11-10 14:51:55 +00:00
Sander de Smalen	5192cb772a	[AArch64] Add hidden option to enable subreg liveness tracking. Subreg liveness tracking is disabled by default for now until all issues are ironed out. This option allows the feature to be used in tests.	2024-10-30 17:09:56 +00:00
Benjamin Maxwell	ddd463be7e	[AArch64] Add getStreamingHazardSize() to AArch64Subtarget (#113679 ) This is defined by the `-aarch64-streaming-hazard-size` option or its alias `-aarch64-stack-hazard-size` (the original name). It has been renamed to be more general as this option will (for the time being) be used to detect if the current target has streaming mode memory hazards. --------- Co-authored-by: Hari Limaye <hari.limaye@arm.com>	2024-10-28 13:01:22 +00:00
Madhur Amilkanthwar	b73771cf0f	[AArch64] Increase scatter overhead on Neoverse-V2 (#101296 ) This patch increases scatter overhead on Neoverse-V2 to 13. This benefits s128 kernel from TSVC_2 test suite. SPEC 17, RAJAPerf, and Sptter are unaffected by this patch. This patch boosts s128 kernel's performance from TSVC test suite by about 40% as this enables vectorization. Also, handle minor code refactoring for gather related part.	2024-08-14 10:12:40 +05:30
Daniil Kovalev	56fd2472d8	[PAC] Sign LR with B key for non-leaf functions with ptrauth-returns attr (#100552 ) For pauthtest ABI, there is a bunch of ptrauth-* options, including ptrauth-returns. Use "ptrauth-returns" function attribute to indicate need for LR signing with B key for non-leaf function to avoid using "sign-return-address" and "sign-return-address-key" which were originally designed for pac-ret. Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org> Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>	2024-07-25 22:21:03 +03:00
Ahmed Bougacha	b8721fa0af	[AArch64][PAC] Sign block addresses used in indirectbr. (#97647 ) Enabled in clang using: -fptrauth-indirect-gotos and at the IR level using function attribute: "ptrauth-indirect-gotos" Signing uses IA and a per-function integer discriminator. The discriminator isn't ABI-visible, and is currently: ptrauth_string_discriminator("<function_name> blockaddress") A sufficiently sophisticated frontend could benefit from per-indirectbr discrimination, which would need additional machinery, such as allowing "ptrauth" bundles on indirectbr. For our purposes, the simple scheme above is sufficient. This approach doesn't support subtracting label addresses and using the result as offsets, because each label address is signed. Pointer arithmetic on signed pointers corrupts the signature bits, and because label address expressions aren't typed beyond void*, we can't do anything reliably intelligent on the arithmetic exprs. Not signing addresses when used to form offsets would allow easily hijacking control flow by overwriting the offset. This diagnoses the basic cases (`&&lbl2 - &&lbl1`) in the frontend, while we evaluate either alternative implementations (e.g., lowering blockaddress to a bb number, and indirectbr to a checked jump-table), or better diagnostics (both at the frontend level and on unencodable IR constants).	2024-07-22 21:24:39 -07:00
Jon Roelofs	2b33591386	[llvm][AArch64] Support -mcpu=apple-m4 (#95478 )	2024-06-14 17:24:45 -07:00
Jonathan Thackray	e80c59556d	[AArch64] Add support for Cortex-A725 and Cortex-X925 (#95214 ) Cortex-A725 and Cortex-X925 are Armv9.2 AArch64 CPUs. Technical Reference Manual for Cortex-A725: https://developer.arm.com/documentation/107652/latest Technical Reference Manual for Cortex-X925: https://developer.arm.com/documentation/102807/latest	2024-06-13 00:00:57 +01:00
Wei Zhao	6b9753a0ec	[AArch64] Add support for Qualcomm Oryon processor (#91022 ) Oryon is an ARM V8 AArch64 CPU from Qualcomm. --------- Co-authored-by: Wei Zhao <wezhao@qti.qualcomm.com>	2024-06-06 07:27:50 -07:00
Sander de Smalen	1015f51dd9	[AArch64] NFC: Rename -force-streaming-compatible-sve to -force-streaming-compatible (#92774 ) The behaviour of the flag should be equivalent to __arm_streaming_compatible. At the moment, the name suggests that '-force-streaming-compatible-sve' on its own (i.e. without specifying `+sve`) enables the compiler to use the streaming-compatible subset of SVE instructions, but the semantics merely are that the function can be called with either PSTATE.SM=0 or PSTATE.SM=1.	2024-05-22 07:58:54 +01:00
Jonathan Thackray	e50a857fb1	[AArch64] Add support for Cortex-R82AE and improve Cortex-R82 (#90440 )	2024-04-30 14:15:01 +01:00
Jonathan Thackray	a670cdadca	[AArch64] Add support for Neoverse-N3, Neoverse-V3 and Neoverse-V3AE (#90143 ) Neoverse-N3, Neoverse-V3 and Neoverse-V3AE are Armv9.2 AArch64 CPUs. Technical Reference Manual for Neoverse-N3: https://developer.arm.com/documentation/107997/latest/ Technical Reference Manual for Neoverse-V3: https://developer.arm.com/documentation/107734/latest/ Technical Reference Manual for Neoverse-V3AE: https://developer.arm.com/documentation/101595/latest/	2024-04-26 13:04:35 +01:00
Tomas Matheson	71c5964f5c	[ARM][AArch64] autogenerate header file for TargetParser from Target tablegen files (#88378 ) Introduce a mechanism to share data between the ARM and AArch64 backends and TargetParser, to reduce duplication of code. This is similar to the current RISC-V implementation. The target tablegen file (in this case `ARM.td` or `AArch64.td`) is processed during building of `TargetParser` to generate the following files in the build tree: - `build/include/llvm/TargetParser/ARMTargetParserDef.inc` - `build/include/llvm/TargetParser/AArch64TargetParserDef.inc` For now, the use of these generated files is limited to files _outside_ of `TargetParser`. The main reason for this is that the modifications to `TargetParser` will require additional data added to the tablegen files, which I want to split into separate PRs.	2024-04-24 09:18:36 +01:00
David Green	b24af43fdf	[AArch64] Improve scheduling latency into Bundles (#86310 ) By default the scheduling info of instructions into a BUNDLE are given a latency of 0 as they operate on the implicit register of the bundle. This modifies that for AArch64 so that the latency is adjusted to use the latency from the instruction in the bundle instead. This essentially assumes that the bundled instructions are executed in a single cycle, which for AArch64 is probably OK considering they are mostly used for MOVPFX bundles, where this can help create slightly better scheduling especially for in-order cores.	2024-04-12 10:57:01 +01:00
Arthur Eubanks	94c988bcfd	[NFC] Remove unused parameter from shouldAssumeDSOLocal()	2024-03-11 19:48:17 +00:00
Jonathan Thackray	8160139136	Add support for Arm Cortex A78AE CPU (#84485 ) Add support for Arm Cortex A78AE CPU Technical Reference Manual for Arm Cortex A78AE: https://developer.arm.com/documentation/101779/0003 Fixes #84450	2024-03-08 16:11:36 +00:00
Fangrui Song	201572e34b	[AArch64] Implement -fno-plt for SelectionDAG/GlobalISel Clang sets the nonlazybind attribute for certain ObjC features. The AArch64 SelectionDAG implementation for non-intrinsic calls (46e36f0953aabb5e5cd00ed8d296d60f9f71b424) is behind a cl option. GCC implements -fno-plt for a few ELF targets. In Clang, -fno-plt also sets the nonlazybind attribute. For SelectionDAG, make the cl option not affect ELF so that non-intrinsic calls to a dso_preemptable function use GOT. Adjust AArch64TargetLowering::LowerCall to handle intrinsic calls. For FastISel, change `fastLowerCall` to bail out when a call is due to -fno-plt. For GlobalISel, handle non-intrinsic calls in CallLowering::lowerCall and intrinsic calls in AArch64CallLowering::lowerCall (where the target-independent CallLowering::lowerCall is not called). The GlobalISel test in `call-rv-marker.ll` is therefore updated. Note: the current -fno-plt -fpic implementation does not use GOT for a preemptable function. Link: #78275 Pull Request: https://github.com/llvm/llvm-project/pull/78890	2024-03-05 13:55:29 -08:00
Philipp Tomsich	fbba818a78	[AArch64] Add the Ampere1B core (#81297 ) The Ampere1B is Ampere's third-generation core implementing a superscalar, out-of-order microarchitecture with nested virtualization, speculative side-channel mitigation and architectural support for defense against ROP/JOP style software attacks. Ampere1B is an ARMv8.7+ implementation, adding support for the FEAT WFxT, FEAT CSSC, FEAT PAN3 and FEAT AFP extensions. It also includes all features of the second-generation Ampere1A, such as the Memory Tagging Extension and SM3/SM4 cryptography instructions.	2024-02-09 15:22:09 -08:00
Yuta Mukai	70eab122bc	[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589 ) Add AArch64 implementations for the interfaces of MachinePipeliner pass. The pass is disabled by default for AArch64. It is enabled by specifying --aarch64-enable-pipeliner. 5 tests in llvm-test-suites show performance improvement by more than 5% on a Neoverse V1 processor. \| test \| improvement \| \| ---------------------------------------------------------------- \| -----------:\| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test \| 16% \| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test \| 16% \| \| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test \| 14% \| \| SingleSource/Benchmarks/Misc/flops-5.test \| 13% \| \| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test \| 6% \| (base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm -pipeliner-enable-copytophi=0) On the other hand, there are cases of significant performance degradation. Algorithm improvements and adding the option/pragma will be needed in the future.	2024-02-02 10:33:44 +09:00
Eli Friedman	a6065f0fa5	Arm64EC entry/exit thunks, consolidated. (#79067 ) This combines the previously posted patches with some additional work I've done to more closely match MSVC output. Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs. The other changes are supporting changes, to handle the new constructs generated by that pass. There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that. There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX. I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently. Thanks to @dpaoliello for extensive testing and suggestions. (Originally posted as https://reviews.llvm.org/D157547 .)	2024-01-22 21:28:07 -08:00
Tim Northover	10d6d5f224	AArch64: add support for currently released Apple CPUs. (#73499 ) These are still v8.6a and have no real changes as far as LLVM cares, so it's mostly just a copy/paste job.	2023-11-29 09:51:42 +00:00
Matthew Devereau	cdf6693f07	[AArch64][SME] Add support for sme-fa64 (#70809 )	2023-11-20 08:37:52 +00:00
Jonathan Thackray	066c4524bc	[AArch64] Add support for Cortex-A520, Cortex-A720 and Cortex-X4 CPUs (#72395 ) Cortex-A520, Cortex-A720 and Cortex-X4 are Armv9.2 AArch64 CPUs. Technical Reference Manual for Cortex-A520: https://developer.arm.com/documentation/102517/latest/ Technical Reference Manual for Cortex-A720: https://developer.arm.com/documentation/102530/latest/ Technical Reference Manual for Cortex-X4: https://developer.arm.com/documentation/102484/latest/ Patch co-authored by: Sivan Shani <sivan.shani@arm.com>	2023-11-16 22:08:58 +00:00
David Sherwood	bdc0afc871	[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets (#71166 ) There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this. The most significant SPEC2017 score changes on neoverse-v1 were: 500.perlbench_r: +1.6% 520.omnetpp_r: +0.6% and the rest saw changes < 0.5%. I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.	2023-11-14 13:00:28 +00:00
Anatoly Trosinenko	1d2b558265	[AArch64][PAC] Check authenticated LR value during tail call When performing a tail call, check the value of LR register after authentication to prevent the callee from signing and spilling an untrusted value. This commit implements a few variants of check, more can be added later. If it is safe to assume that executable pages are always readable, LR can be checked just by dereferencing the LR value via LDR. As an alternative, LR can be checked as follows: ; lowered AUT* instruction ; <some variant of check that LR contains a valid address> b.cond break_block ret_block: ; lowered TCRETURN break_block: brk 0xc471 As the existing methods either break the compatibility with execute-only memory mappings or can degrade the performance, they are disabled by default and can be explicitly enabled with a command line option. Individual subtargets can opt-in to use one of the available methods by updating AArch64FrameLowering::getAuthenticatedLRCheckMethod(). Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D156716	2023-10-11 17:38:17 +03:00
Sander de Smalen	7e815dd76d	[AArch64][SME] Create new interface for isSVEAvailable. When a function is compiled to be in Streaming(-compatible) mode, the full set of SVE instructions may not be available. This patch adds an interface to query that and changes the codegen for FADDA (not legal in Streaming-SVE mode) to instead be expanded for fixed-length vectors, or otherwise not to code-generate for scalable vectors. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D156109	2023-09-01 12:00:36 +00:00
Sander de Smalen	08fd44b300	[AArch64] Force streaming-compatible codegen when attributes are set. Before this patch, the only way to generate streaming-compatible code was to use the `-force-streaming-compatible-sve` flag, but the compiler should also avoid the use of instructions invalid in streaming mode when a function has the aarch64_pstate_sm_enabled/compatible attribute. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D155428	2023-07-18 10:26:00 +00:00
Sander de Smalen	ec6af93d02	[AArch64] NFC: Replace 'forceStreamingCompatibleSVE' with 'isNeonAvailable'. The AArch64Subtarget interface 'isNeonAvailable' is more appropriate going forward, as we may also want to generate 'streaming SVE' code (not just 'streaming-compatible SVE' code), but here we must still make sure not to use NEON instructions which are invalid in streaming SVE mode.	2023-07-17 08:24:10 +00:00
David Sherwood	c7dbe326df	[AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1 This patch enables the tail-folding of simple loops by default when targeting the neoverse-v1 CPU. Simple loops exclude those with recurrences or reductions or loops that are reversed. New tests have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll In terms of SPEC2017 only one benchmark is really affected when building with "-Ofast -mcpu=neoverse-v1 -flto", which is (+ faster, - slower): 525.x264: +7.0% Differential Revision: https://reviews.llvm.org/D130618	2023-05-18 10:35:57 +00:00
Archibald Elliott	4679d7a26a	[NFC][ARM][AArch64] Cleanup TargetParser includes llvm/TargetParser/TargetParser.h now only includes AMDGPU-specific functionality, the ARM- and AArch64-specific functionality is in other headers.	2023-03-03 16:24:55 +00:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Archibald Elliott	8c712296fb	[NFC][TargetParser] Remove llvm/Support/AArch64TargetParser.h Removes the forwarding header `llvm/Support/AArch64TargetParser.h`. I am proposing to do this for all the forwarding headers left after rGf09cf34d00625e57dea5317a3ac0412c07292148 - for each header: - Update all relevant in-tree includes - Remove the forwarding Header Differential Revision: https://reviews.llvm.org/D140999	2023-02-03 17:34:01 +00:00
Guillaume Chatelet	d6e0ff6074	[NFC] Migrate aarch64 alignment to Align	2023-02-03 16:29:11 +00:00
Mitch Phillips	486729ce06	Re-land: [MTE] Add AArch64GlobalsTagging Pass Adds an IR pass for -fsanitize=memtag-globals. This pass goes over the tag-capable global variables, and replaces them with a tagged global variable of the same contents. This new global variable will have its size and alignment adjusted if neccesary so that they're both a multiple of the tag granule size (16 bytes). Global merge must also be suppressed for tagged globals, as each global variable must have a unique tag. This can possibly be relaxed in future; globals that are identical in size, alignment, and content can possibly be merged. The major problem comes from tail- or head-merging, which if left unchecked, could have partially-overlapping global variables with different memory tags, leading to crashes at runtime. Reviewed By: fmayer, eugenis Differential Revision: https://reviews.llvm.org/D133392	2023-01-31 13:03:37 -08:00

1 2 3 4 5

243 Commits