llvm-project

Author	SHA1	Message	Date
Benjamin Maxwell	eb764040bc	[AArch64][SME] Implement the SME ABI (ZA state management) in Machine IR (#149062 ) ## Short Summary This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI for ZA state (e.g., lazy saves and agnostic ZA functions). This is currently not enabled by default (but aims to be by LLVM 22). The goal is for this new pass to more optimally place ZA saves/restores and to work with exception handling. ## Long Description This patch reimplements management of ZA state for functions with private and shared ZA state. Agnostic ZA functions will be handled in a later patch. For now, this is under the flag `-aarch64-new-sme-abi`, however, we intend for this to replace the current SelectionDAG implementation once complete. The approach taken here is to mark instructions as needing ZA to be in a specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly defining or using ZA registers (such as $zt0 or $zab0) require the "ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE" state depending on the callee (having shared or private ZA). We already add ZA register uses/definitions to machine instructions, so no extra work is needed to mark these. Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START. These markers are then used by the MachineSMEABIPass to find instructions where there is a transition between required ZA states. These are the points we need to insert code to set up or restore a ZA save (or initialize ZA). To handle control flow between blocks (which may have different ZA state requirements), we bundle the incoming and outgoing edges of blocks. Bundles are formed by assigning each block an incoming and outgoing bundle (initially, all blocks have their own two bundles). Bundles are then combined by joining the outgoing bundle of a block with the incoming bundle of all successors. These bundles are then assigned a ZA state based on the blocks that participate in the bundle. Blocks whose incoming edges are in a bundle "vote" for a ZA state that matches the state required at the first instruction in the block, and likewise, blocks whose outgoing edges are in a bundle vote for the ZA state that matches the last instruction in the block. The ZA state with the most votes is used, which aims to minimize the number of state transitions.	2025-08-19 10:00:28 +01:00
Ellis Hoag	0d1392e979	[MachineOutliner] Remove LOHs from outlined candidates (#143617 ) Remove Linker Optimization Hints (LOHs) from outlining candidates instead of simply preventing outlining if LOH labels are found in the candidate. This will improve the effectiveness of the machine outliner when LOHs are enabled (which is the default). In https://discourse.llvm.org/t/loh-conflicting-with-machineoutliner/83279/1 it was observed that the machine outliner is much more effective when LOHs are disabled. Rather than completely disabling LOH, this PR aims to keep LOH in most places and removing them from outlined functions where it could be illegal. Note that we are conservatively removing all LOHs from outlined functions for simplicity, but I believe we could retain LOHs that are in the intersection of all candidates. It should be ok to remove these LOHs since these blocks are being outlined anyway, which will harm performance much more than the gain from keeping the LOHs.	2025-06-30 14:29:06 -07:00
Benjamin Maxwell	f37d944152	[AArch64][SME] Use reportFatalUsageError rather than assert (NFC) (#145491 ) Fixes #144351	2025-06-25 00:08:36 +01:00
Andrew Rogers	19658d1474	[llvm] annotate interfaces in llvm/Target for DLL export (#143615 ) ## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang	2025-06-17 13:28:45 -07:00
Pengcheng Wang	f393986b53	[MISched] Add templates for creating custom schedulers (#141935 ) We rename `createGenericSchedLive` and `createGenericSchedPostRA` to `createSchedLive` and `createSchedPostRA`, and add a template parameter `Strategy` which is the generic implementation by default. This can simplify some code for targets that have custom scheduler strategy.	2025-06-03 11:37:40 +08:00
Matthias Braun	675cb70641	Register assembly printer passes (#138348 ) Register assembly printer passes in the pass registry. This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests. Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow targets to use the `INITIALIZE_PASS` macros and register the pass in the pass registry. This currently has a default parameter so it won't break any targets that have not been updated.	2025-05-06 18:01:17 -07:00
Sergei Barannikov	bb1765179e	[TTI] Simplify implementation (NFCI) (#136674 ) Replace "concept based polymorphism" with simpler PImpl idiom. This pursues two goals: * Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures. * Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. `getInstructionCost()`), it would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`, `TTIImplBase`, and target overrides. Now it is two less :) There are three commits to hopefully simplify the review. The first commit removes `TTI::Model`. This is done by deriving `TargetTransformInfoImplBase` from `TTI::Concept`. This is possible because they implement the same set of interfaces with identical signatures. The first commit makes `TargetTransformImplBase` polymorphic, which means all derived classes should `override` its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of `-Woverloaded-virtual` warnings (and break `-Werror` builds). The third commit eliminates `TTI::Concept` by merging it with the only derived class `TargetTransformImplBase`. This commit could be extracted into a separate PR, but it touches the same lines in `TargetTransformInfoImpl.h` (removes `override` added by the second commit and adds `virtual`), so I thought it may make sense to land these two commits together. Pull Request: https://github.com/llvm/llvm-project/pull/136674	2025-04-26 15:25:40 +03:00
Rahul Joshi	23c27f3efc	[NFC][LLVM][AArch64] Cleanup pass initialization for AArch64 (#134315 ) - Remove calls to pass initialization from pass constructors. - https://github.com/llvm/llvm-project/issues/111767	2025-04-07 10:24:27 -07:00
Shubham Sandeep Rastogi	cc86d7cb19	Initialize aarch64-cond-br-tuning pass (#132087 ) The call to the initializeAArch64CondBrTuningPass function is missing in the AArch64TargetMachine LLVMInitializeAArch64Target function. This means that the pass is not in the pass registry and options such as -run-pass=aarch64-cond-br-tuning and -stop-after=aarch64-cond-br-tuning cannot be used. This patch fixes that issue.	2025-03-20 14:48:53 -07:00
Daniel Paoliello	16e051f0b9	[win] NFC: Rename `EHCatchret` to `EHCont` to allow for EH Continuation targets that aren't `catchret` instructions (#129953 ) This change splits out the renaming and comment updates from #129612 as a non-functional change.	2025-03-06 09:28:44 -08:00
Akshat Oke	e3ece07593	Revert "Reland "[AArch64][NPM] Chalk out the CodeGenPassBuilder for NPM (#128…" (#128819 ) Reverts llvm/llvm-project#128662 Still a link error.	2025-02-26 10:54:39 +05:30
Akshat Oke	e927cf6653	Reland "[AArch64][NPM] Chalk out the CodeGenPassBuilder for NPM (#128… (#128662 ) …471)" Reland https://github.com/llvm/llvm-project/pull/128471 The Passes library was not linked in earlier.	2025-02-26 10:48:29 +05:30
Kazu Hirata	83ddb43cad	Revert "[AArch64][NPM] Chalk out the CodeGenPassBuilder for NPM (#128471 )" This reverts commit d85685eb863641dce62a9f858ebcd6bab56c605b. Multiple buildbot failures have been reported: https://github.com/llvm/llvm-project/pull/128471	2025-02-24 22:34:52 -08:00
Akshat Oke	d85685eb86	[AArch64][NPM] Chalk out the CodeGenPassBuilder for NPM (#128471 ) This allows for testing AArch64 passes with the new pass manager.	2025-02-25 11:33:57 +05:30
Christudasan Devadasan	a47c35a699	[CodeGen] Move MISched target hooks into TargetMachine (#125700 ) The createSIMachineScheduler & createPostMachineScheduler target hooks are currently placed in the PassConfig interface. Moving it out to TargetMachine so that both legacy and the new pass manager can effectively use them.	2025-02-05 11:41:37 +05:30
Florian Mayer	514580b438	[MTE] Apply alignment / size in AsmPrinter rather than IR (#111918 ) This makes sure no optimizations are applied that assume the bigger alignment or size, which could be incorrect if we link together with non-instrumented code.	2024-12-17 00:47:02 -08:00
Akshat Oke	3f9d02aae8	[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326 ) With this, all machine SSA optimization passes are available in the new codegen pipeline.	2024-11-18 11:02:01 +05:30
Matin Raayai	bb3f5e1fed	Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234 ) Following discussions in #110443, and the following earlier discussions in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html, https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine` interface classes. More specifically: 1. Makes `TargetMachine` the only class implemented under `TargetMachine.h` in the `Target` library. 2. `TargetMachine` contains target-specific interface functions that relate to IR/CodeGen/MC constructs, whereas before (at least on paper) it was supposed to have only IR/MC constructs. Any Target that doesn't want to use the independent code generator simply does not implement them, and returns either `false` or `nullptr`. 3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming aims to make the purpose of `LLVMTargetMachine` clearer. Its interface was moved under the CodeGen library, to further emphasis its usage in Targets that use CodeGen directly. 4. Makes `TargetMachine` the only interface used across LLVM and its projects. With these changes, `CodeGenCommonTMImpl` is simply a set of shared function implementations of `TargetMachine`, and CodeGen users don't need to static cast to `LLVMTargetMachine` every time they need a CodeGen-specific feature of the `TargetMachine`. 5. More importantly, does not change any requirements regarding library linking. cc @arsenm @aeubanks	2024-11-14 13:30:05 -08:00
Daniel Kiss	3f40ad7ba8	Add ifunc support for Windows on AArch64. (#111962 ) On Windows there is no platform support for ifunc but we could lower them to global function pointers. This also enables FMV for Windows with Clang and Compiler-rt. Depends on #111961	2024-11-14 14:45:15 +01:00
Kazu Hirata	a41922ad75	[AArch64] Remove unused includes (NFC) (#115685 ) Identified with misc-include-cleaner.	2024-11-11 07:35:08 -08:00
weiwei chen	c8a7f14b27	[Backend] Add clearSubtargetMap API for TargetMachine. (#112383 ) - [x] Add `clearSubtargetInfo` API to TargetMachine and each backend to make it possible to release memory used in each backend's `SubtargetInfo` map if needed. Keep this API as `protected` so that it will be used with precautions.	2024-11-07 10:05:19 -05:00
Christudasan Devadasan	488d3924dd	[CodeGen][NewPM] Port EarlyIfConversion pass to NPM. (#108508 )	2024-10-16 13:22:57 +05:30
Daniel Paoliello	c9f27275c1	[clang][aarch64] Add support for the MSVC qualifiers __ptr32, __ptr64, __sptr, __uptr for AArch64 (#111879 ) MSVC has a set of qualifiers to allow using 32-bit signed/unsigned pointers when building 64-bit targets. This is useful for WoW code (i.e., the part of Windows that handles running 32-bit application on a 64-bit OS). Currently this is supported on x64 using the 270, 271 and 272 address spaces, but does not work for AArch64 at all. This change adds the same 270, 271 and 272 address spaces to AArch64 and adjusts the data layout string accordingly. Clang will generate the correct address space casts, but these will currently be ignored until the AArch64 backend is updated to handle them. Partially fixes #62536 This is a resurrected version of <https://reviews.llvm.org/D158857> (originally created by @a_vorobev) - I've cleaned it up a little, fixed the rest of the tests and added to auto-upgrade for the data layout.	2024-10-15 10:37:36 -07:00
Paul Walker	665457815f	[LLVM][AArch64] Enable SVEIntrinsicOpts at all optimisation levels.	2024-10-09 14:17:39 +00:00
Paul Walker	1b3fc75451	Revert "[LLVM][AArch64] Enable SVEIntrinsicOpts at all optimisation levels." This reverts commit 886d98e149843f3890ef4dd556a5dee45ff97fe9.	2024-10-09 10:44:00 +00:00
Paul Walker	886d98e149	[LLVM][AArch64] Enable SVEIntrinsicOpts at all optimisation levels.	2024-10-09 10:22:52 +00:00
rjmansfield	0717898124	Fix cl::desc typos in aarch64-enable-dead-defs and arm-implicit-it. (#106712 )	2024-08-30 19:15:05 +01:00
Sander de Smalen	6c189eaea9	[AArch64] Add SME peephole optimizer pass (#104612 ) This pass removes back-to-back smstart/smstop instructions to reduce the number of streaming mode changes in a function. The implementation as proposed doesn't aim to solve all problems yet and suggests a number of cases that can be optimized in the future.	2024-08-21 09:44:01 +01:00
Anatoly Trosinenko	cc53b953ac	[AArch64] When hardening against SLS, only create called thunks (#97472 ) In preparation for implementing hardening of BLRA* instructions, restrict thunk function generation to only the thunks being actually called from any function. As described in the existing comments, emitting all possible thunks for BLRAA and BLRAB instructions would mean adding about 1800 functions in total, most of which are likely not to be called. This commit merges AArch64SLSHardening class into SLSBLRThunkInserter, so thunks can be created as needed while rewriting a machine function. The usages of TII, TRI and ST fields of AArch64SLSHardening class are replaced with requesting them in-place, as ThunkInserter assumes multiple "entry points" in contrast to the only runOnMachineFunction method of AArch64SLSHardening. The runOnMachineFunction method essentially replaces pre-existing insertThunks implementation as there is no more need to insert all possible thunks unconditionally. Instead, thunks are created on first use from inside of insertThunks method.	2024-07-05 13:12:09 +03:00
Nikita Popov	5cd0ba30f5	Reapply [IR] Lazily initialize the class to pass name mapping (NFC) (#96321 ) (#96462 ) On MSVC the `this` uses inside `decltype` require a lambda capture. On clang they result in an unused capture warning instead. Add the capture and suppress the warning with `(void)this`. ----- Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.	2024-06-24 15:00:11 +02:00
Nikita Popov	e5a41f0afc	Revert "[IR] Lazily initialize the class to pass name mapping (NFC) (#96321 )" My attempt to fix the Windows build made things worse, revert entirely for now. This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c. This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5. This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.	2024-06-24 10:32:03 +02:00
Nikita Popov	957dc4366d	[IR] Lazily initialize the class to pass name mapping (NFC) (#96321 ) Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.	2024-06-24 09:40:09 +02:00
Sander de Smalen	8e0cd7382a	[AArch64] Consider runtime mode when deciding to use SVE for fixed-length vectors. (#96081 ) This also fixes the case where an SVE div is incorrectly to be assumed available in non-streaming mode with SME.	2024-06-20 17:08:14 +01:00
Min-Yih Hsu	37e309f163	[AArch64][LoopIdiom] Generalize AArch64LoopIdiomTransform into LoopIdiomVectorize (#94081 ) To facilitate sharing LoopIdiomTransform between AArch64 and RISC-V, this first patch moves AArch64LoopIdiomTransform from lib/Target/AArch64 to lib/Transforms/Vectorize and renames it to LoopIdiomVectorize. The following patch (#94082) will teach LoopIdiomVectorize how to generate VP intrinsics (in addition to the current masked vector style) in favor of RVV.	2024-06-07 14:06:11 -07:00
paperchalice	7652a59407	Reland "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94149 ) - Fix build with `EXPENSIVE_CHECKS` - Remove unused `PassName::ID` to resolve warning - Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly	2024-06-04 08:10:58 +08:00
paperchalice	8917afaf0e	Revert "[NewPM][CodeGen] Port selection dag isel to new pass manager" (#94146 ) This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to de37c06f01772e02465ccc9f538894c76d89a7a1 It still breaks EXPENSIVE_CHECKS build. Sorry.	2024-06-02 14:31:52 +08:00
paperchalice	d2cdc8ab45	[NewPM][CodeGen] Port selection dag isel to new pass manager (#83567 ) Port selection dag isel to new pass manager. Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.	2024-06-02 09:12:33 +08:00
Sander de Smalen	1015f51dd9	[AArch64] NFC: Rename -force-streaming-compatible-sve to -force-streaming-compatible (#92774 ) The behaviour of the flag should be equivalent to __arm_streaming_compatible. At the moment, the name suggests that '-force-streaming-compatible-sve' on its own (i.e. without specifying `+sve`) enables the compiler to use the streaming-compatible subset of SVE instructions, but the semantics merely are that the function can be called with either PSTATE.SM=0 or PSTATE.SM=1.	2024-05-22 07:58:54 +01:00
Doug Wyatt	ddecadabeb	[clang backend] In AArch64's DataLayout, specify a minimum function alignment of 4. (#90702 ) This addresses an issue where the explicit alignment of 2 (for C++ ABI reasons) was being propagated to the back end and causing under-aligned functions (in special sections). This is an alternate approach suggested by @efriedma-quic in PR #90415. Fixes #90358	2024-05-05 19:05:15 -07:00
Sander de Smalen	c3d58867bd	[AArch64][SME] Create new pass to remove COALESCER_BARRIER early. (#85386 ) The purpose of the COALESCER_BARRIER pseudo node is to prevent the register coalescer from coalescing certain COPY instructions around smstart/smstop instructions, so that we spill only the (required) FPR register rather than the encompassing ZPR register. The pseudos are removed in the AArch64ExpandPseudo pass. However, because the node itself is a _use_ of a register, this occassionally leads to redundant spills/fills, because the register allocator thinks the virtual register is actually used before an smstart/smstop instruction, causing it to be filled, at which points it requires immediate spilling again to ensure it stays live over the smstart/smstop instruction. We can avoid that by removing the pseudo nodes right after coalescing, but before register allocation.	2024-04-15 15:07:20 +01:00
Nathan Lanza	da385e8251	[aarch64] Unguard GEPOpt from O3 This chunk of code currently runs only if the optimization mode is O3 AND the EnableGEPOpt flag is set. Given that this is the only use case for the EnableGEPOpt flag, the guarding against O3 is kinda pointless. IF the user wants to enable it then the flag should be sufficient. Reviewers: TNorthover, aeubanks Reviewed By: aeubanks Pull Request: https://github.com/llvm/llvm-project/pull/86588	2024-03-25 18:08:36 -04:00
paperchalice	29bf32efbb	[NewPM][AArch64] Add AArch64PassRegistry.def (#85215 ) PR #83567 ports `SelectionDAGISel` to the new pass manager, then each backend should provide `<Target>DagToDagISel()` in new pass manager style. Then each target should provide `<Target>PassRegistry.def` to register backend passes in `registerPassBuilderCallbacks` to reduce duplicate code. This PR adds `AArch64PassRegistry.def` to AArch64 backend and boilerplate code in `registerPassBuilderCallbacks`.	2024-03-21 10:57:51 +08:00
ostannard	503c55e170	[AArch64] Move SLS later in pass pipeline (#84210 ) Currently, the SLS hardening pass is run before the machine outliner, which means that the outliner creates new functions and calls which do not have the SLS hardening applied. The fix for this is to move the SLS passes to after the outliner, as has recently been done for the return address signing pass. This also avoids a bug where the SLS outliner emits code with instructions after a return, which the outliner doesn't correctly handle.	2024-03-07 09:28:49 +00:00
Rishabh Bali	fe42e72db2	[CodeGen] Port AtomicExpand to new Pass Manager (#71220 ) Port the `atomicexpand` pass to the new Pass Manager. Fixes #64559	2024-02-25 18:42:22 +05:30
Yuta Mukai	70eab122bc	[AArch64][MachinePipeliner] Add pipeliner support for AArch64 (#79589 ) Add AArch64 implementations for the interfaces of MachinePipeliner pass. The pass is disabled by default for AArch64. It is enabled by specifying --aarch64-enable-pipeliner. 5 tests in llvm-test-suites show performance improvement by more than 5% on a Neoverse V1 processor. \| test \| improvement \| \| ---------------------------------------------------------------- \| -----------:\| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl.test \| 16% \| \| MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-flt.test \| 16% \| \| SingleSource/Benchmarks/Adobe-C++/loop_unroll.test \| 14% \| \| SingleSource/Benchmarks/Misc/flops-5.test \| 13% \| \| SingleSource/Benchmarks/BenchmarkGame/spectral-norm.test \| 6% \| (base flags: -mcpu=neoverse-v1 -O3 -mrecip, flags for pipelining: -mllvm -aarch64-enable-pipeliner -mllvm -pipeliner-max-stages=100 -mllvm -pipeliner-max-mii=100 -mllvm -pipeliner-enable-copytophi=0) On the other hand, there are cases of significant performance degradation. Algorithm improvements and adding the option/pragma will be needed in the future.	2024-02-02 10:33:44 +09:00
Eli Friedman	a6065f0fa5	Arm64EC entry/exit thunks, consolidated. (#79067 ) This combines the previously posted patches with some additional work I've done to more closely match MSVC output. Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs. The other changes are supporting changes, to handle the new constructs generated by that pass. There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that. There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX. I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently. Thanks to @dpaoliello for extensive testing and suggestions. (Originally posted as https://reviews.llvm.org/D157547 .)	2024-01-22 21:28:07 -08:00
David Sherwood	c7148467fc	[AArch64] Add an AArch64 pass for loop idiom transformations (#72273 ) We have added a new pass that looks for loops such as the following: ``` while (i != max_len) if (a[i] != b[i]) break; ... use index i ... ``` Although similar to a memcmp, this is slightly different because instead of returning the difference between the values of the first non-matching pair of bytes, it returns the index of the first mismatch. As such, we are not able to lower this to a memcmp call. The new pass can now spot such idioms and transform them into a specialised predicated loop that gives a significant performance improvement for AArch64. It is intended as a stop-gap solution until this can be handled by the vectoriser, which doesn't currently deal with early exits. This specialised loop makes use of a generic intrinsic that counts the trailing zero elements in a predicate vector. This was added in https://reviews.llvm.org/D159283 and for SVE we end up with brkb & incp instructions. Although we have added this pass only for AArch64, it was written in a generic way so that in theory it could be used by other targets. Currently the pass requires scalable vector support and needs to know the minimum page size for the target, however it's possible to make it work for fixed-width vectors too. Also, the llvm.experimental.cttz.elts intrinsic used by the pass has generic lowering, but can be made efficient for targets with instructions similar to SVE's brkb, cntp and incp. Original version of patch was posted on Phabricator: https://reviews.llvm.org/D158291 Patch co-authored by Kerry McLaughlin (@kmclaughlin-arm) and David Sherwood (@david-arm) See the original discussion on Discourse: https://discourse.llvm.org/t/aarch64-target-specific-loop-idiom-recognition/72383	2024-01-09 11:29:28 +00:00
Momchil Velikov	ac06d4e4cb	Re-commit "[MachineSink][AArch64] Enable sink-and-fold by default (#72132 )" This re-commits 13fe0386454d after fixing a couple of issues in the LLDB testsuite in ef9bcace834e and 6b87d84ff45d	2023-11-27 11:28:22 +00:00
Momchil Velikov	4ac5b0da8d	Revert "[MachineSink][AArch64] Enable sink-and-fold by default (#72132 )" This reverts commit 13fe0386454d2f4c9bad4e20fc59699d1a49b8cf. May have broken an LLDB test https://lab.llvm.org/buildbot/#/builders/96/builds/48609	2023-11-16 17:07:39 +00:00
Momchil Velikov	13fe038645	[MachineSink][AArch64] Enable sink-and-fold by default (#72132 ) Enable the optimisation by default for AArch64 after a compile time regressoin fix in e8209b2486d8	2023-11-16 12:12:56 +00:00

1 2 3 4 5 ...

362 Commits