llvm-project

Author	SHA1	Message	Date
Momchil Velikov	ac06d4e4cb	Re-commit "[MachineSink][AArch64] Enable sink-and-fold by default (#72132 )" This re-commits 13fe0386454d after fixing a couple of issues in the LLDB testsuite in ef9bcace834e and 6b87d84ff45d	2023-11-27 11:28:22 +00:00
Momchil Velikov	4ac5b0da8d	Revert "[MachineSink][AArch64] Enable sink-and-fold by default (#72132 )" This reverts commit 13fe0386454d2f4c9bad4e20fc59699d1a49b8cf. May have broken an LLDB test https://lab.llvm.org/buildbot/#/builders/96/builds/48609	2023-11-16 17:07:39 +00:00
Momchil Velikov	13fe038645	[MachineSink][AArch64] Enable sink-and-fold by default (#72132 ) Enable the optimisation by default for AArch64 after a compile time regressoin fix in e8209b2486d8	2023-11-16 12:12:56 +00:00
David Sherwood	bdc0afc871	[CodeGen][AArch64] Set min jump table entries to 13 for AArch64 targets (#71166 ) There are some workloads that are negatively impacted by using jump tables when the number of entries is small. The SPEC2017 perlbench benchmark is one example of this, where increasing the threshold to around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum threshold based on empirical evidence rather than science, and just manually increased the threshold until I got the best performance without impacting other workloads. For neoverse-v1 I saw around ~0.2% improvement in the SPEC2017 integer geomean, and no overall change for neoverse-n1. If we find issues with this threshold later on we can always revisit this. The most significant SPEC2017 score changes on neoverse-v1 were: 500.perlbench_r: +1.6% 520.omnetpp_r: +0.6% and the rest saw changes < 0.5%. I updated CodeGen/AArch64/min-jump-table.ll to reflect the new threshold. For most of the affected tests I manually set the min number of entries back to 4 on the RUN line because the tests seem to rely upon this behaviour.	2023-11-14 13:00:28 +00:00
Oliver Stannard	339faffd05	Revert "[AArch64] Move SLS later in pass pipeline" The (MF.size() == 0) assertis is triggering when building at -O0. Reverting this while I work out what is going wrong. This reverts commit 7e8eccd990d37d2771ca5ad7a84f54c3cfc4a5e1.	2023-10-26 09:50:13 +01:00
Momchil Velikov	9d35387811	[AArch64] Disable by default MachineSink sink-and-fold (#70101 ) There is a report about a large compile time regression in V8 when generating debug info.	2023-10-25 10:58:31 +01:00
Oliver Stannard	7e8eccd990	[AArch64] Move SLS later in pass pipeline Currently, the SLS hardening pass is run before the machine outliner, which means that the outliner creates new functions and calls which do not have the SLS hardening applied. The fix for this is to move the SLS passes to after the outliner, as has recently been done for the return address signing pass. This also avoids a bug where the SLS outliner emits code with instructions after a return, which the outliner doesn't correctly handle. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D158511	2023-10-25 10:45:12 +01:00
Momchil Velikov	d15fff6c69	Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )' This reverts revert 19505072123e43eccf528b660973067b5c9b4a26. An issue was fixed in bea3684944c0d7962cd53ab77aad756cfee76b7c and some newly appeared tests updated.	2023-10-19 13:18:25 +01:00
Amara Emerson	1950507212	Revert "Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )'" This reverts commit dbb9faedec5e28ab3f584f5e14d31e475ac268ac. This seems to cause miscompiles on CTMark/sqlite3 and others with GISel.	2023-10-15 14:16:37 -07:00
Momchil Velikov	dbb9faedec	Re-apply '[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )' This re-applies commit a9d0ab2ee572f179f80483f3ebbbcdd03c3b4481, which was reverted by 8abb2ace888bdd04a1bdb4ac2f2fc25d57a5760a. The issue was fixed by 7510f32f906ab4e583542eae2611b020f88629af	2023-10-13 12:14:22 +01:00
Caroline Tice	8abb2ace88	Revert "Re-apply "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )"" This reverts commit a9d0ab2ee572f179f80483f3ebbbcdd03c3b4481. That commit is causing clang crashes.	2023-10-06 20:51:48 -07:00
Momchil Velikov	a9d0ab2ee5	Re-apply "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )" This re-applies commit ace20e24287b, which was reverted in eff4ef25b3dc. The issues were fixed in: * b30765caf874 [AArch64] Fix an incorrect handling of debug values in MachineSink (#68107) * b454b04d6869 [AArch64] Fix a compiler crash in MachineSink (#67705)	2023-10-06 09:34:42 +01:00
Momchil Velikov	eff4ef25b3	Revert "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )" This reverts commit ace20e24287bf531bb1185e213642c3b49eb293c. This might be causing a buildbot failure at https://green.lab.llvm.org/green/job/clang-stage1-RA/35786/	2023-09-27 14:24:59 +01:00
Momchil Velikov	ace20e2428	[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432 )	2023-09-27 10:05:32 +01:00
Momchil Velikov	c649fd34e9	[MachineSink][AArch64] Sink instruction copies when they can replace copy into hard register or folded into addressing mode This patch adds a new code transformation to the `MachineSink` pass, that tries to sink copies of an instruction, when the copies can be folded into the addressing modes of load/store instructions, or replace another instruction (currently, copies into a hard register). The criteria for performing the transformation is that: * the register pressure at the sink destination block must not exceed the register pressure limits * the latency and throughput of the load/store or the copy must not deteriorate * the original instruction must be deleted Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D152828	2023-09-25 10:49:44 +01:00
Anatoly Trosinenko	eb02ee44d3	[AArch64] Move PAuth codegen down the machine pipeline To simplify handling PAuth in the machine outliner, introduce a separate AArch64PointerAuth pass that is executed after both Prologue/Epilogue Inserter and Machine Outliner passes. After moving to AArch64PointerAuth, signLR and authenticateLR are not used outside of their class anymore, so make them private and simplify accordingly. The new pass is added via AArch64PassConfig::addPostBBSections(), so that it can change the code size before branch relaxation occurs. AArch64BranchTargets is placed there too, so it can take into account any PACI(A\|B)SP instructions and not excessively add BTIs at the start of functions. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D159357	2023-09-22 14:49:14 +03:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Sander de Smalen	ecb7b9c5c5	[Clang][AArch64] Diagnostics for SME attributes when target doesn't have 'sme' This patch adds error diagnostics to Clang when code uses the AArch64 SME attributes without specifying 'sme' as available target attribute. * Function definitions marked as '__arm_streaming', '__arm_locally_streaming', '__arm_shared_za' or '__arm_new_za' will by definition use or require SME instructions. * Calls from non-streaming functions to streaming-functions require the compiler to enable/disable streaming-SVE mode around the call-site. In some cases we can accept the SME attributes without having 'sme' enabled: * Function declaration can have the SME attributes. * Definitions can be __arm_streaming_compatible since the generated code should execute on processing elements without SME. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D157269	2023-08-09 12:31:02 +00:00
Zhongyunde	05aae0839f	Reland [AArch64][NFC] Call the API getVScaleRange directly Use the maximum 64 for BitWidth of getVScaleRange to avoid returning an empty range. the previous changes bring in a Buildbot failure because MinSVEVectorSize = MinSVEVectorSize. error: explicitly assigning value of variable of type 'unsigned int' to itself [-Werror,-Wself-assign] Reviewed By: sdesmalen, nikic, dmgreen Differential Revision: https://reviews.llvm.org/D155708	2023-07-26 18:55:31 +08:00
Zhongyunde	ebaac2b2d6	Revert "[AArch64][NFC] Call the API getVScaleRange directly" This reverts commit 67005c8e6fa9464f8bc436305a422071013ae499.	2023-07-26 16:44:14 +08:00
Zhongyunde	67005c8e6f	[AArch64][NFC] Call the API getVScaleRange directly Use the maximum 64 for BitWidth of getVScaleRange to avoid returning an empty range. Reviewed By: sdesmalen, nikic, dmgreen Differential Revision: https://reviews.llvm.org/D155708	2023-07-26 15:54:04 +08:00
Daniel Hoekwater	0315fca912	[AArch64] Move branch relaxation after bbsection assignment Because branch relaxation needs to factor in if branches target a block in the same section or a different one, it needs to run after the Basic Block Sections / Machine Function Splitting passes. Because Jump table compression relies on block offsets remaining fixed after the table is compressed, we must also move the JT compression pass. The only tests affected are ones enforcing just the ordering and the a few that have basic block ids changed because RenumberBlocks hasn't run yet. Differential Revision: https://reviews.llvm.org/D153829	2023-07-21 20:24:52 +00:00
Sander de Smalen	08fd44b300	[AArch64] Force streaming-compatible codegen when attributes are set. Before this patch, the only way to generate streaming-compatible code was to use the `-force-streaming-compatible-sve` flag, but the compiler should also avoid the use of instructions invalid in streaming mode when a function has the aarch64_pstate_sm_enabled/compatible attribute. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D155428	2023-07-18 10:26:00 +00:00
Sami Tolvanen	e9569748de	[CodeGen][KCFI] Move cfi-type lowering to TargetLowering KCFI machine function passes transform indirect calls with a cfi-type attribute into architecture-specific type checks bundled together with the calls. Instead of having a separate pass for each architecture, add a generic machine function pass for KCFI and move the architecture-specific code that emits the actual check to TargetLowering. This avoids unnecessary duplication and makes it easier to add KCFI support to other architectures. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D149915	2023-05-09 18:38:54 +00:00
Amara Emerson	1e2f87868f	[AArch64][GlobalISel] Move the localizer to run before the legalizer, and always localize globals. Our strategy for localizing globals in the entry block breaks down when we have large functions with high register pressure, using lots of globals. When this happens, our heuristics say that globals with many uses should not be localized, leading us to cause excessive spills and stack usage. These situations are also exacerbated by LTO which tends to generate large functions. For now, moving to a strategy that's simpler and more akin to SelectionDAG fixes these issues and makes our codegen more similar. This has an overall neutral effect on size on CTMark, while showing slight improvements with -Os -flto on benchmarks. For low level firmware software though we see big improvements. The reason this is neutral, and not an improvement, is because we give up the gains from CSE'ing globals in cases where we low register pressure. I think this can be addressed in future with some better heuristics. Differential Revision: https://reviews.llvm.org/D147484	2023-04-03 20:41:54 -07:00
Archibald Elliott	62c7f035b4	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.	2023-02-07 12:39:46 +00:00
Mitch Phillips	486729ce06	Re-land: [MTE] Add AArch64GlobalsTagging Pass Adds an IR pass for -fsanitize=memtag-globals. This pass goes over the tag-capable global variables, and replaces them with a tagged global variable of the same contents. This new global variable will have its size and alignment adjusted if neccesary so that they're both a multiple of the tag granule size (16 bytes). Global merge must also be suppressed for tagged globals, as each global variable must have a unique tag. This can possibly be relaxed in future; globals that are identical in size, alignment, and content can possibly be merged. The major problem comes from tail- or head-merging, which if left unchecked, could have partially-overlapping global variables with different memory tags, leading to crashes at runtime. Reviewed By: fmayer, eugenis Differential Revision: https://reviews.llvm.org/D133392	2023-01-31 13:03:37 -08:00
Mitch Phillips	15e33c699c	Revert "[MTE] Add AArch64GlobalsTagging Pass" This reverts commit 4edfcff71e150770675a19576f698c7bbe788ee2. Broke the non-aarch64-containing target builds. https://reviews.llvm.org/D133392 has more context.	2023-01-31 12:25:58 -08:00
Mitch Phillips	4edfcff71e	[MTE] Add AArch64GlobalsTagging Pass Adds an IR pass for -fsanitize=memtag-globals. This pass goes over the tag-capable global variables, and replaces them with a tagged global variable of the same contents. This new global variable will have its size and alignment adjusted if neccesary so that they're both a multiple of the tag granule size (16 bytes). Global merge must also be suppressed for tagged globals, as each global variable must have a unique tag. This can possibly be relaxed in future; globals that are identical in size, alignment, and content can possibly be merged. The major problem comes from tail- or head-merging, which if left unchecked, could have partially-overlapping global variables with different memory tags, leading to crashes at runtime. Reviewed By: fmayer, eugenis Differential Revision: https://reviews.llvm.org/D133392	2023-01-31 09:24:18 -08:00
Samuel Parker	615333bc09	[TypePromotion] NewPM support. Differential Revision: https://reviews.llvm.org/D140893	2023-01-03 15:09:29 +00:00
Benjamin Kramer	07e7168048	[AArch64] Stringref'ize AArch64Subtarget constructor. NFCI	2022-12-30 18:02:53 +01:00
Nick Desaulniers	f35d482ffd	[llvm][AArch64ISelDAGToDAG] support -{start\|stop}-{before\|after}=aarch64-isel Follow a similar pattern as AMDGPUDAGToDAGISel's constructor so that we can use INITIALIZE_PASS to register a pass. This allows for more fine grain testability of SelectionDAGISel via: llc -stop-{before,after}=aarch64-isel Link: https://github.com/llvm/llvm-project/issues/59538 See also: https://reviews.llvm.org/D140323 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D140331	2022-12-21 13:06:12 -08:00
Matt Arsenault	69e75ae695	CodeGen: Don't lazily construct MachineFunctionInfo This fixes what I consider to be an API flaw I've tripped over multiple times. The point this is constructed isn't well defined, so depending on where this is first called, you can conclude different information based on the MachineFunction. For example, the AMDGPU implementation inspected the MachineFrameInfo on construction for the stack objects and if the frame has calls. This kind of worked in SelectionDAG which visited all allocas up front, but broke in GlobalISel which hasn't visited any of the IR when arguments are lowered. I've run into similar problems before with the MIR parser and trying to make use of other MachineFunction fields, so I think it's best to just categorically disallow dependency on the MachineFunction state in the constructor and to always construct this at the same time as the MachineFunction itself. A missing feature I still could use is a way to access an custom analysis pass on the IR here.	2022-12-21 10:49:32 -05:00
Fangrui Song	bac974278c	CodeGen/CommandFlags: Convert Optional to std::optional	2022-12-03 18:38:12 +00:00
Krzysztof Parzyszek	8c7c20f033	Convert Optional<CodeModel> to std::optional<CodeModel>	2022-12-03 12:08:47 -06:00
David Green	16a72a0f87	[AArch64] Enable the select optimize pass for AArch64 This enabled the select optimize patch for ARM Out of order AArch64 cores. It is trying to solve a problem that is difficult for the compiler to fix. The criteria for when a csel is better or worse than a branch depends heavily on whether the branch is well predicted and the amount of ILP in the loop (as well as other criteria like the core in question and the relative performance of the branch predictor). The pass seems to do a decent job though, with the inner loop heuristics being well implemented and doing a better job than I had expected in general, even without PGO information. I've been doing quite a bit of benchmarking. The headline numbers are these for SPEC2017 on a Neoverse N1: 500.perlbench_r -0.12% 502.gcc_r 0.02% 505.mcf_r 6.02% 520.omnetpp_r 0.32% 523.xalancbmk_r 0.20% 525.x264_r 0.02% 531.deepsjeng_r 0.00% 541.leela_r -0.09% 548.exchange2_r 0.00% 557.xz_r -0.20% Running benchmarks with a combination of the llvm-test-suite plus several versions of SPEC gave between a 0.2% and 0.4% geomean improvement depending on the core/run. The instruction count went down by 0.1% too, which is a good sign, but the results can be a little noisy. Some issues from other benchmarks I had ran were improved in rGca78b5601466f8515f5f958ef8e63d787d9d812e. In summary well predicted branches will see in improvement, badly predicted branches may get worse, and on average performance seems to be a little better overall. This patch enables the pass for AArch64 under -O3 for cores that will benefit for it. i.e. not in-order cores that do not fit into the "Assume infinite resources that allow to fully exploit the available instruction-level parallelism" cost model. It uses a subtarget feature for specifying when the pass will be enabled, which I have enabled under cpu=generic as the performance increases for out of order cores seems larger than any decreases for inorder, which were minor. Differential Revision: https://reviews.llvm.org/D138990	2022-12-03 16:08:58 +00:00
Krzysztof Parzyszek	26424c96c0	Attributes: convert Optional to std::optional	2022-12-02 08:15:45 -06:00
David Green	201b7858f6	[AArch64] Disable aarch64-enable-gep-opt This option was enabled in D128582, and whilst it seems to be a net improvement in many cases, at least a couple of issues have been reported from D135957 and from the CSE added to the backend causing more instructions in executed blocks. Revert for the time being, until we can improve the precision.	2022-11-19 21:25:18 +00:00
Nicholas Guy	41a3f92596	[AArch64][CodeGen] Add AArch64 support for complex deinterleaving Differential Revision: https://reviews.llvm.org/D129066	2022-11-16 14:00:54 +00:00
Sander de Smalen	36864d47d6	[AArch64] Fix minor issue introduced in D135950. The Key for the SubtargetMap had the StreamingSVEModeDisabled in the wrong place. This change is non-functional, since the string (key) is still unique.	2022-10-19 17:01:41 +00:00
Sander de Smalen	137459aff6	[AArch64][SME] Disable (SLP\|Loop)Vectorizer when function may be executed in streaming mode. When the SME attributes tell that a function is or may be executed in Streaming SVE mode, we currently need to be conservative and disable _any_ vectorization (fixed or scalable) because the code-generator does not yet support generating streaming-compatible code. Scalable auto-vec will be gradually enabled in the future when we have confidence that the loop-vectorizer won't use any SVE or NEON instructions that are illegal in Streaming SVE mode. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135950	2022-10-19 16:42:20 +00:00
David Sherwood	f0f474dfd0	[AArch64][SME] Add codegen pass to handle ZA state in arm_new_za functions. The new pass implements the following: * Inserts code at the start of an arm_new_za function to commit a lazy-save when the lazy-save mechanism is active. * Adds a smstart intrinsic at the start of the function. * Adds a smstop intrinsic at the end of the function. Patch co-authored by kmclaughlin. Differential Revision: https://reviews.llvm.org/D133896	2022-10-05 09:43:57 +00:00
Sami Tolvanen	cff5bef948	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Relands 67504c95494ff05be2a613129110c9bcf17f6c13 with a fix for 32-bit builds. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 22:41:38 +00:00
Sami Tolvanen	a79060e275	Revert "KCFI sanitizer" This reverts commit 67504c95494ff05be2a613129110c9bcf17f6c13 as using PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.	2022-08-24 19:30:13 +00:00
Sami Tolvanen	67504c9549	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 18:52:42 +00:00
Shubham Narlawar	f55dbfbd9d	[AArch64] Move SeparateConstOffsetFromGEPPass before LSR and enable EnableGEPOpt by default. GEP's across basic blocks were not getting splitted due to EnableGEPOpt which was turned off by default. Hence, EarlyCSE missed the opportunity to eliminate common part of GEP's. This can be achieved by simply turning GEP pass on. - This patch moves SeparateConstOffsetFromGEPPass() just before LSR. - It enables EnableGEPOpt by default. Resolves - https://github.com/llvm/llvm-project/issues/50528 Added an unit test. Differential Revision: https://reviews.llvm.org/D128582	2022-07-22 15:20:53 +01:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Adrian Tong	7c13ae6490	Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP. This is then used in AArch64 to remove copy instructions after taildup ran in machine block placement Differential Revision: https://reviews.llvm.org/D125335	2022-05-26 18:43:16 +00:00
Andre Vieira	572fc7d2fd	[AArch64] Order STP Q's by ascending address This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule STP Q's to the same base-address in ascending order of offsets. We have found this to improve performance on Neoverse N1 and should not hurt other AArch64 cores. Differential Revision: https://reviews.llvm.org/D125377	2022-05-23 09:50:44 +01:00

1 2 3 4 5 ...

315 Commits