llvm-project

Author	SHA1	Message	Date
Anton Sidorenko	f8ed709345	[MachineCombiner] Extend reassociation logic to handle inverse instructions Machine combiner supports generic reassociation only of associative and commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we can extend this generic support to handle patterns like (X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`. This patch adds interface functions to process reassociation patterns of associative/commutative instructions and their inverse variants with minimal changes in backends. Differential Revision: https://reviews.llvm.org/D136754	2022-12-07 13:50:28 +03:00
Kazu Hirata	3c09ed006a	[llvm] Use std::nullopt instead of None in comments (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 17:12:44 -08:00
Fangrui Song	b0df70403d	[Target] llvm::Optional => std::optional The updated functions are mostly internal with a few exceptions (virtual functions in TargetInstrInfo.h, TargetRegisterInfo.h). To minimize changes to LLVMCodeGen, GlobalISel files are skipped. https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-04 22:43:14 +00:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
David Green	6a353c7756	[AArch64] Add GPR rr instructions to isAssociativeAndCommutative This adds some more scalar instructions that are both associative and commutative to isAssociativeAndCommutative, allowing the machine combiner to reassociate them to reduce critical path length. Differential Revision: https://reviews.llvm.org/D134260	2022-11-27 12:53:10 +00:00
Benjamin Maxwell	5eec8dfc2b	[AArch64] Add hasSVEorSME() helper and fix some incorrect checks This adds a little hasSVEorSME() helper, and as a NFC updates existing code to use it. The assertions get[Min\|Max]SVEVectorSizeInBits() are also now corrected to use hasSVEorSME() rather than just hasSVE(). Differential Revision: https://reviews.llvm.org/D138575	2022-11-24 17:54:37 +00:00
Vitaly Buka	8f104b806a	Revert "[AArch64] Add GPR rr instructions to isAssociativeAndCommutative" Breaks msan on aarch64. This reverts commit 5f7f484ee54ebbf702ee4c5fe9852502dc237121.	2022-11-22 11:03:13 -08:00
Hassnaa Hamdi	d8306b8885	[AArch64][SME]: Use SVE mov instruction for FPR128 registers in streaming-compatible mode. 1- in streaming mode, use SVE OR/mov instruction instead of NEON OR, during copying phyReg -AArch64InstrInfo::copyPhysReg-. 2- add testing file: register-mov.ll Differential Revision: https://reviews.llvm.org/D138211	2022-11-18 11:18:30 +00:00
Bradley Smith	ac82907a1c	[AArch64][SVE] Ensure redundant PTEST are removed with an 'invalid' PTRUE When a PTRUE of non-element size is encountered, the PTEST optimization logic bails out since it cannot handle that type of PTRUE. Instead, it should be treated as a generic predicate to allow later optimizations trigger. Differential Revision: https://reviews.llvm.org/D138116	2022-11-17 15:42:17 +00:00
David Green	71609871dd	[AArch64][MachineCombiner] Use MIMetadata to copy pcsections metadata to reassociated instructions. D134260/D138107 exposed that the MachineCombiner was not copying pcsections metadata where it should. This patch switches the MIBuild methods to use MIMetadata that can copy the debug loc and pcsections at the same time. Differential Revision: https://reviews.llvm.org/D138112	2022-11-16 13:22:48 +00:00
David Green	5f7f484ee5	[AArch64] Add GPR rr instructions to isAssociativeAndCommutative This adds some more scalar instructions that are both associative and commutative to isAssociativeAndCommutative, allowing the machine combiner to reassociate them to reduce critical path length. Differential Revision: https://reviews.llvm.org/D134260	2022-11-16 12:39:13 +00:00
Bradley Smith	2fb3e3c46d	[AArch64][SVE] Add PTEST_ANY pseudo instruction This allow recognition of when a ptest was emitted as an any condition and allows for extra optimization to be done later. This addresses missing optimizations from D137716 and D137718, and partially D137717. Depends on D137716, D137717, D137718 Differential Revision: https://reviews.llvm.org/D137930	2022-11-15 15:46:28 +00:00
Cullen Rhodes	8699efba6d	[AArch64][SVE] Fix bad PTEST(PTRUE_ALL, PTEST_LIKE) optimization AArch64InstrInfo::optimizePTestInstr attempts to remove a PTEST of a predicate generating operation that identically sets flags (implictly). When the mask is an all active of matching element size the PTEST is currently removed. For while instructions this is correct since they perform an implicit PTEST with an all active mask. However, for other instructions such as compares the mask could be different. This patch fixes this bug by only removing the PTEST if the same all active mask is used by the predicating-generating instruction. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D137718	2022-11-15 12:43:21 +00:00
Cullen Rhodes	a290668ec5	[AArch64][SVE] Fix bad PTEST(X, X) optimization AArch64InstrInfo::optimizePTestInstr attempts to remove a PTEST of a predicate generating operation that identically sets flags (implictly). When the mask is the same as the input predicate the PTEST is currently removed. This is incorrect since the mask for the implicit PTEST performed by the flag-setting instruction differs from the mask specified to the explicit PTEST and could set different flags. For example, consider PG=<1, 1, x, x> Z0=<1, 2, x, x> Z1=<2, 1, x, x> X=CMPLE(PG, Z0, Z1) =<0, 1, x, x> NZCV=0xxx PTEST(X, X), NZCV=1xxx where the first active flag (bit 'N' in NZCV) is set by the explicit PTEST, but not by the implicit PTEST as part of the compare. Given the PTEST mask and source are the same however, first is equivalent to any, so the PTEST could be removed if the condition is changed. The same applies to last active. It is safe to remove the PTEST for any active, but this information isn't available in the current optimization. This patch fixes the bad optimization, a later patch will implement the optimization proposed above and fix the any active case. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D137717	2022-11-15 11:59:07 +00:00
Cullen Rhodes	ce3e7eb968	[AArch64][SVE] Fix bad PTEST(PG, OP(PG, ...)) optimization AArch64InstrInfo::optimizePTestInstr attempts to remove a PTEST of a predicate generating operation that identically sets flags (implictly). When the PTEST and the predicate-generating operation use the same mask the PTEST is currently removed. This is incorrect since it doesn't consider element size. PTEST operates on 8-bit predicates, but for instructions like compare that also support 16/32/64-bit predicates, the implicit PTEST performed by the instruction will consider fewer lanes for these element sizes and could set different first or last active flags. For example, consider the following instruction sequence ptrue p0.b ; P0=1111-1111-1111-1111 index z0.s, #0, #1 ; Z0=<0,1,2,3> index z1.s, #1, #1 ; Z1=<1,2,3,4> cmphi p1.s, p0/z, z1.s, z0.s ; P1=0001-0001-0001-0001 ; ^ last active ptest p0, p1.b ; P1=0001-0001-0001-0001 ; ^ last active where the compare generates a canonical all active 32-bit predicate (equivalent to 'ptrue p1.s, all'). The implicit PTEST sets the last active flag, whereas the PTEST instruction with the same mask doesn't. This patch restricts the optimization to instructions operating on 8-bit predicates. One caveat is the optimization is safe regardless of element size for any active, this will be addressed in a later patch. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D137716	2022-11-15 10:34:23 +00:00
Cullen Rhodes	1e02a29e47	[AArch64][SVE] Use more flag-setting instructions If OP in PTEST(PG, OP(PG, ...)) has a flag-setting variant change the opcode so the PTEST becomes redundant. This patch extends this existing optimization in AArch64::optimizePTestInstr to cover all flag-setting opcodes. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D136083	2022-10-25 09:02:21 +00:00
Eli Friedman	a6ac968360	[Arm64EC] Refer to dllimport'ed functions correctly. Arm64EC has two different ways to refer to dllimport'ed functions in an object file. One is using the usual __imp_ prefix, the other is using an Arm64EC-specific prefix __imp_aux_. As far as I can tell, if a function is in an x64 DLL, __imp_aux_ refers to the actual x64 address, while __imp_ points to some linker-generated code that calls the exit thunk. So __imp_aux_ is used to refer to the address in non-call contexts, while __imp_ is used for calls to avoid the indirect call checker. There's one twist to this, though: if an object refers to a symbol using the __imp_aux_ prefix, the object file's symbol table must also contain the symbol with the usual __imp_ prefix. The symbol doesn't actually have to be used anywhere, it just has to exist; otherwise, the linker's symbol lookup in x64 import libraries doesn't work correctly. Currently, this is handled by emitting a .globl __imp_foo directive; we could try to design some better way to handle this. One minor quirk I haven't figured out: apparently, in Arm64EC mode, MSVC prefers to use a linker-synthesized stub to call dllimport'ed functions, instead of branching directly. The linker stub appears to do the same thing that inline code would do, so not sure if it's just a code-size optimization, or if the synthesized stub can actually do something other than just load from the import table in some circumstances. Differential Revision: https://reviews.llvm.org/D136202	2022-10-20 15:08:56 -07:00
Sander de Smalen	02df03c5b7	[AArch64][SME] Add support for arm_locally_streaming functions. Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start of the function, and a SMSTOP at the end of the function, such that all operations use the right value for vscale. Because the placement of these nodes is critically important (i.e. no vscale-dependent operations should be done before SMSTART has been issued), we require glueing the CopyFromReg to the Entry node such that we can insert the SMSTART as part of that glued chain. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131582	2022-10-14 13:47:53 +00:00
Anton Sidorenko	4431e705cc	[NFC] Use forward decl of MachineCombinerPattern enum to reduce dependencies Differential Revision: https://reviews.llvm.org/D135776	2022-10-13 14:56:14 +01:00
Martin Storsjö	bd3fa31887	[AArch64] Generate SEH info for PAC instructions Without this, unwinding through functions that does use PAC would fail, if PAC actually was active. Differential Revision: https://reviews.llvm.org/D135103	2022-10-12 22:21:03 +03:00
Martin Storsjö	a07787c9a5	[AArch64] Exclude instructions after setting the FP from SEH prologues After setting up the FP, the rest of the prologue doesn't need to be replayed for unwinding the stack frame. This allows reverting the functional parts of 2f7fbf837625267193351cc334e506a3a9161958 (but fixing inconsistent duplicate setting of HasWinCFI). Differential Revision: https://reviews.llvm.org/D135686	2022-10-12 12:36:21 +03:00
Cullen Rhodes	a17fcb2230	[AArch64][SVE] Fix BRKNS bug in optimizePTestInstr The BRKNS instruction is unlike the other instructions that set flags since it has an all active implicit predicate, so the existing PTEST(PG, BRKN(PG, A, B)) -> BRKNS(PG, A, B) in AArch64InstrInfo::optimizePTestInstr is incorrect, however PTEST(PTRUE_B(31), BRKN(PG, A, B)) -> BRKNS(PG, A, B) is correct. Spotted by @paulwalker-arm in D134946. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D135655	2022-10-12 08:34:41 +00:00
zhongyunde	75358f060c	[AArch64] Lower multiplication by a constant int to madd Lower a = b * C -1 into madd a) instcombine change b * C -1 --> b * C + (-1) b) machine-combine change b * C + (-1) --> madd Assembler will transform the neg immedate of sub to add, see https://gcc.godbolt.org/z/cTcxePPf4 Fixes AArch64 part of https://github.com/llvm/llvm-project/issues/57255. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134336	2022-10-07 19:33:47 +08:00
Martin Storsjö	2f7fbf8376	[AArch64] Add missing SEH_Nop when aligning the stack This makes sure that the instructions of the prologue matches the SEH opcodes. Also remove a couple redundant cases of setting HasWinCFI; it was already set unconditionally after the conditional cases. Differential Revision: https://reviews.llvm.org/D135101	2022-10-05 11:00:36 +03:00
David Green	908b3b6ccb	[AArch64] Use fast-math-flags in isAssociativeAndCommutative Previously only using the UnsafeFPMath option, this now looks for the fast moth flags on the instructions, using the same flag flags as other backends.	2022-09-19 11:34:00 +01:00
Sander de Smalen	b00c36c295	[AArch64][SME] Implement ABI for calls to/from streaming functions. This patch implements the ABI for calls from: Normal -> Streaming Normal -> Streaming-compatible Streaming -> Normal Streaming -> Streaming-compatible Streaming -> Streaming The compiler inserts SMSTART/SMSTOP instructions before and after the call, depending on the required transition. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131576	2022-09-16 14:07:47 +00:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Kazu Hirata	a2d4501718	[llvm] Fix comment typos (NFC)	2022-08-07 00:16:14 -07:00
Guozhi Wei	ddc9e8861c	[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance to evaluate which instruction sequence has lower latency. Differential Revision: https://reviews.llvm.org/D124564	2022-06-28 21:42:51 +00:00
Serguei Katkov	163c77b2e0	[AARCH64 folding] Do not fold any copy with NZCV There is no instruction to fold NZCV, so, just do not do it. Without the fix the added test case crashes with an assert "Mismatched register size in non subreg COPY" Reviewed By: danilaml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127294	2022-06-21 10:38:49 +07:00
Kazu Hirata	129b531c9c	[llvm] Use value_or instead of getValueOr (NFC)	2022-06-18 23:07:11 -07:00
zhongyunde	c42a225545	[MachineScheduler] Order more stores by ascending address According D125377, we order STP Q's by ascending address. While on some targets, paired 128 bit loads and stores are slow, so the STP will split into STRQ and STUR, so I hope these stores will also be ordered. Also add subtarget feature ascend-store-address to control the aggressive order. Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D126700	2022-06-13 17:33:50 +08:00
Eli Friedman	0ff51d5dde	Fix interaction of CFI instructions with MachineOutliner. 1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway through. 2. Make sure copied CFI directives refer to the correct instruction. Fixes https://github.com/llvm/llvm-project/issues/55842 Differential Revision: https://reviews.llvm.org/D126930	2022-06-10 13:37:49 -07:00
Kazu Hirata	3b9707dbc0	[llvm] Convert for_each to range-based for loops (NFC)	2022-06-05 12:07:14 -07:00
Sander de Smalen	9c38fc111b	[AArch64] Remove references to Streaming SVE from target features. Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler predicates and instead reason about 'SVE' and 'SME' as its higher level features, rather than trying to model this runtime mode through explicit feature flags. This patch is largely NFC. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D125977	2022-05-31 16:25:01 +02:00
David Green	5cb14dc5a3	[AArch64] Look through copy in MachineCombiner FMUL patterns. This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISel, they may also be FMUL(COPY(DUP(..))), which this patch now ignores the no-op COPY in. Differential Revision: https://reviews.llvm.org/D126632	2022-05-31 09:28:00 +01:00
Daniel Kiss	de07cde67b	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780	2022-04-22 13:25:57 +02:00
Momchil Velikov	d0ea42a7c1	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330	2022-04-12 16:50:50 +01:00
Momchil Velikov	50a97aacac	[AArch64] Async unwind - function prologues Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05 This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-03-24 16:16:44 +00:00
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Hans Wennborg	85c53c7092	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.	2022-03-04 17:36:26 +01:00
Cullen Rhodes	e4fa8291a2	[AArch64] Allow copying of SVE registers in Streaming SVE Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118562	2022-03-03 09:51:14 +00:00
Momchil Velikov	63c9aca12a	Revert "[AArch64] Async unwind - function epilogues" This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84. It causes test failures that look like infinite loop in asan/hwasan unwinding.	2022-03-02 15:01:57 +00:00
Momchil Velikov	74319d6794	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330	2022-03-02 13:15:11 +00:00
Momchil Velikov	32e8b550e5	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-02-28 13:37:57 +00:00
Momchil Velikov	25e92920c9	[AArch64] Async unwind - helper functions to decide on CFI emission Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D112327	2022-02-24 18:16:50 +00:00
Momchil Velikov	fd7e59f0e7	[AArch64] Async unwind - do not schedule frame setup/destroy The PostRA scheduler can reorder non-CFI instructions in a way that makes the unwind info not instruction precise. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D112326	2022-02-24 17:24:04 +00:00
Jessica Paquette	68c718c8f4	Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"" This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781. This commit was not NFC. (See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)	2022-02-23 10:35:52 -08:00
Jessica Paquette	d97f997eb7	[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges" We found a case in the Swift benchmarks where the MachineOutliner introduces about a 20% compile time overhead in comparison to building without the MachineOutliner. The origin of this slowdown is that the benchmark has long blocks which incur lots of LRU checks for lots of candidates. Imagine a case like this: ``` bb: i1 i2 i3 ... i123456 ``` Now imagine that all of the outlining candidates appear early in the block, and that something like, say, NZCV is defined at the end of the block. The outliner has to check liveness for certain registers across all candidates, because outlining from areas where those registers are used is unsafe at call boundaries. This is fairly wasteful because in the previously-described case, the outlining candidates will never appear in an area where those registers are live. To avoid this, precalculate areas where we will consider outlining from. Anything outside of these areas is mapped to illegal and not included in the outlining search space. This allows us to reduce the size of the outliner's suffix tree as well, giving us a potential memory win. By precalculating areas, we can also optimize other checks too, like whether or not LR is live across an outlining candidate. Doing all of this is about a 16% compile time improvement on the case. This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now, this only implements the AArch64 path. The original "is the MBB safe" method still works as before.	2022-02-21 15:29:16 -08:00
Micah Weston	c69af70f02	[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt. Implements ADDS/SUBS 24-bit immediate optimization using the MIPeepholeOpt pass. This follows the pattern: Optimize ([adds\|subs] r, imm) -> ([ADDS\|SUBS] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([adds\|subs] r, imm) -> ([SUBS\|ADDS] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. The SplitAndOpcFunc type had to change the return type to an Opcode pair so that the first add/sub is the regular instruction and the second is the flag setting instruction. This required updating the code in the AND case. Testing: I ran a two stage bootstrap with this code. Using the second stage compiler, I verified that the negation of an ADDS to SUBS or vice versa is a valid optimization. Example V == -0x111111. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118663	2022-02-19 15:35:53 +00:00

1 2 3 4 5 ...

494 Commits