494 Commits

Author SHA1 Message Date
Anton Sidorenko
f8ed709345 [MachineCombiner] Extend reassociation logic to handle inverse instructions
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.

Differential Revision: https://reviews.llvm.org/D136754
2022-12-07 13:50:28 +03:00
Kazu Hirata
3c09ed006a [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:12:44 -08:00
Fangrui Song
b0df70403d [Target] llvm::Optional => std::optional
The updated functions are mostly internal with a few exceptions (virtual functions in
TargetInstrInfo.h, TargetRegisterInfo.h).
To minimize changes to LLVMCodeGen, GlobalISel files are skipped.

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 22:43:14 +00:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
David Green
6a353c7756 [AArch64] Add GPR rr instructions to isAssociativeAndCommutative
This adds some more scalar instructions that are both associative and
commutative to isAssociativeAndCommutative, allowing the machine
combiner to reassociate them to reduce critical path length.

Differential Revision: https://reviews.llvm.org/D134260
2022-11-27 12:53:10 +00:00
Benjamin Maxwell
5eec8dfc2b [AArch64] Add hasSVEorSME() helper and fix some incorrect checks
This adds a little hasSVEorSME() helper, and as a NFC updates existing
code to use it. The assertions get[Min|Max]SVEVectorSizeInBits() are
also now corrected to use hasSVEorSME() rather than just hasSVE().

Differential Revision: https://reviews.llvm.org/D138575
2022-11-24 17:54:37 +00:00
Vitaly Buka
8f104b806a Revert "[AArch64] Add GPR rr instructions to isAssociativeAndCommutative"
Breaks msan on aarch64.

This reverts commit 5f7f484ee54ebbf702ee4c5fe9852502dc237121.
2022-11-22 11:03:13 -08:00
Hassnaa Hamdi
d8306b8885 [AArch64][SME]: Use SVE mov instruction for FPR128 registers in streaming-compatible mode.
1- in streaming mode, use SVE OR/mov instruction instead of NEON OR,
   during copying phyReg -AArch64InstrInfo::copyPhysReg-.
2- add testing file:
   register-mov.ll

Differential Revision: https://reviews.llvm.org/D138211
2022-11-18 11:18:30 +00:00
Bradley Smith
ac82907a1c [AArch64][SVE] Ensure redundant PTEST are removed with an 'invalid' PTRUE
When a PTRUE of non-element size is encountered, the PTEST optimization
logic bails out since it cannot handle that type of PTRUE. Instead, it
should be treated as a generic predicate to allow later optimizations trigger.

Differential Revision: https://reviews.llvm.org/D138116
2022-11-17 15:42:17 +00:00
David Green
71609871dd [AArch64][MachineCombiner] Use MIMetadata to copy pcsections metadata to reassociated instructions.
D134260/D138107 exposed that the MachineCombiner was not copying
pcsections metadata where it should. This patch switches the MIBuild
methods to use MIMetadata that can copy the debug loc and pcsections at
the same time.

Differential Revision: https://reviews.llvm.org/D138112
2022-11-16 13:22:48 +00:00
David Green
5f7f484ee5 [AArch64] Add GPR rr instructions to isAssociativeAndCommutative
This adds some more scalar instructions that are both associative and
commutative to isAssociativeAndCommutative, allowing the machine
combiner to reassociate them to reduce critical path length.

Differential Revision: https://reviews.llvm.org/D134260
2022-11-16 12:39:13 +00:00
Bradley Smith
2fb3e3c46d [AArch64][SVE] Add PTEST_ANY pseudo instruction
This allow recognition of when a ptest was emitted as an any condition
and allows for extra optimization to be done later.

This addresses missing optimizations from D137716 and D137718, and
partially D137717.

Depends on D137716, D137717, D137718

Differential Revision: https://reviews.llvm.org/D137930
2022-11-15 15:46:28 +00:00
Cullen Rhodes
8699efba6d [AArch64][SVE] Fix bad PTEST(PTRUE_ALL, PTEST_LIKE) optimization
AArch64InstrInfo::optimizePTestInstr attempts to remove a PTEST of a
predicate generating operation that identically sets flags (implictly).

When the mask is an all active of matching element size the PTEST is
currently removed. For while instructions this is correct since they
perform an implicit PTEST with an all active mask. However, for other
instructions such as compares the mask could be different.

This patch fixes this bug by only removing the PTEST if the same all
active mask is used by the predicating-generating instruction.

Reviewed By: bsmith

Differential Revision: https://reviews.llvm.org/D137718
2022-11-15 12:43:21 +00:00
Cullen Rhodes
a290668ec5 [AArch64][SVE] Fix bad PTEST(X, X) optimization
AArch64InstrInfo::optimizePTestInstr attempts to remove a PTEST of a
predicate generating operation that identically sets flags (implictly).

When the mask is the same as the input predicate the PTEST is currently
removed. This is incorrect since the mask for the implicit PTEST
performed by the flag-setting instruction differs from the mask
specified to the explicit PTEST and could set different flags.

For example, consider

  PG=<1, 1, x, x>
  Z0=<1, 2, x, x>
  Z1=<2, 1, x, x>

  X=CMPLE(PG, Z0, Z1)
   =<0, 1, x, x>       NZCV=0xxx
  PTEST(X, X),         NZCV=1xxx

where the first active flag (bit 'N' in NZCV) is set by the explicit
PTEST, but not by the implicit PTEST as part of the compare. Given the
PTEST mask and source are the same however, first is equivalent to any,
so the PTEST could be removed if the condition is changed. The same
applies to last active. It is safe to remove the PTEST for any active,
but this information isn't available in the current optimization.

This patch fixes the bad optimization, a later patch will implement the
optimization proposed above and fix the any active case.

Reviewed By: bsmith

Differential Revision: https://reviews.llvm.org/D137717
2022-11-15 11:59:07 +00:00
Cullen Rhodes
ce3e7eb968 [AArch64][SVE] Fix bad PTEST(PG, OP(PG, ...)) optimization
AArch64InstrInfo::optimizePTestInstr attempts to remove a PTEST of a
predicate generating operation that identically sets flags (implictly).

When the PTEST and the predicate-generating operation use the same mask
the PTEST is currently removed. This is incorrect since it doesn't
consider element size. PTEST operates on 8-bit predicates, but for
instructions like compare that also support 16/32/64-bit predicates, the
implicit PTEST performed by the instruction will consider fewer lanes
for these element sizes and could set different first or last active
flags.

For example, consider the following instruction sequence

  ptrue p0.b			; P0=1111-1111-1111-1111
  index z0.s, #0, #1		; Z0=<0,1,2,3>
  index z1.s, #1, #1		; Z1=<1,2,3,4>
  cmphi p1.s, p0/z, z1.s, z0.s  ; P1=0001-0001-0001-0001
				;       ^ last active
  ptest p0, p1.b		; P1=0001-0001-0001-0001
				;     ^ last active

where the compare generates a canonical all active 32-bit predicate (equivalent
to 'ptrue p1.s, all'). The implicit PTEST sets the last active flag, whereas
the PTEST instruction with the same mask doesn't.

This patch restricts the optimization to instructions operating on 8-bit
predicates. One caveat is the optimization is safe regardless of element
size for any active, this will be addressed in a later patch.

Reviewed By: bsmith

Differential Revision: https://reviews.llvm.org/D137716
2022-11-15 10:34:23 +00:00
Cullen Rhodes
1e02a29e47 [AArch64][SVE] Use more flag-setting instructions
If OP in PTEST(PG, OP(PG, ...)) has a flag-setting variant change the
opcode so the PTEST becomes redundant. This patch extends this existing
optimization in AArch64::optimizePTestInstr to cover all flag-setting
opcodes.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D136083
2022-10-25 09:02:21 +00:00
Eli Friedman
a6ac968360 [Arm64EC] Refer to dllimport'ed functions correctly.
Arm64EC has two different ways to refer to dllimport'ed functions in an
object file. One is using the usual __imp_ prefix, the other is using an
Arm64EC-specific prefix __imp_aux_. As far as I can tell, if a function
is in an x64 DLL, __imp_aux_ refers to the actual x64 address, while
__imp_ points to some linker-generated code that calls the exit thunk.
So __imp_aux_ is used to refer to the address in non-call contexts,
while __imp_ is used for calls to avoid the indirect call checker.

There's one twist to this, though: if an object refers to a symbol using
the __imp_aux_ prefix, the object file's symbol table must also contain
the symbol with the usual __imp_ prefix. The symbol doesn't actually
have to be used anywhere, it just has to exist; otherwise, the linker's
symbol lookup in x64 import libraries doesn't work correctly. Currently,
this is handled by emitting a .globl __imp_foo directive; we could try
to design some better way to handle this.

One minor quirk I haven't figured out: apparently, in Arm64EC mode, MSVC
prefers to use a linker-synthesized stub to call dllimport'ed functions,
instead of branching directly. The linker stub appears to do the same
thing that inline code would do, so not sure if it's just a code-size
optimization, or if the synthesized stub can actually do something other
than just load from the import table in some circumstances.

Differential Revision: https://reviews.llvm.org/D136202
2022-10-20 15:08:56 -07:00
Sander de Smalen
02df03c5b7 [AArch64][SME] Add support for arm_locally_streaming functions.
Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start
of the function, and a SMSTOP at the end of the function, such that all
operations use the right value for vscale.

Because the placement of these nodes is critically important (i.e. no
vscale-dependent operations should be done before SMSTART has been issued),
we require glueing the CopyFromReg to the Entry node such that we can
insert the SMSTART as part of that glued chain.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131582
2022-10-14 13:47:53 +00:00
Anton Sidorenko
4431e705cc [NFC] Use forward decl of MachineCombinerPattern enum to reduce dependencies
Differential Revision: https://reviews.llvm.org/D135776
2022-10-13 14:56:14 +01:00
Martin Storsjö
bd3fa31887 [AArch64] Generate SEH info for PAC instructions
Without this, unwinding through functions that does use PAC
would fail, if PAC actually was active.

Differential Revision: https://reviews.llvm.org/D135103
2022-10-12 22:21:03 +03:00
Martin Storsjö
a07787c9a5 [AArch64] Exclude instructions after setting the FP from SEH prologues
After setting up the FP, the rest of the prologue doesn't need to
be replayed for unwinding the stack frame.

This allows reverting the functional parts of
2f7fbf837625267193351cc334e506a3a9161958 (but fixing inconsistent
duplicate setting of HasWinCFI).

Differential Revision: https://reviews.llvm.org/D135686
2022-10-12 12:36:21 +03:00
Cullen Rhodes
a17fcb2230 [AArch64][SVE] Fix BRKNS bug in optimizePTestInstr
The BRKNS instruction is unlike the other instructions that set flags
since it has an all active implicit predicate, so the existing

  PTEST(PG, BRKN(PG, A, B)) -> BRKNS(PG, A, B)

in AArch64InstrInfo::optimizePTestInstr is incorrect, however

  PTEST(PTRUE_B(31), BRKN(PG, A, B)) -> BRKNS(PG, A, B)

is correct.

Spotted by @paulwalker-arm in D134946.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D135655
2022-10-12 08:34:41 +00:00
zhongyunde
75358f060c [AArch64] Lower multiplication by a constant int to madd
Lower a = b * C -1 into madd
  a) instcombine change b * C -1 --> b * C + (-1)
  b) machine-combine change b * C + (-1) --> madd

Assembler will transform the neg immedate of sub to add, see https://gcc.godbolt.org/z/cTcxePPf4
Fixes AArch64 part of https://github.com/llvm/llvm-project/issues/57255.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134336
2022-10-07 19:33:47 +08:00
Martin Storsjö
2f7fbf8376 [AArch64] Add missing SEH_Nop when aligning the stack
This makes sure that the instructions of the prologue matches the
SEH opcodes.

Also remove a couple redundant cases of setting HasWinCFI; it was
already set unconditionally after the conditional cases.

Differential Revision: https://reviews.llvm.org/D135101
2022-10-05 11:00:36 +03:00
David Green
908b3b6ccb [AArch64] Use fast-math-flags in isAssociativeAndCommutative
Previously only using the UnsafeFPMath option, this now looks for the
fast moth flags on the instructions, using the same flag flags as other
backends.
2022-09-19 11:34:00 +01:00
Sander de Smalen
b00c36c295 [AArch64][SME] Implement ABI for calls to/from streaming functions.
This patch implements the ABI for calls from:

  Normal -> Streaming
  Normal -> Streaming-compatible
  Streaming -> Normal
  Streaming -> Streaming-compatible
  Streaming -> Streaming

The compiler inserts SMSTART/SMSTOP instructions before and after the call,
depending on the required transition.

More details about the SME attributes and design can be found
in D131562.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D131576
2022-09-16 14:07:47 +00:00
Fangrui Song
de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Kazu Hirata
a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Guozhi Wei
ddc9e8861c [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency
Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance
to evaluate which instruction sequence has lower latency.

Differential Revision: https://reviews.llvm.org/D124564
2022-06-28 21:42:51 +00:00
Serguei Katkov
163c77b2e0 [AARCH64 folding] Do not fold any copy with NZCV
There is no instruction to fold NZCV, so, just do not do it.

Without the fix the added test case crashes with an assert
"Mismatched register size in non subreg COPY"

Reviewed By: danilaml
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D127294
2022-06-21 10:38:49 +07:00
Kazu Hirata
129b531c9c [llvm] Use value_or instead of getValueOr (NFC) 2022-06-18 23:07:11 -07:00
zhongyunde
c42a225545 [MachineScheduler] Order more stores by ascending address
According D125377, we order STP Q's by ascending address. While on some
targets, paired 128 bit loads and stores are slow, so the STP will split
into STRQ and STUR, so I hope these stores will also be ordered.
Also add subtarget feature ascend-store-address to control the aggressive order.

Reviewed By: dmgreen, fhahn

Differential Revision: https://reviews.llvm.org/D126700
2022-06-13 17:33:50 +08:00
Eli Friedman
0ff51d5dde Fix interaction of CFI instructions with MachineOutliner.
1. When checking if a candidate contains a CFI instruction, actually
iterate over all of the instructions, instead of stopping halfway
through.
2. Make sure copied CFI directives refer to the correct instruction.

Fixes https://github.com/llvm/llvm-project/issues/55842

Differential Revision: https://reviews.llvm.org/D126930
2022-06-10 13:37:49 -07:00
Kazu Hirata
3b9707dbc0 [llvm] Convert for_each to range-based for loops (NFC) 2022-06-05 12:07:14 -07:00
Sander de Smalen
9c38fc111b [AArch64] Remove references to Streaming SVE from target features.
Following discussion on D120261 and D121208 it seems better to remove the
concept of Streaming SVE from the subtarget/assembler predicates and
instead reason about 'SVE' and 'SME' as its higher level features, rather
than trying to model this runtime mode through explicit feature flags.

This patch is largely NFC.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D125977
2022-05-31 16:25:01 +02:00
David Green
5cb14dc5a3 [AArch64] Look through copy in MachineCombiner FMUL patterns.
This is a small addition to D99662, which added machine combiner
patterns for FMUL(DUP(..)). Due to the way these are generated from
ISel, they may also be FMUL(COPY(DUP(..))), which this patch now
ignores the no-op COPY in.

Differential Revision: https://reviews.llvm.org/D126632
2022-05-31 09:28:00 +01:00
Daniel Kiss
de07cde67b [AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D111780
2022-04-22 13:25:57 +02:00
Momchil Velikov
d0ea42a7c1 [AArch64] Async unwind - function epilogues
Reviewed By: MaskRay, chill

Differential Revision: https://reviews.llvm.org/D112330
2022-04-12 16:50:50 +01:00
Momchil Velikov
50a97aacac [AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-03-24 16:16:44 +00:00
Shengchen Kan
37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
Hans Wennborg
85c53c7092 Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:

  (StackSize == 0 && "We already have the CFA offset!"),
  function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.

when targeting iOS. See comment on the code review for reproducer.

> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411

This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.
2022-03-04 17:36:26 +01:00
Cullen Rhodes
e4fa8291a2 [AArch64] Allow copying of SVE registers in Streaming SVE
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D118562
2022-03-03 09:51:14 +00:00
Momchil Velikov
63c9aca12a Revert "[AArch64] Async unwind - function epilogues"
This reverts commit 74319d67943a4fbef36e81f54273549ce4962f84.

It causes test failures that look like infinite loop in asan/hwasan
unwinding.
2022-03-02 15:01:57 +00:00
Momchil Velikov
74319d6794 [AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the
unwind information instruction precise in function epilogues.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112330
2022-03-02 13:15:11 +00:00
Momchil Velikov
32e8b550e5 [AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-02-28 13:37:57 +00:00
Momchil Velikov
25e92920c9 [AArch64] Async unwind - helper functions to decide on CFI emission
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D112327
2022-02-24 18:16:50 +00:00
Momchil Velikov
fd7e59f0e7 [AArch64] Async unwind - do not schedule frame setup/destroy
The PostRA scheduler can reorder non-CFI instructions in a way that
makes the unwind info not instruction precise.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D112326
2022-02-24 17:24:04 +00:00
Jessica Paquette
68c718c8f4 Revert "[MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges""
This reverts commit d97f997eb79d91b2872ac13619f49cb3a7120781.

This commit was not NFC.

(See: https://reviews.llvm.org/rGd97f997eb79d91b2872ac13619f49cb3a7120781)
2022-02-23 10:35:52 -08:00
Jessica Paquette
d97f997eb7 [MachineOutliner][AArch64] NFC: Split MBBs into "outlinable ranges"
We found a case in the Swift benchmarks where the MachineOutliner introduces
about a 20% compile time overhead in comparison to building without the
MachineOutliner.

The origin of this slowdown is that the benchmark has long blocks which incur
lots of LRU checks for lots of candidates.

Imagine a case like this:

```
bb:
  i1
  i2
  i3
  ...
  i123456
```

Now imagine that all of the outlining candidates appear early in the block, and
that something like, say, NZCV is defined at the end of the block.

The outliner has to check liveness for certain registers across all candidates,
because outlining from areas where those registers are used is unsafe at call
boundaries.

This is fairly wasteful because in the previously-described case, the outlining
candidates will never appear in an area where those registers are live.

To avoid this, precalculate areas where we will consider outlining from.
Anything outside of these areas is mapped to illegal and not included in the
outlining search space. This allows us to reduce the size of the outliner's
suffix tree as well, giving us a potential memory win.

By precalculating areas, we can also optimize other checks too, like whether
or not LR is live across an outlining candidate.

Doing all of this is about a 16% compile time improvement on the case.

This is likely useful for other targets (e.g. ARM + RISCV) as well, but for now,
this only implements the AArch64 path. The original "is the MBB safe" method
still works as before.
2022-02-21 15:29:16 -08:00
Micah Weston
c69af70f02 [AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.
Implements ADDS/SUBS 24-bit immediate optimization using the
MIPeepholeOpt pass. This follows the pattern:

Optimize ([adds|subs] r, imm) -> ([ADDS|SUBS] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([adds|subs] r, imm) -> ([SUBS|ADDS] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

The SplitAndOpcFunc type had to change the return type to an Opcode pair so that
the first add/sub is the regular instruction and the second is the flag setting
instruction. This required updating the code in the AND case.

Testing:

I ran a two stage bootstrap with this code.
Using the second stage compiler, I verified that the negation of an ADDS to SUBS
or vice versa is a valid optimization. Example V == -0x111111.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118663
2022-02-19 15:35:53 +00:00