25 Commits

Author SHA1 Message Date
Nikita Popov
5ddce70ef0 [AArch64] Convert some tests to opaque pointers (NFC) 2022-12-19 12:36:19 +01:00
David Green
6a353c7756 [AArch64] Add GPR rr instructions to isAssociativeAndCommutative
This adds some more scalar instructions that are both associative and
commutative to isAssociativeAndCommutative, allowing the machine
combiner to reassociate them to reduce critical path length.

Differential Revision: https://reviews.llvm.org/D134260
2022-11-27 12:53:10 +00:00
Vitaly Buka
8f104b806a Revert "[AArch64] Add GPR rr instructions to isAssociativeAndCommutative"
Breaks msan on aarch64.

This reverts commit 5f7f484ee54ebbf702ee4c5fe9852502dc237121.
2022-11-22 11:03:13 -08:00
David Green
5f7f484ee5 [AArch64] Add GPR rr instructions to isAssociativeAndCommutative
This adds some more scalar instructions that are both associative and
commutative to isAssociativeAndCommutative, allowing the machine
combiner to reassociate them to reduce critical path length.

Differential Revision: https://reviews.llvm.org/D134260
2022-11-16 12:39:13 +00:00
zhongyunde
7a81782585 [AArch64][CodeGen]Fold the mov and lsl into ubfiz
Fix the issue exposed by D132322, depand on D132939
Reviewed By: efriedma, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D132325
2022-09-09 23:50:29 +08:00
Momchil Velikov
50a97aacac [AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-03-24 16:16:44 +00:00
Hans Wennborg
85c53c7092 Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:

  (StackSize == 0 && "We already have the CFA offset!"),
  function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.

when targeting iOS. See comment on the code review for reproducer.

> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411

This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.
2022-03-04 17:36:26 +01:00
Momchil Velikov
32e8b550e5 [AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-02-28 13:37:57 +00:00
Jason Molenda
0d8cd4e2d5 [AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted
Add a comment when there is a shifted value,
    add x9, x0, #291, lsl #12 ; =1191936
but not when the immediate value is unshifted,
    subs x9, x0, #256 ; =256
when the comment adds nothing additional to the reader.

Differential Revision: https://reviews.llvm.org/D107196
2021-08-03 02:28:46 -07:00
serge-sans-paille
4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee04c75b1f1332b4fd1ac4e8ef6c3c247.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille
bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Sander de Smalen
f28e1128d9 Relanding r368987 [AArch64] Change location of frame-record within callee-save area.
Changes:
There was a condition for `!NeedsFrameRecord` missing in the assert. The
assert in question has changed to:

+    assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
+            RPI.Reg1 == AArch64::LR) &&
+           "FrameRecord must be allocated together with LR");

This addresses PR43016.

llvm-svn: 369122
2019-08-16 15:42:28 +00:00
Nico Weber
ee96499a42 Revert r368987, it caused PR43016.
llvm-svn: 369080
2019-08-16 02:21:21 +00:00
Sander de Smalen
643adb5576 [AArch64] Change location of frame-record within callee-save area.
This patch changes the location of the frame-record (FP, LR) to the 
bottom of the callee-saved area. According to the AAPCS the location of
the frame-record within the stackframe is unspecified (section 5.2.3 The 
Frame Pointer), so the compiler should be free to choose a different
location.

The reason for changing the location of the frame-record is to prepare
the frame for allocating an SVE area below the callee-saves. This way the 
compiler can use the VL-scaled addressing modes to directly access SVE 
objects from the frame-pointer.

            :                :   
        | stack |        | stack |
        |  args |        |  args |
        +-------+        +-------+
        |  x30  |        |  x19  |
        |  x29  |        |  x20  |
  FP -> |- - - -|        |  x21  |
        |  x19  |   ==>  |  x22  |
        |  x20  |        |- - - -|
        |  x21  |        |  x30  |
        |  x22  |        |  x29  |
        +-------+        +-------+ <- FP
        |///////|        |///////|         // realignment gap 
        |- - - -|        |- - - -|
        |spills/|        |spills/|
        | locals|        | locals|
  SP -> +-------+        +-------+ <- SP

Things to point out:
- The algorithm to find a paired register should be prevented from
  accidentally pairing some callee-saved register with LR that is not 
  FP, since they should always be paired together when the frame
  has a frame-record.
- For Darwin platforms the location of the frame-record is unchanged,
  since the unwind encoding does not allow for encoding this position
  dynamically and other tools currently depend on the former layout. 

Reviewers: efriedma, rovka, rengolin, thegameg, greened, t.p.northover

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D65653

llvm-svn: 368987
2019-08-15 10:34:16 +00:00
Francis Visoiu Mistrih
b7cef81fd3 Replace "no-frame-pointer-*" function attributes with "frame-pointer"
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"

Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.

"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".

tests are mostly updated with

// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"

// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"

Patch by Yuanfang Chen (tabloid.adroit)!

Differential Revision: https://reviews.llvm.org/D56351

llvm-svn: 351049
2019-01-14 10:55:55 +00:00
Geoff Berry
66f6b65fed [PEI, AArch64] Use empty spaces in stack area for local stack slot allocation.
Summary:
If the target requests it, use emptry spaces in the fixed and
callee-save stack area to allocate local stack objects.

AArch64: Change last callee-save reg stack object alignment instead of
size to leave a gap to take advantage of above change.

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: rengolin, mcrosier, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D20220

llvm-svn: 271527
2016-06-02 16:22:07 +00:00
Geoff Berry
a5335647d5 [AArch64] Combine callee-save and local stack SP adjustment instructions.
Summary:
If a function needs to allocate both callee-save stack memory and local
stack memory, we currently decrement/increment the SP in two steps:
first for the callee-save area, and then for the local stack area.  This
changes the code to allocate them both at once at the very beginning/end
of the function.  This has two benefits:

1) there is one fewer sub/add micro-op in the prologue/epilogue

2) the stack adjustment instructions act as a scheduling barrier, so
moving them to the very beginning/end of the function increases post-RA
scheduler's ability to move instructions (that only depend on argument
registers) before any of the callee-save stores

This change can cause an increase in instructions if the original local
stack SP decrement could be folded into the first store to the stack.
This occurs when the first local stack store is to stack offset 0.  In
this case we are trading off one more sub instruction for one fewer sub
micro-op (along with benefits (2) and (3) above).

Reviewers: t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18619

llvm-svn: 268746
2016-05-06 16:34:59 +00:00
Geoff Berry
62c1a1e7c7 [AArch64] Enable non-leaf frame pointer elimination.
Summary:
This change enables frame pointer elimination in non-leaf functions.
The -fomit-frame-pointer option still needs to be used when compiling
via clang (or an equivalent method of not setting the
'no-frame-pointer-elim*' function attributes if generating llvm IR via
some other method) to take advantage of this optimization.

This change should be NFC when compiling via clang without
-fomit-frame-pointer.

Reviewers: t.p.northover

Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines

Differential Revision: http://reviews.llvm.org/D17730

llvm-svn: 262495
2016-03-02 17:58:31 +00:00
Geoff Berry
c25d3bd238 [AArch64] Reduce number of callee-save save/restores.
Summary:
Before this change, callee-save registers would be rounded up to even
pairs of GPRs and FPRs.  This change eliminates these extra padding
load/stores, though it does keep the stack allocation the same size
unless both the GPR and FPR sets have an odd size, in which case one
full pair stack slot (16 bytes) is saved.

This optimization cannot currently be done for MachO targets since they
rely on a fast-path .debug_frame equivalent that can only encode
callee-save registers as pairs.

Reviewers: t.p.northover, rengolin, mcrosier, jmolloy

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17000

llvm-svn: 260689
2016-02-12 16:31:41 +00:00
Chad Rosier
d016574df8 [AArch64] Enable PostRAScheduler for AArch64 generic build.
Disable post-ra scheduler for perturbed tests to appease the bots and to
preserve the history of the tests.

http://reviews.llvm.org/D15652

llvm-svn: 256158
2015-12-21 14:43:45 +00:00
Quentin Colombet
f6645cce91 [AArch64] Enable shrink-wrapping by default.
Differential Revision: http://reviews.llvm.org/D14360

rdar://problem/20820748

llvm-svn: 253520
2015-11-18 23:12:20 +00:00
Tim Northover
2a9d801fd5 AArch64: use 32-bit MOV rather than UBFX to truncate registers.
It's potentially more efficient on Cyclone, and from the optimization guides &
schedulers looks like it has no effect on Cortex-A53 or A57. In general you'd
expect a MOV to be about the most efficient instruction with its semantics,
even though the official "UXTW" alias is really a UBFX.

llvm-svn: 243576
2015-07-29 21:34:32 +00:00
Evgeniy Stepanov
00b3020453 Fix AArch64 prologue for empty frame with dynamic allocas.
Fixes PR23804: assertion failure in emitPrologue in the case of a
function with an empty frame and a dynamic alloca that needs stack
realignment. This is a typical case for AddressSanitizer.

llvm-svn: 241943
2015-07-10 21:24:07 +00:00
Fiona Glaser
b08ae7affb ComputeKnownBits: be a bit smarter about ADDs
If our two inputs have known top-zero bit counts M and N, we trivially
know that the output cannot have any bits set in the top (min(M, N)-1)
bits, since nothing could carry past that point.

llvm-svn: 241927
2015-07-10 18:29:02 +00:00
Kristof Beyls
17cb8982f4 [AArch64] Add support for dynamic stack alignment
Differential Revision: http://reviews.llvm.org/D8876

llvm-svn: 234471
2015-04-09 08:49:47 +00:00