445 Commits

Author SHA1 Message Date
Nikita Popov
e6bf3fa05b [Thumb] Convert tests to opaque pointers (NFC) 2022-12-19 13:05:05 +01:00
Roman Lebedev
6b3c2aed49
[NFC] Port codegen Thumb tests that invoke opt to -passes= syntax 2022-12-09 01:04:47 +03:00
Roman Lebedev
b1a9584818
[opt] Disincentivize new tests from using old pass syntax
Over the past day or so, i've took a large swing at our tests,
and reduced the number of tests that were still using the old syntax
from ~1800 to just 200.

Left to handle: (as it is seen in this patch)
* Transforms/LSR
* Transforms/CGP
* Transforms/TypePromotion
* Transforms/HardwareLoops
* Analysis/*
* some misc.

I think this is the right point to start actively refusing
to honor the old syntax, except for the old tests,
to prevent the old syntax from creeping back in.

Thus, let's add temporary default-off flag,
and if it is not passed refuse to accept old syntax.
The tests that still need porting are annotated with this flag.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D139647
2022-12-08 23:54:03 +03:00
Jonas Paulsson
5ecd363295 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
This reverts commit 122efef8ee9be57055d204d52c38700fe933c033.

- Patch fixed to not reuse definitions from predecessors in EH landing pads.
- Late review suggestions (by MaskRay) have been addressed.
- M68k/pipeline.ll test updated.
- Init captures added in processBlock() to avoid capturing structured bindings.
- RISCV has this disabled for now.

Original commit message:

A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-05 12:53:50 -06:00
Jonas Paulsson
122efef8ee Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions.""
This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3.

Some more bots got broken - need to investigate.
2022-12-05 00:52:00 +01:00
Jonas Paulsson
17db0de330 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
Init captures added in processBlock() to avoid capturing structured bindings,
which caused the build problems (with clang).

RISCV has this disabled for now until problems relating to post RA pseudo
expansions are resolved.
2022-12-03 14:15:15 -06:00
Jonas Paulsson
8ef4632681 Revert "[CodeGen] Add new pass for late cleanup of redundant definitions."
Temporarily revert and fix buildbot failure.

This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.
2022-12-01 13:29:24 -05:00
Jonas Paulsson
6d12599fd4 [CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-01 13:21:35 -05:00
Daniel Thornburgh
75cdab6dc2 [llvm-objdump] Add --no-print-imm-hex to tests depending on it.
This prepares for an upcoming change to make --print-imm-hex the default
behavior of llvm-objdump. These tests were updated in a semi-automatic
fashion.

See D136972 for details.
2022-10-29 15:40:26 -07:00
Simon Pilgrim
78739fdb4d [DAG] Enable combineShiftOfShiftedLogic folds after type legalization
This was disabled to prevent regressions, which appear to be just occurring on AMDGPU (at least in our current lit tests), which I've addressed by adding AMDGPUTargetLowering::isDesirableToCommuteWithShift overrides.

Fixes #57872

Differential Revision: https://reviews.llvm.org/D136042
2022-10-29 12:30:04 +01:00
Lucas Prates
70a5c52534 [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-27 14:08:48 +01:00
Krasimir Georgiev
8f2ba36336 Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records AND [NFC][Thumb] Update frame-chain codegen test to use thumbv6m"
This reverts commit 7625e01d661644a560884057755d48a0da8b77b4 and
dependent cbcce82ef6b512d97e92a319a75a03e997c844e1.

Commit 7625e01d661644a560884057755d48a0da8b77b4 causes some new codegen test
failures under asan, e.g., CodeGen/ARM/execute-only.ll:
https://lab.llvm.org/buildbot/#/builders/5/builds/24659/steps/15/logs/stdio.
2022-06-15 16:10:02 +02:00
Lucas Prates
cbcce82ef6 [NFC][Thumb] Update frame-chain codegen test to use thumbv6m
The `llvm/test/CodeGen/Thumb/frame-chain.ll`, recently added by D125094,
currently fails when expensive checks are enabled due to a tMOVr
instruction that is only valid from V6 onwards.

The use of the invalid instruction is unrelated to the contents of the
original patch, and continues to be triggered by this test if its
CodeGen changes are reverted, so this patch updates the test to use V6-M
while the issue is not resolved.
2022-06-14 14:36:57 +01:00
Lucas Prates
7625e01d66 [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-14 13:37:51 +01:00
Lucas Prates
33b9ad647e Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records"
Reverting change due to test failure.

This reverts commit 6119053dab67129eb1700dbf36db3524dd3e421f.
2022-06-13 11:00:49 +01:00
Lucas Prates
6119053dab [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-13 10:21:06 +01:00
Hendrik Greving
a92ed167f2 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-02 00:49:11 +00:00
Hendrik Greving
e9d05cc7d8 Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4."
This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99.

Due to failures in Clang tests.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 13:27:49 -07:00
Hendrik Greving
430ac5c302 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 12:48:01 -07:00
Craig Topper
46eef76876 [DAGCombiner] Fix bug in MatchBSwapHWordLow.
This function tries to match (a >> 8) | (a << 8) as (bswap a) >> 16.

If the SRL isn't masked and the high bits aren't demanded, we still
need to ensure that bits 23:16 are zero. After the right shift they
will be in bits 15:8 which is where the important bits from the SHL
end up. It's only a bswap if the OR on bits 15:8 only takes the bits
from the SHL.

Fixes PR55484.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D125641
2022-05-18 09:23:18 -07:00
Craig Topper
74f6ded49d [AArch64][ARM][RISCV][X86] Add test cases for PR55484. NFC
This bug is in generic DAG combine and easily reproducible on many
targets.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D125640
2022-05-16 09:28:11 -07:00
Zhiyao Ma
bd606afe26 [ARM] Only update the successor edges for immediate predecessors of PrologueMBB
When adjusting the function prologue for segmented stacks, only update
the successor edges of the immediate predecessors of the original
prologue.

Differential Revision: https://reviews.llvm.org/D122959
2022-05-03 12:36:35 +01:00
Zhiyao Ma
adc26b4eae [ARM] Fix 8-bit immediate overflow in the instruction of segmented stack prologue.
It fixes the overflow of 8-bit immediate field in the emitted
instruction that allocates large stacklet.

For thumb2 targets, load large immediate by a pair of movw and movt
instruction. For thumb1 and ARM targets, load large immediate by reading
from literal pool.

Differential Revision: https://reviews.llvm.org/D118545
2022-03-10 15:15:24 -08:00
Craig Topper
440c4b705a [SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y).
Previous we used sra (X, size(X)-1); xor (add (X, Y), Y).

By placing sub at the end, we allow RISCV to combine sign_extend_inreg
with it to form subw.

Some X86 tests for Z - abs(X) seem to have improved as well.

Other targets look to be a wash.

I had to modify ARM's abs matching code to match from sub instead of
xor. Maybe instead ISD::ABS should be made legal. I'll try that in
parallel to this patch.

This is an alternative to D119099 which was focused on RISCV only.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119171
2022-02-20 21:11:23 -08:00
Jay Foad
f510045d82 [CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.
Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].

Differential Revision: https://reviews.llvm.org/D117298
2022-02-18 16:10:56 +00:00
Ties Stuij
5cff77c23f [clang][ARM] PACBTI-M assembly support
Introduce assembly support for Armv8.1-M PACBTI extension. This is an optional
extension in v8.1-M.

There are 10 new system registers and 5 new instructions, all predicated on the
feature.

The attribute for llvm-mc is called "pacbti". For armclang, an architecture
extension also called "pacbti" was created.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112420
2021-11-30 09:28:18 +00:00
Guozhi Wei
f1d8345a2a [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB
Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example

    %reg101 = ...
            = ... reg101
    %reg103 = ADD %reg101, %reg102

We can create mapping between %reg101 and %reg103.

Differential Revision: https://reviews.llvm.org/D113193
2021-11-29 19:01:59 -08:00
Jay Foad
bdaa181007 [TwoAddressInstructionPass] Update existing physreg live intervals
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.

Differential Revision: https://reviews.llvm.org/D113191
2021-11-05 21:20:30 +00:00
Jay Foad
0321bd64e6 Revert "[TwoAddressInstructionPass] Update existing physreg live intervals"
This reverts commit ec0e1e88d24fadb2cb22f431d66b22ee1b01cd43.

It was pushed by mistake.
2021-11-05 09:54:26 +00:00
Jay Foad
ec0e1e88d2 [TwoAddressInstructionPass] Update existing physreg live intervals
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.

Differential Revision: https://reviews.llvm.org/D113191
2021-11-05 09:10:24 +00:00
Guozhi Wei
6599961c17 [TwoAddressInstructionPass] Improve the SrcRegMap and DstRegMap computation
This patch contains following enhancements to SrcRegMap and DstRegMap:

  1 In findOnlyInterestingUse not only check if the Reg is two address usage,
    but also check after commutation can it be two address usage.

  2 If a physical register is clobbered, remove SrcRegMap entries that are
    mapped to it.

  3 In processTiedPairs, when create a new COPY instruction, add a SrcRegMap
    entry only when the COPY instruction is coalescable. (The COPY src is
    killed)

With these enhancements isProfitableToCommute can do better commute decision,
and finally more register copies are removed.

Differential Revision: https://reviews.llvm.org/D108731
2021-10-11 15:28:31 -07:00
Stanislav Mekhanoshin
08d7eec06e Revert "Allow rematerialization of virtual reg uses"
Reverted due to two distcint performance regression reports.

This reverts commit 92c1fd19abb15bc68b1127a26137a69e033cdb39.
2021-09-24 10:26:11 -07:00
Ben Shi
63ca9371c7 [ARM] Implement target hook function to decide folding (mul (add x, c1), c2)
Prevent the folding in DAGCombine if it leads to worse code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D109124
2021-09-07 15:42:43 +08:00
Stanislav Mekhanoshin
92c1fd19ab Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-24 11:09:02 -07:00
Petr Hosek
2d4470ab89 Revert "Allow rematerialization of virtual reg uses"
This reverts commit 877572cc193a470f310eec46a7ce793a6cc97c2f which
introduced PR51516.
2021-08-18 00:12:41 -07:00
Stanislav Mekhanoshin
877572cc19 Allow rematerialization of virtual reg uses
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.

The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.

Differential Revision: https://reviews.llvm.org/D106408
2021-08-16 12:42:42 -07:00
David Green
c140ff493e [ARM] Change a couple of instances of LiveRegs.contains to !LiveRegs.available
This changes a couple of calls to LiveRegs.contains to
!LiveRegs.available, one in Thumb1FrameLoweringInfo (which modifies a
test to look more correct to me, given r7 should be the frame pointer so
is not available), and another in the ARMLoadStoreOptimizer, that I
don't have a test for, it was just found by inspection.

Differential Revision: https://reviews.llvm.org/D107454
2021-08-10 09:53:26 +01:00
David Green
85d6045b88 [ARM] Regenerate Thumb PR35481.ll test. NFC 2021-07-31 16:21:23 +01:00
Matt Arsenault
fae05692a3 CodeGen: Print/parse LLTs in MachineMemOperands
This will currently accept the old number of bytes syntax, and convert
it to a scalar. This should be removed in the near future (I think I
converted all of the tests already, but likely missed a few).

Not sure what the exact syntax and policy should be. We can continue
printing the number of bytes for non-generic instructions to avoid
test churn and only allow non-scalar types for generic instructions.

This will currently print the LLT in parentheses, but accept parsing
the existing integers and implicitly converting to scalar. The
parentheses are a bit ugly, but the parser logic seems unable to deal
without either parentheses or some keyword to indicate the start of a
type.
2021-06-30 16:54:13 -04:00
Lucas Prates
7749b19e9c [NFC] Adding test for clobbering of high registers in Thumb
Prior to the changes from D52010, clobbering Thumb's high registers in
inline asm would cause incorrect code to be generated - or an assertion
failure for debug builds. Now that the issue is no longer reproducible,
this patch adds a MIR test to cover that scenario.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96335
2021-06-28 13:44:44 +01:00
Eli Friedman
bf0d0671a1 [ARM] Make sure we don't transform unaligned store to stm on Thumb1.
This isn't likely to come up in practice; the combination of compiler
flags required to hit this issue should be rare. Found by inspection.
2021-06-21 14:32:42 -07:00
Koutheir Attouchi
789708617d Do not generate calls to the 128-bit function __multi3() on 32-bit ARM
Re-applying this patch after bots failures. Should be fine now.

The function __multi3() is undefined on 32-bit ARM, so a call to it should
never be emitted. Instead, plain instructions need to be generated to
perform 128-bit multiplications.

Differential Revision: https://reviews.llvm.org/D103906
2021-06-11 11:45:21 +01:00
serge-sans-paille
4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee04c75b1f1332b4fd1ac4e8ef6c3c247.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille
bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71406c75a4ea4d5a2fe84289f07ea3a1, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Simonas Kazlauskas
777a58e05b Support {S,U}REMEqFold before legalization
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.

Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D88785
2021-04-01 01:35:41 +03:00
David Green
dc206be77b [ARM] Regenerate some test checks. NFC 2021-03-24 15:34:34 +00:00
David Green
402f2cae7d [ARM] Use lrdsb for more thumb1 loads.
Given a sextload i16, we can usually generate "ldrsh [rn. rm]". If we
don't naturally have a rn, rm addressing mode, we can either generate
"ldrh [rn, #0]; sxth" or "mov rm, #0; ldrsh [rn. rm]".

We currently generate the first, always creating a sxth. They are both
the same number of instructions, but if we generate the second then the
mov #0 will likely be CSE'd or pulled out of a loop, etc.

This adjusts the ISel patterns to do that, creating a mov instead of a
sxth.

Differential Revision: https://reviews.llvm.org/D98693
2021-03-17 15:29:02 +00:00
Simonas Kazlauskas
a2eca31da2 Test cases for rem-seteq fold with illegal types
This also briefly tests a larger set of architectures than the more
exhaustive functionality tests for AArch64 and x86.

As requested in D88785

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98339
2021-03-12 16:28:04 +02:00
Roger Ferrer Ibanez
d4ce062340 [RISCV][PrologEpilogInserter] "Float" emergency spill slots to avoid making them immediately unreachable from the stack pointer
In RISC-V there is a single addressing mode of the form imm(reg) where
imm is a signed integer of 12-bit with a range of [-2048..2047] bytes
from reg.

The test MultiSource/UnitTests/C++11/frame_layout of the LLVM test-suite
exercises several scenarios with the stack, including function calls
where the stack will need to be realigned to to a local variable having
a large alignment of 4096 bytes.

In situations of large stacks, the RISC-V backend (in
RISCVFrameLowering) reserves an extra emergency spill slot which can be
used (if no free register is found) by the register scavenger after the
frame indexes have been eliminated. PrologEpilogInserter already takes
care of keeping the emergency spill slots as close as possible to the
stack pointer or frame pointer (depending on what the function will
use). However there is a final alignment step to honour the maximum
alignment of the stack that, when using the stack pointer to access the
emergency spill slots, has the side effect of setting them farther from
the stack pointer.

In the case of the frame_layout testcase, the net result is that we do
have an emergency spill slot but it is so far from the stack pointer
(more than 2048 bytes due to the extra alignment of a variable to 4096
bytes) that it becomes unreachable via any immediate offset.

During elimination of the frame index, many (regular) offsets of the
stack may be immediately unreachable already. Their address needs to be
computed using a register. A virtual register is created and later
RegisterScavenger should be able to find an unused (physical) register.
However if no register is available, RegisterScavenger will pick a
physical register and spill it onto an emergency stack slot, while we
compute the offset (restoring the chosen register after all this). This
assumes that the emergency stack slot is easily reachable (this is,
without requiring another register!).

This is the assumption we seem to break when we perform the extra
alignment in PrologEpilogInserter.

We can "float" the emergency spill slots by increasing (in absolute
value) their offsets from the incoming stack pointer. This way the
emergency spill slots will remain close to the stack pointer (once the
function has allocated storage for the stack, including the needed
realignment). The new size computed in PrologEpilogInserter is padding
so it should be OK to move the emergency spill slots there. Also because
we're increasing the alignment, the new location should stay aligned for
the purpose of the emergency spill slots.

Note that this change also impacts other backends as shown by the tests.
Changes are minor adjustments to the emergency stack slot offset.

Differential Revision: https://reviews.llvm.org/D89239
2021-01-23 09:10:03 +00:00
Matt Arsenault
1d1234b2a4 OpaquePtr: Update more tests to use typed sret 2020-11-20 20:08:43 -05:00