A 'CMOV 1, 0, CC, %cpsr, Cmp' is the same as a 'CSINC 0, 0, CC, Cmp',
and can be treated the same in IsCMPZCSINC added in D114013. This allows
us to remove the unnecessary CMOV in the same way that we could remove a
CSINC.
Differential Revision: https://reviews.llvm.org/D115188
Try to revert D113741 once again.
This also reverts 0ac75e82fff93a80ca401d3db3541e8d1d9098f9 (D114705)
as it causes LLDB's lldb-api.lang/cpp/nsimport.TestCppNsImport.py test
failure w/o D113741.
This reverts commit f9607d45f399e2afc39ec16222ea68b4e0831564.
Differential Revision: https://reviews.llvm.org/D116225
As reported from a failing firefox build, we can sometimes get frame
indices with negative offsets from a t2LDRi8. This adds support for
them, to prevent the crash.
This patch implements PAC return address signing for armv8-m. This patch roughly
accomplishes the following things:
- PAC and AUT instructions are generated.
- They're part of the stack frame setup, so that shrink-wrapping can move them
inwards to cover only part of a function
- The auth code generated by PAC is saved across subroutine calls so that AUT
can find it again to check
- PAC is emitted before stacking registers (so that the SP it signs is the one
on function entry).
- The new pseudo-register ra_auth_code is mentioned in the DWARF frame data
- With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT
validates the corresponding value of SP
- Emit correct unwind information when PAC is replaced by PACBTI
- Handle tail calls correctly
Some notes:
We make the assembler accept the `.save {ra_auth_code}` directive that is
emitted by the compiler when it saves a register that contains a
return address authentication code.
For EHABI we need to have the `FrameSetup` flag on the instruction and
handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit
`.save {ra_auth_code}`, instead of `.save {r12}`.
For PACBTI-M, the instruction which computes return address PAC should use SP
value before adjustment for the argument registers save are (used for variadic
functions and when a parameter is is split between stack and register), but at
the same it should be after the instruction that saves FPCXT when compiling a
CMSE entry function.
This patch moves the varargs SP adjustment after the FPCXT save (they are never
enabled at the same time), so in a following patch handling of the `PAC`
instruction can be placed between them.
Epilogue emission code adjusted in a similar manner.
PACBTI-M code generation should not emit any instructions for architectures
v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases
is handled separately by a future ticket.
note on tail calls:
If the called function has four arguments that occupy registers `r0`-`r3`, the
only option for holding the function pointer itself is `r12`, but this register
is used to keep the PAC during function/prologue epilogue and clobbers the
function pointer.
When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to
keep six values - the four function arguments, the function pointer and the PAC,
which is obviously impossible.
One option would be to authenticate the return address before all callee-saved
registers are restored, so we have a scratch register to temporarily keep the
value of `r12`. The issue with this approach is that it violates a fundamental
invariant that PAC is computed using CFA as a modifier. It would also mean using
separate instructions to pop `lr` and the rest of the callee-saved registers,
which would offset the advantages of doing a tail call.
Instead, this patch disables indirect tail calls when the called function take
four or more arguments and the return address sign and authentication is enabled
for the caller function, conservatively assuming the caller function would spill
LR.
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov
- Ties Stuij
Reviewed By: danielkiss
Differential Revision: https://reviews.llvm.org/D112429
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
local imported entities; related types can be created as a part of
the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).
Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.
The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
from this time IR can be significantly changed by target-specific passes.
If it happens for debug metadata of global entities, those changes will not
be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
but it isn't known in DwarfDebug::beginModule() (it's possible to make some
guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
(subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
(they need parent emitted first). Another problem is if to try to gather
some information about local entities and defer their emission
(till subprogram's processing or DwarfDebug::endModule()) all the gathered
details might be irrelevant / invalid by the time the entities are being
emitted (because of (1)).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114705
As an extension to D111976, this converts clamp fptosi, clamped between
0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with
conversions that naturally saturate, such as Arm.
X86 disables the transform as some of the test cases increases in size.
A fptoui.sat necessitates a fp clamp without native support, so there is
little use in converting if the instruction is just going to be
expanded.
Differential Revision: https://reviews.llvm.org/D112428
This reverts commits
* ee691970a9a85470948ada623c31f0ab8773617c (D113741),
* 79d3132998b2828be8f7d2ec411f91fb11b3e01f (D114705)
due to lldb and dexter test failures.
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
local imported entities; related types can be created as a part of
the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).
Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.
The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
from this time IR can be significantly changed by target-specific passes.
If it happens for debug metadata of global entities, those changes will not
be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
but it isn't known in DwarfDebug::beginModule() (it's possible to make some
guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
(subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
(they need parent emitted first). Another problem is if to try to gather
some information about local entities and defer their emission
(till subprogram's processing or DwarfDebug::endModule()) all the gathered
details might be irrelevant / invalid by the time the entities are being
emitted (because of (1)).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114705
Some instructions with i8 immediate ranges can only hold negative values
(like t2LDRHi8), only hold positive values (like t2STRT) or hold +/-
depending on the U bit (like the pre/post inc instructions. e.g
t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8,
AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear.
This allows us to get the offset ranges of t2LDRHi8 correct in the
load/store optimizer, fixing issues where we could end up creating
instructions with positive offsets (which may then be encoded as ldrht).
Differential Revision: https://reviews.llvm.org/D114638
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.
Recommitted from 0e98659ea1193c with a fix for APInt comparison.
Differential Revision: https://reviews.llvm.org/D114380
This reverts commit 6d41de380f223c8da02fd4d6a7f7dd1e7a404a24 as the
windows bots are not happy, in a way I do not understand. Revert whilst
we figure out what is wrong.
This patch implements a new MachineFunction in the ARM backend for
placing BTI instructions. It is similar to the existing AArch64
aarch64-branch-targets pass.
BTI instructions are inserted into basic blocks that:
- Have their address taken
- Are the entry block of a function, if the function has external
linkage or has its address taken
- Are mentioned in jump tables
- Are exception/cleanup landing pads
Each BTI instructions is placed in the beginning of a BB after the
so-called meta instructions (e.g. exception handler labels).
Each outlining candidate and the outlined function need to be in agreement about
whether BTI placement is enabled or not. If branch target enforcement is
disabled for a function, the outliner should not covertly enable it by emitting
a call to an outlined function, which begins with BTI.
The cost mode of the outliner is adjusted to account for the extra BTI
instructions in the outlined function.
The ARM Constant Islands pass will maintain the count of the jump tables, which
reference a block. A `BTI` instruction is removed from a block only if the
reference count reaches zero.
PAC instructions in entry blocks are replaced with PACBTI instructions (tests
for this case will be added in a later patch because the compiler currently does
not generate PAC instructions).
The ARM Constant Island pass is adjusted to handle BTI
instructions correctly.
Functions with static linkage that don't have their address taken can
still be called indirectly by linker-generated veneers and thus their
entry points need be marked with BTI or PACBTI.
The changes are tested using "LLVM IR -> assembly" tests, jump tables
also have a MIR test. Unfortunately it is not possible add MIR tests
for exception handling and computed gotos because of MIR parser
limitations.
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Mikhail Maltsev
- Momchil Velikov
- Ties Stuij
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D112426
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.
Differential Revision: https://reviews.llvm.org/D114380
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.
A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.
This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.
Differential Revision: https://reviews.llvm.org/D111976
It causes builds to fail with this assert:
llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.
See comment on the code review.
> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976
This reverts commit 52ff3b009388f1bef4854f1b6470b4ec19d10b0e.
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.
A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.
This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.
Differential Revision: https://reviews.llvm.org/D111976
We can't use the existing pseudo ARM::tLDRLIT_ga_pcrel for loading the
stack guard for PIC code that references the GOT, since arm-pseudo may
expand this to the narrow tLDRpci rather than the wider t2LDRpci.
Create a new pseudo, t2LDRLIT_ga_pcrel, and expand it to t2LDRpci.
Fixes: https://bugs.chromium.org/p/chromium/issues/detail?id=1270361
Reviewed By: ardb
Differential Revision: https://reviews.llvm.org/D114762
Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example
%reg101 = ...
= ... reg101
%reg103 = ADD %reg101, %reg102
We can create mapping between %reg101 and %reg103.
Differential Revision: https://reviews.llvm.org/D113193
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in `2007-04-02-RegScavengerAssert.ll`.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D114276
Patch to fix some of the regressions in D77804.
By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443.
Differential Revision: https://reviews.llvm.org/D113192
Implement support for loading the stack canary from a memory location held in
the TLS register, with an optional offset applied. This is used by the Linux
kernel to implement per-task stack canaries, which is impossible on SMP systems
when using a global variable for the stack canary.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112768
Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and
other targets rely on the generic support which may result in spilling of the
stack canary value or address, or may cause it to be kept in a callee save
register across function calls, which means they essentially get spilled as
well, only by the callee when it wants to free up this register.
So let's implement LOAD_STACK GUARD for other targets as well. This ensures
that the load of the stack canary is rematerialized fully in the epilogue.
This code was split off from
D112768: [ARM] implement support for TLS register based stack protector
for which it is a prerequisite.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112811
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.
Differential Revision: https://reviews.llvm.org/D113191
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.
Differential Revision: https://reviews.llvm.org/D113191
Currently when tail predicating loops, vpt blocks need to be created
with the vctp predicate in case we need to revert to non-tail predicated
form. This has the unfortunate side effect of severely hampering post-ra
scheduling at times as the instructions are already stuck in vpt blocks,
not allowed to be independently ordered.
This patch addresses that by just moving the creation of VPT blocks
later in the pipeline, after post-ra scheduling has been performed. This
allows more optimal scheduling post-ra before the vpt blocks are
created, leading to more optimal tail predicated loops.
Differential Revision: https://reviews.llvm.org/D113094
If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering.
Also use the optimized funnel shift lowering for rotates.
Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept
(Branched from D108058 as getting this completed should help unlock some other WIP patches).
Original Patch: @efriedma (Eli Friedman)
Differential Revision: https://reviews.llvm.org/D112443
This is relanding commit da1d1a08694bbfe0ea7a23ea094612436e8a2dd0 .
This patch additionally addresses failures found in buildbots & post review comments.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.
Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions
Reviewed By: logan
Differential Revision: https://reviews.llvm.org/D111703
This is relanding commit da1d1a08694bbfe0ea7a23ea094612436e8a2dd0 .
This patch additionally addresses failures found in buildbots & post review comments.
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.
Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions
Reviewed By: logan
Differential Revision: https://reviews.llvm.org/D111703
In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register.
Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead.
This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,.
Reviewed By: nickdesaulniers, peter.smith
Differential Revision: https://reviews.llvm.org/D112600
ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.
Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions
Reviewed By: logan
Differential Revision: https://reviews.llvm.org/D111703