4493 Commits

Author SHA1 Message Date
Julian Lettner
22570bac69 Lower @llvm.global_dtors using __cxa_atexit on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121736
2022-03-17 10:47:13 -07:00
Arthur Eubanks
2371c5a0e0 [OpaquePtr][ARM] Use elementtype on ldrex/ldaex/stlex/strex
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.

Basically the same as D120527.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D121847
2022-03-16 14:11:53 -07:00
Simon Pilgrim
7262eacd41 Revert rG9c542a5a4e1ba36c24e48185712779df52b7f7a6 "Lower @llvm.global_dtors using __cxa_atexit on MachO"
Mane of the build bots are complaining: Unknown command line argument '-lower-global-dtors'
2022-03-15 13:01:35 +00:00
Julian Lettner
9c542a5a4e Lower @llvm.global_dtors using __cxa_atexit on MachO
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.

Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.

Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`).  This escape hatch will be
removed in the future.

Differential Revision: https://reviews.llvm.org/D121327
2022-03-14 17:51:18 -07:00
Zhiyao Ma
adc26b4eae [ARM] Fix 8-bit immediate overflow in the instruction of segmented stack prologue.
It fixes the overflow of 8-bit immediate field in the emitted
instruction that allocates large stacklet.

For thumb2 targets, load large immediate by a pair of movw and movt
instruction. For thumb1 and ARM targets, load large immediate by reading
from literal pool.

Differential Revision: https://reviews.llvm.org/D118545
2022-03-10 15:15:24 -08:00
Xiang1 Zhang
c31014322c TLS loads opimization (hoist)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120000
2022-03-10 09:29:06 +08:00
Sanjay Patel
341623653d [SDAG] match rotate pattern with extra 'or' operation
This is another fold generalized from D111530.
We can find a common source for a rotate operation hidden inside an 'or':
https://alive2.llvm.org/ce/z/9pV8hn

Deciding when this is profitable vs. a funnel-shift is tricky, but this
does not show any regressions: if a target has a rotate but it does not
have a funnel-shift, then try to form the rotate here. That is why we
don't have x86 test diffs for the scalar tests that are duplicated from
AArch64 ( 74a65e3834d9487 ) - shld/shrd are available. That also makes it
difficult to show vector diffs - the only case where I found a diff was
on x86 AVX512 or XOP with i64 elements.

There's an additional check for a legal type to avoid a problem seen
with x86-32 where we form a 64-bit rotate but then it gets split
inefficiently. We might avoid that by adding more rotate folds, but
I didn't check to see what is missing on that path.

This gets most of the motivating patterns for AArch64 / ARM that are in
D111530.

We still need a couple of enhancements to setcc pattern matching with
rotate/funnel-shift to get the rest.

Differential Revision: https://reviews.llvm.org/D120933
2022-03-09 13:19:00 -05:00
Craig Topper
8e132c5c1d [LegalizeTypes][ARM][X86] Change ExpandIntRes_ABS to use sra+xor+sub.
Previously we used sra+add+xor if ADDCARRY is supported. This changes
to sra+xor+sub is SUBCARRY is available.

This is consistent with the recent change to the default expansion
in LegalizeDAG.

Differential Revision: https://reviews.llvm.org/D121039
2022-03-07 11:28:32 -08:00
Xiang1 Zhang
65588a0776 Revert "TLS loads opimization (hoist)"
Revert for more reviews

This reverts commit 30e612ebdfb0f243eb63d93487790a53c26ae873.
2022-03-02 14:10:11 +08:00
Xiang1 Zhang
30e612ebdf TLS loads opimization (hoist)
Reviewed By: Wang Pheobe, Topper Craig

Differential Revision: https://reviews.llvm.org/D120000
2022-03-02 10:37:24 +08:00
Sanjay Patel
acb96ffd14 [SDAG] fold bitwise logic with shifted operands
LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z

https://alive2.llvm.org/ce/z/QmR9rR

This is a reassociation + factoring fold. The common shift operation is moved
after a bitwise logic op on 2 input operands.
We get simpler cases of these patterns in IR, but I suspect we would miss all
of these exact tests in IR too. We also handle the simpler form of this plus
several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().

This is a partial implementation of a transform suggested in D111530
(only handles 'or' bitwise logic as a first step - need to stamp out more
tests for other opcodes).
Several of the same tests added for D111530 are altered here (but not
fully optimized). I'm not sure yet if this would help/hinder that patch,
but this should be an improvement for all tests added with ecf606cb4329ae
since it removes a shift operation in those examples.

Differential Revision: https://reviews.llvm.org/D120516
2022-02-27 09:54:12 -05:00
David Green
a10789d6cd [ARM] Recognize SSAT and USAT from SMIN/SMAX
We have some recognition of SSAT and USAT from SELECT_CC at the moment.
This extends the matching to SMIN/SMAX which can help catch more cases,
either from min/max being the canonical form in instcombine or from some
expanded nodes like fp_to_si_sat.

Differential Revision: https://reviews.llvm.org/D119819
2022-02-23 08:55:54 +00:00
David Green
4d5b020d6e [ARM] Addition SSAT/USAT tests for min/max patterns. NFC 2022-02-21 16:24:58 +00:00
Craig Topper
a6fb1bb306 [ARM] Remove unused lowerABS function. NFC
This function was added in D49837, but no setOperationAction call
was added with it. The code is equivalent to what is done by the
default ExpandIntRes_ABS implementation when ADDCARRY is supported.
Test case added to verify this. There was some existing coverage
from Thumb2 MVE tests, but they started from vectors.
2022-02-20 22:43:23 -08:00
Jay Foad
f510045d82 [CodeGen] Remove unneeded regex escaping in FileCheck patterns. NFC.
Take advantage of D117117 to simplify all {{\[}} to [ and {{\]}} to ].

Differential Revision: https://reviews.llvm.org/D117298
2022-02-18 16:10:56 +00:00
Nikita Popov
ff040eca93 [FastISel] Reuse register for bitcast that does not change MVT
The current FastISel code reuses the register for a bitcast that
doesn't change the IR type, but uses a reg-to-reg copy if it
changes the IR type without changing the MVT. However, we can
simply reuse the register in that case as well.

In particular, this avoids unnecessary reg-to-reg copies for pointer
bitcasts. This was found while inspecting O0 codegen differences
between typed and opaque pointers.

Differential Revision: https://reviews.llvm.org/D119432
2022-02-14 09:13:17 +01:00
Nikita Popov
ff5a9c3c65 [CodeGen] Regenerate test checks (NFC) 2022-02-10 14:10:00 +01:00
Oliver Stannard
a76620143c [ARM] Patterns for vector conversion between half and float
These patterns were omitted because clang only allows converting between
these types using intrinsics, but other front-ends or optimisation
passes may want to use them.

Differential revision: https://reviews.llvm.org/D119354
2022-02-10 09:51:55 +00:00
Mark Murray
3d7662142d [ARM] Undeprecate complex IT blocks
AArch32/Armv8A  introduced the performance deprecation of certain patterns
of IT instructions.  After some debate internal to ARM, this is now being
reverted; i.e. no IT instruction patterns are performance deprecated
anymore, as the perfomance degredation is not significant enough.

This reverts the following:

"ARMv8-A deprecates some uses of the T32 IT instruction. All uses of
IT that apply to instructions other than a single subsequent 16-bit
instruction from a restricted set are deprecated, as are explicit
references to the PC within that single 16-bit instruction. This permits
the non-deprecated forms of IT and subsequent instructions to be treated
as a single 32-bit conditional instruction."

The deprecation no longer applies, but the behaviour may be controlled
by the -arm-restrict-it and -arm-no-restrict-it command-line options,
with the latter being the default. No warnings about complex IT blocks
will be generated.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118044
2022-02-07 15:47:53 +00:00
Roman Lebedev
ee4ba9f3a1
Revert "[SimplifyCFG] Start redesigning FoldTwoEntryPHINode()."
Unfortunately, it seems we really do need to take the long route;
start from the "merge" block, find (all the) "dispatch" blocks,
and deal with each "dispatch" block separately, instead of simply
starting from each "dispatch" block like it would logically make sense,
otherwise we run into a number of other missing folds around
`switch` formation, missing sinking/hoisting and phase ordering.

This reverts commit 85628ce75b3084dc0f185a320152baf85b59aba7.
This reverts commit c5fff9095342a792bf4b9a077fe3c3a83c4e566c.
This reverts commit 34a98e1046e3aa55e5f26ab20a15e96b4034d25a.
This reverts commit 1e353f092288309d74d380367aa50bbd383780ed.
2022-02-03 12:32:50 +03:00
Roman Lebedev
1e353f0922
[SimplifyCFG] Start redesigning FoldTwoEntryPHINode().
The current `FoldTwoEntryPHINode()` is not quite designed correctly.
It starts from the merge point, and then tries to detect
the 'divergence' point.

Because of that, it is limited to the simple two-predecessor case,
where the PHI completely goes away. but that is rather pessimistic,
and it doesn't make much sense from the costmodel side of things.

For example if there is some other unrelated predecessor of
the merge point,  we could split the merge point so that
the then/else blocks first branch to an empty block
and then to the merge point, and then we'd be able to speculate
the then/else code.

But if we'd instead simply start at the divergence point,
and look for the merge point, then we'll just natively support this case.

There's also the fact that `SpeculativelyExecuteBB()` already does
just that, but only if there is a single block to speculate,
and with a much more restrictive cost model.
But that also means we have code duplication.

Now, sadly, while this is as much NFCI as possible,
there is just no way to cleanly migrate to
the proper implementation. The results *are* going to be different
somewhat because of various phase ordering effects and SimplifyCFG
block iteration strategy.
2022-02-02 17:53:56 +03:00
David Green
82973edfb7 [ARM][AArch64] Introduce qrdmlah and qrdmlsh intrinsics
Since it's introduction, the qrdmlah has been represented as a qrdmulh
and a sadd_sat. This doesn't produce the same result for all input
values though. This patch fixes that by introducing a qrdmlah (and
qrdmlsh) intrinsic specifically for the vqrdmlah and sqrdmlah
instructions. The old test cases will now produce a qrdmulh and sqadd,
as expected.

Fixes #53120 and #50905 and #51761.

Differential Revision: https://reviews.llvm.org/D117592
2022-01-27 19:19:46 +00:00
David Green
1fec2154b2 [ARM][AArch64] Cleanup and autogenerate v8.1a vqdrmlah tests. NFC 2022-01-27 18:43:06 +00:00
Simon Pilgrim
fdd3e2c943 [DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars
We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage).

In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests.

I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants.

Differential Revision: https://reviews.llvm.org/D118264
2022-01-27 10:59:08 +00:00
David Green
57356d6bb7 [DAG] Create fptoui.sat from clamped fptoui
This is the unsigned variant of D111976, where we convert a clamped
fptoui to a fptoui.sat. Because we are unsigned, the condition this time
is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D114964
2022-01-26 08:37:44 +00:00
Bjorn Pettersson
46cacdbb21 [DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth
In code review for D117104 two slightly weird checks were found
in DAGCombiner::reduceLoadWidth. They were typically checking
if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)),
but such a comparison actually only make sense if BitsB is a power
of two.

The checks were related to the code that attempted to shrink a load
based on the fact that the loaded value would be right shifted.

Afaict the legality of the value types is checked later (typically in
isLegalNarrowLdSt), so the existing checks were both overly
conservative as well as being wrong whenever ExtVTBits wasn't a
power of two. The latter was a situation triggered by a number of
lit tests so we could not just assert on ExtVTBIts being a power of
two).

When attempting to simply remove the checks I found some problems,
that seems to have been guarded by the checks (maybe just out of
luck). A typical example would be a pattern like this:

  t1 = load i96* ptr
  t2 = srl t1, 64
  t3 = truncate t2 to i64

When DAGCombine is visiting the truncate reduceLoadWidth is called
attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then
the SRL is detected and we set ShAmt to 64.

In the past we've bailed out due to i96 not being a multiple of 64.
If we simply remove that check then we would end up replacing the
load with a new load that would read 64 bits but with a base pointer
adjusted by 64 bits. So we would read 32 bits the wasn't accessed by
the original load.
This patch will instead utilize the fact that the logical left shift
can be folded away by using a zextload. Thus, the pattern above will
now be combined into

  t3 = load i32* ptr+offset, zext to i64


Another case is shown in the X86/shift-folding.ll test case:

  t1 = load i32* ptr
  t2 = srl i32 t1, 8
  t3 = truncate t2 to i16

In the past we bailed out due to the shift count (8) not being a
multiple of 16. Now the narrowing kicks in and we get

  t3 = load i16* ptr+offset

Differential Revision: https://reviews.llvm.org/D117406
2022-01-24 12:22:04 +01:00
Fangrui Song
e6cdef187e [XRay][test] Clean up llc RUN lines 2022-01-21 17:00:03 -08:00
Mircea Trofin
e67430cca4 [MLGO] ML Regalloc Eviction Advisor
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information (currently after the Virtual Register
Rewriter) and then produce the log file.

This patch also introduces the score injection pass, 'Register
Allocation Pass Scoring', which is trivially just logging the score in
development mode.

Differential Revision: https://reviews.llvm.org/D117147
2022-01-19 11:00:32 -08:00
David Green
351edf1c47 [ARM] Remove FeaturePerfMon from armv7-m
FeaturePerfMon relates to the PMU extensions available in armv7-a, and
should not be available in v7-m (it requires loading from a system
register with a mrc). Sink it down a level in the dependency map so that
it isn't present in ARMv7m or HasV8MMainlineOps.

It is also removed from the Neoverse-N2, as it will already be
transitively included.

Differential Revision: https://reviews.llvm.org/D117022
2022-01-12 09:44:53 +00:00
Nick Desaulniers
79ebc3b0dd [llvm][test] rewrite callbr to use i rather than X constraint NFC
In D115311, we're looking to modify clang to emit i constraints rather
than X constraints for callbr's indirect destinations. Prior to doing
so, update all of the existing tests in llvm/ to match.

Reviewed By: void, jyknight

Differential Revision: https://reviews.llvm.org/D115410
2022-01-11 11:31:08 -08:00
Tim Northover
0b5b35fdbd ARM: make FastISel & GISel pass -1 to ADJCALLSTACKUP to signal no callee pop.
The interface for these instructions changed with support for mandatory tail
calls, and now -1 indicates the CalleePopAmount argument is not valid.
Unfortunately I didn't realise FastISel or GISel did calls at the time so
didn't update them.
2022-01-11 11:31:13 +00:00
Nikita Popov
f430c1eb64 [Tests] Add elementtype attribute to indirect inline asm operands (NFC)
This updates LLVM tests for D116531 by adding elementtype attributes
to operands that correspond to indirect asm constraints.
2022-01-06 14:23:51 +01:00
David Green
2ec3ca7477 [ARM] Extend IsCMPZCSINC to handle CMOV
A 'CMOV 1, 0, CC, %cpsr, Cmp' is the same as a 'CSINC 0, 0, CC, Cmp',
and can be treated the same in IsCMPZCSINC added in D114013. This allows
us to remove the unnecessary CMOV in the same way that we could remove a
CSINC.

Differential Revision: https://reviews.llvm.org/D115188
2021-12-27 14:15:03 +00:00
Kristina Bessonova
81378f7e56 Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block" & dependent patches
Try to revert D113741 once again.

This also reverts 0ac75e82fff93a80ca401d3db3541e8d1d9098f9 (D114705)
as it causes LLDB's lldb-api.lang/cpp/nsimport.TestCppNsImport.py test
failure w/o D113741.

This reverts commit f9607d45f399e2afc39ec16222ea68b4e0831564.

Differential Revision: https://reviews.llvm.org/D116225
2021-12-24 00:47:04 +02:00
David Green
26f6fbe2be [ARM] Add AddrModeT2_i8neg addressing mode support for frame lowering.
As reported from a failing firefox build, we can sometimes get frame
indices with negative offsets from a t2LDRi8. This adds support for
them, to prevent the crash.
2021-12-14 12:49:27 +00:00
Roman Lebedev
c1a36ba002
[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask')
In most test changes this allows us to drop some broadcasts/shuffles.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104156
2021-12-13 20:03:44 +03:00
Ties Stuij
63eb7ff47d [ARM] Implement PAC return address signing mechanism for PACBTI-M
This patch implements PAC return address signing for armv8-m. This patch roughly
accomplishes the following things:

- PAC and AUT instructions are generated.
- They're part of the stack frame setup, so that shrink-wrapping can move them
inwards to cover only part of a function
- The auth code generated by PAC is saved across subroutine calls so that AUT
can find it again to check
- PAC is emitted before stacking registers (so that the SP it signs is the one
on function entry).
- The new pseudo-register ra_auth_code is mentioned in the DWARF frame data
- With CMSE also in use: PAC is emitted before stacking FPCXTNS, and AUT
validates the corresponding value of SP
- Emit correct unwind information when PAC is replaced by PACBTI
- Handle tail calls correctly

Some notes:

We make the assembler accept the `.save {ra_auth_code}` directive that is
emitted by the compiler when it saves a register that contains a
return address authentication code.

For EHABI we need to have the `FrameSetup` flag on the instruction and
handle the `t2PACBTI` opcode (identically to `t2PAC`), so we can emit
`.save {ra_auth_code}`, instead of `.save {r12}`.

For PACBTI-M, the instruction which computes return address PAC should use SP
value before adjustment for the argument registers save are (used for variadic
functions and when a parameter is is split between stack and register), but at
the same it should be after the instruction that saves FPCXT when compiling a
CMSE entry function.

This patch moves the varargs SP adjustment after the FPCXT save (they are never
enabled at the same time), so in a following patch handling of the `PAC`
instruction can be placed between them.

Epilogue emission code adjusted in a similar manner.

PACBTI-M code generation should not emit any instructions for architectures
v6-m, v8-m.base, and for A- and R-class cores. Diagnostic message for such cases
is handled separately by a future ticket.

note on tail calls:

If the called function has four arguments that occupy registers `r0`-`r3`, the
only option for holding the function pointer itself is `r12`, but this register
is used to keep the PAC during function/prologue epilogue and clobbers the
function pointer.

When we do the tail call we need the five registers (`r0`-`r3` and `r12`) to
keep six values - the four function arguments, the function pointer and the PAC,
which is obviously impossible.

One option would be to authenticate the return address before all callee-saved
registers are restored, so we have a scratch register to temporarily keep the
value of `r12`. The issue with this approach is that it violates a fundamental
invariant that PAC is computed using CFA as a modifier. It would also mean using
separate instructions to pop `lr` and the rest of the callee-saved registers,
which would offset the advantages of doing a tail call.

Instead, this patch disables indirect tail calls when the called function take
four or more arguments and the return address sign and authentication is enabled
for the caller function, conservatively assuming the caller function would spill
LR.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Ties Stuij

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112429
2021-12-07 10:15:19 +00:00
Ties Stuij
0fbb17458a [ARM] Implement setjmp BTI placement for PACBTI-M
This patch intends to guard indirect branches performed by longjmp
by inserting BTI instructions after calls to setjmp.

Calls with 'returns-twice' are lowered to a new pseudo-instruction
named t2CALL_BTI that is later expanded to a bundle of {tBL,t2BTI}.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Alexandros Lamprineas
- Ties Stuij

Reviewed By: labrinea

Differential Revision: https://reviews.llvm.org/D112427
2021-12-06 11:07:10 +00:00
Kristina Bessonova
0ac75e82ff Reland [DwarfDebug] Move emission of global vars, types and imports to endModule()
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
  local imported entities; related types can be created as a part of
  the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).

Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.

The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
    from this time IR can be significantly changed by target-specific passes.
    If it happens for debug metadata of global entities, those changes will not
    be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
    but it isn't known in DwarfDebug::beginModule() (it's possible to make some
    guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
    (subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
    (they need parent emitted first). Another problem is if to try to gather
    some information about local entities and defer their emission
    (till subprogram's processing or DwarfDebug::endModule()) all the gathered
    details might be irrelevant / invalid by the time the entities are being
    emitted (because of (1)).

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114705
2021-12-05 13:56:45 +02:00
David Green
57ff805a6d [DAG] Create fptoui.sat from clamped fptosi
As an extension to D111976, this converts clamp fptosi, clamped between
0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with
conversions that naturally saturate, such as Arm.

X86 disables the transform as some of the test cases increases in size.
A fptoui.sat necessitates a fp clamp without native support, so there is
little use in converting if the instruction is just going to be
expanded.

Differential Revision: https://reviews.llvm.org/D112428
2021-12-05 09:25:52 +00:00
Kristina Bessonova
a961604819 Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block"
This reverts commits
* ee691970a9a85470948ada623c31f0ab8773617c (D113741),
* 79d3132998b2828be8f7d2ec411f91fb11b3e01f (D114705)

due to lldb and dexter test failures.
2021-12-04 18:06:57 +02:00
Kristina Bessonova
79d3132998 [DwarfDebug] Move emission of global vars, types and imports to endModule()
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
  local imported entities; related types can be created as a part of
  the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).

Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.

The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
    from this time IR can be significantly changed by target-specific passes.
    If it happens for debug metadata of global entities, those changes will not
    be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
    but it isn't known in DwarfDebug::beginModule() (it's possible to make some
    guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
    (subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
    (they need parent emitted first). Another problem is if to try to gather
    some information about local entities and defer their emission
    (till subprogram's processing or DwarfDebug::endModule()) all the gathered
    details might be irrelevant / invalid by the time the entities are being
    emitted (because of (1)).

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114705
2021-12-04 14:10:01 +02:00
David Green
b8f1ccb0ac [ARM] Introduce i8neg and i8pos addressing modes
Some instructions with i8 immediate ranges can only hold negative values
(like t2LDRHi8), only hold positive values (like t2STRT) or hold +/-
depending on the U bit (like the pre/post inc instructions. e.g
t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8,
AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear.

This allows us to get the offset ranges of t2LDRHi8 correct in the
load/store optimizer, fixing issues where we could end up creating
instructions with positive offsets (which may then be encoded as ldrht).

Differential Revision: https://reviews.llvm.org/D114638
2021-12-02 17:10:26 +00:00
David Green
646c872f9d [ARM] Teach getIntImmCostInst about the cost of saturating fp converts
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.

Recommitted from 0e98659ea1193c with a fix for APInt comparison.

Differential Revision: https://reviews.llvm.org/D114380
2021-12-02 07:56:27 +00:00
David Green
13e66c070b Revert "[ARM] Teach getIntImmCostInst about the cost of saturating fp converts"
This reverts commit 6d41de380f223c8da02fd4d6a7f7dd1e7a404a24 as the
windows bots are not happy, in a way I do not understand. Revert whilst
we figure out what is wrong.
2021-12-01 15:25:19 +00:00
Ties Stuij
f5f28d5b0c [ARM] Implement BTI placement pass for PACBTI-M
This patch implements a new MachineFunction in the ARM backend for
placing BTI instructions. It is similar to the existing AArch64
aarch64-branch-targets pass.

BTI instructions are inserted into basic blocks that:
- Have their address taken
- Are the entry block of a function, if the function has external
  linkage or has its address taken
- Are mentioned in jump tables
- Are exception/cleanup landing pads

Each BTI instructions is placed in the beginning of a BB after the
so-called meta instructions (e.g. exception handler labels).

Each outlining candidate and the outlined function need to be in agreement about
whether BTI placement is enabled or not. If branch target enforcement is
disabled for a function, the outliner should not covertly enable it by emitting
a call to an outlined function, which begins with BTI.

The cost mode of the outliner is adjusted to account for the extra BTI
instructions in the outlined function.

The ARM Constant Islands pass will maintain the count of the jump tables, which
reference a block. A `BTI` instruction is removed from a block only if the
reference count reaches zero.

PAC instructions in entry blocks are replaced with PACBTI instructions (tests
for this case will be added in a later patch because the compiler currently does
not generate PAC instructions).

The ARM Constant Island pass is adjusted to handle BTI
instructions correctly.

Functions with static linkage that don't have their address taken can
still be called indirectly by linker-generated veneers and thus their
entry points need be marked with BTI or PACBTI.

The changes are tested using "LLVM IR -> assembly" tests, jump tables
also have a MIR test. Unfortunately it is not possible add MIR tests
for exception handling and computed gotos because of MIR parser
limitations.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Mikhail Maltsev
- Momchil Velikov
- Ties Stuij

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D112426
2021-12-01 12:54:05 +00:00
David Green
0e98659ea1 [ARM] Strengthen fpclamptosat.ll triple to attempt to fix buildbot errors. NFC 2021-12-01 11:11:09 +00:00
Ties Stuij
b430782be3 [ARM] emit PACBTI-M build attributes
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos
- Ties Stuij

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D112425
2021-12-01 11:05:29 +00:00
David Green
6d41de380f [ARM] Teach getIntImmCostInst about the cost of saturating fp converts
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.

Differential Revision: https://reviews.llvm.org/D114380
2021-12-01 10:25:52 +00:00
Mircea Trofin
520f641877 [test] Avoid dumping .o in source tree (expand-pseudos.ll)
Piping the input to llc avoids that (i.e. llc .... < %s vs llc ... %s)
2021-11-30 16:56:53 -08:00