4575 Commits

Author SHA1 Message Date
David Green
4704da1374 [ARM] Fix Thumb2 compare being emitted ExpandCMP_SWAP
Given a patch like D129506, using instructions not valid for the current
target feature set becomes an error. This fixes an issue in
ARMExpandPseudo::ExpandCMP_SWAP where Thumb2 compares were used in
Thumb1Only code, such as thumbv8m.baseline targets.

Differential Revision: https://reviews.llvm.org/D129695
2022-07-20 12:04:22 +01:00
Simon Pilgrim
9fc347aa4e [DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents
PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target.

This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur.

Differential Revision: https://reviews.llvm.org/D129641
2022-07-20 10:49:31 +01:00
David Green
e22576455f [ARM] Update atomic tests for D129695. NFC 2022-07-19 19:36:08 +01:00
Simon Pilgrim
d8888e14a0 Revert rG14364200821f7b2d97edf6e78160c514800d3ec6 "[ARM] Regenerate reg_sequence.ll test checks"
Breaks on some apple machines
2022-07-16 17:32:58 +01:00
Simon Pilgrim
1436420082 [ARM] Regenerate reg_sequence.ll test checks 2022-07-16 17:10:35 +01:00
Simon Pilgrim
a5d0122f75 [DAG] Canonicalize non-inlane shuffle -> AND if all non-inlane referenced elements are known zero
As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero.

Differential Revision: https://reviews.llvm.org/D129150
2022-07-16 11:38:24 +01:00
Simon Pilgrim
8dbf9e8d02 [ARM] Regenerate pr36577.ll test checks 2022-07-15 13:54:17 +01:00
Simon Pilgrim
0a9752d937 [ARM] Regenerate hoist-and-by-const-from-shl-in-eqcmp-zero.ll
cleanup some common prefixes
2022-07-15 11:28:41 +01:00
Nikita Popov
2a721374ae [IR] Don't use blockaddresses as callbr arguments
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:

    ; Before:
    %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
    to label %asm.fallthrough [label %foo]
    ; After:
    %res = callbr i8* asm "", "=r,r,!i"(i8* %x)
    to label %asm.fallthrough [label %foo]

The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:

* Allow unrolling/peeling/rotation of callbr, or any other
  clone-based optimizations
  (https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
  (https://github.com/llvm/llvm-project/issues/45248)

This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.

Differential Revision: https://reviews.llvm.org/D129288
2022-07-15 10:18:17 +02:00
Simon Pilgrim
5271fcd8fc [ARM] Regenerate select_xform.ll test checks 2022-07-13 13:52:15 +01:00
Sanjay Patel
8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
Archibald Elliott
1666f09933 [ARM] Add Support for Cortex-M85
This patch adds support for Arm's Cortex-M85 CPU. The Cortex-M85 CPU is
an Arm v8.1m Mainline CPU, with optional support for MVE and PACBTI,
both of which are enabled by default.

Parts have been coauthored by by Mark Murray, Alexandros Lamprineas and
David Green.

Differential Revision: https://reviews.llvm.org/D128415
2022-07-05 10:43:31 +01:00
David Green
dee59f7a9e [ARM] Add Thumb-1 CTTZ codegen tests. NFC 2022-06-30 16:45:00 +01:00
Tim Northover
4aafebce52 SelectionDAG: allow FP extensions when folding extract/insert.
Before, we were trying to sign extend half -> float, and asserted in getNode.
2022-06-28 12:08:35 +01:00
Lucas Prates
70a5c52534 [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-27 14:08:48 +01:00
Tim Northover
69ae441e4c ARM: don't try to load function pointer before long call.
Deciding to load an arbitrary global based on whether the entire module is
being built for long calls is pretty clearly spurious, and in fact the existing
indirect logic is sufficient.
2022-06-27 13:59:35 +01:00
Martin Sebor
b19194c032 [InstCombine] handle subobjects of constant aggregates
Remove the known limitation of the library function call folders to only
work with top-level arrays of characters (as per the TODO comment in
the code) and allows them to also fold calls involving subobjects of
constant aggregates such as member arrays.
2022-06-21 11:55:14 -06:00
Simon Pilgrim
e4a124dda5 [DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m)
Similar to the existing (shl (srl x, c1), c2) fold

Part of the work to fix the regressions in D77804

Differential Revision: https://reviews.llvm.org/D125836
2022-06-20 08:37:38 +01:00
Craig Topper
314dbde12c [DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore.
The VT we want to shrink to may not be legal especially after type
legalization.

Fixes PR56110.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D128135
2022-06-19 15:50:15 -07:00
Krasimir Georgiev
8f2ba36336 Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records AND [NFC][Thumb] Update frame-chain codegen test to use thumbv6m"
This reverts commit 7625e01d661644a560884057755d48a0da8b77b4 and
dependent cbcce82ef6b512d97e92a319a75a03e997c844e1.

Commit 7625e01d661644a560884057755d48a0da8b77b4 causes some new codegen test
failures under asan, e.g., CodeGen/ARM/execute-only.ll:
https://lab.llvm.org/buildbot/#/builders/5/builds/24659/steps/15/logs/stdio.
2022-06-15 16:10:02 +02:00
Lucas Prates
7625e01d66 [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-14 13:37:51 +01:00
Simon Pilgrim
7d8fd4f5db [DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere
Another issue unearthed by D127115

We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with).

D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded.

This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late.

Differential Revision: https://reviews.llvm.org/D127595
2022-06-13 11:48:18 +01:00
Lucas Prates
33b9ad647e Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records"
Reverting change due to test failure.

This reverts commit 6119053dab67129eb1700dbf36db3524dd3e421f.
2022-06-13 11:00:49 +01:00
Lucas Prates
6119053dab [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-13 10:21:06 +01:00
Simon Pilgrim
44a0cd25df [DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling
Check if we're just replacing one v1x?? vector with another
2022-06-11 19:30:00 +01:00
Sam Parker
447c411fef [ARM][ParallelDSP] Fix self reference bug
Ensure we don't generate a smlad intrinsic that takes itself as an
argument.

Differential Revision: https://reviews.llvm.org/D127213
2022-06-09 09:10:57 +00:00
Martin Storsjö
98dc3e86fd [ARM] [MinGW] Default to WinEH exception handling instead of Dwarf
Switching this target to WinEH also seems to affect the `-windows-itanium`
target.

Differential Revision: https://reviews.llvm.org/D126870
2022-06-06 23:27:19 +03:00
Martin Storsjö
485432f3c8 [ARM] Make a narrow tMOVi8 where possible in SEH prologues
We intentionally disable Thumb2SizeReduction for SEH
prologues/epilogues, to avoid needing to guess what will happen with
the instructions in a potential future pass in frame lowering.

But for this specific case, where we know we can express the
intent with a narrow instruction, change to that instruction form
directly in frame lowering.

Differential Revision: https://reviews.llvm.org/D126949
2022-06-03 22:33:55 +03:00
Martin Storsjö
bd52506d24 [ARM] Make narrow push/pop in SEH prologues/epilogues where applicable
We intentionally disable Thumb2SizeReduction for SEH
prologues/epilogues, to avoid needing to guess what will happen with
the instructions in a potential future pass in frame lowering.

But for this specific case, where we know we can express the
intent with a narrow instruction, change to that instruction form
directly in frame lowering.

Differential Revision: https://reviews.llvm.org/D126948
2022-06-03 22:33:55 +03:00
Martin Storsjö
40c937cba2 [ARM] Fix restoring stack for varargs with SEH split frame pointer push
Previously, the "add sp, #12" ended up inserted after "bx lr".

Differential Revision: https://reviews.llvm.org/D126872
2022-06-03 09:32:00 +03:00
Martin Storsjö
9245c4930f [ARM] Fix a test case typo. NFC.
The test looked for the wrong string, but it happened to match as
it was a substring of the actual output.

This fixes a typo from d8e67c1cccd8fcb62230166caea744592288da17.
2022-06-02 13:04:50 +03:00
Martin Storsjö
668bb96379 [ARM] Implement lowering of the sponentry intrinsic
This is needed for SEH based setjmp on Windows.

Differential Revision: https://reviews.llvm.org/D126763
2022-06-02 12:29:59 +03:00
Martin Storsjö
2ab19bfa41 [ARM] Adjust the frame pointer when it's needed for SEH unwinding
For functions that require restoring SP from FP (e.g. that need to
align the stack, or that have variable sized allocations), the prologue
and epilogue previously used to look like this:

    push {r4-r5, r11, lr}
    add r11, sp, #8
    ...
    sub r4, r11, #8
    mov sp, r4
    pop {r4-r5, r11, pc}

This is problematic, because this unwinding operation (restoring sp
from r11 - offset) can't be expressed with the SEH unwind opcodes
(probably because this unwind procedure doesn't map exactly to
individual instructions; note the detour via r4 in the epilogue too).

To make unwinding work, the GPR push is split into two; the first one
pushing all other registers, and the second one pushing r11+lr, so that
r11 can be set pointing at this spot on the stack:

    push {r4-r5}
    push {r11, lr}
    mov r11, sp
    ...
    mov sp, r11
    pop {r11, lr}
    pop {r4-r5}
    bx lr

For the same setup, MSVC generates code that uses two registers;
r11 still pointing at the {r11,lr} pair, but a separate register
used for restoring the stack at the end:

    push {r4-r5, r7, r11, lr}
    add r11, sp, #12
    mov r7, sp
    ...
    mov sp, r7
    pop {r4-r5, r7, r11, pc}

For cases with clobbered float/vector registers, they are pushed
after the GPRs, before the {r11,lr} pair.

Differential Revision: https://reviews.llvm.org/D125649
2022-06-02 12:28:46 +03:00
Martin Storsjö
d8e67c1ccc [ARM] Add SEH opcodes in frame lowering
Skip inserting regular CFI instructions if using WinCFI.

This is based a fair amount on the corresponding ARM64 implementation,
but instead of trying to insert the SEH opcodes one by one where
we generate other prolog/epilog instructions, we try to walk over the
whole prolog/epilog range and insert them. This is done because in
many cases, the exact number of instructions inserted is abstracted
away deeper.

For some cases, we manually insert specific SEH opcodes directly where
instructions are generated, where the automatic mapping of instructions
to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes).

Skip Thumb2SizeReduction for SEH prologs/epilogs, and force
tail calls to wide instructions (just like on MachO), to make sure
that the unwind info actually matches the width of the final
instructions, without heuristics about what later passes will do.

Mark SEH instructions as scheduling boundaries, to make sure that they
aren't reordered away from the instruction they describe by
PostRAScheduler.

Mark the SEH instructions with the NoMerge flag, to avoid doing
tail merging of functions that have multiple epilogs that all end
with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue".

Differential Revision: https://reviews.llvm.org/D125648
2022-06-02 12:28:46 +03:00
Nikita Popov
41d5033eb1 [IR] Enable opaque pointers by default
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:

* If IR that contains *neither* explicit ptr nor %T* types is passed
  to tools, we will now use opaque pointer mode, unless
  -opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
  It is possible to opt-out by calling setOpaquePointers(false) on
  LLVMContext.

A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.

Differential Revision: https://reviews.llvm.org/D126689
2022-06-02 09:40:56 +02:00
Hendrik Greving
a92ed167f2 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-02 00:49:11 +00:00
Hendrik Greving
e9d05cc7d8 Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4."
This reverts commit 430ac5c3029c52e391e584c6d4447e6e361fae99.

Due to failures in Clang tests.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 13:27:49 -07:00
Hendrik Greving
430ac5c302 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 12:48:01 -07:00
Simon Pilgrim
0a96885940 [ARM] uxtb.ll - adjust armv6 triple so the update_llc_test_checks.py script can be used to regenerate the tests
No need to specify armv6-apple-darwin in these UXTB codegen tests
2022-06-01 15:28:19 +01:00
Simon Pilgrim
e1d02f6c37 [ARM][Thumb2] Refresh UXTB16 tests to match optimized IR from instcombine
As discussed on D77804, instcombine will have already performed a similar SimplifyMultipleUseDemandedBits call which will break the UXTB16 pattern that was being match in these DAG tests

I've updated the existing tests so that it match the instcombine IR (with a suitable FIXME) and added an equivalent test pattern suggested by @dmgreen
2022-06-01 15:28:19 +01:00
Nuno Lopes
80b3dcc045 [Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.

This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.

Reviewed by: nikic, MaskRay, RKSimon

Differential Revision: https://reviews.llvm.org/D126550
2022-05-30 19:19:23 +01:00
Chen Zheng
d79275238f [MachineSink] replace MachineLoop with MachineCycle
reapply 62a9b36fcf728b104ea87e6eb84c0be69b779df7 and fix module build
failue:
1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
   MachineCycleInfoWrapperPass is a anylysis pass, should not be there.
2: move the definition for MachineCycleInfoPrinterPass to cpp file.

Otherwise, there are module conflicit for MachineCycleInfoWrapperPass
in MachinePassRegistry.def and MachineCycleAnalysis.h after
62a9b36fcf728b104ea87e6eb84c0be69b779df7.

MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-26 06:45:23 -04:00
Ivan Kosarev
ad1d60c3be [FileCheck] Catch missspelled directives.
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D125604
2022-05-26 11:37:19 +01:00
David Green
18cb3b3506 [ARM] Fix vcvtb/t.f16 input liveness
The `vcvtb.f16.f32 Sd, Sn` (and vcvtt.f16.f32) instruction convert a f32
into a f16, writing either the top or bottom halves of the register.
That means that half of the input register Sd is used in the output.
This wasn't being modelled in the instructions, leading later analyses
to believe that the registers were dead where they were not, generating
invalid scheduling

Fix that be specifying the input Sda register for the instructions too,
allowing them to be set for cases like vector inserts. Most of the
changes are plumbing through the constraint string, cstr.

Differential Revision: https://reviews.llvm.org/D126118
2022-05-25 12:16:26 +01:00
Chen Zheng
80c4910f3d Revert "[MachineSink] replace MachineLoop with MachineCycle"
This reverts commit 62a9b36fcf728b104ea87e6eb84c0be69b779df7.
Cause build failure on lldb incremental buildbot:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes
2022-05-24 22:43:37 -04:00
Chen Zheng
62a9b36fcf [MachineSink] replace MachineLoop with MachineCycle
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-24 01:16:19 -04:00
David Green
a86cfaea54 [ARM] Add register-mask for tail returns
The TC_RETURN/TCRETURNdi under Arm does not currently add the
register-mask operand when tail folding, which leads to the register
(like LR) not being 'used' by the return. This changes the code to
unconditionally set the register mask on the call, as opposed to
skipping it for tail calls.

I don't believe this will currently alter any codegen, but should glue
things together better post-frame lowering. It matches the AArch64 code
better.

Differential Revision: https://reviews.llvm.org/D125906
2022-05-21 15:28:24 +01:00
Craig Topper
46eef76876 [DAGCombiner] Fix bug in MatchBSwapHWordLow.
This function tries to match (a >> 8) | (a << 8) as (bswap a) >> 16.

If the SRL isn't masked and the high bits aren't demanded, we still
need to ensure that bits 23:16 are zero. After the right shift they
will be in bits 15:8 which is where the important bits from the SHL
end up. It's only a bswap if the OR on bits 15:8 only takes the bits
from the SHL.

Fixes PR55484.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D125641
2022-05-18 09:23:18 -07:00
Simon Pilgrim
d40b7f0d5a [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses
If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node.

AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback.

Part of the work to fix the regressions in D77804

Differential Revision: https://reviews.llvm.org/D125607
2022-05-17 13:40:11 +01:00
Craig Topper
74f6ded49d [AArch64][ARM][RISCV][X86] Add test cases for PR55484. NFC
This bug is in generic DAG combine and easily reproducible on many
targets.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D125640
2022-05-16 09:28:11 -07:00