11816 Commits

Author SHA1 Message Date
John Brawn
ddd9485129 [MVE] Don't distribute add of vecreduce if it has more than one use
If the add has more than one use then applying the transformation
won't cause it to be removed, so we can end up applying it again
causing an infinite loop.

Differential Revision: https://reviews.llvm.org/D129361
2022-07-11 14:13:29 +01:00
David Sherwood
03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00
David Green
0a11ad2aa8 [ARM] Expand MVE i1 fptoint and inttofp if mve.fp is not present.
If MVE.fp is not present then we cannot select the vector i1 fp
operations to VCMP instructions, so need to expand.
2022-07-11 13:03:30 +01:00
David Green
438ffdb821 [ARM] Switch the costs of mve1beat and mve4beat
These three subtarget features are meant to control where MVE
instructions take 1 vs 2 vs 4 architectural beats. The mve1beat feature
is described as "Model MVE instructions as a 1 beat per tick
architecture", meaning MVE instruction will execute over 4 cycles.
mve4beat is the opposite where the entire 4 beats of the MVE instruction
execute in a single cycle. The costs for the two were backwards though,
not matching the cycle counts like they should. This patch switches the
costs on the two to bring them in-line with expectations.

Differential Revision: https://reviews.llvm.org/D129141
2022-07-07 16:10:00 +01:00
Archibald Elliott
1666f09933 [ARM] Add Support for Cortex-M85
This patch adds support for Arm's Cortex-M85 CPU. The Cortex-M85 CPU is
an Arm v8.1m Mainline CPU, with optional support for MVE and PACBTI,
both of which are enabled by default.

Parts have been coauthored by by Mark Murray, Alexandros Lamprineas and
David Green.

Differential Revision: https://reviews.llvm.org/D128415
2022-07-05 10:43:31 +01:00
Lucas Prates
70a5c52534 [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-27 14:08:48 +01:00
Tim Northover
69ae441e4c ARM: don't try to load function pointer before long call.
Deciding to load an arbitrary global based on whether the entire module is
being built for long calls is pretty clearly spurious, and in fact the existing
indirect logic is sufficient.
2022-06-27 13:59:35 +01:00
Kazu Hirata
3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
2022-06-25 11:56:50 -07:00
Kazu Hirata
aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
serge-sans-paille
27fd01d3f8 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fb67d683db46dfd88da09d99 detected a few
regressions, fixing them.

The impact on preprocessed output is negligible: -4k lines.
2022-06-22 18:50:39 +02:00
David Green
979400be78 [ARM] Fix MVE gather/scatter merged gep offsets
This fixes the combining of constant vector GEP operands in the
optimization of MVE gather/scatter addresses, when opaque pointers are
enabled. As opaque pointers reduce the number of bitcasts between geps,
more can be folded than before. This can cause problems if the index
types are now different between the two geps.

This fixes that by making sure each constant is scaled appropriately,
which has the effect of transforming the geps to have a scale of 1,
changing [r0, q0, uxtw #1] gathers to [r0, q0] with a larger q0. This
helps use a simpler instruction that doesn't need the extra uxtw.

Differential Revision: https://reviews.llvm.org/D127733
2022-06-22 11:04:22 +01:00
Kazu Hirata
7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Kazu Hirata
0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Kazu Hirata
e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
David Green
76f60931e2 [ARM] Allow distributing postinc with PHI uses
Although this doesn't usually come up, we can have uses of the
BaseAccess of a distributed postinc being a PHI. This doesn't need the
usual dominance check as we will dominate along the phi edge, allowing
us to still create a postinc load/store.

Differential Revision: https://reviews.llvm.org/D127676
2022-06-20 10:08:21 +01:00
Kazu Hirata
437f960062 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 10:22:05 -07:00
Kazu Hirata
b254d67160 [llvm] Call *set::insert without checking membership first (NFC) 2022-06-18 08:32:54 -07:00
Kazu Hirata
621f58e716 [Target, CodeGen] Use isImm(), isReg(), etc (NFC) 2022-06-18 07:41:04 -07:00
Corentin Jabot
b62e3a73e1 Replace to_hexString by touhexstr [NFC]
LLVM had 2 methods to convert a number to an hexa string,
this remove one of them.

Differential Revision: https://reviews.llvm.org/D127958
2022-06-16 17:29:50 +02:00
Krasimir Georgiev
8f2ba36336 Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records AND [NFC][Thumb] Update frame-chain codegen test to use thumbv6m"
This reverts commit 7625e01d661644a560884057755d48a0da8b77b4 and
dependent cbcce82ef6b512d97e92a319a75a03e997c844e1.

Commit 7625e01d661644a560884057755d48a0da8b77b4 causes some new codegen test
failures under asan, e.g., CodeGen/ARM/execute-only.ll:
https://lab.llvm.org/buildbot/#/builders/5/builds/24659/steps/15/logs/stdio.
2022-06-15 16:10:02 +02:00
David Green
1da6940275 [ARM] Add more opaque pointer gather/scatter tests. NFC
Some of the newly added tests are incorrect, fixed in D127733.
2022-06-14 14:08:43 +01:00
Lucas Prates
7625e01d66 [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-14 13:37:51 +01:00
Guillaume Chatelet
6725d80640 [NFC][Alignment] Use Align in shouldAlignPointerArgs 2022-06-14 10:56:36 +00:00
Guillaume Chatelet
01a8b89edb [NFC][Alignment] Use getAlign in ARMFastISel 2022-06-13 13:36:36 +00:00
Simon Pilgrim
f97e15ef45 [ARM] Fix "local variable is initialized but not referenced" MSVX warning. NFC 2022-06-13 11:48:06 +01:00
Lucas Prates
33b9ad647e Revert "[ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records"
Reverting change due to test failure.

This reverts commit 6119053dab67129eb1700dbf36db3524dd3e421f.
2022-06-13 11:00:49 +01:00
Lucas Prates
6119053dab [ARM][Thumb] Command-line option to ensure AAPCS compliant Frame Records
Currently the a AAPCS compliant frame record is not always created for
functions when it should. Although a consistent frame record might not
be required in some cases, there are still scenarios where applications
may want to make use of the call hierarchy made available trough it.

In order to enable the use of AAPCS compliant frame records whilst keep
backwards compatibility, this patch introduces a new command-line option
(`-mframe-chain=[none|aapcs|aapcs+leaf]`) for Aarch32 and Thumb backends.
The option allows users to explicitly select when to use it, and is also
useful to ensure the extra overhead introduced by the frame records is
only introduced when necessary, in particular for Thumb targets.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D125094
2022-06-13 10:21:06 +01:00
Lucas Prates
7775124b5c [NFC][Thumb1] Use FrameDestroy flag to identify epilog instructions
Simiarly to what's done on both ARM's and AArch64's frame lowering code,
this updates Thumb1FrameLowering to use the FrameDestroy Machine
Instruction flag to identify instructions inserted as part of the epilog
instead of relying on assumptions about specific machine instructions.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D126285
2022-06-13 10:19:10 +01:00
Fangrui Song
adf4142f76 [MC] De-capitalize SwitchSection. NFC
Add SwitchSection to return switchSection. The API will be removed soon.
2022-06-10 22:50:55 -07:00
Guillaume Chatelet
38637ee477 [clang] Add support for __builtin_memset_inline
In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`.

The idea is to get support from the compiler to easily write efficient memory function implementations.

This patch could be split in two:
 - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics.
 - and another one for the Clang part providing the instrinsic as a builtin.

Differential Revision: https://reviews.llvm.org/D126903
2022-06-10 13:13:59 +00:00
David Sherwood
007917b95c [MVE] Fold fadd(select(..., +0.0)) into a predicated fadd
We already have patterns for matching fadd(select(..., -0.0)),
but an upcoming patch will lead to patterns using +0.0 as the
identity instead of -0.0. I'm adding support for these patterns
now to avoid any regressions for MVE.

Differential Revision: https://reviews.llvm.org/D127275
2022-06-10 11:09:55 +01:00
Sam Parker
447c411fef [ARM][ParallelDSP] Fix self reference bug
Ensure we don't generate a smlad intrinsic that takes itself as an
argument.

Differential Revision: https://reviews.llvm.org/D127213
2022-06-09 09:10:57 +00:00
Matt Arsenault
cc5a1b3dd9 llvm-reduce: Add cloning of target MachineFunctionInfo
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.

Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
2022-06-07 10:14:48 -04:00
Guillaume Chatelet
0788186182 [Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.

Differential Revision: https://reviews.llvm.org/D126910
2022-06-07 13:52:20 +00:00
David Green
53be6ab25c [ARM] Fix MVE getShuffleCost legalized type check
The MVE shuffle costing for VREV instructions was making incorrect
assumptions as to legalized vector types remaining as vectors. Add a
quick check to ensure they are indeed vectors before attempting to get
the number of elements.
2022-06-07 14:36:04 +01:00
Fangrui Song
15d82c62dc [MC] De-capitalize MCStreamer functions
Follow-up to c031378ce01b8485ba0ef486654bc9393c4ac024 .
The class is mostly consistent now.
2022-06-07 00:31:02 -07:00
ksyx
3204272f0f [ARM] Use llvm::dbgs() to print debug info (NFC)
For consistency with other parts of code.

Approved by efriedma in differential revision
https://reviews.llvm.org/D127055
2022-06-06 16:43:16 -04:00
Martin Storsjö
98dc3e86fd [ARM] [MinGW] Default to WinEH exception handling instead of Dwarf
Switching this target to WinEH also seems to affect the `-windows-itanium`
target.

Differential Revision: https://reviews.llvm.org/D126870
2022-06-06 23:27:19 +03:00
Fangrui Song
77e300ffdf [MC] Change EndOfStatement "unexpected tokens in .xxx directive " to "expected newline" 2022-06-05 15:11:01 -07:00
Fangrui Song
8c911f8e9a [ARM][MC] Change EndOfStatement "unexpected tokens in .xxx directive " to "expected newline"
The directive name is not useful because the next line replicates the error line
which includes the directive. The prevailing style uses "expected newline".
2022-06-05 14:53:59 -07:00
Kazu Hirata
3b9707dbc0 [llvm] Convert for_each to range-based for loops (NFC) 2022-06-05 12:07:14 -07:00
Fangrui Song
d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Martin Storsjö
485432f3c8 [ARM] Make a narrow tMOVi8 where possible in SEH prologues
We intentionally disable Thumb2SizeReduction for SEH
prologues/epilogues, to avoid needing to guess what will happen with
the instructions in a potential future pass in frame lowering.

But for this specific case, where we know we can express the
intent with a narrow instruction, change to that instruction form
directly in frame lowering.

Differential Revision: https://reviews.llvm.org/D126949
2022-06-03 22:33:55 +03:00
Martin Storsjö
bd52506d24 [ARM] Make narrow push/pop in SEH prologues/epilogues where applicable
We intentionally disable Thumb2SizeReduction for SEH
prologues/epilogues, to avoid needing to guess what will happen with
the instructions in a potential future pass in frame lowering.

But for this specific case, where we know we can express the
intent with a narrow instruction, change to that instruction form
directly in frame lowering.

Differential Revision: https://reviews.llvm.org/D126948
2022-06-03 22:33:55 +03:00
Martin Storsjö
40c937cba2 [ARM] Fix restoring stack for varargs with SEH split frame pointer push
Previously, the "add sp, #12" ended up inserted after "bx lr".

Differential Revision: https://reviews.llvm.org/D126872
2022-06-03 09:32:00 +03:00
Martin Storsjö
668bb96379 [ARM] Implement lowering of the sponentry intrinsic
This is needed for SEH based setjmp on Windows.

Differential Revision: https://reviews.llvm.org/D126763
2022-06-02 12:29:59 +03:00
Martin Storsjö
2ab19bfa41 [ARM] Adjust the frame pointer when it's needed for SEH unwinding
For functions that require restoring SP from FP (e.g. that need to
align the stack, or that have variable sized allocations), the prologue
and epilogue previously used to look like this:

    push {r4-r5, r11, lr}
    add r11, sp, #8
    ...
    sub r4, r11, #8
    mov sp, r4
    pop {r4-r5, r11, pc}

This is problematic, because this unwinding operation (restoring sp
from r11 - offset) can't be expressed with the SEH unwind opcodes
(probably because this unwind procedure doesn't map exactly to
individual instructions; note the detour via r4 in the epilogue too).

To make unwinding work, the GPR push is split into two; the first one
pushing all other registers, and the second one pushing r11+lr, so that
r11 can be set pointing at this spot on the stack:

    push {r4-r5}
    push {r11, lr}
    mov r11, sp
    ...
    mov sp, r11
    pop {r11, lr}
    pop {r4-r5}
    bx lr

For the same setup, MSVC generates code that uses two registers;
r11 still pointing at the {r11,lr} pair, but a separate register
used for restoring the stack at the end:

    push {r4-r5, r7, r11, lr}
    add r11, sp, #12
    mov r7, sp
    ...
    mov sp, r7
    pop {r4-r5, r7, r11, pc}

For cases with clobbered float/vector registers, they are pushed
after the GPRs, before the {r11,lr} pair.

Differential Revision: https://reviews.llvm.org/D125649
2022-06-02 12:28:46 +03:00
Martin Storsjö
d8e67c1ccc [ARM] Add SEH opcodes in frame lowering
Skip inserting regular CFI instructions if using WinCFI.

This is based a fair amount on the corresponding ARM64 implementation,
but instead of trying to insert the SEH opcodes one by one where
we generate other prolog/epilog instructions, we try to walk over the
whole prolog/epilog range and insert them. This is done because in
many cases, the exact number of instructions inserted is abstracted
away deeper.

For some cases, we manually insert specific SEH opcodes directly where
instructions are generated, where the automatic mapping of instructions
to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes).

Skip Thumb2SizeReduction for SEH prologs/epilogs, and force
tail calls to wide instructions (just like on MachO), to make sure
that the unwind info actually matches the width of the final
instructions, without heuristics about what later passes will do.

Mark SEH instructions as scheduling boundaries, to make sure that they
aren't reordered away from the instruction they describe by
PostRAScheduler.

Mark the SEH instructions with the NoMerge flag, to avoid doing
tail merging of functions that have multiple epilogs that all end
with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue".

Differential Revision: https://reviews.llvm.org/D125648
2022-06-02 12:28:46 +03:00
Martin Storsjö
298e9cac92 [MC] [Win64EH] Check that the SEH unwind opcodes match the actual instructions
It's a fairly common issue that the generating code incorrectly marks
instructions as narrow or wide; check that the instruction lengths
add up to the expected value, and error out if it doesn't. This allows
catching code generation bugs.

Also check that prologs and epilogs are properly terminated, to
catch other code generation issues.

Differential Revision: https://reviews.llvm.org/D125647
2022-06-01 11:25:49 +03:00
Martin Storsjö
6b75a3523f [ARM] [MC] Add support for writing ARM WinEH unwind info
This includes .seh_* directives for generating it from assembly.
It is designed fairly similarly to the ARM64 handling.

For .seh_handler directives, such as
".seh_handler __C_specific_handler, @except" (which is supported
on x86_64 and aarch64 so far), the "@except" bit doesn't work in
ARM assembly, as '@' is used as a comment character (on all current
platforms).

Allow using '%' instead of '@' for this purpose. This convention
is used by GAS in similar contexts already,
e.g. [1]:

    Note on targets where the @ character is the start of a comment
    (eg ARM) then another character is used instead. For example the
    ARM port uses the % character.

In practice, this unfortunately means that all such .seh_handler
directives will need ifdefs for ARM.

Contrary to ARM64, on ARM, it's quite common that we can't evaluate
e.g. the function length at this point, due to instructions whose
length is finalized later. (Also, inline jump tables end with
a ".p2align 1".)

If unable to to evaluate the function length immediately, emit
it as an MCExpr instead. If we'd implement splitting the unwind
info for a function (which isn't implemented for ARM64 yet either),
we wouldn't know whether we need to split it though.

Avoid calling getFrameIndexOffset() on an unset
FuncInfo.UnwindHelpFrameIdx, to avoid triggering asserts in the
preexisting testcase CodeGen/ARM/Windows/wineh-basic.ll. (Once
MSVC exception handling is fully implemented, those changes
can be reverted.)

[1] https://sourceware.org/binutils/docs/as/Section.html#Section

Differential Revision: https://reviews.llvm.org/D125645
2022-06-01 11:25:48 +03:00