11786 Commits

Author SHA1 Message Date
David Sherwood
007917b95c [MVE] Fold fadd(select(..., +0.0)) into a predicated fadd
We already have patterns for matching fadd(select(..., -0.0)),
but an upcoming patch will lead to patterns using +0.0 as the
identity instead of -0.0. I'm adding support for these patterns
now to avoid any regressions for MVE.

Differential Revision: https://reviews.llvm.org/D127275
2022-06-10 11:09:55 +01:00
Sam Parker
447c411fef [ARM][ParallelDSP] Fix self reference bug
Ensure we don't generate a smlad intrinsic that takes itself as an
argument.

Differential Revision: https://reviews.llvm.org/D127213
2022-06-09 09:10:57 +00:00
Matt Arsenault
cc5a1b3dd9 llvm-reduce: Add cloning of target MachineFunctionInfo
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.

Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
2022-06-07 10:14:48 -04:00
Guillaume Chatelet
0788186182 [Alignment][NFC] Remove usage of MemSDNode::getAlignment
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.

Differential Revision: https://reviews.llvm.org/D126910
2022-06-07 13:52:20 +00:00
David Green
53be6ab25c [ARM] Fix MVE getShuffleCost legalized type check
The MVE shuffle costing for VREV instructions was making incorrect
assumptions as to legalized vector types remaining as vectors. Add a
quick check to ensure they are indeed vectors before attempting to get
the number of elements.
2022-06-07 14:36:04 +01:00
Fangrui Song
15d82c62dc [MC] De-capitalize MCStreamer functions
Follow-up to c031378ce01b8485ba0ef486654bc9393c4ac024 .
The class is mostly consistent now.
2022-06-07 00:31:02 -07:00
ksyx
3204272f0f [ARM] Use llvm::dbgs() to print debug info (NFC)
For consistency with other parts of code.

Approved by efriedma in differential revision
https://reviews.llvm.org/D127055
2022-06-06 16:43:16 -04:00
Martin Storsjö
98dc3e86fd [ARM] [MinGW] Default to WinEH exception handling instead of Dwarf
Switching this target to WinEH also seems to affect the `-windows-itanium`
target.

Differential Revision: https://reviews.llvm.org/D126870
2022-06-06 23:27:19 +03:00
Fangrui Song
77e300ffdf [MC] Change EndOfStatement "unexpected tokens in .xxx directive " to "expected newline" 2022-06-05 15:11:01 -07:00
Fangrui Song
8c911f8e9a [ARM][MC] Change EndOfStatement "unexpected tokens in .xxx directive " to "expected newline"
The directive name is not useful because the next line replicates the error line
which includes the directive. The prevailing style uses "expected newline".
2022-06-05 14:53:59 -07:00
Kazu Hirata
3b9707dbc0 [llvm] Convert for_each to range-based for loops (NFC) 2022-06-05 12:07:14 -07:00
Fangrui Song
d86a206f06 Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 00:31:44 -07:00
Martin Storsjö
485432f3c8 [ARM] Make a narrow tMOVi8 where possible in SEH prologues
We intentionally disable Thumb2SizeReduction for SEH
prologues/epilogues, to avoid needing to guess what will happen with
the instructions in a potential future pass in frame lowering.

But for this specific case, where we know we can express the
intent with a narrow instruction, change to that instruction form
directly in frame lowering.

Differential Revision: https://reviews.llvm.org/D126949
2022-06-03 22:33:55 +03:00
Martin Storsjö
bd52506d24 [ARM] Make narrow push/pop in SEH prologues/epilogues where applicable
We intentionally disable Thumb2SizeReduction for SEH
prologues/epilogues, to avoid needing to guess what will happen with
the instructions in a potential future pass in frame lowering.

But for this specific case, where we know we can express the
intent with a narrow instruction, change to that instruction form
directly in frame lowering.

Differential Revision: https://reviews.llvm.org/D126948
2022-06-03 22:33:55 +03:00
Martin Storsjö
40c937cba2 [ARM] Fix restoring stack for varargs with SEH split frame pointer push
Previously, the "add sp, #12" ended up inserted after "bx lr".

Differential Revision: https://reviews.llvm.org/D126872
2022-06-03 09:32:00 +03:00
Martin Storsjö
668bb96379 [ARM] Implement lowering of the sponentry intrinsic
This is needed for SEH based setjmp on Windows.

Differential Revision: https://reviews.llvm.org/D126763
2022-06-02 12:29:59 +03:00
Martin Storsjö
2ab19bfa41 [ARM] Adjust the frame pointer when it's needed for SEH unwinding
For functions that require restoring SP from FP (e.g. that need to
align the stack, or that have variable sized allocations), the prologue
and epilogue previously used to look like this:

    push {r4-r5, r11, lr}
    add r11, sp, #8
    ...
    sub r4, r11, #8
    mov sp, r4
    pop {r4-r5, r11, pc}

This is problematic, because this unwinding operation (restoring sp
from r11 - offset) can't be expressed with the SEH unwind opcodes
(probably because this unwind procedure doesn't map exactly to
individual instructions; note the detour via r4 in the epilogue too).

To make unwinding work, the GPR push is split into two; the first one
pushing all other registers, and the second one pushing r11+lr, so that
r11 can be set pointing at this spot on the stack:

    push {r4-r5}
    push {r11, lr}
    mov r11, sp
    ...
    mov sp, r11
    pop {r11, lr}
    pop {r4-r5}
    bx lr

For the same setup, MSVC generates code that uses two registers;
r11 still pointing at the {r11,lr} pair, but a separate register
used for restoring the stack at the end:

    push {r4-r5, r7, r11, lr}
    add r11, sp, #12
    mov r7, sp
    ...
    mov sp, r7
    pop {r4-r5, r7, r11, pc}

For cases with clobbered float/vector registers, they are pushed
after the GPRs, before the {r11,lr} pair.

Differential Revision: https://reviews.llvm.org/D125649
2022-06-02 12:28:46 +03:00
Martin Storsjö
d8e67c1ccc [ARM] Add SEH opcodes in frame lowering
Skip inserting regular CFI instructions if using WinCFI.

This is based a fair amount on the corresponding ARM64 implementation,
but instead of trying to insert the SEH opcodes one by one where
we generate other prolog/epilog instructions, we try to walk over the
whole prolog/epilog range and insert them. This is done because in
many cases, the exact number of instructions inserted is abstracted
away deeper.

For some cases, we manually insert specific SEH opcodes directly where
instructions are generated, where the automatic mapping of instructions
to SEH opcodes doesn't hold up (e.g. for __chkstk stack probes).

Skip Thumb2SizeReduction for SEH prologs/epilogs, and force
tail calls to wide instructions (just like on MachO), to make sure
that the unwind info actually matches the width of the final
instructions, without heuristics about what later passes will do.

Mark SEH instructions as scheduling boundaries, to make sure that they
aren't reordered away from the instruction they describe by
PostRAScheduler.

Mark the SEH instructions with the NoMerge flag, to avoid doing
tail merging of functions that have multiple epilogs that all end
with the same sequence of "b <other>; .seh_nop_w, .seh_endepilogue".

Differential Revision: https://reviews.llvm.org/D125648
2022-06-02 12:28:46 +03:00
Martin Storsjö
298e9cac92 [MC] [Win64EH] Check that the SEH unwind opcodes match the actual instructions
It's a fairly common issue that the generating code incorrectly marks
instructions as narrow or wide; check that the instruction lengths
add up to the expected value, and error out if it doesn't. This allows
catching code generation bugs.

Also check that prologs and epilogs are properly terminated, to
catch other code generation issues.

Differential Revision: https://reviews.llvm.org/D125647
2022-06-01 11:25:49 +03:00
Martin Storsjö
6b75a3523f [ARM] [MC] Add support for writing ARM WinEH unwind info
This includes .seh_* directives for generating it from assembly.
It is designed fairly similarly to the ARM64 handling.

For .seh_handler directives, such as
".seh_handler __C_specific_handler, @except" (which is supported
on x86_64 and aarch64 so far), the "@except" bit doesn't work in
ARM assembly, as '@' is used as a comment character (on all current
platforms).

Allow using '%' instead of '@' for this purpose. This convention
is used by GAS in similar contexts already,
e.g. [1]:

    Note on targets where the @ character is the start of a comment
    (eg ARM) then another character is used instead. For example the
    ARM port uses the % character.

In practice, this unfortunately means that all such .seh_handler
directives will need ifdefs for ARM.

Contrary to ARM64, on ARM, it's quite common that we can't evaluate
e.g. the function length at this point, due to instructions whose
length is finalized later. (Also, inline jump tables end with
a ".p2align 1".)

If unable to to evaluate the function length immediately, emit
it as an MCExpr instead. If we'd implement splitting the unwind
info for a function (which isn't implemented for ARM64 yet either),
we wouldn't know whether we need to split it though.

Avoid calling getFrameIndexOffset() on an unset
FuncInfo.UnwindHelpFrameIdx, to avoid triggering asserts in the
preexisting testcase CodeGen/ARM/Windows/wineh-basic.ll. (Once
MSVC exception handling is fully implemented, those changes
can be reverted.)

[1] https://sourceware.org/binutils/docs/as/Section.html#Section

Differential Revision: https://reviews.llvm.org/D125645
2022-06-01 11:25:48 +03:00
Zongwei Lan
ad73ce318e [Target] use getSubtarget<> instead of static_cast<>(getSubtarget())
Differential Revision: https://reviews.llvm.org/D125391
2022-05-26 11:22:41 -07:00
David Penry
917dc0749b [ARM] Recognize t2LoopEnd for software pipelining
- Add t2LoopEnd to TargetInstrInfo::analyzeBranch and
  related functions.  As there are many side effects of
  analyzing a branch, only do so if software pipelining
  is enabled to maintain previous behavior when pipelining
  is not desired.
- Make sure that t2LoopEndDec is immediately followed by
  a t2B when it is synthesized from a t2LoopEnd. This is
  done because the t2LoopEnd might have acquired a
  fall-through path, but IfConversion assumes that
  fall-through are only possible on analyzable branches.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D126322
2022-05-26 09:55:42 -07:00
Fangrui Song
9ee15bba47 [MC] Lower case the first letter of EmitCOFF* EmitWin* EmitCV*. NFC 2022-05-26 00:14:08 -07:00
Maksim Panchenko
bed9efed71 [MCDisassembler] Disambiguate Size parameter in tryAddingSymbolicOperand()
MCSymbolizer::tryAddingSymbolicOperand() overloaded the Size parameter
to specify either the instruction size or the operand size depending on
the architecture. However, for proper symbolic disassembly on X86, we
need to know both sizes, as an instruction can have two operands, and
the instruction size cannot be reliably calculated based on the operand
offset and its size. Hence, split Size into OpSize and InstSize.

For X86, the new interface allows to fix a couple of issues:
  * Correctly adjust the value of PC-relative operands.
  * Set operand size to zero when the operand is specified implicitly.

Differential Revision: https://reviews.llvm.org/D126101
2022-05-25 13:44:32 -07:00
David Green
18cb3b3506 [ARM] Fix vcvtb/t.f16 input liveness
The `vcvtb.f16.f32 Sd, Sn` (and vcvtt.f16.f32) instruction convert a f32
into a f16, writing either the top or bottom halves of the register.
That means that half of the input register Sd is used in the output.
This wasn't being modelled in the instructions, leading later analyses
to believe that the registers were dead where they were not, generating
invalid scheduling

Fix that be specifying the input Sda register for the instructions too,
allowing them to be set for cases like vector inserts. Most of the
changes are plumbing through the constraint string, cstr.

Differential Revision: https://reviews.llvm.org/D126118
2022-05-25 12:16:26 +01:00
David Green
a86cfaea54 [ARM] Add register-mask for tail returns
The TC_RETURN/TCRETURNdi under Arm does not currently add the
register-mask operand when tail folding, which leads to the register
(like LR) not being 'used' by the return. This changes the code to
unconditionally set the register mask on the call, as opposed to
skipping it for tail calls.

I don't believe this will currently alter any codegen, but should glue
things together better post-frame lowering. It matches the AArch64 code
better.

Differential Revision: https://reviews.llvm.org/D125906
2022-05-21 15:28:24 +01:00
David Green
b4dd9fc370 [ARM] Cost modelling for MVE vector fptoi_sat
Building on top of D125665, this adds MVE costs for fptosi.sat and
fptoui.sat, providing MVE is available and the types are legal.

Differential Revision: https://reviews.llvm.org/D125666
2022-05-20 11:00:34 +01:00
David Green
80aab0312a [ARM] Cost modelling for scalar fptoi_sat
Similar to D124357, this adds some cost modelling for fptoi_sat for Arm
targets. Where VFP2 is available (and FP64/FP16 for the relevant types),
the operations are legal as the Arm instructions naturally saturate.
Otherwise they will need an extra smin/smax clamp, similar to AArch64.

Differential Revision: https://reviews.llvm.org/D125665
2022-05-19 19:53:21 +01:00
Archibald Elliott
2321c36fbf [ARM] Don't Enable AES Pass for Generic Cores
This brings clang/llvm into line with GCC. The Pass is still enabled for
the affected cores, but is now opt-in when using `-march=`.

I also took the opportunity to add release notes for this change.

Reviewed By: john.brawn

Differential Revision: https://reviews.llvm.org/D125775
2022-05-18 13:10:31 +01:00
Martin Storsjö
68f37e7991 [ARM] Rename the isARMAreaXRegister parameter isIOS to SplitFramePushPop. NFC.
In f8b0a7af52f8c4ec6b4ddcfe3a6fa75098c9507c in 2016, this parameter
was generalized on the caller side (previously passing
STI.isTargetMachO(), now passing STI.splitFramePushPop()). Rename
the parameter on the receiver side to match the generalization.

Differential Revision: https://reviews.llvm.org/D125681
2022-05-17 00:41:38 +03:00
NAKAMURA Takumi
bdab5c4b3d ARMFixCortexA57AES1742098Pass.cpp: Suppress a warning. [-Wunused-but-set-variable] 2022-05-15 18:01:42 +09:00
Sheng
c644488a8b Rename MCFixedLenDisassembler.h as MCDecoderOps.h
The name `MCFixedLenDisassembler.h` is out of date after D120958.

Rename it as `MCDecoderOps.h` to reflect the change.

Reviewed By: myhsu

Differential Revision: https://reviews.llvm.org/D124987
2022-05-15 08:44:58 +08:00
Archibald Elliott
3a24df992c [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum
This adds a late Machine Pass to work around a Cortex CPU Erratum
affecting Cortex-A57 and Cortex-A72:
- Cortex-A57 Erratum 1742098
- Cortex-A72 Erratum 1655431

The pass inserts instructions to make the inputs to the fused AES
instruction pairs no longer trigger the erratum. Here the pass errs on
the side of caution, inserting the instructions wherever we cannot prove
that the inputs came from a safe instruction.

The pass is used:
- for Cortex-A57 and Cortex-A72,
- for "generic" cores (which are used when using `-march=`),
- when the user specifies `-mfix-cortex-a57-aes-1742098` or
  `mfix-cortex-a72-aes-1655431` in the command-line arguments to clang.

Reviewed By: dmgreen, simon_tatham

Differential Revision: https://reviews.llvm.org/D119720
2022-05-13 10:47:33 +01:00
David Green
f848798b7d [ARM] Delay creation of MVE Imm shifts to legalization
The reasoning for creating VSHLIMM/VSHRsIMM/VSHRuIMM nodes in a combine
- because matching i64 constants is difficult -  does not apply for MVE,
as there are not v2i64 shifts. Delaying the creation of the nodes can
allow extra transforms on target independant shl/shr.
2022-05-04 22:12:09 +01:00
serge-sans-paille
7030654296 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few
regressions, fixing them.

Differential Revision: https://reviews.llvm.org/D124847
2022-05-04 08:32:38 +02:00
Zhiyao Ma
bd606afe26 [ARM] Only update the successor edges for immediate predecessors of PrologueMBB
When adjusting the function prologue for segmented stacks, only update
the successor edges of the immediate predecessors of the original
prologue.

Differential Revision: https://reviews.llvm.org/D122959
2022-05-03 12:36:35 +01:00
David Penry
dcb77643e3 Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Fixed "private field is not used" warning when compiled
with clang.

original commit: 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f
reverted in: fa49021c68ef7a7adcdf7b8a44b9006506523191

------

This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-29 10:54:39 -07:00
David Penry
fa49021c68 Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM"
This reverts commit 28d09bbbc3d09c912b54a4d5edb32cab7de32a6f
while I investigate a buildbot failure.
2022-04-28 13:29:27 -07:00
David Penry
28d09bbbc3 [CodeGen][ARM] Enable Swing Module Scheduling for ARM
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-28 13:01:18 -07:00
Ties Stuij
051deb2d9d [ARM] add Armv9 build attribute
The build attribute number can be found in the Arm ABI addenda32 document:
https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#335target-related-attributes

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D124090
2022-04-28 10:48:26 +01:00
Vasileios Porpodas
fa8a9fea47 Recommit "[SLP][TTI] Refactoring of getShuffleCost Args to work like getArithmeticInstrCost"
This reverts commit 6a9bbd9f20dcd700e28738788bb63a160c6c088c.

Code review: https://reviews.llvm.org/D124202
2022-04-26 14:02:40 -07:00
Vasileios Porpodas
6a9bbd9f20 Revert "[SLP][TTI] Refactoring of getShuffleCost Args to work like getArithmeticInstrCost"
This reverts commit 55ce296d6f217fd0defed2592ff7b74b79b2c1f0.
2022-04-26 11:25:26 -07:00
Vasileios Porpodas
55ce296d6f [SLP][TTI] Refactoring of getShuffleCost Args to work like getArithmeticInstrCost
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.

Differential Revision: https://reviews.llvm.org/D124202
2022-04-26 11:11:29 -07:00
Matt Arsenault
d7938b1a81 MachineModuleInfo: Move HasSplitStack handling to AsmPrinter
This is used to emit one field in doFinalization for the module. We
can accumulate this when emitting all individual functions directly in
the AsmPrinter, rather than accumulating additional state in
MachineModuleInfo.

Move the special case behavior predicate into MachineFrameInfo to
share it. This now promotes it to generic behavior. I'm assuming this
is fine because no other target implements adjustForSegmentedStacks,
or has tests using the split-stack attribute.
2022-04-20 10:54:29 -04:00
Ilia Diachkov
6c69427e88 [SPIR-V](3/6) Add MC layer, object file support, and InstPrinter
The patch adds SPIRV-specific MC layer implementation, SPIRV object
file support and SPIRVInstPrinter.

Differential Revision: https://reviews.llvm.org/D116462

Authors: Aleksandr Bezzubikov, Lewis Crawford, Ilia Diachkov,
Michal Paszkowski, Andrey Tretyakov, Konrad Trifunovic

Co-authored-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Co-authored-by: Ilia Diachkov <iliya.diyachkov@intel.com>
Co-authored-by: Michal Paszkowski <michal.paszkowski@outlook.com>
Co-authored-by: Andrey Tretyakov <andrey1.tretyakov@intel.com>
Co-authored-by: Konrad Trifunovic <konrad.trifunovic@intel.com>
2022-04-20 01:10:25 +02:00
Craig Topper
65942554e2 [ARM] Add missing return to ARMTTIImpl::isLoweredToCall.
I assume we meant to return the result of the call to
BaseT::isLoweredToCall(F).

This might not be a functional change in practice because it would
still hit the default case in the switch and call
BaseT::isLoweredToCall(F) at the end.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D123333
2022-04-07 12:52:54 -07:00
Ties Stuij
edeceb8647 remove dead code in parseRegisterList checking for ARM::RA_AUTH_CODE
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122577
2022-04-07 14:53:46 +01:00
Matt Arsenault
c4ea925f50 AtomicExpand: Change return type for shouldExpandAtomicStoreInIR
Use the same enum as the other atomic instructions for consistency, in
preparation for addition of another strategy.

Introduce a new "Expand" option, since the store expansion does not
use cmpxchg. Alternatively, the existing CmpXChg strategy could be
renamed to Expand.
2022-04-06 22:34:04 -04:00
Matt Arsenault
0fb6856aff ARM/GlobalISel: Get pointer type from value instead of getPointerSize
Avoid using getPointerSize and pass through the original value type.
2022-03-31 16:46:23 -04:00
Chris Bieneman
9130e471fe Add DXContainer
DXIL is wrapped in a container format defined by the DirectX 11
specification. Codebases differ in calling this format either DXBC or
DXILContainer.

Since eventually we want to add support for DXBC as a target
architecture and the format is used by DXBC and DXIL, I've termed it
DXContainer here.

Most of the changes in this patch are just adding cases to switch
statements to address warnings.

Reviewed By: pete

Differential Revision: https://reviews.llvm.org/D122062
2022-03-29 14:34:23 -05:00