12289 Commits

Author SHA1 Message Date
David Green
4ac2721e51
[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934)
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.

It should help fix some of the regressions from #87510.
2024-04-09 16:36:08 +01:00
AtariDreams
c5d000b1a8
[Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' (#81908)
Consider the following:

        ldr     r0, [r4]
        ldr     r7, [r0, #4]
        cmp     r7, r3
        bhi     .LBB0_6
        cmp     r0, r2
        push    {r0}
        pop     {r4}
        bne     .LBB0_3
        movs    r0, r6
        pop     {r4, r5, r6, r7}
        pop     {r1}
        bx      r1

Here is a snippet of the generated THUMB1 code of the K&R malloc
function that clang currently compiles to.

push    {r0} ends up being popped to pop {r4}.

movs r4, r0 would destroy the flags set by cmp right above.

The compiler has no alternative in this case, except one:
the only alternative is to transfer through a high register.

However, it seems like LLVM does not consider that this is a valid
approach, even though it is a free clobbering a high register.

This patch addresses the FIXME so the compiler can do that when it can
in r10 or r11, or r12.
2024-04-05 10:18:22 +01:00
Victor Campos
74373c1bef
Revert "[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries" (#87699)
Reverts llvm/llvm-project#79173

The testcase fails in non-asserts builds.
2024-04-04 21:29:21 +01:00
Victor Campos
5ad320abe3
[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries (#79173)
Following https://github.com/llvm/llvm-project/pull/68313 this patch
extends the idea to M-profile PACBTI.

The Machine Scheduler can reorder instructions within a scheduling
region depending on the scheduling policy set. If a BTI-clearing
instruction happens to partake in one such region, it might be moved
around, therefore ending up where it shouldn't.

The solution is to mark all BTI-clearing instructions as scheduling
region boundaries. This essentially means that they must not be part of
any scheduling region, and as consequence never get moved:

 - PAC
 - PACBTI
 - BTI
 - SG

Note that PAC isn't BTI-clearing, but it's replaced by PACBTI late in
the compilation pipeline.

As far as I know, currently it isn't possible to organically obtain code
that's susceptible to the bug:

- Instructions that write to SP are region boundaries. PAC seems to
always be followed by the pushing of r12 to the stack, so essentially
PAC is always by itself in a scheduling region.
- CALL_BTI is expanded into a machine instruction bundle. Bundles are
unpacked only after the last machine scheduler run. Thus setjmp and BTI
can be separated only if someone deliberately run the scheduler once
more.
- The BTI insertion pass is run late in the pipeline, only after the
last machine scheduling has run. So once again it can be reordered only
if someone deliberately runs the scheduler again.

Nevertheless, one can reasonably argue that we should prevent the bug in
spite of the compiler not being able to produce the required conditions
for it. If things change, the compiler will be robust against this
issue.

The tests written for this are contrived: bogus MIR instructions have
been added adjacent to the BTI-clearing instructions in order to have
them inside non-trivial scheduling regions.
2024-04-04 12:44:32 +01:00
Prabhuk
212b1a84a6
[CallSiteInfo][NFC] CallSiteInfo -> CallSiteInfo.ArgRegPairs (#86842)
CallSiteInfo is originally used only for argument - register pairs. Make
it struct, in which we can store additional data for call sites.

Also, the variables/methods used for CallSiteInfo are named for its
original use case, e.g., CallFwdRegsInfo. Refactor these for the
upcoming
use, e.g. addCallArgsForwardingRegs() -> addCallSiteInfo().

An upcoming patch will add type ids for indirect calls to propogate them
from
middle-end to the back-end. The type ids will be then used to emit the
call
graph section.

Original RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html
Updated RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html

Differential Revision: https://reviews.llvm.org/D107109?id=362888

Co-authored-by: Necip Fazil Yildiran <necip@google.com>
2024-04-02 13:05:16 -07:00
Alfie Richards
ff870aeeb7
[ARM] Add reference to ARMAsmParser in ARMOperand (#86110) 2024-03-28 14:06:40 +00:00
AtariDreams
9669aba132
[Thumb1] LivePhysRegs to LiveRegUnits (#84474)
This removes the r7 exception because otherwise, LiveRegUnits will try
to use that register.
2024-03-27 09:33:56 -07:00
Simon Pilgrim
78f0871bee Revert rG58de1e2c5eee548a9b365e3b1554d87317072ad9 "Fix stack layout for frames larger than 2gb (#84114)"
This is failing on some EXPENSIVE_CHECKS buildbots
2024-03-27 16:16:15 +00:00
David Green
313bf28f98
[ARM][MVE] Remove kill flags when reusing VPR register. (#86300)
The vpr register may no longer be killed where it was, so we should be
removing the kill flags.
2024-03-27 16:04:48 +00:00
Wesley Wiser
58de1e2c5e
Fix stack layout for frames larger than 2gb (#84114)
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.

This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.

Fixes #48911
2024-03-27 15:05:58 +00:00
Alfie Richards
375ddd677c
[ARM][MC] Add GNU Alias for ldrexd, ldaexd, stlexd, and strexd instructions (#86507)
These aliases were supported previously there was a regression at some point.

This adds back the alternate forms and tidies up this section of code a little.

See https://github.com/llvm/llvm-project/pull/83436#issuecomment-2010213714 for the initial report regarding this change.
2024-03-26 16:13:41 +00:00
Sergei Barannikov
5e5b656102
[MC] Make MCParsedAsmOperand::getReg() return MCRegister (#86444) 2024-03-25 05:13:48 +03:00
Alfie Richards
e3030f1e19
[ARM] FIX: Fix parsing pkhtb with a condition code
This was broken by https://github.com/llvm/llvm-project/pull/83436 as in
optional operands meant when the CC operand is provided the
`parsePKHImm` parser is applied to register operands, which previously
erroneously produced an error.
2024-03-19 23:11:48 +02:00
Jeremy Morse
b9d83eff25
[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736)
These are the last remaining "trivial" changes to passes that use
Instruction pointers for insertion. All of this should be NFC, it's just
changing the spelling of how we identify a position.

In one or two locations, I'm also switching uses of getNextNode etc to
using std::next with iterators. This too should be NFC.

---------

Merged by: Stephen Tozer <stephen.tozer@sony.com>
2024-03-19 16:36:29 +00:00
Benjamin Kramer
6598f631bd Remove another layering violation by unused include. NFC 2024-03-18 12:56:51 +01:00
Alfie Richards
295cdd5c3d
[ARM][TableGen][MC] Change the ARM mnemonic operands to be optional for ASM parsing (#83436)
This changs the way the assembly matcher works for Aarch32 parsing.
Previously there was a pile of hacks which dictated whether the CC,
CCOut, and VCC operands should be present which de-facto chose if the
wide/narrow (or thumb1/thumb2/arm) instruction version were chosen.

This meant much of the TableGen machinery present for the assembly
matching was effectively being bypassed and worked around.

This patch makes the CC and CCOut operands optional which allows the ASM
matcher operate as it was designed and means we can avoid doing some of
the hacks done previously. This also adds the option for the target to
allow the prioritizing the smaller instruction encodings as is required
for Aarch32.
2024-03-18 11:25:13 +00:00
David Green
601e102bdb
[CodeGen] Use LocationSize for MMO getSize (#84751)
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.

This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.

Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
2024-03-17 18:15:56 +00:00
Arthur Eubanks
94c988bcfd [NFC] Remove unused parameter from shouldAssumeDSOLocal() 2024-03-11 19:48:17 +00:00
Sivan Shani
5e688f0dbd [llvm][arm] add T1 and T2 assembly options for vlldm and vlstm
Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

T1 allow for an optional registers list,
the register list must be {d0-d15}.
T2 define a mandatory register list,
the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-03-11 14:27:28 +00:00
Jonathan Thackray
8160139136
Add support for Arm Cortex A78AE CPU (#84485)
Add support for Arm Cortex A78AE CPU

Technical Reference Manual for Arm Cortex A78AE:
   https://developer.arm.com/documentation/101779/0003

Fixes #84450
2024-03-08 16:11:36 +00:00
AtariDreams
7c21495fee
Reapply "Convert many LivePhysRegs uses to LiveRegUnits" (#84338)
This only converts the instances where all that is needed is to change
the variable type name.

Basically, anything that involves a function that LiveRegUnits does not
directly have was skipped to play it safe.

Reverts
7a0e222a17
2024-03-08 19:05:00 +05:30
Jay Foad
7a0e222a17 Revert "Convert many LivePhysRegs uses to LiveRegUnits (#83905)"
This reverts commit 2a13422b8bcee449405e3ebff957b4020805f91c.

It was causing test failures on the expensive check builders.
2024-03-07 08:20:26 +00:00
AtariDreams
2a13422b8b
Convert many LivePhysRegs uses to LiveRegUnits (#83905) 2024-03-06 10:38:14 +05:30
Noah Goldstein
61c06775c9 [KnownBits] Add API for nuw flag in computeForAddSub; NFC 2024-03-05 12:59:58 -06:00
Jeremy Morse
f33f66be7d [NFC][RemoveDIs] Always use iterators for inserting PHIs
It's becoming potentially unsafe to insert a PHI instruction using a plain
Instruction pointer. Switch all the remaining sites that create and insert
PHIs to use iterators instead. For example, the code in
ComplexDeinterleavingPass.cpp is definitely at-risk of mixing PHIs and
debug-info.
2024-03-05 17:00:12 +00:00
James Westwood
b2c16e7ff4
Revert "[ARM] R11 not pushed adjacent to link register with PAC-M and… (#84019)
… AAPCS frame chain fix (#82801)"

This reverts commit 00e4a4197137410129d4725ffb82bae9ce44bdde. This patch
was found to cause miscompilations and compilation failures.
2024-03-05 14:34:43 +00:00
James Westwood
00e4a41971
[ARM] R11 not pushed adjacent to link register with PAC-M and AAPCS frame chain fix (#82801)
When code for M class architecture was compiled with AAPCS and PAC
enabled, the frame pointer, r11, was not pushed to the stack adjacent to
the link register. Due to PAC being enabled, r12 was placed between r11
and lr. This patch fixes this by adding an extra case to the already
existing code that splits the GPR push in two when R11 is the frame
pointer and certain paremeters are met. The differential revision for
this previous change can be found here:
https://reviews.llvm.org/D125649. This now ensures that r11 and lr are
pushed in a separate push instruction to the other GPRs when PAC and
AAPCS are enabled, meaning the frame pointer and link register are now
pushed onto the stack adjacent to each other.
2024-03-04 12:00:36 +00:00
David Green
5f058398ab [ARM] Mark AESD and AESE instructions as commutative.
Similar to #83390, this marks AESD and AESE as commutative, as the logic of the
instructions starts as a XOR between the two operands.
2024-03-03 16:56:21 +00:00
MagentaTreehouse
f505a92fc2
[NFC] Use fold expressions to replace discarded initializer_lists (#83693) 2024-03-02 23:44:17 +01:00
Fangrui Song
21c83feca5 [ARM] Simplify shouldAssumeDSOLocal for ELF. NFC 2024-03-01 16:14:48 -08:00
Alfie Richards
b8e0f3e81e
[ARM] Change the type of CC and VCC code in splitMnemonic. (#83413)
This changes the type of `PredicationCode` and `VPTPredicationCode` from
`unsigned` to `ARMCC::CondCodes` and `ARMVCC::VPTCodes` resp' for
clarity and correctness.
2024-03-01 13:12:06 +00:00
Jonathan Thackray
147dc81c1d
[ARM][AArch64] Enable FEAT_FHM for Arm Neoverse N2 (#82613)
Correct an issue with Arm Neoverse N2 after it was
changed to a v9a core in change
f576cbe44eabb8a5ac0af817424a0d1e7c8fbf85:

  * FEAT_FHM should be enabled for this core.
2024-02-29 15:57:50 +00:00
Tomas Matheson
03420f570e Revert "[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)"
This reverts commit 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

Failing EXPENSIVE_CHECKS builds with "undefined physical register".
2024-02-29 09:48:29 +00:00
SivanShani-Arm
634b0243b8
[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)
T1 allows for an optional registers list, the register list must be {d0-d15}.
T2 defines a mandatory register list, the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-02-28 17:02:51 +00:00
ostannard
749384c08e
[ARM] Update IsRestored for LR based on all returns (#82745)
PR #75527 fixed ARMFrameLowering to set the IsRestored flag for LR based
on all of the return instructions in the function, not just one.
However, there is also code in ARMLoadStoreOptimizer which changes
return instructions, but it set IsRestored based on the one instruction
it changed, not the whole function.

The fix is to factor out the code added in #75527, and also call it from
ARMLoadStoreOptimizer if it made a change to return instructions.

Fixes #80287.
2024-02-26 12:23:25 +00:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
Rishabh Bali
fe42e72db2
[CodeGen] Port AtomicExpand to new Pass Manager (#71220)
Port the `atomicexpand` pass to the new Pass Manager. 
Fixes #64559
2024-02-25 18:42:22 +05:30
Pierre van Houtryve
2ae8bee8f1
[ARM][GlobalISel] Remove legacy legalizer rules (#82619)
I've been looking at LegacyLegalizerInfo and what its place in GISel is.
It seems like it's very close to being deleted so I'm checking if we can
remove the last remaining uses of it.

Looks like we can do a drop-in replacement with the new legalizer for
ARM.
2024-02-23 10:28:58 +01:00
Fangrui Song
2167881f51 [ARM,MC] Support FDPIC relocations
Linux kernel fs/binfmt_elf_fdpic.c supports FDPIC for MMU-less systems.
GCC/binutils/qemu support FDPIC ABI for ARM
(https://github.com/mickael-guene/fdpic_doc).
_ARM FDPIC Toolchain and ABI_ provides a summary.

This patch implements FDPIC relocations to the integrated assembler.
There are 6 static relocations and 2 dynamic relocations, with
R_ARM_FUNCDESC as both static and dynamic.

gas requires `--fdpic` to assemble data relocations like `.word f(FUNCDESC)`.
This patch adds `MCTargetOptions::FDPIC` and reports an error if FDPIC
is not set.

Pull Request: https://github.com/llvm/llvm-project/pull/82187
2024-02-21 10:13:26 -08:00
Sergei Barannikov
1e4c76cdc9
[MC][AsmParser] Make MatchRegisterName return MCRegister (NFC) (#81408)
`MCRegister` is preferred over `unsigned` nowadays.
2024-02-18 13:59:49 +03:00
Shoaib Meenai
ac452204aa
[llvm] Support indirect symbol replacement with R_ARM_GOT_PREL (#81916)
R_ARM_GOT_PREL is equivalent to GOTPCREL on other architectures, so we
can use it for indirect symbol replacement the same way. This is the
equivalent of https://github.com/llvm/llvm-project/pull/67754 for x86_64
and https://github.com/llvm/llvm-project/pull/78003 for AArch64 and
64-bit RISC-V.
2024-02-15 17:05:15 -08:00
Alexey Bataev
7bc079c852
[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for
extract subvector.

Many targets do not have cost for extractsubvector shuffle kind, but
have the costs for single source permute. If there are no costs
estimation for extractsubvector, better to switchto single source
permute for better cost estimation.

Reviewers: RKSimon, davemgreen, arsenm

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/79837
2024-02-12 07:09:49 -05:00
Serge Pavlov
213b0ae497
[GlobalISel][ARM] legalize G_FPENV_RESET for soft-float mode (#81456) 2024-02-12 17:46:59 +07:00
Simon Pilgrim
b45de48be2
[MVE] Expand64BitShift - handle all constant shift amounts less than 32 (#81261)
Expand64BitShift was always dropping to generic shift legalization if the shift amount type was larger than i64, even if the constant shift amount was actually very small. I've adjusted the constant bounds checks to work with APInt types so we can always perform the comparison.

This results in the MVE long shift instructions being used more often, and it looks like this is preventing some additional combines from happening. This could be addressed in the future.

This came about while I was trying to extend the DAGTypeLegalizer::ExpandShift* helpers and need to move to consistently using the legal shift amount types instead of reusing the shift amount type from the original wider shift.
2024-02-11 15:02:27 +00:00
Serge Pavlov
b0785cd1cb
[GlobalISel][ARM] Support missing case for G_CONSTANT (#80555)
Global Instruction Selector could not select the code:

    %0:gprb(s32) = G_CONSTANT i32 -1

In DAG selector the similar code is selected to the instruction MVNi
using custom operand `mod_imm_not`. Changing its definition from
`PatLeaf` to `ImmLeaf` and providing counterpart for `imm_not_XFORM`
make the relevant rule available for GlobalISel too.
2024-02-07 12:53:20 +07:00
Serge Pavlov
b4eb7a10c0
[GlobalISel][ARM] Legalze set_fpenv and get_fpenv (#79852)
Implement handling of get/set floating point environment for ARM in
Global Instruction Selector. Lowering of these intrinsics to operations
on FPSCR was previously inplemented in DAG selector, in GlobalISel it is
reused.
2024-02-04 12:30:33 +07:00
Harald van Dijk
52864d9c7b
[ARM] Switch to soft promoting half types. (#80440)
The traditional promotion is known to generate wrong code.

Fixes #73805.
2024-02-02 21:40:40 +00:00
Philip Reames
3ff7caea33
[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339) 2024-02-01 17:52:35 -08:00
Alfie Richards
de75e5079a
[ARM][NEON] Add constraint to vld2 Odd/Even Pseudo instructions. (#79287)
This ensures the odd/even pseudo instructions are allocated to the same
register range.

This fixes #71763
2024-01-31 14:08:02 +00:00
Oskar Wirga
ff4636a4ab
Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940)
This is a fix for the regression seen in
https://github.com/llvm/llvm-project/pull/79498

> Currently, the way that recomputeLiveIns works is that it will
recompute the livein registers for that MachineBasicBlock but it matters
what order you call recomputeLiveIn which can result in incorrect
register allocations down the line.

Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence.
2024-01-30 19:33:04 -08:00