4862 Commits

Author SHA1 Message Date
Jonas Paulsson
9ebd329ad8 Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)"
This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92.

Reverting due to verifier complaints with expensive checks on build-bot.
2024-03-20 11:48:30 -04:00
Jonas Paulsson
05bde30585
Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)
Have the verifier report a missing AdjustsStack flag rather than waiting until
PEI asserts.
2024-03-20 10:29:12 -04:00
Sivan Shani
5e688f0dbd [llvm][arm] add T1 and T2 assembly options for vlldm and vlstm
Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

T1 allow for an optional registers list,
the register list must be {d0-d15}.
T2 define a mandatory register list,
the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-03-11 14:27:28 +00:00
Jay Foad
fd3eaf76ba
[GISel] Enforce G_PTR_ADD RHS type matching index size for addr space (#84352) 2024-03-09 09:07:22 +00:00
David Green
5f058398ab [ARM] Mark AESD and AESE instructions as commutative.
Similar to #83390, this marks AESD and AESE as commutative, as the logic of the
instructions starts as a XOR between the two operands.
2024-03-03 16:56:21 +00:00
Fangrui Song
d89b771ef5 [ARM] Add alias tests for ROPI/RWPI
https://reviews.llvm.org/D23195 does not test aliases.
2024-03-02 10:33:57 -08:00
Tomas Matheson
03420f570e Revert "[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)"
This reverts commit 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

Failing EXPENSIVE_CHECKS builds with "undefined physical register".
2024-02-29 09:48:29 +00:00
SivanShani-Arm
634b0243b8
[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)
T1 allows for an optional registers list, the register list must be {d0-d15}.
T2 defines a mandatory register list, the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-02-28 17:02:51 +00:00
ostannard
749384c08e
[ARM] Update IsRestored for LR based on all returns (#82745)
PR #75527 fixed ARMFrameLowering to set the IsRestored flag for LR based
on all of the return instructions in the function, not just one.
However, there is also code in ARMLoadStoreOptimizer which changes
return instructions, but it set IsRestored based on the one instruction
it changed, not the whole function.

The fix is to factor out the code added in #75527, and also call it from
ARMLoadStoreOptimizer if it made a change to return instructions.

Fixes #80287.
2024-02-26 12:23:25 +00:00
Oliver Stannard
8779cf68e8 Pre-commit test showing bug #80287
This test shows the bug where LR is used as a general-purpose register
on a code path where it is not spilled to the stack.
2024-02-26 12:21:13 +00:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
Serge Pavlov
213b0ae497
[GlobalISel][ARM] legalize G_FPENV_RESET for soft-float mode (#81456) 2024-02-12 17:46:59 +07:00
Serge Pavlov
b0785cd1cb
[GlobalISel][ARM] Support missing case for G_CONSTANT (#80555)
Global Instruction Selector could not select the code:

    %0:gprb(s32) = G_CONSTANT i32 -1

In DAG selector the similar code is selected to the instruction MVNi
using custom operand `mod_imm_not`. Changing its definition from
`PatLeaf` to `ImmLeaf` and providing counterpart for `imm_not_XFORM`
make the relevant rule available for GlobalISel too.
2024-02-07 12:53:20 +07:00
Nikita Popov
b31fffbc7f [ARM] Convert tests to opaque pointers (NFC) 2024-02-05 13:56:59 +01:00
Craig Topper
6590d0fed5
[DAGCombiner][ARM] Teach reduceLoadWidth to handle (and (srl (load), C, ShiftedMask)) (#80342)
If we have a shifted mask, we may be able to reduce the load width
to the width of the non-zero part of the mask and use an offset
to the base address to remove the srl. The offset is given by
C+trailingzeros(ShiftedMask).
    
Then we add a final shl to restore the trailing zero bits.
    
I've use the ARM test because that's where the existing (and (srl
(load))) tests were.
    
The X86 test was modified to keep the H register.
2024-02-04 16:05:51 -08:00
Serge Pavlov
b4eb7a10c0
[GlobalISel][ARM] Legalze set_fpenv and get_fpenv (#79852)
Implement handling of get/set floating point environment for ARM in
Global Instruction Selector. Lowering of these intrinsics to operations
on FPSCR was previously inplemented in DAG selector, in GlobalISel it is
reused.
2024-02-04 12:30:33 +07:00
Harald van Dijk
52864d9c7b
[ARM] Switch to soft promoting half types. (#80440)
The traditional promotion is known to generate wrong code.

Fixes #73805.
2024-02-02 21:40:40 +00:00
Quentin Dian
112fba974c
[MIRPrinter] Don't print line break when there is no instructions (NFC) (#80147)
Per #80143, we can remove the extra line break when there is no
instruction.
2024-02-01 22:10:52 +08:00
Simon Pilgrim
ea2984287d [ARM] Add ctpop codegen tests 2024-02-01 11:42:18 +00:00
Quentin Dian
b7738e275d
[MIRPrinter] Don't print space when there is no successor (#80143)
Extra space causes the checks generated by update_mir_test_checks to be
unavailable.

```
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
# RUN: llc -mtriple=x86_64-- -o - %s -run-pass=none -verify-machineinstrs -simplify-mir | FileCheck %s
---
name: foo
body: |
  ; CHECK-LABEL: name: foo
  ; CHECK: bb.0:
  ; CHECK-NEXT:   successors:
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT: bb.1:
  ; CHECK-NEXT:   RET 0, $eax
  bb.0:
    successors:

  bb.1:
    RET 0, $eax
...
```

The failure log is as follows:

```
llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir:9:16: error: CHECK-NEXT: is on the same line as previous match
 ; CHECK-NEXT: {{ $}}
               ^
<stdin>:21:13: note: 'next' match was here
 successors:
            ^
<stdin>:21:13: note: previous match ended here
 successors:
```
2024-01-31 22:35:41 +08:00
Alfie Richards
de75e5079a
[ARM][NEON] Add constraint to vld2 Odd/Even Pseudo instructions. (#79287)
This ensures the odd/even pseudo instructions are allocated to the same
register range.

This fixes #71763
2024-01-31 14:08:02 +00:00
Fangrui Song
4cb90ca8f8
[Thumb,ELF] Fix access to dso_preemptable __stack_chk_guard with static relocation model (#78950)
PR #70014 fixes A32 to use GOT for dso_preemptable `__stack_chk_guard`
with static relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie
linking).

This patch fixes such `__stack_chk_guard` access for Thumb1 and Thumb2.
Note: `t2LDRLIT_ga_pcrel` is only for ELF. mingw needs
`.refptr.__stack_chk_guard` (https://reviews.llvm.org/D92738).

Fix #64999
2024-01-22 13:16:31 -08:00
Fangrui Song
3b943c0203 [Thumb,test] Improve __stack_chk_guard test 2024-01-22 00:20:40 -08:00
Simon Pilgrim
d92ce344bf Revert faecc736e2ac3cd8c77 #74443 [DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value (#74443)
Relying on ComputeKnownBits to find a splat is causing miscompilations where a shift of zero is being assumed to give zero, but further simplification leads to a shift of zero by undef, resulting in an unexpected undef value.

Fixes #78109
2024-01-17 15:59:33 +00:00
Alfie Richards
60c775769b
[ARM] Add missing earlyclobber to sqrshr and uqrshl instructions. (#77782)
This avoids possible undefined behavior using the same register for Rm
and Rda.

Additionally adds a check in MC to produce an error upon parsing this
case.
2024-01-16 10:30:16 +00:00
Nick Anderson
f1ec0d12bb
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #75380

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-09 13:32:59 +07:00
Orlando Cazalet-Hyams
10b03e6662
[RemoveDIs] Handle DPValues in FastISel (#76952)
The change is fairly mechanical:
1. Factor code from `FastISel::selectIntrinsicCall`, which converts
debug intrinsics into debug instructions, into functions (NFC).
2. Call those functions for DPValues attached to instructions too.

The test updates look the same as other RemoveDIs changes: re-run the
tests with `--try-experimental-debuginfo-iterators`, which checks the
output is identical using the new debug info format (if it has been
enabled in the cmake configuration).

Depends on #76941 (otherwise some modified tests spuriously fail).
2024-01-05 15:11:47 +00:00
Simon Pilgrim
7648371c25 Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054)"
Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)"

Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104
2024-01-05 12:28:10 +00:00
Nick Anderson
e0c554ad87
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #64560

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-05 13:47:56 +07:00
Serge Pavlov
2f81788067
[ARM][FPEnv] Lowering of fpmode intrinsics (#74054)
LLVM intrinsics `get_fpmode`, `set_fpmode` and `reset_fpmode` operate
control modes, the bits of FP environment that affect FP operations. On
ARM these bits are in FPSCR together with the status bits. The
implementation of these intrinsics produces code close to that of
functions `fegetmode` and `fesetmode` from GLIBC.

Pull request: https://github.com/llvm/llvm-project/pull/74054
2023-12-18 18:57:36 +07:00
ostannard
4888218d03
[ARM] Do not emit unwind tables when saving LR around outlined call (#69611)
In some cases, the machine outliner needs to preserve LR across an
outlined call by pushing it onto the stack. Previously, this also
generated unwind table instructions, which is incorrect because EHABI
unwind tables cannot represent different stack frames a different points
in the function, so the extra unwind info applied to the entire
function.

The outliner code already avoided generating CFI instructions, but EHABI
unwind data is generated later from the actual instructions, so we need
to avoid using the FrameSetup and FrameDestroy flags to prevent unwind
data being generated.
2023-12-14 14:46:13 +00:00
Shih-Po Hung
b97c5a9554
[VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
- Replace `undef` pointer operands and add stores to avoid the loads
   being optmized away.
2023-12-14 21:16:11 +08:00
Simon Pilgrim
b7fc78255e Revert rG2047ab00eaf0a17e71ce5e8a5b27a8c90f034c3d "[VPlan] Add a test for testing unused interleave recipes (#75026)"
vplan-unused-interleave-group.ll is causing buildbot failures
2023-12-14 10:25:41 +00:00
Shih-Po Hung
2047ab00ea
[VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
   - Replace `undef` pointer operands and add stores to avoid the loads
      being optmized away.
2023-12-14 17:36:58 +08:00
paperchalice
b0cc42ae0f
[CodeGen] Port SjLjEHPrepare to new pass manager (#75023)
`doInitialization` in `SjLjEHPrepare` is trivial.

This is the last pass suffix with `ehprepare`.
2023-12-12 16:07:26 +08:00
Simon Pilgrim
faecc736e2
[DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value (#74443) 2023-12-08 10:53:51 +00:00
Simon Pilgrim
22df0886a1
[DAG] Don't split f64 constant stores if the fp imm is legal (#74622)
If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores

Another cleanup step for #74304
2023-12-07 10:33:03 +00:00
Simon Pilgrim
609d980b3f [ARM] Regenerate aapcs-hfa-code.ll 2023-12-06 12:09:30 +00:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
simpal01
74cdb8e6f8
[llvm][ARM] Emit MVE .arch_extension after .fpu directive if it does not include MVE features (#71545)
The floating-point and MVE features together specify the MVE
functionality that is supported on the Cortex-M85 processor. But the FPU
extension for the underlying architecture(armv8.1-m.main) is FPV5 which
does not include MVE-F. So Compiler's -S output and `-save-temps=obj`
loses MVE feature which leads to assembler error. What happening here is
.fpu directive overrides any previously set features by .cpu directive.
Since the the corresponding .fpu generated (.fpu fpv5-d16) does not
include MVE-F, it overrides those features even though it is supported
and set by the .cpu directive. Looks like .fpu is supposed to do this.

In this case, there should be an .arch_extension directive re-enabling
the relevant extensions after .fpu if the goal is to keep these
extensions enabled. GCC also does the same.

So this patch enables the MVE features by emitting the below arch
extension:
  .fpu fpv5-d16
  .arch_extension mve.fp

---------

Co-authored-by: Simi Pallipurath <simi.pallipurath.com>
2023-11-22 09:16:58 +00:00
Serge Pavlov
a2e1de1934 [ARM][FPEnv] Lowering of fpenv intrinsics
The change implements lowering of `get_fpenv`, `set_fpenv` and
`reset_fpenv`.

Differential Revision: https://reviews.llvm.org/D81843
2023-11-20 15:08:25 +07:00
Tavian Barnes
75cf672b12
[SDAG] Simplify is-power-of-2 codegen (#72275)
When x is not known to be nonzero, ctpop(x) == 1 is expanded to

    x != 0 && (x & (x - 1)) == 0

resulting in codegen like

    leal    -1(%rdi), %eax
    testl   %eax, %edi
    sete    %cl
    testl   %edi, %edi
    setne   %al
    andb    %cl, %al

But another expression that works is

    (x ^ (x - 1)) > x - 1

which has nicer codegen:

    leal    -1(%rdi), %eax
    xorl    %eax, %edi
    cmpl    %eax, %edi
    seta    %al
2023-11-15 22:26:34 +09:00
Serge Pavlov
5b0f703918 Revert "[ARM][FPEnv] Lowering of fpenv intrinsics"
This reverts commit d62f040418bd167d1ddd2b79c640a90c0c2ea353.
Some cuda buildbots start failing.
2023-11-10 16:24:51 +07:00
Serge Pavlov
d62f040418 [ARM][FPEnv] Lowering of fpenv intrinsics
The change implements lowering of `get_fpenv`, `set_fpenv` and
`reset_fpenv`.

Differential Revision: https://reviews.llvm.org/D81843
2023-11-10 16:06:33 +07:00
Nikita Popov
e4a4122eb6
[IR] Remove zext and sext constant expressions (#71040)
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.

There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-03 10:46:07 +01:00
Tobias Stadler
373c343a77 Reland: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND
Reland 3686a0b after fixing an exposed miscompile in #68840

Differential Revision: https://reviews.llvm.org/D159140
2023-11-02 00:18:19 +01:00
Fangrui Song
5888dee7d0
[ARM,ELF] Fix access to dso_preemptable __stack_chk_guard with static relocation model (#70014)
The ELF code from https://reviews.llvm.org/D112811 emits LDRLIT_ga_pcrel
when `TM.isPositionIndependent()` but uses a different condition
`Subtarget.isGVIndirectSymbol(GV)` (aka dso_preemptable on ELF targets).
This would cause incorrect access for dso_preemptable
`__stack_chk_guard` with the static relocation model.

Regarding whether `__stack_chk_guard` gets the dso_local specifier,
https://reviews.llvm.org/D150841 switched to
`M.getDirectAccessExternalData()` (implied by "PIC Level") instead of
`TM.getRelocationModel() == Reloc::Static`.

The result is that when non-zero "PIC Level" is used with static
relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie linking),
`__stack_chk_guard` accesses are incorrect.
```
        ldr     r0, .LCPI0_0
        ldr     r0, [r0]
        ldr     r0, [r0]   // incorrectly dereferences __stack_chk_guard
...
.LCPI0_0:
        .long   __stack_chk_guard
```

To fix this, for dso_preemptable `__stack_chk_guard`, emit a GOT PIC
code sequence like for -fpic using `LDRLIT_ga_pcrel`:
```
        ldr     r0, .LCPI0_0
.LPC0_0:
        add     r0, pc, r0
        ldr     r0, [r0]
        ldr     r0, [r0]
...
LCPI0_0:
.Ltmp0:
        .long   __stack_chk_guard(GOT_PREL)-((.LPC0_0+8)-.Ltmp0)
```

Technically, `LDRLIT_ga_abs` with `R_ARM_GOT_ABS` could be used, but
`R_ARM_GOT_ABS` does not have GNU or integrated assembler support.
(Note, `.LCPI0_0: .long __stack_chk_guard@GOT` produces an
`R_ARM_GOT_BREL`, which is not desired).

This patch fixes #6499 while not changing behavior for the following
configurations:
```
run arm.linux.nopic --target=arm-linux-gnueabi -fno-pic
run arm.linux.pie --target=arm-linux-gnueabi -fpie
run arm.linux.pic --target=arm-linux-gnueabi -fpic
run armv6.darwin.nopic --target=armv6-apple-darwin -fno-pic
run armv6.darwin.dynamicnopic --target=armv6-apple-darwin -mdynamic-no-pic
run armv6.darwin.pic --target=armv6-apple-darwin -fpic
run armv7.darwin.nopic --target=armv7-apple-darwin -mcpu=cortex-a8 -fno-pic
run armv7.darwin.dynamicnopic --target=armv7-apple-darwin -mcpu=cortex-a8 -mdynamic-no-pic
run armv7.darwin.pic --target=armv7-apple-darwin -mcpu=cortex-a8 -fpic
run arm64.darwin.pic --target=arm64-apple-darwin
```
2023-10-31 15:37:26 -07:00
Fangrui Song
6ae7b735db [ARM][test] Improve stack-protector tests
llvm/test/LTO/ARM/ssp-static-reloc.ll is more about using the static
relocation model with "PIC Level" and unrelated to the LTO infrastructure.
Move the test. Update stack_guard_remat.ll to clearly test "PIC Level"
with the relevant relocation models.
2023-10-31 15:30:08 -07:00
XChy
fc6bdb8549
[SimplifyCFG] Reland transform for redirecting phis between unmergeable BB and SuccBB (#68473)
Reland #67275 with #68953 resolved.
2023-10-28 17:10:20 +08:00
Matthias Braun
e3cf80c5c1
BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
2023-10-24 20:27:39 -07:00