4878 Commits

Author SHA1 Message Date
Simon Pilgrim
b2c5e9b9bf [ARM] iabs.ll - regenerate test checks 2024-05-14 16:36:39 +01:00
David Green
8fc9e3d577
[DAG] Lower frem of power-2 using div/trunc/mul+sub (#91148)
If we are lowering a frem and the divisor is known to be an integer power-2, we
can use the formula 'frem = x - trunc(x / d) * d'. This avoids the more
expensive call to fmod. The results are identical as fmod so long as d is a
power-2 (so the mul does not round incorrectly), and the sign of the return is
either always positive or not important for zeroes (nsz).

Unfortunately Alive2 does not handle this well at the moment. I was using
exhaustive checking to test this:
(https://gist.github.com/davemgreen/6078015f30d3bacd1e9572f8db5d4b64).

I found this in cpythons implementation of float_pow. I currently added it as a
DAG combine for frem with power-2 fp constants.
2024-05-10 14:58:48 +01:00
David Green
9e1a49cba7 [AArch64][ARM] Add tests for frem power2 lowering. NFC 2024-05-09 17:15:35 +01:00
Chris Copeland
651bdb96b1
[ARM] Armv8-R does not require fp64 or neon. (#88287)
This was [addressed for AArch64
here](https://github.com/llvm/llvm-project/pull/79004), but the same
applies to ARM.

Move the enablement of neon+fp64 to `-mcpu=cortex-r52`, which optionally
supports these features.
2024-05-07 11:48:30 +01:00
Eleanor Bonnici
c12bc57e23
Do not use R12 for indirect tail calls with PACBTI (#82661)
When compiling for thumbv8.1m with +pacbti and making an indirect tail
call, the compiler was free to put the function pointer into R12.

This is incorrect because R12 is restored to contain authentication code
for the caller's return address.

This patch excludes R12 from the set of registers the compiler can put
the function pointer in.

Fixes https://github.com/llvm/llvm-project/issues/75998
2024-04-30 15:29:07 +01:00
Qiu Chaofan
4a8f2f2e1a
[Legalizer] Expand fmaximum and fminimum (#67301)
According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and
propagates NaN.

Expand the nodes on targets not supporting the operation, by adding
extra check for NaN and using is_fpclass to check zero signs.
2024-04-29 15:09:54 +08:00
Yingwei Zheng
4d28d3f93b
[SDAG] Turn umin into smin if the saturation pattern is broken (#88505)
As we canonicalizes smin with non-negative operands into umin in the
middle-end, the saturation pattern will be broken.
This patch reverts the transform in DAGCombine to fix the regression on
ARM.

Fixes https://github.com/llvm/llvm-project/issues/85706.
2024-04-16 01:28:28 +08:00
Victor Campos
45137766ca
[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries (#87982)
Following https://github.com/llvm/llvm-project/pull/68313 this patch
extends the idea to M-profile PACBTI.

The Machine Scheduler can reorder instructions within a scheduling
region depending on the scheduling policy set. If a BTI-clearing
instruction happens to partake in one such region, it might be moved
around, therefore ending up where it shouldn't.

The solution is to mark all BTI-clearing instructions as scheduling
region boundaries. This essentially means that they must not be part of
any scheduling region, and as consequence never get moved:

 - PAC
 - PACBTI
 - BTI
 - SG

Note that PAC isn't BTI-clearing, but it's replaced by PACBTI late in
the compilation pipeline.

As far as I know, currently it isn't possible to organically obtain code
that's susceptible to the bug:

- Instructions that write to SP are region boundaries. PAC seems to
always be followed by the pushing of r12 to the stack, so essentially
PAC is always by itself in a scheduling region.
- CALL_BTI is expanded into a machine instruction bundle. Bundles are
unpacked only after the last machine scheduler run. Thus setjmp and BTI
can be separated only if someone deliberately run the scheduler once
more.
- The BTI insertion pass is run late in the pipeline, only after the
last machine scheduling has run. So once again it can be reordered only
if someone deliberately runs the scheduler again.

Nevertheless, one can reasonably argue that we should prevent the bug in
spite of the compiler not being able to produce the required conditions
for it. If things change, the compiler will be robust against this
issue.

The tests written for this are contrived: bogus MIR instructions have
been added adjacent to the BTI-clearing instructions in order to have
them inside non-trivial scheduling regions.
2024-04-15 10:58:30 +01:00
Björn Pettersson
33e6b488be
[SelectionDAG] Fix and improve TargetLowering::SimplifySetCC (#87646)
The load narrowing part of TargetLowering::SimplifySetCC is updated
according to this:

1) The offset calculation (for big endian) did not work properly for
   non byte-sized types. This is basically solved by an early exit
   if the memory type isn't byte-sized. But the code is also corrected
   to use the store size when calculating the offset.
2) To still allow some optimizations for non-byte-sized types the
   TargetLowering::isPaddedAtMostSignificantBitsWhenStored hook is
   added. By default it assumes that scalar integer types are padded
   starting at the most significant bits, if the type needs padding
   when being stored to memory.
3) Allow optimizing when isPaddedAtMostSignificantBitsWhenStored is
   true, as that hook makes it possible for TargetLowering to know
   how the non byte-sized value is aligned in memory.
4) Update the algorithm to always search for a narrowed load with
   a power-of-2 byte-sized type. In the past the algorithm started
   with the the width of the original load, and then divided it by
   two for each iteration. But for a type such as i48 that would
   just end up trying to narrow the load into a i24 or i12 load,
   and then we would fail sooner or later due to not finding a
   newVT that fulfilled newVT.isRound().
   With this new approach we can narrow the i48 load into either
   an i8, i16 or i32 load. By checking if such a load is allowed
(e.g. alignment wise) for any "multiple of 8 offset", then we can find
   more opportunities for the optimization to trigger. So even for a
   byte-sized type such as i32 we may now end up narrowing the load
   into loading the 16 bits starting at offset 8 (if that is allowed
   by the target). The old algorithm did not even consider that case.
5) Also start using getObjectPtrOffset instead of getMemBasePlusOffset
   when creating the new ptr. This way we get "nsw" on the add.
2024-04-12 16:18:12 +02:00
Bjorn Pettersson
bcf047a4ed [ARM][PowerPC] Add regression tests for narrowing load in TargetLowering::SimplifySetCC
These test cases show some miscomplies for big-endian when dealing
with non byte-sized loads. One part of the problem is that LLVM IR
isn't really telling where the padding goes for non byte-sized
loads/stores. So currently TargetLowering::SimplifySetCC can't assume
anything about it. But the implementation also do not consider that
the TypeStoreSize could be larger than the TypeSize, resulting in
the offset calculation being wrong for big-endian.

Pre-commit for https://github.com/llvm/llvm-project/pull/87646
2024-04-12 16:14:39 +02:00
Arthur Eubanks
5d6d8dcd29
[clang][llvm] Remove "implicit-section-name" attribute (#87906)
D33412/D33413 introduced this to support a clang pragma to set section
names for a symbol depending on if it would be placed in
bss/data/rodata/text, which may not be known until the backend. However,
for text we know that only functions will go there, so just directly set
the section in clang instead of going through a completely separate
attribute.

Autoupgrade the "implicit-section-name" attribute to directly setting
the section on a Fuction.
2024-04-11 12:29:29 -07:00
AtariDreams
c5d000b1a8
[Thumb] Resolve FIXME: Use 'mov hi, $src; mov $dst, hi' (#81908)
Consider the following:

        ldr     r0, [r4]
        ldr     r7, [r0, #4]
        cmp     r7, r3
        bhi     .LBB0_6
        cmp     r0, r2
        push    {r0}
        pop     {r4}
        bne     .LBB0_3
        movs    r0, r6
        pop     {r4, r5, r6, r7}
        pop     {r1}
        bx      r1

Here is a snippet of the generated THUMB1 code of the K&R malloc
function that clang currently compiles to.

push    {r0} ends up being popped to pop {r4}.

movs r4, r0 would destroy the flags set by cmp right above.

The compiler has no alternative in this case, except one:
the only alternative is to transfer through a high register.

However, it seems like LLVM does not consider that this is a valid
approach, even though it is a free clobbering a high register.

This patch addresses the FIXME so the compiler can do that when it can
in r10 or r11, or r12.
2024-04-05 10:18:22 +01:00
Victor Campos
74373c1bef
Revert "[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries" (#87699)
Reverts llvm/llvm-project#79173

The testcase fails in non-asserts builds.
2024-04-04 21:29:21 +01:00
Victor Campos
5ad320abe3
[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries (#79173)
Following https://github.com/llvm/llvm-project/pull/68313 this patch
extends the idea to M-profile PACBTI.

The Machine Scheduler can reorder instructions within a scheduling
region depending on the scheduling policy set. If a BTI-clearing
instruction happens to partake in one such region, it might be moved
around, therefore ending up where it shouldn't.

The solution is to mark all BTI-clearing instructions as scheduling
region boundaries. This essentially means that they must not be part of
any scheduling region, and as consequence never get moved:

 - PAC
 - PACBTI
 - BTI
 - SG

Note that PAC isn't BTI-clearing, but it's replaced by PACBTI late in
the compilation pipeline.

As far as I know, currently it isn't possible to organically obtain code
that's susceptible to the bug:

- Instructions that write to SP are region boundaries. PAC seems to
always be followed by the pushing of r12 to the stack, so essentially
PAC is always by itself in a scheduling region.
- CALL_BTI is expanded into a machine instruction bundle. Bundles are
unpacked only after the last machine scheduler run. Thus setjmp and BTI
can be separated only if someone deliberately run the scheduler once
more.
- The BTI insertion pass is run late in the pipeline, only after the
last machine scheduling has run. So once again it can be reordered only
if someone deliberately runs the scheduler again.

Nevertheless, one can reasonably argue that we should prevent the bug in
spite of the compiler not being able to produce the required conditions
for it. If things change, the compiler will be robust against this
issue.

The tests written for this are contrived: bogus MIR instructions have
been added adjacent to the BTI-clearing instructions in order to have
them inside non-trivial scheduling regions.
2024-04-04 12:44:32 +01:00
Jonas Paulsson
7564566779 Reapply "Move assertion for AdjustsStack from PEI to MachineVerifier (#85698)"
- The check is now actually done in both PEI and the MachineVerifier.
- More .mir tests trivially updated with "adjustsStack: true" as needed.
2024-03-21 20:24:57 -04:00
David Green
686f4599cf [ARM] Regenerate some check lines. NFC 2024-03-21 13:45:44 +00:00
Jonas Paulsson
9ebd329ad8 Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)"
This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92.

Reverting due to verifier complaints with expensive checks on build-bot.
2024-03-20 11:48:30 -04:00
Jonas Paulsson
05bde30585
Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698)
Have the verifier report a missing AdjustsStack flag rather than waiting until
PEI asserts.
2024-03-20 10:29:12 -04:00
Sivan Shani
5e688f0dbd [llvm][arm] add T1 and T2 assembly options for vlldm and vlstm
Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

T1 allow for an optional registers list,
the register list must be {d0-d15}.
T2 define a mandatory register list,
the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-03-11 14:27:28 +00:00
Jay Foad
fd3eaf76ba
[GISel] Enforce G_PTR_ADD RHS type matching index size for addr space (#84352) 2024-03-09 09:07:22 +00:00
David Green
5f058398ab [ARM] Mark AESD and AESE instructions as commutative.
Similar to #83390, this marks AESD and AESE as commutative, as the logic of the
instructions starts as a XOR between the two operands.
2024-03-03 16:56:21 +00:00
Fangrui Song
d89b771ef5 [ARM] Add alias tests for ROPI/RWPI
https://reviews.llvm.org/D23195 does not test aliases.
2024-03-02 10:33:57 -08:00
Tomas Matheson
03420f570e Revert "[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)"
This reverts commit 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

Failing EXPENSIVE_CHECKS builds with "undefined physical register".
2024-02-29 09:48:29 +00:00
SivanShani-Arm
634b0243b8
[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)
T1 allows for an optional registers list, the register list must be {d0-d15}.
T2 defines a mandatory register list, the register list must be {d0-d31}.

The requirements for T1/T2 are as follows:
                T1              T2
Require:        v8-M.Main,      v8.1-M.Main,
                secure state    secure state
16 D Regs       valid           valid
32 D Regs       UNDEFINED       valid
No D Regs       NOP             NOP
2024-02-28 17:02:51 +00:00
ostannard
749384c08e
[ARM] Update IsRestored for LR based on all returns (#82745)
PR #75527 fixed ARMFrameLowering to set the IsRestored flag for LR based
on all of the return instructions in the function, not just one.
However, there is also code in ARMLoadStoreOptimizer which changes
return instructions, but it set IsRestored based on the one instruction
it changed, not the whole function.

The fix is to factor out the code added in #75527, and also call it from
ARMLoadStoreOptimizer if it made a change to return instructions.

Fixes #80287.
2024-02-26 12:23:25 +00:00
Oliver Stannard
8779cf68e8 Pre-commit test showing bug #80287
This test shows the bug where LR is used as a general-purpose register
on a code path where it is not spilled to the stack.
2024-02-26 12:21:13 +00:00
Jack Styles
28233408a2
[CodeGen] [ARM] Make RISC-V Init Undef Pass Target Independent and add support for the ARM Architecture. (#77770)
When using Greedy Register Allocation, there are times where
early-clobber values are ignored, and assigned the same register. This
is illeagal behaviour for these intructions. To get around this, using
Pseudo instructions for early-clobber registers gives them a definition
and allows Greedy to assign them to a different register. This then
meets the ARM Architecture Reference Manual and matches the defined
behaviour.

This patch takes the existing RISC-V patch and makes it target
independent, then adds support for the ARM Architecture. Doing this will
ensure early-clobber restraints are followed when using the ARM
Architecture. Making the pass target independent will also open up
possibility that support other architectures can be added in the future.
2024-02-26 12:12:31 +00:00
Serge Pavlov
213b0ae497
[GlobalISel][ARM] legalize G_FPENV_RESET for soft-float mode (#81456) 2024-02-12 17:46:59 +07:00
Serge Pavlov
b0785cd1cb
[GlobalISel][ARM] Support missing case for G_CONSTANT (#80555)
Global Instruction Selector could not select the code:

    %0:gprb(s32) = G_CONSTANT i32 -1

In DAG selector the similar code is selected to the instruction MVNi
using custom operand `mod_imm_not`. Changing its definition from
`PatLeaf` to `ImmLeaf` and providing counterpart for `imm_not_XFORM`
make the relevant rule available for GlobalISel too.
2024-02-07 12:53:20 +07:00
Nikita Popov
b31fffbc7f [ARM] Convert tests to opaque pointers (NFC) 2024-02-05 13:56:59 +01:00
Craig Topper
6590d0fed5
[DAGCombiner][ARM] Teach reduceLoadWidth to handle (and (srl (load), C, ShiftedMask)) (#80342)
If we have a shifted mask, we may be able to reduce the load width
to the width of the non-zero part of the mask and use an offset
to the base address to remove the srl. The offset is given by
C+trailingzeros(ShiftedMask).
    
Then we add a final shl to restore the trailing zero bits.
    
I've use the ARM test because that's where the existing (and (srl
(load))) tests were.
    
The X86 test was modified to keep the H register.
2024-02-04 16:05:51 -08:00
Serge Pavlov
b4eb7a10c0
[GlobalISel][ARM] Legalze set_fpenv and get_fpenv (#79852)
Implement handling of get/set floating point environment for ARM in
Global Instruction Selector. Lowering of these intrinsics to operations
on FPSCR was previously inplemented in DAG selector, in GlobalISel it is
reused.
2024-02-04 12:30:33 +07:00
Harald van Dijk
52864d9c7b
[ARM] Switch to soft promoting half types. (#80440)
The traditional promotion is known to generate wrong code.

Fixes #73805.
2024-02-02 21:40:40 +00:00
Quentin Dian
112fba974c
[MIRPrinter] Don't print line break when there is no instructions (NFC) (#80147)
Per #80143, we can remove the extra line break when there is no
instruction.
2024-02-01 22:10:52 +08:00
Simon Pilgrim
ea2984287d [ARM] Add ctpop codegen tests 2024-02-01 11:42:18 +00:00
Quentin Dian
b7738e275d
[MIRPrinter] Don't print space when there is no successor (#80143)
Extra space causes the checks generated by update_mir_test_checks to be
unavailable.

```
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
# RUN: llc -mtriple=x86_64-- -o - %s -run-pass=none -verify-machineinstrs -simplify-mir | FileCheck %s
---
name: foo
body: |
  ; CHECK-LABEL: name: foo
  ; CHECK: bb.0:
  ; CHECK-NEXT:   successors:
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT: {{  $}}
  ; CHECK-NEXT: bb.1:
  ; CHECK-NEXT:   RET 0, $eax
  bb.0:
    successors:

  bb.1:
    RET 0, $eax
...
```

The failure log is as follows:

```
llvm/test/CodeGen/MIR/X86/unreachable-block-print.mir:9:16: error: CHECK-NEXT: is on the same line as previous match
 ; CHECK-NEXT: {{ $}}
               ^
<stdin>:21:13: note: 'next' match was here
 successors:
            ^
<stdin>:21:13: note: previous match ended here
 successors:
```
2024-01-31 22:35:41 +08:00
Alfie Richards
de75e5079a
[ARM][NEON] Add constraint to vld2 Odd/Even Pseudo instructions. (#79287)
This ensures the odd/even pseudo instructions are allocated to the same
register range.

This fixes #71763
2024-01-31 14:08:02 +00:00
Fangrui Song
4cb90ca8f8
[Thumb,ELF] Fix access to dso_preemptable __stack_chk_guard with static relocation model (#78950)
PR #70014 fixes A32 to use GOT for dso_preemptable `__stack_chk_guard`
with static relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie
linking).

This patch fixes such `__stack_chk_guard` access for Thumb1 and Thumb2.
Note: `t2LDRLIT_ga_pcrel` is only for ELF. mingw needs
`.refptr.__stack_chk_guard` (https://reviews.llvm.org/D92738).

Fix #64999
2024-01-22 13:16:31 -08:00
Fangrui Song
3b943c0203 [Thumb,test] Improve __stack_chk_guard test 2024-01-22 00:20:40 -08:00
Simon Pilgrim
d92ce344bf Revert faecc736e2ac3cd8c77 #74443 [DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value (#74443)
Relying on ComputeKnownBits to find a splat is causing miscompilations where a shift of zero is being assumed to give zero, but further simplification leads to a shift of zero by undef, resulting in an unexpected undef value.

Fixes #78109
2024-01-17 15:59:33 +00:00
Alfie Richards
60c775769b
[ARM] Add missing earlyclobber to sqrshr and uqrshl instructions. (#77782)
This avoids possible undefined behavior using the same register for Rm
and Rda.

Additionally adds a check in MC to produce an error upon parsing this
case.
2024-01-16 10:30:16 +00:00
Nick Anderson
f1ec0d12bb
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#77182)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #75380

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-09 13:32:59 +07:00
Orlando Cazalet-Hyams
10b03e6662
[RemoveDIs] Handle DPValues in FastISel (#76952)
The change is fairly mechanical:
1. Factor code from `FastISel::selectIntrinsicCall`, which converts
debug intrinsics into debug instructions, into functions (NFC).
2. Call those functions for DPValues attached to instructions too.

The test updates look the same as other RemoveDIs changes: re-run the
tests with `--try-experimental-debuginfo-iterators`, which checks the
output is identical using the new debug info format (if it has been
enabled in the cmake configuration).

Depends on #76941 (otherwise some modified tests spuriously fail).
2024-01-05 15:11:47 +00:00
Simon Pilgrim
7648371c25 Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054)"
Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)"

Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104
2024-01-05 12:28:10 +00:00
Nick Anderson
e0c554ad87
Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)
Port CodeGenPrepare to new pass manager and dependency
BasicBlockSectionsProfileReader
Fixes: #64560

Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>
2024-01-05 13:47:56 +07:00
Serge Pavlov
2f81788067
[ARM][FPEnv] Lowering of fpmode intrinsics (#74054)
LLVM intrinsics `get_fpmode`, `set_fpmode` and `reset_fpmode` operate
control modes, the bits of FP environment that affect FP operations. On
ARM these bits are in FPSCR together with the status bits. The
implementation of these intrinsics produces code close to that of
functions `fegetmode` and `fesetmode` from GLIBC.

Pull request: https://github.com/llvm/llvm-project/pull/74054
2023-12-18 18:57:36 +07:00
ostannard
4888218d03
[ARM] Do not emit unwind tables when saving LR around outlined call (#69611)
In some cases, the machine outliner needs to preserve LR across an
outlined call by pushing it onto the stack. Previously, this also
generated unwind table instructions, which is incorrect because EHABI
unwind tables cannot represent different stack frames a different points
in the function, so the extra unwind info applied to the entire
function.

The outliner code already avoided generating CFI instructions, but EHABI
unwind data is generated later from the actual instructions, so we need
to avoid using the FrameSetup and FrameDestroy flags to prevent unwind
data being generated.
2023-12-14 14:46:13 +00:00
Shih-Po Hung
b97c5a9554
[VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
- Replace `undef` pointer operands and add stores to avoid the loads
   being optmized away.
2023-12-14 21:16:11 +08:00
Simon Pilgrim
b7fc78255e Revert rG2047ab00eaf0a17e71ce5e8a5b27a8c90f034c3d "[VPlan] Add a test for testing unused interleave recipes (#75026)"
vplan-unused-interleave-group.ll is causing buildbot failures
2023-12-14 10:25:41 +00:00
Shih-Po Hung
2047ab00ea
[VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
   - Replace `undef` pointer operands and add stores to avoid the loads
      being optmized away.
2023-12-14 17:36:58 +08:00