5175 Commits

Author SHA1 Message Date
David Green
22968f5b4a
[DAG] Add strictfp implicit def reg after metadata. (#168282)
This prevents a machine verifier error, where it "Expected implicit
register after groups".

Fixes #158661
2025-11-17 10:57:21 +00:00
hstk30-hw
51c8180515
[GlobalMerge]Prefer use global-merge-max-offset instead of the target-specific constant offset. (#165591)
In the Dhrystone benchmark, I find some adjacent global not be merged,
on the contrary the GCC's anchor optimize is work. Use
global-merge-max-offset to set the max offset can yield similar results
(still slightly different, at least we can control the offset).
2025-11-17 15:37:51 +08:00
Austin
700aa5e376
[revert][CodeGen] add a command to force global merge (#168230)
sorry, this was my mistake
2025-11-16 03:40:07 +08:00
Austin
3705921f60 [CodeGen] add a command to force global merge
I found that in some performance scenarios, such as under O2, this pr can be helpful for a series of loading global variables.
2025-11-16 03:20:27 +08:00
Amara Emerson
18f29a5810
[ARM] Fix not saving FP when required to in frame-pointer=non-leaf. (#163699)
When the stars align to conspire against stack alignment, when we have
frame-pointer=non-leaf we can incorrectly skip preserving fp/r7 in the
prolog.

The fix here first makes sure we're using the right frame pointer
register in the context of preserving the incoming FP, and then make sure that we
save the FP when re-alignment is known to be necessary.

rdar://162462271
2025-11-12 16:31:25 -08:00
David Tellenbach
a01a921004
[ARM] Prevent stack argument overwrite during tail calls (#166492)
For tail-calls we want to re-use the caller stack-frame and potentially
need to copy stack arguments.

For large stack arguments, such as by-val structs, this can lead to
overwriting incoming stack arguments when preparing outgoing ones by
copying them. E.g., in cases like

        %"struct.s1" = type { [19 x i32] }

        define void @f0(ptr byval(%"struct.s1") %0, ptr %1) {
        tail call  void @f1(ptr %1, ptr byval(%"struct.s1") %0)
        ret void
        }

        declare  void @f1(ptr, ptr)

that swap arguments, the last bytes of %0 are on the stack, followed by
%1. To prepare the outgoing arguments, %0 needs to be copied and %1
needs to be loaded into r0. However, currently the copy of %0
overwrites the location of %1, resulting in loading garbage into r0.

We fix that by forcing the load to the pointer stack argument to happen
before the copy.
2025-11-12 23:38:48 +00:00
Matt Arsenault
782759b757
DAG: Use poison when widening build_vector (#167631)
Test changes are mostly noise. There are a few improvements and a few
regressions.
2025-11-12 20:17:41 +00:00
David Green
4d1f2492d2
[ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329)
The subtarget may not be set if no functions are present in the module.
Attempt to use the TargetMachine directly in more cases.

Fixes #165422
Fixes #167577
2025-11-12 16:26:21 +00:00
Matt Arsenault
821d2825a4
RuntimeLibcalls: Remove incorrect sincospi from most targets (#166982)
sincospi/sincospif/sincospil does not appear to exist on common
targets. Darwin targets have __sincospi and __sincospif, so define
and use those implementations. I have no idea what version added
those calls, so I'm just guessing it's the same conditions as
__sincos_stret.

Most of this patch is working to preserve codegen when a vector
library is explicitly enabled. This only covers sleef and armpl,
as those are the only cases tested.

The multiple result libcalls have an aberrant process where the
legalizer looks for the scalar type's libcall in RuntimeLibcalls,
and then cross references TargetLibraryInfo to find a matching
vector call. This was unworkable in the sincospi case, since the
common case is there is no scalar call available. To preserve
codegen if the call is available, first try to match a libcall
with the vector type before falling back on the old scalar search.

Eventually all of this logic should be contained in RuntimeLibcalls,
without the link to TargetLibraryInfo. In principle we should perform
the same legalization logic as for an ordinary operation, trying
to find a matching subvector type with a libcall.
2025-11-10 11:05:08 -08:00
Matt Arsenault
5e7f7a496c
ARM: Add fp128 ldexp tests (#166619) 2025-11-05 22:42:59 -08:00
Prabhu Rajasekaran
f60e69315e
[llvm] Emit canonical linkage correct function symbol (#166487)
In the call graph section, we were emitting the temporary label
pointing to the start of the function instead of the canonical linkage
correct function symbol. This patch fixes it and updates the
corresponding tests.
2025-11-05 09:22:08 -08:00
Matt Arsenault
4d98ee2a22
ARM: Add watchos run line to llvm.sincos test (#166271) 2025-11-03 18:20:24 -08:00
Matt Arsenault
c77b614564
ARM: Add more ABIs to llvm.sincos test (#166264)
Make sure the iOS with/without sincos_stret are tested
2025-11-03 16:00:54 -08:00
Erik Enikeev
1523332fbd
[ARM] Mark function calls as possibly changing FPSCR (#160699)
This patch does the same changes as D143001 for AArch64.

This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
2025-10-30 16:36:55 +00:00
Erik Enikeev
242ebcf13e
[ARM] Add instruction selection for strict FP (#160696)
This consists of marking the various strict opcodes as legal, and
adjusting instruction selection patterns so that 'op' is 'any_op'. The
changes are similar to those in D114946 for AArch64.

Custom lowering and promotion are set for some FP16 strict ops to work
correctly.

This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
2025-10-29 21:43:43 +00:00
AZero13
5d0f1591f8
[DAGCombine] Improve bswap lowering for machines that support bit rotates (#164848)
Source: Hacker's delight.
2025-10-25 10:17:15 -07:00
David Green
a1e59bdc17
[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508)
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.

Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
2025-10-24 08:21:35 +01:00
Kees Cook
d130f40264
[ARM][KCFI] Add backend support for Kernel Control-Flow Integrity (#163698)
Implement KCFI (Kernel Control Flow Integrity) backend support for
ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via
Clang's generic KCFI implementation, but this has finally started to
[cause problems](https://github.com/ClangBuiltLinux/linux/issues/2124)
so it's time to get the KCFI operand bundle lowering working on ARM.

Supports patchable-function-prefix with adjusted load offsets. Provides
an instruction size worst case estimate of how large the KCFI bundle is
so that range-limited instructions (e.g. cbz) know how big the indirect
calls can become.

ARM implementation notes:
- Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte
  to work within ARM's modified immediate encoding constraints.
- Scratch register selection: r12 (IP) is preferred, r3 used as fallback
  when r12 holds the call target. r3 gets spilled/reloaded if it is
  being used as a call argument.
- UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar
  to aarch64's trap encoding.

Thumb2 implementation notes:
- Logically the same as ARM
- UDF trap encoding: 0x80 | target_reg_index

Thumb1 implementation notes:
- Due to register pressure, 2 scratch registers are needed: r3 and r2,
  which get spilled/reloaded if they are being used as call args.
- Instead of EOR, add/lsl sequence to load immediate, followed by
  a compare.
- No trap encoding.

Update tests to validate all three sub targets.
2025-10-23 08:27:13 -07:00
paperchalice
542703fa68
[test][ARM] Remove unsafe-fp-math-uses (NFC) (#164744)
Post cleanup for #164534.
2025-10-23 15:07:46 +08:00
Prabhu Rajasekaran
b7c7083c1f
[llvm] Update call graph ELF section type. (#164461)
Make call graph section to have a dedicated type instead of the generic
progbits type.
2025-10-22 15:08:36 -07:00
David Green
6d5dea63ed
[ARM][SDAG] Add llvm.lround half promotion. (#164235)
Similar to #161088, add llvm.lround and llvm.llround promotion.
2025-10-21 16:56:55 +01:00
Prabhu Rajasekaran
cac8bdb56c
[NFC][llvm] Update call graph section's name. (#163429)
Call graph section emitted by LLVM was named `.callgraph`. Renaming it
to `.llvm.callgraph`.
2025-10-15 07:52:54 -07:00
paperchalice
bfee9db785
[DAGCombiner] Remove NoNaNsFPMath uses (#163504)
Users should use `nnan` flag instead.
2025-10-15 21:22:13 +08:00
Simon Pilgrim
4c3ec9cda0
[ARM] carry.ll - regenerate test checks (#163172) 2025-10-13 11:12:09 +00:00
Yatao Wang
c4bcbf02a5
[GlobalISel] Add G_SUB for computeNumSignBits (#158384)
This patch ports the ISD::SUB handling from SelectionDAG’s ComputeNumSignBits to GlobalISel.

Related to https://github.com/llvm/llvm-project/issues/150515.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-10-13 10:45:26 +00:00
beetrees
11571a005a
Fix legalizing FNEG and FABS with TypeSoftPromoteHalf (#156343)
Based on top of #157211.

`FNEG` and `FABS` must preserve signalling NaNs, meaning they should not
convert to f32 to perform the operation. Instead legalize to `XOR` and
`AND`.

Fixes almost all of #104915
2025-10-11 11:08:26 +09:00
Prabhu Rajasekaran
6fb87b231f
[llvm][AsmPrinter] Call graph section format. (#159866)
Make .callgraph section's layout efficient in space. Document the layout
of the section.
2025-10-10 12:20:11 -07:00
Brad Smith
31e85cc572
[Android] Drop workarounds for older Android API levels pre 9, 17 and 21 (#161911)
Drop workarounds for Android API levels pre 9, 17, 21.

The minimum Android API currently supported by the LTS NDK is 21.
2025-10-10 03:59:44 -04:00
Erik Enikeev
5c613f287d
[ARM] Add mayRaiseFPException to appropriate instructions and mark all instructions that read/write fpscr rounding bits as doing so (#160698)
Added new register FPSCR_RM to correctly model interactions with
rounding mode control bits of fpscr and to avoid performance regressions
in normal non-strictfp case

This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
2025-10-07 22:19:53 +01:00
David Green
125f0ac757
[ARM][SDAG] Half promote llvm.lrint nodes. (#161088)
As shown in #137101, fp16 lrint are not handled correctly on Arm. This
adds soft-half promotion for them, reusing the function that promotes a
value with operands (and can handle strict fp once that is added).
2025-10-07 22:04:39 +01:00
Luke Lau
795a115d19
[RegAlloc] Remove default restriction on non-trivial rematerialization (#159211)
In the register allocator we define non-trivial rematerialization as the
rematerlization of an instruction with virtual register uses.

We have been able to perform non-trivial rematerialization for a while,
but it has been prevented by default unless specifically overriden by
the target in `TargetTransformInfo::isReMaterializableImpl`. The
original reasoning for this given by the comment in the default
implementation is because we might increase a live range of the virtual
register, but we don't actually do this.
LiveRangeEdit::allUsesAvailableAt makes sure that we only rematerialize
instructions whose virtual registers are already live at the use sites.

https://reviews.llvm.org/D106408 had originally tried to remove this
restriction but it was reverted after some performance regressions were
reported. We think it is likely that the regressions were caused by the
fact that the old isTriviallyReMaterializable API sometimes returned
true for non-trivial rematerializations.

However https://github.com/llvm/llvm-project/pull/160377 recently split
the API out into a separate non-trivial and trivial version and updated
the call-sites accordingly, and
https://github.com/llvm/llvm-project/pull/160709 and #159180 fixed
heuristics which weren't accounting for the difference between
non-trivial and trivial.

With these fixes in place, this patch proposes to again allow
non-trivial rematerialization by default which reduces a significant
amount of spills and reloads across various targets.

For llvm-test-suite built with -O3 -flto, we get the following geomean
reduction in reloads:

- arm64-apple-darwin: 11.6%
- riscv64-linux-gnu: 8.1%
- x86_64-linux-gnu: 6.5%
2025-10-04 22:50:44 +00:00
David Green
9e4af2ffa6 [ARM] Update and cleanup lround/llround tests. NFC
Similar to f4370fb801aa, the fp16 tests do not work yet.
2025-10-04 19:52:46 +01:00
Yatao Wang
178e2a704b
[LLVM][CodeGen] Check Non Saturate Case in isSaturatingMinMax (#160637)
Fix Issue #160611
2025-10-03 20:39:45 +01:00
AZero13
90582ad284
[ARM] shouldFoldMaskToVariableShiftPair should be true for scalars up to the biggest legal type (#158070)
For ARM, we want to do this up to 32-bits. Otherwise the code ends up
bigger and bloated.
2025-10-03 08:10:22 +01:00
David Green
f4370fb801 [ARM] Update and cleanup lrint/llrint tests. NFC
Most of the fp16 cases still do not work properly. See #161088.
2025-10-02 21:51:45 +01:00
Matt Arsenault
c6e280e7ed
PeepholeOpt: Fix losing subregister indexes on full copies (#161310)
Previously if we had a subregister extract reading from a
full copy, the no-subregister incoming copy would overwrite
the DefSubReg index of the folding context.

There's one ugly rvv regression, but it's a downstream
issue of this; an unnecessary same class reg-to-reg full copy
was avoided.
2025-10-02 13:36:47 +09:00
Un1q32
133406e3d9
Reserve R9 on armv6 iOS 2.x (#150835)
The iOS 2.x ABI had R9 as a reserved register, 3.0 made it available,
but support for the 2.x ABI was never added to LLVM. We only use the 2.x
ABI on armv6 since before 3.0 armv6 was the only architecture supported
by iOS.
2025-09-30 21:05:27 -07:00
Matt Arsenault
9811226967
PeepholeOpt: Try to constrain uses to support subregister (#161338)
This allows removing a special case hack in ARM. ARM's implementation
of getExtractSubregLikeInputs has the strange property that it reports
a register with a class that does not support the reported subregister
index. We can however reconstrain the register to support this usage.

This is an alternative to #159600. I've included the test, but
the output is different. In this case version the VMOVSR is
replaced with an ordinary subregister extract copy.
2025-10-01 00:18:51 +09:00
paperchalice
8ce3b8b518
[ARM] Remove UnsafeFPMath uses (#151275)
Try to remove `UnsafeFPMath` uses in arm backend. These global flags
block some improvements like
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797.
Remove them incrementally.
2025-09-28 13:50:20 +08:00
David Green
9bf51b2b19
[ARM] Generate build-attributes more correctly in the presence of intrinsic declarations. (#160749)
This code doesn't work very well, but this makes it work when intrinsic
definitions are present. It now discounts functions declarations from
the set of attributes it looks at.

The code would have worked better before
0ab5b5b8581d9f2951575f7245824e6e4fc57dec when module-level attributes
could provide the information used to construct build-attributes.
2025-09-27 16:50:48 +01:00
David Green
02746f80c1 [ARM] Remove -fno-unsafe-math from a number of tests. NFC
llvm.convert/to.fp16 and from.fp16 are no longer used / deprecated and do not
need to be tested any more.
2025-09-26 11:48:34 +01:00
paperchalice
3257dc35fe
[ARM] Remove UnsafeFPMath uses in code generation part (#160801)
Factor out from #151275
Remove all UnsafeFPMath uses but ABI tags related part.
2025-09-26 15:54:30 +08:00
paperchalice
add906ffe4
[ARM] Consider denormal mode in ARMSubtarget (#160456)
Factor out from #151275.
Add denormal mode to subtarget.
2025-09-25 07:51:48 +08:00
Simon Pilgrim
6f188056b3
[ARM] ha-alignstack-call.ll - regenerate test checks (#159988) 2025-09-21 16:16:08 +00:00
Mikhail Gudim
562146499c
[CodeGen][NewPM] Port ReachingDefAnalysis to new pass manager. (#159572)
In this commit:
  (1) Added new pass manager support for `ReachingDefAnalysis`.
  (2) Added printer pass.
  (3) Make old pass manager use `ReachingDefInfoWrapperPass`
2025-09-19 09:38:34 -04:00
Nikita Popov
1723f80b08
[ARM] Allow s constraints on half (#157860)
Fix a regression from https://github.com/llvm/llvm-project/pull/147559.
2025-09-11 08:50:32 +02:00
Matt Arsenault
fc0f1fc695
ARM: Move remaining half convert libcall config into tablegen (#153408)
The __truncdfhf2 handling is kind of convoluted, but reproduces
the existing, likely wrong, handling.
2025-09-11 12:11:46 +09:00
Francesco Petrogalli
f82023d72e
[clang][driver][arm][macho] Default to -mframe-pointer=non-leaf. (#154216)
The commit in [1] changes the behavior of the Arm backend for the
attribute frame-pointer=all. Before [1], leaf functions marked with
frame-pointer=all were not emitting the frame-pointer.

After [1], frame-pointer=all started generating frame pointer for all
functions, including leaf functions.

However, the default behavior for the driver in clang is to emit the
command line option `-mframe-pointer=all` on Arm, if no options for
handling the frame pointer is specified at command line. This causes
observable regressions.

This patch addresses these regressions by configuring the driver so
to emit `-mframe-pointer=non-leaf` when targeting Arm.

Codegen tests dealing with frame pointer generation have been extended
to handle functions with a tail call, since this configuration was
missing.

[1] 4a2bd78f5b0d0661c23dff9c4b93a393a49dbf9a
2025-09-09 18:39:26 +00:00
paperchalice
667f919214
[SelectionDAG][ARM] Propagate fast math flags in visitBRCOND (#156647)
Factor out from #151275.
2025-09-06 20:44:25 +08:00
Nikita Popov
3f757a39f2
[CodeGen] Remove ExpandInlineAsm hook (#156617)
This hook replaces inline asm with LLVM intrinsics. It was intended to
match inline assembly implementations of bswap in libc headers and
replace them more optimizable implementations.

At this point, it has outlived its usefulness (see
https://github.com/llvm/llvm-project/issues/156571#issuecomment-3247638412),
as libc implementations no longer use inline assembly for this purpose.

Additionally, it breaks the "black box" property of inline assembly,
which some languages like Rust would like to guarantee.

Fixes https://github.com/llvm/llvm-project/issues/156571.
2025-09-04 09:28:11 +02:00