5199 Commits

Author SHA1 Message Date
paperchalice
c53acf0443
[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904)
Replaced by checking fast-math flags or value tracking results.
2026-02-09 09:48:07 +08:00
Jay Foad
48619c8ab2
[ARM] Autogenerate checks for crypto intrinsics (#180147) 2026-02-06 14:33:22 +00:00
SiliconA-Z
37aba1b5d4
[ARM] Set operation action for UMULO and SMULO as Custom if not Thumb1 (#154253)
We should specify a custom lowering for SMULO and UMULO like we do for
AArch64, but only if not Thumb 1 obviously.
2026-02-05 08:47:56 -08:00
Matt Arsenault
2502e3b7ba
IR: Promote "denormal-fp-math" to a first class attribute (#174293)
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.

The syntax in the common cases looks like this:
  `denormal_fpenv(preservesign,preservesign)`
  `denormal_fpenv(float: preservesign,preservesign)`
  `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`

I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).

This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.

This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.

I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
2026-02-05 13:31:26 +00:00
paperchalice
d1598c96e0
[ARM] Recognize abi tag module flags (#161306)
Recognize abi tag hints from frontend rather than from architecture and
options.
Frontend part: #161106.
2026-02-05 12:08:22 +00:00
Simi Pallipurath
09a68427ff
[ARM] Lower unaligned loads/stores to aeabi functions. (#172672)
When targeting architectures that do not support unaligned memory
accesses or when explictly pass -mno-unaligned-access, it requires the
compiler to expand each unaligned load/store into an inline sequences.
For 32-bit operations this typically involves:

	1. 4× LDRB (or 2× LDRH),
	2. multiple shift/or instructions

These sequences are emitted at every unaligned access site, and
therefore contribute significant code size in workloads that touch
packed or misaligned structures.

When compiling with -Oz and in combination with -mno-unaligned-access,
this patch lowers unaligned 32 bit and 64 bit loads and stores to below
AEABI heper calls:
```
         __aeabi_uread4
	__aeabi_uread8
	__aeabi_uwrite4
	__aeabi_uwrite8
```

And it provide a way to perform unaligned memory accesses on targets
that do not support them, such as ARMv6-M or when compiling with
-mno-unaligned-access. Although each use introduces a function call
making it less straightforward than using raw loads and stores the call
itself is often much smaller than the compiler emitted sequence of
multiple ldrb/strb operations. As a result, these helpers can greatly
reduce code-size providing they are invoked more than once across a
program.

1. Functions become smaller in AEABI mode once they contain more than a
few unaligned accesses.
2. The total image .text size becomes smaller whenever multiple
functions call the same helpers.

This PR is derived from https://reviews.llvm.org/D57595, with some minor
changes.
 Co-authored-by: David Green
2026-02-02 16:32:12 +00:00
David Green
1bc655f52b [ARM] Expand and regenerate llvm/test/CodeGen/ARM/cls.ll. NFC 2026-02-02 11:28:07 +00:00
Nikita Popov
1bad00adc4
[SDAG] Remove non-canonical fabs libcall handling (#177967)
This is a followup to https://github.com/llvm/llvm-project/pull/171288,
which removed lowering of libcalls to SDAG nodes for most libcalls that
get unconditionally canonicalized to intrinsics. This handles the
remaining fabs case, which I originally skipped due to larger test
impact.
2026-01-26 15:11:17 +00:00
Simon Tatham
0921542e3b
[ARM] Count register copies when estimating function size (#175763)
`EstimateFunctionSizeInBytes`, in `ARMFrameLowering.cpp`, provides an
early estimate of the compiled size of a function, in a context that
wants to overestimate rather than underestimate.

In some cases it was underestimating severely, by over 20%. The
discrepancy was entirely accounted for by the fact that `COPY`
operations were not being counted at all, even though each one (or at
least each one that survives any post-regalloc optimizations) takes 2
bytes in Thumb or 4 in Arm. This could lead to a compile failure, if the
underestimated function size led frame lowering to not stack LR, but
later, `ARMConstantIslandsPass` needed to insert an intra-function
branch long enough to require a `bl` instruction, needing LR to have
been stacked.

The result of `EstimateFunctionSizeInBytes` was not directly available
for testing, so I added an `LLVM_DEBUG` at the end of the function. That
way, the test file doesn't need to try to make a >2048 byte function
estimated at <2048 bytes; it just needs to exhibit a function with a
single `COPY` and make sure it's counted.

At the moment, `EstimateFunctionSizeInBytes` is only used at all in
Thumb-1 compilations, to decide whether the function is large enough to
justify stacking LR as a precaution. However, the subroutine
`ARMBaseInstrInfo::getInstSizeInBytes` which counts each individual
`MachineInstr` is called from other contexts too, so I've made it return
a sensible answer for `COPY` nodes in both of Arm and Thumb.
2026-01-26 09:28:38 +00:00
valadaptive
cdc6a84c14
TargetLowering: Allow FMINNUM/FMAXNUM to lower to FMINIMUM/FMAXIMUM even without nsz (#177828)
This restriction was originally added in
https://reviews.llvm.org/D143256, with the given justification:

> Currently, in TargetLowering, if the target does not support fminnum,
we lower to fminimum if neither operand could be a NaN. But this isn't
quite correct because fminnum and fminimum treat +/-0 differently; so,
we need to prove that one of the operands isn't a zero.

As far as I can tell, this was never correct. Before
https://github.com/llvm/llvm-project/pull/172012, `minnum` and `maxnum`
were nondeterministic with regards to signed zero, so it's always been
perfectly legal to lower them to operations that order signed zeroes.
2026-01-25 18:24:12 -05:00
Tony Linthicum
15b9109bc7
Make MachineBlockFrequencyInfo a required pass for the MachineScheduler pass. (#176172)
This is needed to support functionality in the AMDGPU scheduler. Various
passes have been modified to preserve MBFI to ensure that this change
does not introduce new invocations of MBFI. Some targets have passes
reordered, but there are no new runs of MBFI.
2026-01-15 20:26:51 +00:00
Usman Nadeem
c49c7e72b3
[ARM] Add size to tLDRLIT_ga_pcrel|abs Pseudo Instructions (#175663)
Compiling OpenSSL for Thumb was giving a crash in `ARMConstantIslands`
with error message: "underestimated function size". Adding a size for
`tLDRLIT_ga_pcrel` pseudo instruction fixes the issue. Also added a
size for `tLDRLIT_ga_abs` as per review comments.
2026-01-15 11:27:13 -08:00
David Green
2f43659011
[ARM] Add tablegen patterns for vsdot and vudot high index. (#174728)
The index on a vsdot and vudot instruction can be 0/1 from a D-reg, not 0/1/2/3
from a Q reg as would be expected. Add a pattern to allow extracting from the
high half of the input vector.

Fixes #174688
2026-01-14 10:26:05 +00:00
moorabbit
a5fa246435
[Clang] Add __builtin_stack_address (#148281)
Add support for `__builtin_stack_address` builtin. The semantics match
those of GCC's builtin with the same name.

`__builtin_stack_address` returns the starting address of the stack
region that may be used by called functions. It may or may not include
the space used for on-stack arguments passed to a callee (See [GCC
Bug/121013](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013)).

Fixes #82632.
2026-01-12 10:01:57 +01:00
David Green
7635474d26 [ARM] Update and extend neon-dot-product.ll. NFC 2026-01-07 10:18:27 +00:00
Frederik Harwath
5c05824d2b
[CodeGen] Rename expand-fp to expand-ir-insts (#172681)
The pass now contains a non-fp expansion and should
be used for any similar expansions regardless of the
types involved. Hence a generic name seems apt.

Rename the source files, pass, and adjust the pass
description. Move all tests for the expansions
that have previously been merged into the pass
to a single directory.
2025-12-18 11:15:04 +00:00
Frederik Harwath
71760f324f
[CodeGen] Merge ExpandLargeDivRem into ExpandFp (#172680)
Both passes expand instructions at the IR level.
They use the same kind of instruction visitation
logic and contain significant code duplication e.g.
for scalarization.
2025-12-18 09:22:47 +01:00
Folkert de Vries
a587ccd87d
fix llvm.fma.f16 double rounding issue when there is no native support (#171904)
fixes https://github.com/llvm/llvm-project/issues/98389

As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does
not work, because there is not enough precision to handle the repeated
rounding. `f64` does have sufficient space. So this PR explicitly
promotes the 16-bit fma to a 64-bit fma.

I could not find examples of a libcall being used for fma, but that's
something that could be looked in separately to work around code size
issues.
2025-12-17 22:03:01 +01:00
Craig Topper
0cdc1b6dd4
[SelectionDAG] Support integer types with multiple registers in ComputePHILiveOutRegInfo. (#172081)
PHIs that are larger than a legal integer type are split into multiple
virtual registers that are numbered sequentially. We can propagate the
known bits for each of these registers individually.

Big endian is not supported yet because the register order needs to be
reversed.

Fixes #171671
2025-12-13 13:24:41 -08:00
Nikita Popov
5a24dfa339
[SDAG] Remove most non-canonical libcall handing (#171288)
This is a followup to https://github.com/llvm/llvm-project/pull/171114,
removing the handling for most libcalls that are already canonicalized
to intrinsics in the middle-end. The only remaining one is fabs, which
has more test coverage than the others.
2025-12-10 11:45:26 +01:00
Nikita Popov
d5b3ba6596
[SDAG] Don't handle non-canonical libcalls in SDAG lowering (#171114)
SDAG currently tries to lower certain libcalls to ISD opcodes. However,
many of these are already canonicalized from libcalls to intrinsic in
the middle-end (and often already emitted as intrinsics in the
front-end).

I believe that SDAG should not be doing anything for such libcalls. This
PR just drops a single libcall to get consensus on the direction, as
these changes need a non-trivial amount of test updates.

A lot of the remaining libcalls *should* probably also be canonicalized
to intrinsics in the middle-end when annotated with `memory(none)`, but
that would require additional work in SimplifyLibCalls.
2025-12-09 08:07:33 +01:00
Folkert de Vries
fdd0d53430
cmse: emit __acle_se_ symbol for aliases to entry functions (#162109)
Emitting the symbol in `emitGlobalAlias` seemed most efficient,
otherwise I think you'd have to traverse all aliases. I have verified
that the additional symbol is picked up by `arm-none-eabi-ld` and
correctly generates an entry in `veneers.o`.

Fixes #162084
2025-12-08 17:26:21 +00:00
Lewis Crawford
ea3fdc5972
Avoid maxnum(sNaN, x) optimizations / folds (#170181)
The behaviour of constant-folding `maxnum(sNaN, x)` and `minnum(sNaN,
x)` has become controversial, and there are ongoing discussions about
which behaviour we want to specify in the LLVM IR LangRef.

See:
  - https://github.com/llvm/llvm-project/issues/170082
  - https://github.com/llvm/llvm-project/pull/168838
  - https://github.com/llvm/llvm-project/pull/138451
  - https://github.com/llvm/llvm-project/pull/170067
-
https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006

This patch removes optimizations and constant-folding support for
`maxnum(sNaN, x)` but keeps it folded/optimized for `qNaN`. This should
allow for some more flexibility so the implementation can conform to
either the old or new version of the semantics specified without any
changes.

As far as I am aware, optimizations involving constant `sNaN` should
generally be edge-cases that rarely occur, so here should hopefully be
very little real-world performance impact from disabling these
optimizations.
2025-12-02 12:43:03 +00:00
Erik Enikeev
d08b0f7240
[ARM] Disable strict node mutation and use correct lowering for several strict ops (#170136)
Changes in this PR were discussed and reviewed in
https://github.com/llvm/llvm-project/pull/137101.
2025-12-01 22:03:55 +00:00
David Green
22968f5b4a
[DAG] Add strictfp implicit def reg after metadata. (#168282)
This prevents a machine verifier error, where it "Expected implicit
register after groups".

Fixes #158661
2025-11-17 10:57:21 +00:00
hstk30-hw
51c8180515
[GlobalMerge]Prefer use global-merge-max-offset instead of the target-specific constant offset. (#165591)
In the Dhrystone benchmark, I find some adjacent global not be merged,
on the contrary the GCC's anchor optimize is work. Use
global-merge-max-offset to set the max offset can yield similar results
(still slightly different, at least we can control the offset).
2025-11-17 15:37:51 +08:00
Austin
700aa5e376
[revert][CodeGen] add a command to force global merge (#168230)
sorry, this was my mistake
2025-11-16 03:40:07 +08:00
Austin
3705921f60 [CodeGen] add a command to force global merge
I found that in some performance scenarios, such as under O2, this pr can be helpful for a series of loading global variables.
2025-11-16 03:20:27 +08:00
Amara Emerson
18f29a5810
[ARM] Fix not saving FP when required to in frame-pointer=non-leaf. (#163699)
When the stars align to conspire against stack alignment, when we have
frame-pointer=non-leaf we can incorrectly skip preserving fp/r7 in the
prolog.

The fix here first makes sure we're using the right frame pointer
register in the context of preserving the incoming FP, and then make sure that we
save the FP when re-alignment is known to be necessary.

rdar://162462271
2025-11-12 16:31:25 -08:00
David Tellenbach
a01a921004
[ARM] Prevent stack argument overwrite during tail calls (#166492)
For tail-calls we want to re-use the caller stack-frame and potentially
need to copy stack arguments.

For large stack arguments, such as by-val structs, this can lead to
overwriting incoming stack arguments when preparing outgoing ones by
copying them. E.g., in cases like

        %"struct.s1" = type { [19 x i32] }

        define void @f0(ptr byval(%"struct.s1") %0, ptr %1) {
        tail call  void @f1(ptr %1, ptr byval(%"struct.s1") %0)
        ret void
        }

        declare  void @f1(ptr, ptr)

that swap arguments, the last bytes of %0 are on the stack, followed by
%1. To prepare the outgoing arguments, %0 needs to be copied and %1
needs to be loaded into r0. However, currently the copy of %0
overwrites the location of %1, resulting in loading garbage into r0.

We fix that by forcing the load to the pointer stack argument to happen
before the copy.
2025-11-12 23:38:48 +00:00
Matt Arsenault
782759b757
DAG: Use poison when widening build_vector (#167631)
Test changes are mostly noise. There are a few improvements and a few
regressions.
2025-11-12 20:17:41 +00:00
David Green
4d1f2492d2
[ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329)
The subtarget may not be set if no functions are present in the module.
Attempt to use the TargetMachine directly in more cases.

Fixes #165422
Fixes #167577
2025-11-12 16:26:21 +00:00
Matt Arsenault
821d2825a4
RuntimeLibcalls: Remove incorrect sincospi from most targets (#166982)
sincospi/sincospif/sincospil does not appear to exist on common
targets. Darwin targets have __sincospi and __sincospif, so define
and use those implementations. I have no idea what version added
those calls, so I'm just guessing it's the same conditions as
__sincos_stret.

Most of this patch is working to preserve codegen when a vector
library is explicitly enabled. This only covers sleef and armpl,
as those are the only cases tested.

The multiple result libcalls have an aberrant process where the
legalizer looks for the scalar type's libcall in RuntimeLibcalls,
and then cross references TargetLibraryInfo to find a matching
vector call. This was unworkable in the sincospi case, since the
common case is there is no scalar call available. To preserve
codegen if the call is available, first try to match a libcall
with the vector type before falling back on the old scalar search.

Eventually all of this logic should be contained in RuntimeLibcalls,
without the link to TargetLibraryInfo. In principle we should perform
the same legalization logic as for an ordinary operation, trying
to find a matching subvector type with a libcall.
2025-11-10 11:05:08 -08:00
Matt Arsenault
5e7f7a496c
ARM: Add fp128 ldexp tests (#166619) 2025-11-05 22:42:59 -08:00
Prabhu Rajasekaran
f60e69315e
[llvm] Emit canonical linkage correct function symbol (#166487)
In the call graph section, we were emitting the temporary label
pointing to the start of the function instead of the canonical linkage
correct function symbol. This patch fixes it and updates the
corresponding tests.
2025-11-05 09:22:08 -08:00
Matt Arsenault
4d98ee2a22
ARM: Add watchos run line to llvm.sincos test (#166271) 2025-11-03 18:20:24 -08:00
Matt Arsenault
c77b614564
ARM: Add more ABIs to llvm.sincos test (#166264)
Make sure the iOS with/without sincos_stret are tested
2025-11-03 16:00:54 -08:00
Erik Enikeev
1523332fbd
[ARM] Mark function calls as possibly changing FPSCR (#160699)
This patch does the same changes as D143001 for AArch64.

This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
2025-10-30 16:36:55 +00:00
Erik Enikeev
242ebcf13e
[ARM] Add instruction selection for strict FP (#160696)
This consists of marking the various strict opcodes as legal, and
adjusting instruction selection patterns so that 'op' is 'any_op'. The
changes are similar to those in D114946 for AArch64.

Custom lowering and promotion are set for some FP16 strict ops to work
correctly.

This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
2025-10-29 21:43:43 +00:00
AZero13
5d0f1591f8
[DAGCombine] Improve bswap lowering for machines that support bit rotates (#164848)
Source: Hacker's delight.
2025-10-25 10:17:15 -07:00
David Green
a1e59bdc17
[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508)
I'm not sure if this is the best way forward or not, but we have a lot
of issues with forgetting that shuffle_vectors can be scalar again and
again. (There is another example from the recent known-bits code added
recently). As a scalar-dst shuffle vector is just an extract, and a
scalar-source shuffle vector is just a build vector, this patch makes
scalar shuffle vector illegal and adjusts the irbuilder to create the
correct node as required.

Most targets do this already through lowering or combines. Making scalar
shuffles illegal simplifies gisel as a whole, it just requires that
transforms that create shuffles of new sizes to account for the scalar
shuffle being illegal (mostly IRBuilder and LessElements).
2025-10-24 08:21:35 +01:00
Kees Cook
d130f40264
[ARM][KCFI] Add backend support for Kernel Control-Flow Integrity (#163698)
Implement KCFI (Kernel Control Flow Integrity) backend support for
ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via
Clang's generic KCFI implementation, but this has finally started to
[cause problems](https://github.com/ClangBuiltLinux/linux/issues/2124)
so it's time to get the KCFI operand bundle lowering working on ARM.

Supports patchable-function-prefix with adjusted load offsets. Provides
an instruction size worst case estimate of how large the KCFI bundle is
so that range-limited instructions (e.g. cbz) know how big the indirect
calls can become.

ARM implementation notes:
- Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte
  to work within ARM's modified immediate encoding constraints.
- Scratch register selection: r12 (IP) is preferred, r3 used as fallback
  when r12 holds the call target. r3 gets spilled/reloaded if it is
  being used as a call argument.
- UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar
  to aarch64's trap encoding.

Thumb2 implementation notes:
- Logically the same as ARM
- UDF trap encoding: 0x80 | target_reg_index

Thumb1 implementation notes:
- Due to register pressure, 2 scratch registers are needed: r3 and r2,
  which get spilled/reloaded if they are being used as call args.
- Instead of EOR, add/lsl sequence to load immediate, followed by
  a compare.
- No trap encoding.

Update tests to validate all three sub targets.
2025-10-23 08:27:13 -07:00
paperchalice
542703fa68
[test][ARM] Remove unsafe-fp-math-uses (NFC) (#164744)
Post cleanup for #164534.
2025-10-23 15:07:46 +08:00
Prabhu Rajasekaran
b7c7083c1f
[llvm] Update call graph ELF section type. (#164461)
Make call graph section to have a dedicated type instead of the generic
progbits type.
2025-10-22 15:08:36 -07:00
David Green
6d5dea63ed
[ARM][SDAG] Add llvm.lround half promotion. (#164235)
Similar to #161088, add llvm.lround and llvm.llround promotion.
2025-10-21 16:56:55 +01:00
Prabhu Rajasekaran
cac8bdb56c
[NFC][llvm] Update call graph section's name. (#163429)
Call graph section emitted by LLVM was named `.callgraph`. Renaming it
to `.llvm.callgraph`.
2025-10-15 07:52:54 -07:00
paperchalice
bfee9db785
[DAGCombiner] Remove NoNaNsFPMath uses (#163504)
Users should use `nnan` flag instead.
2025-10-15 21:22:13 +08:00
Simon Pilgrim
4c3ec9cda0
[ARM] carry.ll - regenerate test checks (#163172) 2025-10-13 11:12:09 +00:00
Yatao Wang
c4bcbf02a5
[GlobalISel] Add G_SUB for computeNumSignBits (#158384)
This patch ports the ISD::SUB handling from SelectionDAG’s ComputeNumSignBits to GlobalISel.

Related to https://github.com/llvm/llvm-project/issues/150515.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-10-13 10:45:26 +00:00
beetrees
11571a005a
Fix legalizing FNEG and FABS with TypeSoftPromoteHalf (#156343)
Based on top of #157211.

`FNEG` and `FABS` must preserve signalling NaNs, meaning they should not
convert to f32 to perform the operation. Instead legalize to `XOR` and
`AND`.

Fixes almost all of #104915
2025-10-11 11:08:26 +09:00