2452 Commits

Author SHA1 Message Date
SiliconA-Z
37aba1b5d4
[ARM] Set operation action for UMULO and SMULO as Custom if not Thumb1 (#154253)
We should specify a custom lowering for SMULO and UMULO like we do for
AArch64, but only if not Thumb 1 obviously.
2026-02-05 08:47:56 -08:00
Matt Arsenault
10d3859369
ARM: Avoid using isTarget wrappers around Triple predicates (#179512)
These are module level properties, and querying them through
a function-level subtarget context is confusing. Plus we don't
need an aliased name.

Continue change started in 91439817e8d19613ac6e25ca9abd5e7534a9d33b
2026-02-03 18:29:44 +00:00
Nicolai Hähnle
6f0b873f1c
[CodeGen] Refactor targets to override the new getTgtMemIntrinsic overload (NFC) (#175844)
This is a fairly mechanical change. Instead of returning true/false,
we either keep the Infos vector empty or push one entry.
2026-02-02 17:40:02 -08:00
Simi Pallipurath
09a68427ff
[ARM] Lower unaligned loads/stores to aeabi functions. (#172672)
When targeting architectures that do not support unaligned memory
accesses or when explictly pass -mno-unaligned-access, it requires the
compiler to expand each unaligned load/store into an inline sequences.
For 32-bit operations this typically involves:

	1. 4× LDRB (or 2× LDRH),
	2. multiple shift/or instructions

These sequences are emitted at every unaligned access site, and
therefore contribute significant code size in workloads that touch
packed or misaligned structures.

When compiling with -Oz and in combination with -mno-unaligned-access,
this patch lowers unaligned 32 bit and 64 bit loads and stores to below
AEABI heper calls:
```
         __aeabi_uread4
	__aeabi_uread8
	__aeabi_uwrite4
	__aeabi_uwrite8
```

And it provide a way to perform unaligned memory accesses on targets
that do not support them, such as ARMv6-M or when compiling with
-mno-unaligned-access. Although each use introduces a function call
making it less straightforward than using raw loads and stores the call
itself is often much smaller than the compiler emitted sequence of
multiple ldrb/strb operations. As a result, these helpers can greatly
reduce code-size providing they are invoked more than once across a
program.

1. Functions become smaller in AEABI mode once they contain more than a
few unaligned accesses.
2. The total image .text size becomes smaller whenever multiple
functions call the same helpers.

This PR is derived from https://reviews.llvm.org/D57595, with some minor
changes.
 Co-authored-by: David Green
2026-02-02 16:32:12 +00:00
Matt Arsenault
aa57ee958d
CodeGen: Use LibcallLoweringInfo for stack protector insertion (#176829)
Thread LibcallLoweringInfo into the TargetLowering hooks used
by the stack protector passes.
2026-01-20 12:37:31 +01:00
Matt Arsenault
2c9cc88e25
FastISel: Thread LibcallLoweringInfo through (#176799)
Boilerplate change to prepare to take LibcallLoweringInfo from
an analysis. For now, it just sets it from the copy inside of
TargetLowering.
2026-01-19 20:44:48 +00:00
Matt Arsenault
01e6245af4
DAG: Avoid querying libcall info from TargetLowering (#176268)
Libcall lowering decisions should come from the LibcallLoweringInfo
analysis. Query this through the DAG, so eventually the source
can be the analysis. For the moment this is just a wrapper around
the TargetLowering information.
2026-01-16 09:02:49 +00:00
Akshay Deodhar
3860147a7f
[NFC][TargetLowering] Make shouldExpandAtomicRMWInIR and shouldExpandAtomicCmpXchgInIR take a const Instruction pointer (#176073)
Splits out change from https://github.com/llvm/llvm-project/pull/176015

Changes shouldExpandAtomicRMWInIR to take a constant argument: This is
to allow some other TargetLowering constant-argument functions to call
it. This change touches several backends. An alternative solution
exists, but to me, this seems the "right" way.
2026-01-15 14:22:57 -08:00
Islam Imad
7ceecfad40
[CodeGen] Fix EVT::changeVectorElementType assertion on simple-to-extended fallback (#173413)
Fixes #171608
2025-12-28 18:51:18 +00:00
Matt Arsenault
bb993a89a8
RuntimeLibcalls: Add entries for stack probe functions (#167453) 2025-12-19 01:17:20 +01:00
Frederik Harwath
6ad41bcc49
[CodeGen] expand-fp: Change frem expansion criterion (#158285)
The existing condition for checking whether or not to expand an frem
instruction in expand-fp is not sufficiently precise.
The expansion on other targets than AMDGPU - which is the only intended
user right now - is only prevented due to the interaction with the
MaxLegalFpConvertBitWidth check.  Relying on this is conceptually wrong
and limits the use of the pass for other targets and further expansions
(e.g. merging with the similar ExpandLargeDivRem pass).

Change the expansion criterion to always expand frem of a given type
for targets that use "Expand" as the legalization action for the 
underlying scalar type and use this to exit the pass early for targets 
which do not require any expansions. This requires to change the
frem legalization action for all targets which do not want frem to 
be expanded in this pass from "Expand" to "LibCall".

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-12-16 17:31:26 +01:00
Matt Arsenault
b2d9356719
DAG: Make more use of the LibcallImpl overload of getExternalSymbol (#172171)
Also add a new copy for TargetExternalSymbol that AArch64 needs.
2025-12-13 19:16:47 +00:00
Robert Imschweiler
5c3c0020af
[NFC] Refactor TargetLowering::getTgtMemIntrinsic to take CallBase parameter (#170334)
cf.
https://github.com/llvm/llvm-project/pull/133907#discussion_r2578576548
2025-12-02 19:42:31 +01:00
Erik Enikeev
d08b0f7240
[ARM] Disable strict node mutation and use correct lowering for several strict ops (#170136)
Changes in this PR were discussed and reviewed in
https://github.com/llvm/llvm-project/pull/137101.
2025-12-01 22:03:55 +00:00
Matt Arsenault
1c5b1501ca
CodeGen: Move libcall lowering configuration to subtarget (#168621)
Previously libcall lowering decisions were made directly
in the TargetLowering constructor. Pull these into the subtarget
to facilitate turning LibcallLoweringInfo into a separate analysis
in the future.
2025-11-25 11:59:56 -05:00
Matt Arsenault
a757c4e74e
CodeGen: Add subtarget to TargetLoweringBase constructor (#168620)
Currently LibcallLoweringInfo is defined inside of TargetLowering,
which is owned by the subtarget. Pass in the subtarget so we can
construct LibcallLoweringInfo with the subtarget. This is a temporary
step that should be revertable in the future, after LibcallLoweringInfo
is moved out of TargetLowering.
2025-11-19 19:18:13 +00:00
Sergei Barannikov
0ae2bccde4
[ARM] TableGen-erate node descriptions (#168212)
This allows SDNodes to be validated against their expected type profiles
and reduces the number of changes required to add a new node.

Some nodes fail validation, those are enumerated in
`ARMSelectionDAGInfo::verifyTargetNode()`. Some of the bugs are easy to
fix, but probably they should be fixed separately, this patch is already big.

Part of #119709.

Pull Request: https://github.com/llvm/llvm-project/pull/168212
2025-11-18 21:41:52 +03:00
David Tellenbach
a01a921004
[ARM] Prevent stack argument overwrite during tail calls (#166492)
For tail-calls we want to re-use the caller stack-frame and potentially
need to copy stack arguments.

For large stack arguments, such as by-val structs, this can lead to
overwriting incoming stack arguments when preparing outgoing ones by
copying them. E.g., in cases like

        %"struct.s1" = type { [19 x i32] }

        define void @f0(ptr byval(%"struct.s1") %0, ptr %1) {
        tail call  void @f1(ptr %1, ptr byval(%"struct.s1") %0)
        ret void
        }

        declare  void @f1(ptr, ptr)

that swap arguments, the last bytes of %0 are on the stack, followed by
%1. To prepare the outgoing arguments, %0 needs to be copied and %1
needs to be loaded into r0. However, currently the copy of %0
overwrites the location of %1, resulting in loading garbage into r0.

We fix that by forcing the load to the pointer stack argument to happen
before the copy.
2025-11-12 23:38:48 +00:00
Matt Arsenault
831e79adff
DAG: Merge all sincos_stret emission code into legalizer (#166295)
This avoids AArch64 legality rules depending on libcall
availability.

ARM, AArch64, and X86 all had custom lowering of fsincos which
all were just to emit calls to sincos_stret / sincosf_stret. This
messes with the cost heuristics around legality, because really
it's an expand/libcall cost and not a favorable custom.

This is a bit ugly, because we're emitting code trying to match the
C ABI lowered IR type for the aggregate return type. This now also
gives an easy way to lift the unhandled x86_32 darwin case, since
ARM already handled the return as sret case.
2025-11-04 10:20:00 -08:00
Matt Arsenault
590a2b0a1f
Revert "ARM: Remove unnecessary manual ABI lowering for sincos_stret (#166040)" (#166262)
This reverts commit a522ae3ef6e13cb39e7756c151652e03a024b301.

The ABI handling doesn't account for matching the C ABI, only implicit
sret.
2025-11-03 16:00:29 -08:00
Matt Arsenault
a522ae3ef6
ARM: Remove unnecessary manual ABI lowering for sincos_stret (#166040)
LowerCallTo handles all of the ABI details, including the load of
implicit sret return to the expected result positions.
2025-11-03 14:17:39 -08:00
Erik Enikeev
1523332fbd
[ARM] Mark function calls as possibly changing FPSCR (#160699)
This patch does the same changes as D143001 for AArch64.

This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
2025-10-30 16:36:55 +00:00
Erik Enikeev
242ebcf13e
[ARM] Add instruction selection for strict FP (#160696)
This consists of marking the various strict opcodes as legal, and
adjusting instruction selection patterns so that 'op' is 'any_op'. The
changes are similar to those in D114946 for AArch64.

Custom lowering and promotion are set for some FP16 strict ops to work
correctly.

This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
2025-10-29 21:43:43 +00:00
Matt Arsenault
28e9a2832f
DAG: Consider __sincos_stret when deciding to form fsincos (#165169) 2025-10-28 08:28:09 -07:00
Matt Arsenault
f5a2e6bb8f
CodeGen: Remove overrides of getSSPStackGuardCheck (NFC) (#164044)
All 3 implementations are just checking if this has the
windows check function, so merge that as the only implementation.
2025-10-24 21:17:34 +09:00
Kees Cook
d130f40264
[ARM][KCFI] Add backend support for Kernel Control-Flow Integrity (#163698)
Implement KCFI (Kernel Control Flow Integrity) backend support for
ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via
Clang's generic KCFI implementation, but this has finally started to
[cause problems](https://github.com/ClangBuiltLinux/linux/issues/2124)
so it's time to get the KCFI operand bundle lowering working on ARM.

Supports patchable-function-prefix with adjusted load offsets. Provides
an instruction size worst case estimate of how large the KCFI bundle is
so that range-limited instructions (e.g. cbz) know how big the indirect
calls can become.

ARM implementation notes:
- Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte
  to work within ARM's modified immediate encoding constraints.
- Scratch register selection: r12 (IP) is preferred, r3 used as fallback
  when r12 holds the call target. r3 gets spilled/reloaded if it is
  being used as a call argument.
- UDF trap encoding: 0x8000 | (0x1F << 5) | target_reg_index, similar
  to aarch64's trap encoding.

Thumb2 implementation notes:
- Logically the same as ARM
- UDF trap encoding: 0x80 | target_reg_index

Thumb1 implementation notes:
- Due to register pressure, 2 scratch registers are needed: r3 and r2,
  which get spilled/reloaded if they are being used as call args.
- Instead of EOR, add/lsl sequence to load immediate, followed by
  a compare.
- No trap encoding.

Update tests to validate all three sub targets.
2025-10-23 08:27:13 -07:00
Petr Hosek
7b190b79d9
[Clang][LLVM] Support for Fuchsia on ARM (#163848)
This introduces the support for 32-bit ARM Fuchsia target which uses the
aapcs-linux ABI defaulting to thumbv8a as the target.
2025-10-21 11:08:30 -07:00
David Green
6d5dea63ed
[ARM][SDAG] Add llvm.lround half promotion. (#164235)
Similar to #161088, add llvm.lround and llvm.llround promotion.
2025-10-21 16:56:55 +01:00
Matt Arsenault
0cefd5c3c2
CodeGen: Fix hardcoded libcall names in insertSSPDeclarations (NFC) (#163710) 2025-10-17 21:50:16 +09:00
AZero13
d95f8ffee4
[ARM][TargetLowering] Combine Level should not be a factor in shouldFoldConstantShiftPairToMask (NFC) (#156949)
This should be based on the type and instructions, and only thumb uses
combine level anyway.
2025-10-11 10:58:48 +09:00
Matt Arsenault
424fa83335
CodeGen: Remove unused IntrinsicLowering includes (#162844) 2025-10-10 14:34:16 +00:00
David Green
125f0ac757
[ARM][SDAG] Half promote llvm.lrint nodes. (#161088)
As shown in #137101, fp16 lrint are not handled correctly on Arm. This
adds soft-half promotion for them, reusing the function that promotes a
value with operands (and can handle strict fp once that is added).
2025-10-07 22:04:39 +01:00
Simon Tatham
2cacf7117b
[ARM] Improve comment on the 'J' inline asm modifier. (#160712)
An inline asm constraint "Jr", in AArch32, means that if the input value
is a compile-time constant in the range -4095 to +4095, then it can be
inserted into the assembly language as an immediate operand, and
otherwise it will be placed in a register.

The comment in the Arm backend said "It is not clear what this
constraint is intended for". I believe the answer is that that range of
immediate values are the ones you can use in a LDR or STR instruction.
So it's suitable for cases like this:

asm("str %0,[%1,%2]" : : "r"(data), "r"(base), "Jr"(offset) : "memory");

in the same way that the "Ir" constraint is suitable for the immediate
in a data-processing instruction such as ADD or EOR.
2025-09-26 09:18:59 +01:00
paperchalice
3257dc35fe
[ARM] Remove UnsafeFPMath uses in code generation part (#160801)
Factor out from #151275
Remove all UnsafeFPMath uses but ABI tags related part.
2025-09-26 15:54:30 +08:00
AZero13
151a80bbce
[TargetLowering][ExpandABD] Prefer selects over usubo if we do the same for ucmp (#159889)
Same deal we use for determining ucmp vs scmp.

Using selects on platforms that like selects is better than using usubo.

Rename function to be more general fitting this new description.
2025-09-25 10:33:05 +09:00
AZero13
733c1aded1
[ARM] Replace ABS and tABS machine nodes with custom lowering (#156717)
Just do a custom lowering instead.

Also copy paste the cmov-neg fold to prevent regressions in nabs.
2025-09-19 19:43:36 +01:00
Nikita Popov
1723f80b08
[ARM] Allow s constraints on half (#157860)
Fix a regression from https://github.com/llvm/llvm-project/pull/147559.
2025-09-11 08:50:32 +02:00
paperchalice
667f919214
[SelectionDAG][ARM] Propagate fast math flags in visitBRCOND (#156647)
Factor out from #151275.
2025-09-06 20:44:25 +08:00
woruyu
22fb21a64e
[DAG][ARM] canCreateUndefOrPoisonForTargetNode - ARMISD VORRIMM\VBICIMM nodes can't create poison/undef (#156831)
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/156640
2025-09-05 16:40:02 +08:00
woruyu
010f1ea3b3
[DAG][ARM] ComputeKnownBitsForTargetNode - add handling for ARMISD VORRIMM\VBICIMM nodes (#149494)
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/147179
2025-09-04 15:56:31 +08:00
Nikita Popov
3f757a39f2
[CodeGen] Remove ExpandInlineAsm hook (#156617)
This hook replaces inline asm with LLVM intrinsics. It was intended to
match inline assembly implementations of bswap in libc headers and
replace them more optimizable implementations.

At this point, it has outlived its usefulness (see
https://github.com/llvm/llvm-project/issues/156571#issuecomment-3247638412),
as libc implementations no longer use inline assembly for this purpose.

Additionally, it breaks the "black box" property of inline assembly,
which some languages like Rust would like to guarantee.

Fixes https://github.com/llvm/llvm-project/issues/156571.
2025-09-04 09:28:11 +02:00
Daniel Paoliello
f99b0f3de4
[NFC] RuntimeLibcalls: Prefix the impls with 'Impl_' (#153850)
As noted in #153256, TableGen is generating reserved names for
RuntimeLibcalls, which resulted in a build failure for Arm64EC since
`vcruntime.h` defines `__security_check_cookie` as a macro.

To avoid using reserved names, all impl names will now be prefixed with
`Impl_`.

`NumLibcallImpls` was lifted out as a `constexpr size_t` instead of
being an enum field.

While I was churning the dependent code, I also removed the TODO to move
the impl enum into its own namespace and use an `enum class`: I
experimented with using an `enum class` and adding a namespace, but we
decided it was too verbose so it was dropped.
2025-09-02 09:57:33 -07:00
AZero13
2259a80c7d
[ARM] Simplify LowerCMP (NFC) (#156198)
Pass the opcode directly.
2025-08-31 15:45:12 +01:00
Min-Yih Hsu
acaa925cb2
[IA][RISCV] Recognize interleaving stores that could lower to strided segmented stores (#154647)
This is a sibling patch to #151612: passing gap masks to the renewal TLI
hooks for lowering interleaved stores that use shufflevector to do the
interleaving.
2025-08-26 13:22:42 -07:00
AZero13
79dfe48865
[ARM] Set isCheapToSpeculateCtlz as true for hasV5TOps and no Thumb 1 (#154848)
This is so that we don't expand to include unneeded 0 checks.

Also fix the logic error in LegalizerInfo so it is NOT legal on Thumb1
in Fast-ISEL.

Finally, Remove the README entry regarding this issue.
2025-08-25 12:43:48 -07:00
Kazu Hirata
e9045b3cea
[ARM] Remove an unnecessary cast (NFC) (#155206)
getType() already returns Type *.
2025-08-25 07:33:34 -07:00
Matt Arsenault
65d12622fa
RuntimeLibcalls: Add entries for stackprotector globals (#154930)
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.

These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.

This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.

i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.

The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.
2025-08-23 10:21:00 +09:00
Nikita Popov
01bc742185
[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)
This ensures that the required fields are set, and also makes the
construction more convenient.
2025-08-15 18:06:07 +02:00
Matt Arsenault
4aae7bc625
ARM: Move half convert libcall config to tablegen (#153389) 2025-08-14 17:35:58 +09:00
Matt Arsenault
bbcac029db
ARM: Move more aeabi libcall config into tablegen (#152109) 2025-08-14 15:43:15 +09:00