llvm-project

Author	SHA1	Message	Date
paperchalice	c53acf0443	[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904 ) Replaced by checking fast-math flags or value tracking results.	2026-02-09 09:48:07 +08:00
Jay Foad	48619c8ab2	[ARM] Autogenerate checks for crypto intrinsics (#180147 )	2026-02-06 14:33:22 +00:00
SiliconA-Z	37aba1b5d4	[ARM] Set operation action for UMULO and SMULO as Custom if not Thumb1 (#154253 ) We should specify a custom lowering for SMULO and UMULO like we do for AArch64, but only if not Thumb 1 obviously.	2026-02-05 08:47:56 -08:00
Matt Arsenault	2502e3b7ba	IR: Promote "denormal-fp-math" to a first class attribute (#174293 ) Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first class denormal_fpenv attribute. Previously the query for the effective denormal mode involved two string attribute queries with parsing. I'm introducing more uses of this, so it makes sense to convert this to a more efficient encoding. The old representation was also awkward since it was split across two separate attributes. The new encoding just stores the default and float modes as bitfields, largely avoiding the need to consider if the other mode is set. The syntax in the common cases looks like this: `denormal_fpenv(preservesign,preservesign)` `denormal_fpenv(float: preservesign,preservesign)` `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)` I wasn't sure about reusing the float type name instead of adding a new keyword. It's parsed as a type but only accepts float. I'm also debating switching the name to subnormal to match the current preferred IEEE terminology (also used by nofpclass and other contexts). This has a behavior change when using the command flag debug options to set the denormal mode. The behavior of the flag ignored functions with an explicit attribute set, per the default and f32 version. Now that these are one attribute, the flag logic can't distinguish which of the two components were explicitly set on the function. Only one test appeared to rely on this behavior, so I just avoided using the flags in it. This also does not perform all the code cleanups this enables. In particular the attributor handling could be cleaned up. I also guessed at how to support this in MLIR. I followed MemoryEffects as a reference; it appears bitfields are expanded into arguments to attributes, so the representation there is a bit uglier with the 2 2-element fields flattened into 4 arguments.	2026-02-05 13:31:26 +00:00
paperchalice	d1598c96e0	[ARM] Recognize abi tag module flags (#161306 ) Recognize abi tag hints from frontend rather than from architecture and options. Frontend part: #161106.	2026-02-05 12:08:22 +00:00
Simi Pallipurath	09a68427ff	[ARM] Lower unaligned loads/stores to aeabi functions. (#172672 ) When targeting architectures that do not support unaligned memory accesses or when explictly pass -mno-unaligned-access, it requires the compiler to expand each unaligned load/store into an inline sequences. For 32-bit operations this typically involves: 1. 4× LDRB (or 2× LDRH), 2. multiple shift/or instructions These sequences are emitted at every unaligned access site, and therefore contribute significant code size in workloads that touch packed or misaligned structures. When compiling with -Oz and in combination with -mno-unaligned-access, this patch lowers unaligned 32 bit and 64 bit loads and stores to below AEABI heper calls: ``` __aeabi_uread4 __aeabi_uread8 __aeabi_uwrite4 __aeabi_uwrite8 ``` And it provide a way to perform unaligned memory accesses on targets that do not support them, such as ARMv6-M or when compiling with -mno-unaligned-access. Although each use introduces a function call making it less straightforward than using raw loads and stores the call itself is often much smaller than the compiler emitted sequence of multiple ldrb/strb operations. As a result, these helpers can greatly reduce code-size providing they are invoked more than once across a program. 1. Functions become smaller in AEABI mode once they contain more than a few unaligned accesses. 2. The total image .text size becomes smaller whenever multiple functions call the same helpers. This PR is derived from https://reviews.llvm.org/D57595, with some minor changes. Co-authored-by: David Green	2026-02-02 16:32:12 +00:00
David Green	1bc655f52b	[ARM] Expand and regenerate llvm/test/CodeGen/ARM/cls.ll. NFC	2026-02-02 11:28:07 +00:00
Nikita Popov	1bad00adc4	[SDAG] Remove non-canonical fabs libcall handling (#177967 ) This is a followup to https://github.com/llvm/llvm-project/pull/171288, which removed lowering of libcalls to SDAG nodes for most libcalls that get unconditionally canonicalized to intrinsics. This handles the remaining fabs case, which I originally skipped due to larger test impact.	2026-01-26 15:11:17 +00:00
Simon Tatham	0921542e3b	[ARM] Count register copies when estimating function size (#175763 ) `EstimateFunctionSizeInBytes`, in `ARMFrameLowering.cpp`, provides an early estimate of the compiled size of a function, in a context that wants to overestimate rather than underestimate. In some cases it was underestimating severely, by over 20%. The discrepancy was entirely accounted for by the fact that `COPY` operations were not being counted at all, even though each one (or at least each one that survives any post-regalloc optimizations) takes 2 bytes in Thumb or 4 in Arm. This could lead to a compile failure, if the underestimated function size led frame lowering to not stack LR, but later, `ARMConstantIslandsPass` needed to insert an intra-function branch long enough to require a `bl` instruction, needing LR to have been stacked. The result of `EstimateFunctionSizeInBytes` was not directly available for testing, so I added an `LLVM_DEBUG` at the end of the function. That way, the test file doesn't need to try to make a >2048 byte function estimated at <2048 bytes; it just needs to exhibit a function with a single `COPY` and make sure it's counted. At the moment, `EstimateFunctionSizeInBytes` is only used at all in Thumb-1 compilations, to decide whether the function is large enough to justify stacking LR as a precaution. However, the subroutine `ARMBaseInstrInfo::getInstSizeInBytes` which counts each individual `MachineInstr` is called from other contexts too, so I've made it return a sensible answer for `COPY` nodes in both of Arm and Thumb.	2026-01-26 09:28:38 +00:00
valadaptive	cdc6a84c14	TargetLowering: Allow FMINNUM/FMAXNUM to lower to FMINIMUM/FMAXIMUM even without `nsz` (#177828 ) This restriction was originally added in https://reviews.llvm.org/D143256, with the given justification: > Currently, in TargetLowering, if the target does not support fminnum, we lower to fminimum if neither operand could be a NaN. But this isn't quite correct because fminnum and fminimum treat +/-0 differently; so, we need to prove that one of the operands isn't a zero. As far as I can tell, this was never correct. Before https://github.com/llvm/llvm-project/pull/172012, `minnum` and `maxnum` were nondeterministic with regards to signed zero, so it's always been perfectly legal to lower them to operations that order signed zeroes.	2026-01-25 18:24:12 -05:00
Tony Linthicum	15b9109bc7	Make MachineBlockFrequencyInfo a required pass for the MachineScheduler pass. (#176172 ) This is needed to support functionality in the AMDGPU scheduler. Various passes have been modified to preserve MBFI to ensure that this change does not introduce new invocations of MBFI. Some targets have passes reordered, but there are no new runs of MBFI.	2026-01-15 20:26:51 +00:00
Usman Nadeem	c49c7e72b3	[ARM] Add size to `tLDRLIT_ga_pcrel\|abs` Pseudo Instructions (#175663 ) Compiling OpenSSL for Thumb was giving a crash in `ARMConstantIslands` with error message: "underestimated function size". Adding a size for `tLDRLIT_ga_pcrel` pseudo instruction fixes the issue. Also added a size for `tLDRLIT_ga_abs` as per review comments.	2026-01-15 11:27:13 -08:00
David Green	2f43659011	[ARM] Add tablegen patterns for vsdot and vudot high index. (#174728 ) The index on a vsdot and vudot instruction can be 0/1 from a D-reg, not 0/1/2/3 from a Q reg as would be expected. Add a pattern to allow extracting from the high half of the input vector. Fixes #174688	2026-01-14 10:26:05 +00:00
moorabbit	a5fa246435	[Clang] Add `__builtin_stack_address` (#148281 ) Add support for `__builtin_stack_address` builtin. The semantics match those of GCC's builtin with the same name. `__builtin_stack_address` returns the starting address of the stack region that may be used by called functions. It may or may not include the space used for on-stack arguments passed to a callee (See [GCC Bug/121013](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013)). Fixes #82632.	2026-01-12 10:01:57 +01:00
David Green	7635474d26	[ARM] Update and extend neon-dot-product.ll. NFC	2026-01-07 10:18:27 +00:00
Frederik Harwath	5c05824d2b	[CodeGen] Rename expand-fp to expand-ir-insts (#172681 ) The pass now contains a non-fp expansion and should be used for any similar expansions regardless of the types involved. Hence a generic name seems apt. Rename the source files, pass, and adjust the pass description. Move all tests for the expansions that have previously been merged into the pass to a single directory.	2025-12-18 11:15:04 +00:00
Frederik Harwath	71760f324f	[CodeGen] Merge ExpandLargeDivRem into ExpandFp (#172680 ) Both passes expand instructions at the IR level. They use the same kind of instruction visitation logic and contain significant code duplication e.g. for scalarization.	2025-12-18 09:22:47 +01:00
Folkert de Vries	a587ccd87d	fix `llvm.fma.f16` double rounding issue when there is no native support (#171904 ) fixes https://github.com/llvm/llvm-project/issues/98389 As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does not work, because there is not enough precision to handle the repeated rounding. `f64` does have sufficient space. So this PR explicitly promotes the 16-bit fma to a 64-bit fma. I could not find examples of a libcall being used for fma, but that's something that could be looked in separately to work around code size issues.	2025-12-17 22:03:01 +01:00
Craig Topper	0cdc1b6dd4	[SelectionDAG] Support integer types with multiple registers in ComputePHILiveOutRegInfo. (#172081 ) PHIs that are larger than a legal integer type are split into multiple virtual registers that are numbered sequentially. We can propagate the known bits for each of these registers individually. Big endian is not supported yet because the register order needs to be reversed. Fixes #171671	2025-12-13 13:24:41 -08:00
Nikita Popov	5a24dfa339	[SDAG] Remove most non-canonical libcall handing (#171288 ) This is a followup to https://github.com/llvm/llvm-project/pull/171114, removing the handling for most libcalls that are already canonicalized to intrinsics in the middle-end. The only remaining one is fabs, which has more test coverage than the others.	2025-12-10 11:45:26 +01:00
Nikita Popov	d5b3ba6596	[SDAG] Don't handle non-canonical libcalls in SDAG lowering (#171114 ) SDAG currently tries to lower certain libcalls to ISD opcodes. However, many of these are already canonicalized from libcalls to intrinsic in the middle-end (and often already emitted as intrinsics in the front-end). I believe that SDAG should not be doing anything for such libcalls. This PR just drops a single libcall to get consensus on the direction, as these changes need a non-trivial amount of test updates. A lot of the remaining libcalls should probably also be canonicalized to intrinsics in the middle-end when annotated with `memory(none)`, but that would require additional work in SimplifyLibCalls.	2025-12-09 08:07:33 +01:00
Folkert de Vries	fdd0d53430	cmse: emit `__acle_se_` symbol for aliases to entry functions (#162109 ) Emitting the symbol in `emitGlobalAlias` seemed most efficient, otherwise I think you'd have to traverse all aliases. I have verified that the additional symbol is picked up by `arm-none-eabi-ld` and correctly generates an entry in `veneers.o`. Fixes #162084	2025-12-08 17:26:21 +00:00
Lewis Crawford	ea3fdc5972	Avoid maxnum(sNaN, x) optimizations / folds (#170181 ) The behaviour of constant-folding `maxnum(sNaN, x)` and `minnum(sNaN, x)` has become controversial, and there are ongoing discussions about which behaviour we want to specify in the LLVM IR LangRef. See: - https://github.com/llvm/llvm-project/issues/170082 - https://github.com/llvm/llvm-project/pull/168838 - https://github.com/llvm/llvm-project/pull/138451 - https://github.com/llvm/llvm-project/pull/170067 - https://discourse.llvm.org/t/rfc-a-consistent-set-of-semantics-for-the-floating-point-minimum-and-maximum-operations/89006 This patch removes optimizations and constant-folding support for `maxnum(sNaN, x)` but keeps it folded/optimized for `qNaN`. This should allow for some more flexibility so the implementation can conform to either the old or new version of the semantics specified without any changes. As far as I am aware, optimizations involving constant `sNaN` should generally be edge-cases that rarely occur, so here should hopefully be very little real-world performance impact from disabling these optimizations.	2025-12-02 12:43:03 +00:00
Erik Enikeev	d08b0f7240	[ARM] Disable strict node mutation and use correct lowering for several strict ops (#170136 ) Changes in this PR were discussed and reviewed in https://github.com/llvm/llvm-project/pull/137101.	2025-12-01 22:03:55 +00:00
David Green	22968f5b4a	[DAG] Add strictfp implicit def reg after metadata. (#168282 ) This prevents a machine verifier error, where it "Expected implicit register after groups". Fixes #158661	2025-11-17 10:57:21 +00:00
hstk30-hw	51c8180515	[GlobalMerge]Prefer use global-merge-max-offset instead of the target-specific constant offset. (#165591 ) In the Dhrystone benchmark, I find some adjacent global not be merged, on the contrary the GCC's anchor optimize is work. Use global-merge-max-offset to set the max offset can yield similar results (still slightly different, at least we can control the offset).	2025-11-17 15:37:51 +08:00
Austin	700aa5e376	[revert][CodeGen] add a command to force global merge (#168230 ) sorry, this was my mistake	2025-11-16 03:40:07 +08:00
Austin	3705921f60	[CodeGen] add a command to force global merge I found that in some performance scenarios, such as under O2, this pr can be helpful for a series of loading global variables.	2025-11-16 03:20:27 +08:00
Amara Emerson	18f29a5810	[ARM] Fix not saving FP when required to in frame-pointer=non-leaf. (#163699 ) When the stars align to conspire against stack alignment, when we have frame-pointer=non-leaf we can incorrectly skip preserving fp/r7 in the prolog. The fix here first makes sure we're using the right frame pointer register in the context of preserving the incoming FP, and then make sure that we save the FP when re-alignment is known to be necessary. rdar://162462271	2025-11-12 16:31:25 -08:00
David Tellenbach	a01a921004	[ARM] Prevent stack argument overwrite during tail calls (#166492 ) For tail-calls we want to re-use the caller stack-frame and potentially need to copy stack arguments. For large stack arguments, such as by-val structs, this can lead to overwriting incoming stack arguments when preparing outgoing ones by copying them. E.g., in cases like %"struct.s1" = type { [19 x i32] } define void @f0(ptr byval(%"struct.s1") %0, ptr %1) { tail call void @f1(ptr %1, ptr byval(%"struct.s1") %0) ret void } declare void @f1(ptr, ptr) that swap arguments, the last bytes of %0 are on the stack, followed by %1. To prepare the outgoing arguments, %0 needs to be copied and %1 needs to be loaded into r0. However, currently the copy of %0 overwrites the location of %1, resulting in loading garbage into r0. We fix that by forcing the load to the pointer stack argument to happen before the copy.	2025-11-12 23:38:48 +00:00
Matt Arsenault	782759b757	DAG: Use poison when widening build_vector (#167631 ) Test changes are mostly noise. There are a few improvements and a few regressions.	2025-11-12 20:17:41 +00:00
David Green	4d1f2492d2	[ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329 ) The subtarget may not be set if no functions are present in the module. Attempt to use the TargetMachine directly in more cases. Fixes #165422 Fixes #167577	2025-11-12 16:26:21 +00:00
Matt Arsenault	821d2825a4	RuntimeLibcalls: Remove incorrect sincospi from most targets (#166982 ) sincospi/sincospif/sincospil does not appear to exist on common targets. Darwin targets have __sincospi and __sincospif, so define and use those implementations. I have no idea what version added those calls, so I'm just guessing it's the same conditions as __sincos_stret. Most of this patch is working to preserve codegen when a vector library is explicitly enabled. This only covers sleef and armpl, as those are the only cases tested. The multiple result libcalls have an aberrant process where the legalizer looks for the scalar type's libcall in RuntimeLibcalls, and then cross references TargetLibraryInfo to find a matching vector call. This was unworkable in the sincospi case, since the common case is there is no scalar call available. To preserve codegen if the call is available, first try to match a libcall with the vector type before falling back on the old scalar search. Eventually all of this logic should be contained in RuntimeLibcalls, without the link to TargetLibraryInfo. In principle we should perform the same legalization logic as for an ordinary operation, trying to find a matching subvector type with a libcall.	2025-11-10 11:05:08 -08:00
Matt Arsenault	5e7f7a496c	ARM: Add fp128 ldexp tests (#166619 )	2025-11-05 22:42:59 -08:00
Prabhu Rajasekaran	f60e69315e	[llvm] Emit canonical linkage correct function symbol (#166487 ) In the call graph section, we were emitting the temporary label pointing to the start of the function instead of the canonical linkage correct function symbol. This patch fixes it and updates the corresponding tests.	2025-11-05 09:22:08 -08:00
Matt Arsenault	4d98ee2a22	ARM: Add watchos run line to llvm.sincos test (#166271 )	2025-11-03 18:20:24 -08:00
Matt Arsenault	c77b614564	ARM: Add more ABIs to llvm.sincos test (#166264 ) Make sure the iOS with/without sincos_stret are tested	2025-11-03 16:00:54 -08:00
Erik Enikeev	1523332fbd	[ARM] Mark function calls as possibly changing FPSCR (#160699 ) This patch does the same changes as D143001 for AArch64. This PR is part of the work on adding strict FP support in ARM, which was previously discussed in #137101.	2025-10-30 16:36:55 +00:00
Erik Enikeev	242ebcf13e	[ARM] Add instruction selection for strict FP (#160696 ) This consists of marking the various strict opcodes as legal, and adjusting instruction selection patterns so that 'op' is 'any_op'. The changes are similar to those in D114946 for AArch64. Custom lowering and promotion are set for some FP16 strict ops to work correctly. This PR is part of the work on adding strict FP support in ARM, which was previously discussed in #137101.	2025-10-29 21:43:43 +00:00
AZero13	5d0f1591f8	[DAGCombine] Improve bswap lowering for machines that support bit rotates (#164848 ) Source: Hacker's delight.	2025-10-25 10:17:15 -07:00
David Green	a1e59bdc17	[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508 ) I'm not sure if this is the best way forward or not, but we have a lot of issues with forgetting that shuffle_vectors can be scalar again and again. (There is another example from the recent known-bits code added recently). As a scalar-dst shuffle vector is just an extract, and a scalar-source shuffle vector is just a build vector, this patch makes scalar shuffle vector illegal and adjusts the irbuilder to create the correct node as required. Most targets do this already through lowering or combines. Making scalar shuffles illegal simplifies gisel as a whole, it just requires that transforms that create shuffles of new sizes to account for the scalar shuffle being illegal (mostly IRBuilder and LessElements).	2025-10-24 08:21:35 +01:00
Kees Cook	d130f40264	[ARM][KCFI] Add backend support for Kernel Control-Flow Integrity (#163698 ) Implement KCFI (Kernel Control Flow Integrity) backend support for ARM32, Thumb2, and Thumb1. The Linux kernel has supported ARM KCFI via Clang's generic KCFI implementation, but this has finally started to [cause problems](https://github.com/ClangBuiltLinux/linux/issues/2124) so it's time to get the KCFI operand bundle lowering working on ARM. Supports patchable-function-prefix with adjusted load offsets. Provides an instruction size worst case estimate of how large the KCFI bundle is so that range-limited instructions (e.g. cbz) know how big the indirect calls can become. ARM implementation notes: - Four-instruction EOR sequence builds the 32-bit type ID byte-by-byte to work within ARM's modified immediate encoding constraints. - Scratch register selection: r12 (IP) is preferred, r3 used as fallback when r12 holds the call target. r3 gets spilled/reloaded if it is being used as a call argument. - UDF trap encoding: 0x8000 \| (0x1F << 5) \| target_reg_index, similar to aarch64's trap encoding. Thumb2 implementation notes: - Logically the same as ARM - UDF trap encoding: 0x80 \| target_reg_index Thumb1 implementation notes: - Due to register pressure, 2 scratch registers are needed: r3 and r2, which get spilled/reloaded if they are being used as call args. - Instead of EOR, add/lsl sequence to load immediate, followed by a compare. - No trap encoding. Update tests to validate all three sub targets.	2025-10-23 08:27:13 -07:00
paperchalice	542703fa68	[test][ARM] Remove unsafe-fp-math-uses (NFC) (#164744 ) Post cleanup for #164534.	2025-10-23 15:07:46 +08:00
Prabhu Rajasekaran	b7c7083c1f	[llvm] Update call graph ELF section type. (#164461 ) Make call graph section to have a dedicated type instead of the generic progbits type.	2025-10-22 15:08:36 -07:00
David Green	6d5dea63ed	[ARM][SDAG] Add llvm.lround half promotion. (#164235 ) Similar to #161088, add llvm.lround and llvm.llround promotion.	2025-10-21 16:56:55 +01:00
Prabhu Rajasekaran	cac8bdb56c	[NFC][llvm] Update call graph section's name. (#163429 ) Call graph section emitted by LLVM was named `.callgraph`. Renaming it to `.llvm.callgraph`.	2025-10-15 07:52:54 -07:00
paperchalice	bfee9db785	[DAGCombiner] Remove NoNaNsFPMath uses (#163504 ) Users should use `nnan` flag instead.	2025-10-15 21:22:13 +08:00
Simon Pilgrim	4c3ec9cda0	[ARM] carry.ll - regenerate test checks (#163172 )	2025-10-13 11:12:09 +00:00
Yatao Wang	c4bcbf02a5	[GlobalISel] Add G_SUB for computeNumSignBits (#158384 ) This patch ports the ISD::SUB handling from SelectionDAG’s ComputeNumSignBits to GlobalISel. Related to https://github.com/llvm/llvm-project/issues/150515. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-10-13 10:45:26 +00:00
beetrees	11571a005a	Fix legalizing `FNEG` and `FABS` with `TypeSoftPromoteHalf` (#156343 ) Based on top of #157211. `FNEG` and `FABS` must preserve signalling NaNs, meaning they should not convert to f32 to perform the operation. Instead legalize to `XOR` and `AND`. Fixes almost all of #104915	2025-10-11 11:08:26 +09:00

1 2 3 4 5 ...

5199 Commits