llvm-project

Author	SHA1	Message	Date
Folkert de Vries	a587ccd87d	fix `llvm.fma.f16` double rounding issue when there is no native support (#171904 ) fixes https://github.com/llvm/llvm-project/issues/98389 As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does not work, because there is not enough precision to handle the repeated rounding. `f64` does have sufficient space. So this PR explicitly promotes the 16-bit fma to a 64-bit fma. I could not find examples of a libcall being used for fma, but that's something that could be looked in separately to work around code size issues.	2025-12-17 22:03:01 +01:00
Matt Arsenault	a3aaa1a391	DAG: Use RuntimeLibcalls to legalize vector frem calls (#170719 ) This continues the replacement of TargetLibraryInfo uses in codegen with RuntimeLibcallsInfo started in 821d2825a4f782da3da3c03b8a002802bff4b95c. The series there handled all of the multiple result calls. This extends for the other handled case, which happened to be frem. For some reason the Libcall for these are prefixed with "REM_", for the instruction "frem", which maps to the libcall "fmod".	2025-12-11 13:33:27 +00:00
Matt Arsenault	62832593b7	DAG: Use more RTLIB helper functions for getting libcall from type (#170563 ) We had a set of utilities which was only used for some set of floating point libcalls. Add more, most of which are for integer operations. Ideally we would generate these functions from tablegen.	2025-12-04 17:49:27 +01:00
Matt Arsenault	1c5b1501ca	CodeGen: Move libcall lowering configuration to subtarget (#168621 ) Previously libcall lowering decisions were made directly in the TargetLowering constructor. Pull these into the subtarget to facilitate turning LibcallLoweringInfo into a separate analysis in the future.	2025-11-25 11:59:56 -05:00
Matt Arsenault	1d73b68463	TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744 ) Query RuntimeLibcalls for the support and the name. The check that the implementation is exactly __guard_local instead of unsupported feels a bit strange.	2025-11-20 20:44:25 -05:00
Matt Arsenault	a757c4e74e	CodeGen: Add subtarget to TargetLoweringBase constructor (#168620 ) Currently LibcallLoweringInfo is defined inside of TargetLowering, which is owned by the subtarget. Pass in the subtarget so we can construct LibcallLoweringInfo with the subtarget. This is a temporary step that should be revertable in the future, after LibcallLoweringInfo is moved out of TargetLowering.	2025-11-19 19:18:13 +00:00
Matt Arsenault	862d34666f	opt: Fix bad merge of #167996 (#168110 ) After the base branch was moved to main, this somehow ended up adding a second definition of RTLCI, instead of modifying the existing one. Also fix other build error with gcc bots.	2025-11-14 12:03:26 -08:00
Matt Arsenault	590ab43e8a	RuntimeLibcalls: Move VectorLibrary handling into TargetOptions (#167996 ) This fixes the -fveclib flag getting lost on its way to the backend. Previously this was its own cl::opt with a random boolean. Move the flag handling into CommandFlags with other backend ABI-ish options, and have clang directly set it, rather than forcing it to go through command line parsing. Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector function. Clang has special handling for TargetLibraryInfo, where it would directly construct one with the vector library in the pass pipeline. RuntimeLibcallsInfo currently is not used as an analysis in codegen, and needs to know the vector library when constructed. RuntimeLibraryAnalysis could follow the same trick that TargetLibraryInfo is using in the future, but a lot more boilerplate changes are needed to thread that analysis through codegen. Ideally this would come from an IR module flag, and nothing would be in TargetOptions. For now, it's better for all of these sorts of controls to be consistent.	2025-11-14 11:19:21 -08:00
Matt Arsenault	4b9771e41a	DAG: Use modf vector libcalls through RuntimeLibcalls (#166986 ) Copy new process from sincos/sincospi	2025-11-11 18:05:35 -08:00
Matt Arsenault	de68181d7f	DAG: Use sincos vector libcalls through RuntimeLibcalls (#166984 ) Copy new process from sincospi.	2025-11-11 10:51:23 -08:00
serge-sans-paille	af146462f9	Remove unused <iterator> inclusion Per https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible this improves compilation time, while not being too intrusive on the codebase.	2025-11-11 13:33:38 +01:00
Matt Arsenault	b7423af8da	RuntimeLibcalls: Add entries for vector sincospi functions (#166981 ) Add libcall entries for sleef and armpl sincospi implementations. This is the start of adding the vector library functions; eventually they should all be tracked here. I'm starting with this case because this is a prerequisite to fix reporting sincospi calls which do not exist on any common targets without regressing vector codegen when these libraries are available.	2025-11-10 10:56:33 -08:00
Matt Arsenault	056d2c12f7	RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (#164987 ) Introduce a new class for the TargetLowering usage. This tracks the subtarget specific lowering decisions for which libcall to use. RuntimeLibcallsInfo is a module level property, which may have multiple implementations of a particular libcall available. This attempts to be a minimum boilerplate patch to introduce the new concept. In the future we should have a tablegen way of selecting which implementations should be used for a subtarget. Currently we do have some conflicting implementations added, it just happens to work out that the default cases to prefer is alphabetically first (plus some of these still are using manual overrides in TargetLowering constructors).	2025-11-05 17:10:36 +00:00
Matt Arsenault	28e9a2832f	DAG: Consider __sincos_stret when deciding to form fsincos (#165169 )	2025-10-28 08:28:09 -07:00
Shimin Cui	531fd45e92	[PPC] Set minimum of largest number of comparisons to use bit test for switch lowering (#155910 ) Currently it is considered suitable to lower to a bit test for a set of switch case clusters when the the number of unique destinations (`NumDests`) and the number of total comparisons (`NumCmps`) satisfy: `(NumDests == 1 && NumCmps >= 3) \|\| (NumDests == 2 && NumCmps >= 5) \|\| (NumDests == 3 && NumCmps >= 6)` However it is found for some cases on powerpc, for example, when NumDests is 3, and the number of comparisons for each destination is all 2, it's not profitable to lower the switch to bit test. This is to add an option to set the minimum of largest number of comparisons to use bit test for switch lowering. --------- Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>	2025-10-28 10:24:32 -04:00
Matt Arsenault	f5a2e6bb8f	CodeGen: Remove overrides of getSSPStackGuardCheck (NFC) (#164044 ) All 3 implementations are just checking if this has the windows check function, so merge that as the only implementation.	2025-10-24 21:17:34 +09:00
Sam Parker	1820102167	Wasm fmuladd relaxed (#163177 ) Reland #161355, after fixing up the cross-projects-tests for the wasm simd intrinsics. Original commit message: Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 16:50:53 +01:00
Sam Parker	30d3441cf0	Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171 ) Reverts llvm/llvm-project#161355 Looks like I've broken some intrinsic code generation.	2025-10-13 11:53:40 +01:00
Sam Parker	a4eb7ea225	[WebAssembly] Lower fmuladd to madd and nmadd (#161355 ) Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 10:36:08 +01:00
Daniel Paoliello	f99b0f3de4	[NFC] RuntimeLibcalls: Prefix the impls with 'Impl_' (#153850 ) As noted in #153256, TableGen is generating reserved names for RuntimeLibcalls, which resulted in a build failure for Arm64EC since `vcruntime.h` defines `__security_check_cookie` as a macro. To avoid using reserved names, all impl names will now be prefixed with `Impl_`. `NumLibcallImpls` was lifted out as a `constexpr size_t` instead of being an enum field. While I was churning the dependent code, I also removed the TODO to move the impl enum into its own namespace and use an `enum class`: I experimented with using an `enum class` and adding a namespace, but we decided it was too verbose so it was dropped.	2025-09-02 09:57:33 -07:00
Sam Tebbs	569d738d4e	[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes (#117007 ) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.	2025-09-02 15:35:15 +01:00
daniel-trujillo-bsc	658a931c5b	[CodeGen][RISCV] Add support of RISCV nontemporal to vector predication instructions. (#153033 ) This PR adds support for VP intrinsics to be aware of the nontemporal metadata information.	2025-08-27 15:48:33 -07:00
Matt Arsenault	65d12622fa	RuntimeLibcalls: Add entries for stackprotector globals (#154930 ) Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie, and __guard_local. As far as I can tell these are all just different names for the same shaped functionality on different systems. These aren't really functions, but special global variable names. They should probably be treated the same way; all the same contexts that need to know about emittable function names also need to know about this. This avoids a special case check in IRSymtab. This isn't a complete change, there's a lot more cleanup which should be done. The stack protector configuration system is a complete mess. There are multiple overlapping controls, used in 3 different places. Some of the target control implementations overlap with conditions used in the emission points, and some use correlated but not identical conditions in different contexts. i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and insertSSPDeclarations are all used in inconsistent ways so I don't know if I've tracked the intention of the system correctly. The PowerPC test change is a bug fix on linux. Previously the manual conditions were based around !isOSOpenBSD, which is not the condition where __stack_chk_guard are used. Now getSDagStackGuard returns the proper global reference, resulting in LOAD_STACK_GUARD getting a MachineMemOperand which allows scheduling.	2025-08-23 10:21:00 +09:00
Nikita Popov	498ef361fe	[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414 ) https://github.com/llvm/llvm-project/pull/152709 exposed the original IR argument type to the CC lowering logic. However, in SDAG, this used the raw type, prior to aggregate splitting. This PR changes it to use the non-aggregate type instead. (This matches what happened in the GlobalISel case already.) I've also added some more detailed documentation on the InputArg/OutputArg fields, to explain how they differ. In most cases ArgVT is going to be the EVT of OrigTy, so they encode very similar information (OrigTy just preserves some additional information lost in EVTs, like pointer types). One case where they do differ is in post-legalization lowering of libcalls, where ArgVT is going to be a legalized type, while OrigTy is going to be the original non-legalized type.	2025-08-13 18:42:26 +02:00
Stephen Long	19ada02086	PreISelIntrinsicLowering: Lower llvm.log to a loop if scalable vec arg (#129744 ) Similar to ab976a1, but for llvm.log.	2025-08-12 01:04:28 +09:00
Nikita Popov	e92b7e9641	[CodeGen] Provide original IR type to CC lowering (NFC) (#152709 ) It is common to have ABI requirements for illegal types: For example, two i64 argument parts that originally came from an fp128 argument may have a different call ABI than ones that came from a i128 argument. The current calling convention lowering does not provide access to this information, so backends come up with various hacks to support it (like additional pre-analysis cached in CCState, or bypassing the default logic entirely). This PR adds the original IR type to InputArg/OutputArg and passes it down to CCAssignFn. It is not actually used anywhere yet, this just does the mechanical changes to thread through the new argument.	2025-08-11 08:57:53 +02:00
Alexander Richardson	3a4b351ba1	[IR] Introduce the `ptrtoaddr` instruction This introduces a new `ptrtoaddr` instruction which is similar to `ptrtoint` but has two differences: 1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance 2) `ptrtoaddr` only extracts (and then extends/truncates) the low index-width bits of the pointer For most architectures, difference 2) does not matter since index (address) width and pointer representation width are the same, but this does make a difference for architectures that have pointers that aren't just plain integer addresses such as AMDGPU fat pointers or CHERI capabilities. This commit introduces textual and bitcode IR support as well as basic code generation, but optimization passes do not handle the new instruction yet so it may result in worse code than using ptrtoint. Follow-up changes will update capture tracking, etc. for the new instruction. RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54 Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/139357	2025-08-08 10:12:39 -07:00
Kazu Hirata	4be22dabc5	[CodeGen] Remove an unnecessary cast (NFC) (#152441 ) getActiveBits() already returns unsigned.	2025-08-07 07:22:42 -07:00
Nikita Popov	406d9b1dd6	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319 ) The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).	2025-08-07 09:12:40 +02:00
Paul Walker	94d374ab6c	[LLVM][CGP] Allow finer control for sinking compares. (#151366 ) Compare sinking is selectable based on the result of hasMultipleConditionRegisters. This function is too coarse grained by not taking into account the differences between scalar and vector compares. This PR extends the interface to take an EVT to allow finer control. The new interface is used by AArch64 to disable sinking of scalable vector compares, but with isProfitableToSinkOperands updated to maintain the cases that are specifically tested.	2025-08-05 11:43:41 +01:00
Abhishek Kaushik	1c0ac80d4a	[DAG] Combine `store + vselect` to `masked_store` (#145176 ) Add a new combine to replace ``` (store ch (vselect cond truevec (load ch ptr offset)) ptr offset) ``` to ``` (mstore ch truevec ptr offset cond) ``` This saves a blend operation on targets that support conditional stores.	2025-08-04 19:05:36 +05:30
jeremyd2019	28b3190053	[LLVM][Cygwin] Enable conditions that are shared with MinGW (#149638 ) Cygwin and MinGW share the auto import behavior that could result in __stack_check_guard being non-dso-local. Allow windres to assume a Cygwin target as well as a MinGW one, so defines like _WIN32 would not be present on Cygwin.	2025-07-29 10:01:04 -07:00
Nikita Popov	fe0dbe0f29	[CodeGen] More consistently expand float ops by default (#150597 ) These float operations were expanded for scalar f32/f64/f128, but not for f16 and more problematically, not for vectors. A small subset of them was separately set to expand for vectors. Change these to always expand by default, and adjust targets to mark these as legal where necessary instead. This is a much safer default, and avoids unnecessary legalization failures because a target failed to manually mark them as expand. Fixes https://github.com/llvm/llvm-project/issues/110753. Fixes https://github.com/llvm/llvm-project/issues/121390.	2025-07-28 09:46:00 +02:00
Matt Arsenault	f4a394fc0c	SafeStack: Check if __safestack_pointer_address is available (#147917 ) Start using RuntimeLibcalls in the base implementation of getSafeStackPointerLocation instead of hardcoding the function names.	2025-07-15 23:26:52 +09:00
Matt Arsenault	a446300d1b	TargetLowering: Avoid a use of PointerType::getUnqual (#147884 ) Use the default globals address space	2025-07-10 19:00:59 +09:00
Matt Arsenault	dc69b00b0a	RuntimeLibcalls: Remove table of soft float compare cond codes (#146082 ) Previously we had a table of entries for every Libcall for the comparison to use against an integer 0 if it was a soft float compare function. This was only relevant to a handful of opcodes, so it was wasteful. Now that we can distinguish the abstract libcall for the compare with the concrete implementation, we can just directly hardcode the comparison against the libcall impl without this configuration system.	2025-07-09 17:13:58 +09:00
Matt Arsenault	3697d6dd98	DAG: Fall back to separate sin and cos when softening sincos (#147468 ) Fix asserting in the error case.	2025-07-09 01:52:46 +09:00
Dominik Steenken	acdf1c7526	[DAG] Add generic expansion for ISD::FCANONICALIZE nodes (#142105 ) This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use `__builtin_canonicalize` for all `f(16\|32\|64\|128)` data types. Canonicalization is implemented here as multiplication by `1.0`, as suggested in [the docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).	2025-07-08 16:12:17 +01:00
Matt Arsenault	b5401624e1	DAG: Add RTLIB::getPOW helper (#147274 ) Co-authored-by: Paul Walker <paul.walker@arm.com>	2025-07-07 21:31:49 +09:00
Austin	a550fef906	[llvm] Use llvm::fill instead of std::fill(NFC) (#146911 ) Use llvm::fill instead of std::fill	2025-07-04 14:10:28 +08:00
Matt Arsenault	58987d2e34	RuntimeLibcalls: Pass in ABI name from MCOptions (#144894 ) ARM needs this to compute the available libcalls.	2025-06-23 22:14:44 +09:00
Matt Arsenault	1c35fe4e6b	RuntimeLibcalls: Pass in exception handling type (#144696 ) All of the ABI options that influence libcall decisions need to be passed in.	2025-06-19 19:08:52 +09:00
Matt Arsenault	5bee2c34bd	RuntimeLibcalls: Pass in FloatABI and EABI type (#144691 ) We need the full set of ABI options to accurately compute the full set of libcalls. This partially resolves missing information required to compute the set of ARM calls.	2025-06-19 19:02:42 +09:00
Craig Topper	a733c6c7bb	[TargetLowering][RISCV] Allow scalable non-simple EVTs to be split even if the element type isn't a legal scalar type. (#144007 ) This fixes an inconsistency in i64 vector handling between RV32 and RV64. Even if i64 isn't legal as a scalar, we should still be able to split a large i64 vector to get down to a legal vector type. We only need to give up if we need to split a vscale x 1 vector.	2025-06-16 10:04:28 -07:00
Peter Collingbourne	645f0e6723	IR: Make Module::getOrInsertGlobal() return a GlobalVariable. After pointer element types were removed this function can only return a GlobalVariable, so reflect that in the type and comments and clean up callers. Reviewers: nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/141323	2025-05-27 12:23:12 -07:00
Nicholas Guy	a1f369e630	[AArch64][SVE] Add dot product lowering for PARTIAL_REDUCE_MLA node (#130933 ) Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only happens when the combine has been performed on the ISD node. Also adds in check to only do the DAG combine when the node can then eventually be lowered, so changes neon tests too. --------- Co-authored-by: James Chesterman <james.chesterman@arm.com>	2025-04-23 13:19:41 +01:00
Reid Kleckner	2538c607e9	[CodeGen] Prune headers and move code out of line for build efficiency, NFC (#135622 ) I noticed these destructors taking time with -ftime-trace and moved some of them for minor build efficiency improvements. The main impact of moving destructors out of line is that it avoids requiring container fields containing other types from being complete, i.e. one can have uptr<T> or vector<T> as a field with an incomplete type T, and that means we can reduce transitive includes, as with LegalizerInfo.h. Move expensive getDebugOperandsForReg template out-of-line. The std::function instantiation shows up in time trace even if you don't use the function.	2025-04-14 22:23:18 -07:00
3405691582	c180e249d0	Fix crash lowering stack guard on OpenBSD/aarch64. (#125416 ) TargetLoweringBase::getIRStackGuard refers to a platform-specific guard variable. Before this change, TargetLoweringBase::getSDagStackGuard only referred to a different variable. This means that SelectionDAGBuilder's getLoadStackGuard does not get memory operands. However, AArch64InstrInfo::expandPostRAPseudo assumes that the passed MachineInstr has nonzero memoperands, causing a segfault. We have two possible options here: either disabling the LOAD_STACK_GUARD node entirely in AArch64TargetLowering::useLoadStackGuardNode or just making the platform-specific values match across TargetLoweringBase. Here, we try the latter.	2025-03-31 09:17:55 -07:00
Jim Lin	49bb51ed91	[RISCV][LibCall] Add libcall for i64 -> bf16 (#130024 ) Add support for lowering i64 -> bf16 with libcall.	2025-03-07 09:23:50 +08:00
James Chesterman	d4a0848dc6	[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207 ) Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line argument (aarch64-enable-partial-reduce-nodes) that indicates whether the intrinsic experimental_vector_partial_ reduce_add will be transformed into the new ISD node. Lowering with the new ISD nodes will, for now, always be done as an expand.	2025-02-18 09:08:47 +00:00

1 2 3 4 5 ...

553 Commits