llvm-project

Author	SHA1	Message	Date
Matt Arsenault	f5a2e6bb8f	CodeGen: Remove overrides of getSSPStackGuardCheck (NFC) (#164044 ) All 3 implementations are just checking if this has the windows check function, so merge that as the only implementation.	2025-10-24 21:17:34 +09:00
Sam Parker	1820102167	Wasm fmuladd relaxed (#163177 ) Reland #161355, after fixing up the cross-projects-tests for the wasm simd intrinsics. Original commit message: Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 16:50:53 +01:00
Sam Parker	30d3441cf0	Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171 ) Reverts llvm/llvm-project#161355 Looks like I've broken some intrinsic code generation.	2025-10-13 11:53:40 +01:00
Sam Parker	a4eb7ea225	[WebAssembly] Lower fmuladd to madd and nmadd (#161355 ) Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 10:36:08 +01:00
Daniel Paoliello	f99b0f3de4	[NFC] RuntimeLibcalls: Prefix the impls with 'Impl_' (#153850 ) As noted in #153256, TableGen is generating reserved names for RuntimeLibcalls, which resulted in a build failure for Arm64EC since `vcruntime.h` defines `__security_check_cookie` as a macro. To avoid using reserved names, all impl names will now be prefixed with `Impl_`. `NumLibcallImpls` was lifted out as a `constexpr size_t` instead of being an enum field. While I was churning the dependent code, I also removed the TODO to move the impl enum into its own namespace and use an `enum class`: I experimented with using an `enum class` and adding a namespace, but we decided it was too verbose so it was dropped.	2025-09-02 09:57:33 -07:00
Sam Tebbs	569d738d4e	[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes (#117007 ) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.	2025-09-02 15:35:15 +01:00
daniel-trujillo-bsc	658a931c5b	[CodeGen][RISCV] Add support of RISCV nontemporal to vector predication instructions. (#153033 ) This PR adds support for VP intrinsics to be aware of the nontemporal metadata information.	2025-08-27 15:48:33 -07:00
Matt Arsenault	65d12622fa	RuntimeLibcalls: Add entries for stackprotector globals (#154930 ) Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie, and __guard_local. As far as I can tell these are all just different names for the same shaped functionality on different systems. These aren't really functions, but special global variable names. They should probably be treated the same way; all the same contexts that need to know about emittable function names also need to know about this. This avoids a special case check in IRSymtab. This isn't a complete change, there's a lot more cleanup which should be done. The stack protector configuration system is a complete mess. There are multiple overlapping controls, used in 3 different places. Some of the target control implementations overlap with conditions used in the emission points, and some use correlated but not identical conditions in different contexts. i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and insertSSPDeclarations are all used in inconsistent ways so I don't know if I've tracked the intention of the system correctly. The PowerPC test change is a bug fix on linux. Previously the manual conditions were based around !isOSOpenBSD, which is not the condition where __stack_chk_guard are used. Now getSDagStackGuard returns the proper global reference, resulting in LOAD_STACK_GUARD getting a MachineMemOperand which allows scheduling.	2025-08-23 10:21:00 +09:00
Nikita Popov	498ef361fe	[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414 ) https://github.com/llvm/llvm-project/pull/152709 exposed the original IR argument type to the CC lowering logic. However, in SDAG, this used the raw type, prior to aggregate splitting. This PR changes it to use the non-aggregate type instead. (This matches what happened in the GlobalISel case already.) I've also added some more detailed documentation on the InputArg/OutputArg fields, to explain how they differ. In most cases ArgVT is going to be the EVT of OrigTy, so they encode very similar information (OrigTy just preserves some additional information lost in EVTs, like pointer types). One case where they do differ is in post-legalization lowering of libcalls, where ArgVT is going to be a legalized type, while OrigTy is going to be the original non-legalized type.	2025-08-13 18:42:26 +02:00
Stephen Long	19ada02086	PreISelIntrinsicLowering: Lower llvm.log to a loop if scalable vec arg (#129744 ) Similar to ab976a1, but for llvm.log.	2025-08-12 01:04:28 +09:00
Nikita Popov	e92b7e9641	[CodeGen] Provide original IR type to CC lowering (NFC) (#152709 ) It is common to have ABI requirements for illegal types: For example, two i64 argument parts that originally came from an fp128 argument may have a different call ABI than ones that came from a i128 argument. The current calling convention lowering does not provide access to this information, so backends come up with various hacks to support it (like additional pre-analysis cached in CCState, or bypassing the default logic entirely). This PR adds the original IR type to InputArg/OutputArg and passes it down to CCAssignFn. It is not actually used anywhere yet, this just does the mechanical changes to thread through the new argument.	2025-08-11 08:57:53 +02:00
Alexander Richardson	3a4b351ba1	[IR] Introduce the `ptrtoaddr` instruction This introduces a new `ptrtoaddr` instruction which is similar to `ptrtoint` but has two differences: 1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance 2) `ptrtoaddr` only extracts (and then extends/truncates) the low index-width bits of the pointer For most architectures, difference 2) does not matter since index (address) width and pointer representation width are the same, but this does make a difference for architectures that have pointers that aren't just plain integer addresses such as AMDGPU fat pointers or CHERI capabilities. This commit introduces textual and bitcode IR support as well as basic code generation, but optimization passes do not handle the new instruction yet so it may result in worse code than using ptrtoint. Follow-up changes will update capture tracking, etc. for the new instruction. RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54 Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/139357	2025-08-08 10:12:39 -07:00
Kazu Hirata	4be22dabc5	[CodeGen] Remove an unnecessary cast (NFC) (#152441 ) getActiveBits() already returns unsigned.	2025-08-07 07:22:42 -07:00
Nikita Popov	406d9b1dd6	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319 ) The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).	2025-08-07 09:12:40 +02:00
Paul Walker	94d374ab6c	[LLVM][CGP] Allow finer control for sinking compares. (#151366 ) Compare sinking is selectable based on the result of hasMultipleConditionRegisters. This function is too coarse grained by not taking into account the differences between scalar and vector compares. This PR extends the interface to take an EVT to allow finer control. The new interface is used by AArch64 to disable sinking of scalable vector compares, but with isProfitableToSinkOperands updated to maintain the cases that are specifically tested.	2025-08-05 11:43:41 +01:00
Abhishek Kaushik	1c0ac80d4a	[DAG] Combine `store + vselect` to `masked_store` (#145176 ) Add a new combine to replace ``` (store ch (vselect cond truevec (load ch ptr offset)) ptr offset) ``` to ``` (mstore ch truevec ptr offset cond) ``` This saves a blend operation on targets that support conditional stores.	2025-08-04 19:05:36 +05:30
jeremyd2019	28b3190053	[LLVM][Cygwin] Enable conditions that are shared with MinGW (#149638 ) Cygwin and MinGW share the auto import behavior that could result in __stack_check_guard being non-dso-local. Allow windres to assume a Cygwin target as well as a MinGW one, so defines like _WIN32 would not be present on Cygwin.	2025-07-29 10:01:04 -07:00
Nikita Popov	fe0dbe0f29	[CodeGen] More consistently expand float ops by default (#150597 ) These float operations were expanded for scalar f32/f64/f128, but not for f16 and more problematically, not for vectors. A small subset of them was separately set to expand for vectors. Change these to always expand by default, and adjust targets to mark these as legal where necessary instead. This is a much safer default, and avoids unnecessary legalization failures because a target failed to manually mark them as expand. Fixes https://github.com/llvm/llvm-project/issues/110753. Fixes https://github.com/llvm/llvm-project/issues/121390.	2025-07-28 09:46:00 +02:00
Matt Arsenault	f4a394fc0c	SafeStack: Check if __safestack_pointer_address is available (#147917 ) Start using RuntimeLibcalls in the base implementation of getSafeStackPointerLocation instead of hardcoding the function names.	2025-07-15 23:26:52 +09:00
Matt Arsenault	a446300d1b	TargetLowering: Avoid a use of PointerType::getUnqual (#147884 ) Use the default globals address space	2025-07-10 19:00:59 +09:00
Matt Arsenault	dc69b00b0a	RuntimeLibcalls: Remove table of soft float compare cond codes (#146082 ) Previously we had a table of entries for every Libcall for the comparison to use against an integer 0 if it was a soft float compare function. This was only relevant to a handful of opcodes, so it was wasteful. Now that we can distinguish the abstract libcall for the compare with the concrete implementation, we can just directly hardcode the comparison against the libcall impl without this configuration system.	2025-07-09 17:13:58 +09:00
Matt Arsenault	3697d6dd98	DAG: Fall back to separate sin and cos when softening sincos (#147468 ) Fix asserting in the error case.	2025-07-09 01:52:46 +09:00
Dominik Steenken	acdf1c7526	[DAG] Add generic expansion for ISD::FCANONICALIZE nodes (#142105 ) This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use `__builtin_canonicalize` for all `f(16\|32\|64\|128)` data types. Canonicalization is implemented here as multiplication by `1.0`, as suggested in [the docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).	2025-07-08 16:12:17 +01:00
Matt Arsenault	b5401624e1	DAG: Add RTLIB::getPOW helper (#147274 ) Co-authored-by: Paul Walker <paul.walker@arm.com>	2025-07-07 21:31:49 +09:00
Austin	a550fef906	[llvm] Use llvm::fill instead of std::fill(NFC) (#146911 ) Use llvm::fill instead of std::fill	2025-07-04 14:10:28 +08:00
Matt Arsenault	58987d2e34	RuntimeLibcalls: Pass in ABI name from MCOptions (#144894 ) ARM needs this to compute the available libcalls.	2025-06-23 22:14:44 +09:00
Matt Arsenault	1c35fe4e6b	RuntimeLibcalls: Pass in exception handling type (#144696 ) All of the ABI options that influence libcall decisions need to be passed in.	2025-06-19 19:08:52 +09:00
Matt Arsenault	5bee2c34bd	RuntimeLibcalls: Pass in FloatABI and EABI type (#144691 ) We need the full set of ABI options to accurately compute the full set of libcalls. This partially resolves missing information required to compute the set of ARM calls.	2025-06-19 19:02:42 +09:00
Craig Topper	a733c6c7bb	[TargetLowering][RISCV] Allow scalable non-simple EVTs to be split even if the element type isn't a legal scalar type. (#144007 ) This fixes an inconsistency in i64 vector handling between RV32 and RV64. Even if i64 isn't legal as a scalar, we should still be able to split a large i64 vector to get down to a legal vector type. We only need to give up if we need to split a vscale x 1 vector.	2025-06-16 10:04:28 -07:00
Peter Collingbourne	645f0e6723	IR: Make Module::getOrInsertGlobal() return a GlobalVariable. After pointer element types were removed this function can only return a GlobalVariable, so reflect that in the type and comments and clean up callers. Reviewers: nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/141323	2025-05-27 12:23:12 -07:00
Nicholas Guy	a1f369e630	[AArch64][SVE] Add dot product lowering for PARTIAL_REDUCE_MLA node (#130933 ) Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only happens when the combine has been performed on the ISD node. Also adds in check to only do the DAG combine when the node can then eventually be lowered, so changes neon tests too. --------- Co-authored-by: James Chesterman <james.chesterman@arm.com>	2025-04-23 13:19:41 +01:00
Reid Kleckner	2538c607e9	[CodeGen] Prune headers and move code out of line for build efficiency, NFC (#135622 ) I noticed these destructors taking time with -ftime-trace and moved some of them for minor build efficiency improvements. The main impact of moving destructors out of line is that it avoids requiring container fields containing other types from being complete, i.e. one can have uptr<T> or vector<T> as a field with an incomplete type T, and that means we can reduce transitive includes, as with LegalizerInfo.h. Move expensive getDebugOperandsForReg template out-of-line. The std::function instantiation shows up in time trace even if you don't use the function.	2025-04-14 22:23:18 -07:00
3405691582	c180e249d0	Fix crash lowering stack guard on OpenBSD/aarch64. (#125416 ) TargetLoweringBase::getIRStackGuard refers to a platform-specific guard variable. Before this change, TargetLoweringBase::getSDagStackGuard only referred to a different variable. This means that SelectionDAGBuilder's getLoadStackGuard does not get memory operands. However, AArch64InstrInfo::expandPostRAPseudo assumes that the passed MachineInstr has nonzero memoperands, causing a segfault. We have two possible options here: either disabling the LOAD_STACK_GUARD node entirely in AArch64TargetLowering::useLoadStackGuardNode or just making the platform-specific values match across TargetLoweringBase. Here, we try the latter.	2025-03-31 09:17:55 -07:00
Jim Lin	49bb51ed91	[RISCV][LibCall] Add libcall for i64 -> bf16 (#130024 ) Add support for lowering i64 -> bf16 with libcall.	2025-03-07 09:23:50 +08:00
James Chesterman	d4a0848dc6	[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207 ) Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line argument (aarch64-enable-partial-reduce-nodes) that indicates whether the intrinsic experimental_vector_partial_ reduce_add will be transformed into the new ISD node. Lowering with the new ISD nodes will, for now, always be done as an expand.	2025-02-18 09:08:47 +00:00
Benjamin Maxwell	19556eccf6	[RTLIB] Rename getFSINCOS() to getSINCOS (NFC) (#126705 ) This makes the name more consistent with the other helpers.	2025-02-11 11:51:35 +00:00
Benjamin Maxwell	701223ac20	[IR] Add llvm.sincospi intrinsic (#125873 ) This adds the `llvm.sincospi` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.sincospi` intrinsic takes a floating-point value and returns both the sine and cosine of the value multiplied by pi. It computes the result more accurately than the naive approach of doing the multiplication ahead of time, especially for large input values. ``` declare { float, float } @llvm.sincospi.f32(float %Val) declare { double, double } @llvm.sincospi.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val) ``` Currently, the default lowering of this intrinsic relies on the `sincospi[f\|l]` functions being available in the target's runtime (e.g. libc).	2025-02-11 09:01:30 +00:00
Benjamin Maxwell	4bf97aa818	[IR] Add `llvm.modf` intrinsic (#121948 ) This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.modf` intrinsic takes a floating-point value and returns both the integral and fractional parts (as a struct). ``` declare { float, float } @llvm.modf.f32(float %Val) declare { double, double } @llvm.modf.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val) ``` This corresponds to the libm `modf` function but returns multiple values in a struct (rather than take output pointers), which makes it easier to vectorize.	2025-02-07 09:25:13 +00:00
Stephen Long	ab976a1712	PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568 )	2025-01-24 14:02:06 -05:00
Graham Hunter	d9f165ddea	[SDAG] Add an ISD node to help lower vector.extract.last.active (#118810 ) Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder. The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.	2025-01-20 12:57:05 +00:00
Craig Topper	0d9fc17433	[GISel] Remove unused DataLayout operand from getApproximateEVTForLLT (#119833 )	2024-12-13 09:09:20 -08:00
Sergei Barannikov	e55c167777	[TargetLowering] Return Align from getByValTypeAlignment (NFC) (#119233 )	2024-12-09 23:39:19 +03:00
Feng Zou	28e4aad45a	[X86][BF16] Add libcall for FP128 -> BF16 (#115825 ) This is to fix #115710.	2024-11-12 15:54:09 +08:00
Matt Arsenault	ea859005b5	SafeStack: Respect alloca addrspace (#112536 ) Just insert addrspacecast in cases where the alloca uses a different address space, since I don't know what else you could possibly do.	2024-11-04 17:51:30 -08:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
Ellis Hoag	6ab26eab4f	Check hasOptSize() in shouldOptimizeForSize() (#112626 )	2024-10-28 09:45:03 -07:00
Tex Riddell	875afa939d	[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 Based on example PR #96222 and fix PR #101268, with some differences due to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp). - Add llvm.experimental.constrained.atan2 - Intrinsics.td, ConstrainedOps.def, LangRef.rst - Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp - Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp, and LegalizeVectorTypes.cpp - Update isKnownNeverNaN in SelectionDAG.cpp - Update SelectionDAGDumper.cpp - Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp - TargetLoweringBase.cpp - Expand for vectors, promote f16 - X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC Part 4 for Implement the atan2 HLSL Function #70096.	2024-10-16 11:43:17 -07:00
Benjamin Maxwell	3073c3c229	[SDAG] Avoid creating redundant stack slots when lowering FSINCOS (#108401 ) When lowering `FSINCOS` to a library call (that takes output pointers) we can avoid creating new stack allocations if the results of the `FSINCOS` are being stored. Instead, we can take the destination pointers from the stores and pass those to the library call. --- Note: As a NFC this also adds (and uses) `RTLIB::getFSINCOS()`.	2024-09-24 13:36:21 +01:00
Phoebe Wang	c18be32185	Reland "[X86][BF16] Add libcall for F80 -> BF16 (#109116 )" (#109143 ) This reverts commit ababfee78714313a0cad87591b819f0944b90d09. Add X86 FP80 check.	2024-09-19 15:39:07 +08:00
Phoebe Wang	a10c9f994b	Revert "[X86][BF16] Add libcall for F80 -> BF16" (#109140 ) Reverts llvm/llvm-project#109116	2024-09-18 21:35:38 +08:00

1 2 3 4 5 ...

538 Commits