llvm-project

Author	SHA1	Message	Date
Peter Collingbourne	191af6c254	Add llvm.cond.loop intrinsic. The llvm.cond.loop intrinsic is semantically equivalent to a conditional branch conditioned on ``pred`` to a basic block consisting only of an unconditional branch to itself. Unlike such a branch, it is guaranteed to use specific instructions. This allows an interrupt handler or other introspection mechanism to straightforwardly detect whether the program is currently spinning in the infinite loop and possibly terminate the program if so. The intent is that this intrinsic may be used as a more efficient alternative to a conditional branch to a call to ``llvm.trap`` in circumstances where the loop detection is guaranteed to be present. This construct has been experimentally determined to be executed more efficiently (when the branch is not taken) than a conditional branch to a trap instruction on AMD and older Intel microarchitectures, and is also more code size efficient by avoiding the need to emit a trap instruction and possibly a long branch instruction. On i386 and x86_64, the infinite loop is guaranteed to consist of a short conditional branch instruction that branches to itself. Specifically, the first byte of the instruction will be between 0x70 and 0x7F, and the second byte will be 0xFE. Part of this RFC: https://discourse.llvm.org/t/rfc-optimizing-conditional-traps/89456 Reviewers: arsenm, RKSimon, fmayer, vitalybuka Pull Request: https://github.com/llvm/llvm-project/pull/177686	2026-02-06 17:11:15 -08:00
Stanislav Mekhanoshin	ba8df39898	Add SDNodeFlag::NoConvergent (#179323 )	2026-02-04 10:21:45 -08:00
Nicolai Hähnle	af836ff60c	[CodeGen] Add getTgtMemIntrinsic overload for multiple memory operands (NFC) (#175843 ) There are target intrinsics that logically require two MMOs, such as llvm.amdgcn.global.load.lds, which is a copy from global memory to LDS, so there's both a load and a store to different addresses. Add an overload of getTgtMemIntrinsic that produces intrinsic info in a vector, and implement it in terms of the existing (now protected) overload. GlobalISel and SelectionDAG paths are updated to support multiple MMOs. The main part of this change is supporting multiple MMOs in MemIntrinsicNodes. Converting the backends to using the new overload is a fairly mechanical step that is done in a separate change in the hope that that allows reducing merging pains during review and for downstreams. A later change will then enable using multiple MMOs in AMDGPU.	2026-02-02 21:58:42 +00:00
Peter Collingbourne	1a80fd6d8c	SelectionDAG: Add -print-sdnode-addrs flag. Similar to -print-inst-addrs and -print-mi-addrs.	2026-01-29 16:22:42 -08:00
moorabbit	a5fa246435	[Clang] Add `__builtin_stack_address` (#148281 ) Add support for `__builtin_stack_address` builtin. The semantics match those of GCC's builtin with the same name. `__builtin_stack_address` returns the starting address of the stack region that may be used by called functions. It may or may not include the space used for on-stack arguments passed to a callee (See [GCC Bug/121013](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013)). Fixes #82632.	2026-01-12 10:01:57 +01:00
Luke Lau	ad4bfac732	[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796 ) This PR implements the first change outlined in https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel In order to allow non-immediate offsets in the llvm.vector.splice intrinsic, we need to separate out the "shift left" and "shift right" modes into two separate intrinsics, which were previously determined by whether or not the offset is positive or negative. The description in the LangRef has also been reworded in terms of sliding elements left or right and extracting either the upper or lower half as opposed to extracting from a certain index, which brings it inline with the definition of `llvm.fshr.`/`llvm.fshl.`. This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into their new equivalent one based on their offset, so existing uses of vector.splice should still work. Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced in this PR to keep the diff small and kick the tyres on the AutoUpgrader a bit. I planned to do this in a follow up NFC but can include it in this PR if reviewers prefer. Similarly the shuffle costing kind `SK_Splice` has just been kept the same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight` later.	2026-01-06 15:41:26 +08:00
Ramkumar Ramachandra	9e5e267a03	[ISel] Introduce llvm.clmul intrinsic (#168731 ) In line with a std proposal to introduce the llvm.clmul family of intrinsics corresponding to carry-less multiply operations. This work builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives), and follow-up patches will introduce custom-lowering on supported targets, replacing target-specific clmul intrinsics. Testing is done on the RISC-V target, which should be sufficient to prove that the intrinsics work, since no RISC-V specific lowering has been added. Ref: https://isocpp.org/files/papers/P3642R3.html Co-authored-by: Craig Topper <craig.topper@sifive.com>	2026-01-05 20:24:06 +00:00
Craig Topper	1b43f5cec6	[RISCV][SelectionDAG] Add a ISD::CTLS node for count leading redundant sign bits. Use it to select CLS(W). (#173417 ) The RISC-V P extension adds an instruction equivalent to __builtin_clrsb. AArch64 has a similar instruction that we currently fail to select when using the builtin. This patch adds a combine based on the canonical version of the pattern emitted by clang for the builtin, (add (ctlz (xor x, (sra x, bw-1)))), -1). I'm starting the combine at the ctlz because the outer add can easily be combined into other nodes obscuring the full pattern. So we generate (add (ctls x), 1) and hope the add will be combined away. I've also added a combine for the pattern AArch64 recognizes (ctlz_zero_undef (or (shl (xor x, (sra x, bw-1)), 1), 1)). I've only enabled the combines when the target has a Legal or Custom action for the operation, taking into account type promotion. We can relax this in the future by adding a default expansion to LegalizeDAG and adding more type legalization rules.	2026-01-04 18:00:00 -08:00
Damian Heaton	70f4b596cf	Add `llvm.vector.partial.reduce.fadd` intrinsic (#159776 ) With this intrinsic, and supporting SelectionDAG nodes, we can better make use of instructions such as AArch64's `FDOT`.	2025-11-07 15:36:54 +00:00
Daniel Thornburgh	5f08fb4d72	[IR] llvm.reloc.none intrinsic for no-op symbol references (#147427 ) This intrinsic emits a BFD_RELOC_NONE relocation at the point of call, which allows optimizations and languages to explicitly pull in symbols from static libraries without there being any code or data that has an effectual relocation against such a symbol. See issue #146159 for context.	2025-11-06 08:52:46 -08:00
Fabian Ritter	a3ea51e4f1	[SDAG] Introduce inbounds flag for ISD::PTRADD (#162477 ) This patch introduces SDNodeFlags::InBounds, to show that an ISD::PTRADD SDNode implements an inbounds getelementptr operation (i.e., the pointer operand is in bounds wrt. an allocated object it is based on, and the arithmetic does not change that). The flag is set in the DAG construction when lowering inbounds GEPs. Inbounds information is useful in the ISel when selecting memory instructions that perform address computations whose intermediate steps must be in the same memory region as the final result. Follow-up patches to propagate the flag in DAGCombines and to use it when lowering AMDGPU's flat memory instructions, where the immediate offset must not affect the memory aperture of the address (similar to this GISel patch: #153001), are planned. This mirrors #150900, which has introduced a similar flag in GlobalISel. This patch supersedes #131862, which previously attempted to introduce an SDNodeFlags::InBounds flag. The difference between this PR and #131862 is that there is now an ISD::PTRADD opcode (PR #140017) and the InBounds flag is only defined to apply to ISD::PTRADD DAG nodes. It is therefore unambiguous that in-bounds-ness refers to a memory object into which the left operand of the PTRADD node points (in contrast to #131862, where InBounds would have applied to commutative ISD::ADD nodes, so that the semantics would be more difficult to reason about). For SWDEV-516125.	2025-10-23 09:35:33 +02:00
Sam Parker	1820102167	Wasm fmuladd relaxed (#163177 ) Reland #161355, after fixing up the cross-projects-tests for the wasm simd intrinsics. Original commit message: Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 16:50:53 +01:00
Sam Parker	30d3441cf0	Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171 ) Reverts llvm/llvm-project#161355 Looks like I've broken some intrinsic code generation.	2025-10-13 11:53:40 +01:00
Sam Parker	a4eb7ea225	[WebAssembly] Lower fmuladd to madd and nmadd (#161355 ) Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 10:36:08 +01:00
Min-Yih Hsu	c2c2e4ec90	[SelectionDAG] Add support to dump DAGs with sorted nodes (#161097 ) An alternative approach to #149732 , which sorts the DAG before dumping it. That approach runs a risk of altering the codegen result as we don't know if any of the downstream DAG users relies on the node ID, which was updated as part of the sorting. The new method proposed by this PR does not update the node ID or any of the DAG's internal states: the newly added `SelectionDAG::getTopologicallyOrderedNodes` is a const member function that returns a list of all nodes in their topological order.	2025-10-03 16:18:21 -07:00
Sam Tebbs	569d738d4e	[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes (#117007 ) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.	2025-09-02 15:35:15 +01:00
Nikita Popov	ab1f6ce482	[IR][SDAG] Remove lifetime size handling from SDAG (#150944 ) Split out from https://github.com/llvm/llvm-project/pull/150248: Specify that the argument of lifetime.start/lifetime.end is ignored and will be removed in the future. Remove lifetime size handling from SDAG. The size was previously discarded during isel, so was always ignored for stack coloring anyway. Where necessary, obtain the size of the full frame index.	2025-07-29 09:53:59 +02:00
Nikita Popov	a7a1df8f72	[CodeGen] Remove handling for lifetime.start/end on non-alloca (#149838 ) After https://github.com/llvm/llvm-project/pull/149310 we are guaranteed that the argument is an alloca, so we don't need to look at underlying objects (which was not a correct thing to do anyway). This also drops the offset argument for lifetime nodes in SDAG. The offset is fixed to zero now. (Peculiarly, while SDAG pretended to have an offset, it just gets silently dropped during selection.)	2025-07-22 09:44:59 +02:00
Philip Reames	939666380f	[SDAG] Add partial_reduce_sumla node (#141267 ) We have recently added the partial_reduce_smla and partial_reduce_umla nodes to represent Acc += ext(b) * ext(b) where the two extends have to have the same source type, and have the same extend kind. For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which correspond to the existing nodes, but we also have vqdotsu which represents the case where the two extends are sign and zero respective (i.e. not the same type of extend). This patch adds a partial_reduce_sumla node which has sign extension for A, and zero extension for B. The addition is somewhat mechanical.	2025-06-09 07:17:45 -07:00
Fabian Ritter	8adcc8a669	[SelectionDAG] Introduce ISD::PTRADD (#140017 ) This opcode represents the addition of a pointer value (first operand) and an integer offset (second operand). PTRADD nodes are only generated if the TargetMachine opts in by overriding TargetMachine::shouldPreservePtrArith(). The PTRADD node and respective visitPTRADD() function were adapted by @rgwott from the CHERI/Morello LLVM tree. Original authors: @davidchisnall, @jrtc27, @arichardson. The changes in this PR were extracted from PR #105669. --------- Co-authored-by: David Chisnall <github@theravensnest.org> Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com> Co-authored-by: Alexander Richardson <alexrichardson@google.com> Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>	2025-05-28 09:09:17 +02:00
Jon Roelofs	714096c132	[LLVM] Skip dumping inline SDag children (#141359 ) If they're simple enough to render inline, we don't need to dump them again in the recursive walk.	2025-05-26 19:40:01 -07:00
Jon Roelofs	346a72f2ca	[LLVM] Add color to SDNode ID's when dumping (#141295 ) This is especially helpful for the recursive 'Cannot select:' dumps, where colors help distinguish nodes at a quick glance.	2025-05-24 09:40:29 -07:00
Kerry McLaughlin	0bc3993716	[SelectionDAG] Add an ISD node for for get.active.lane.mask (#139084 ) For now expansion still happens in SelectionDAGBuilder when GET_ACTIVE_LANE_MASK is not legal on the target. This patch also includes changes in AArch64ISelLowering to replace handling of the get.active.lane.mask intrinsic to use the ISD node. Tablegen patterns are added which match to whilelo for scalable types. A follow up change will add support for more types to be lowered to GET_ACTIVE_LANE_MASK by allowing splitting of the node.	2025-05-15 09:14:46 +01:00
YunQiang Su	780054d3ff	CodeGen: Add ISD::AssertNoFPClass (#138839 ) It is used to mark a value that we are sure that it is not some fcType. The examples include: * An arguments of a function is marked with nofpclass * Output value of an intrinsic can be sure to not be some type So that the following operation can make some assumptions.	2025-05-15 16:05:15 +08:00
Jonathan Thackray	6e49f73825	Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-30 22:06:37 +01:00
YunQiang Su	db859db74d	Revert "CodeGen: Add ISD::AssertNoFPClass (#135946 )" This reverts commit f0c61d2242bbc7576ca5e4137a5ea8f63e4859a9.	2025-04-30 16:16:26 +08:00
Jonathan Thackray	7ee0097b48	Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657 ) Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4	2025-04-28 16:53:36 +01:00
Jonathan Thackray	ba420d8122	[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-28 15:31:44 +01:00
YunQiang Su	f0c61d2242	CodeGen: Add ISD::AssertNoFPClass (#135946 ) It is used to mark a value that we are sure that it is not some fcType. The examples include: * An arguments of a function is marked with nofpclass * Output value of an intrinsic can be sure to not be some type So that the following operation can make some assumptions. --------- Co-authored-by: Your Name <you@example.com>	2025-04-25 09:12:41 +08:00
zhijian lin	378ac572ac	Reland "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135056 ) A new ISD::POISON SDNode is introduced to represent the poison value in the IR, replacing the previous use of ISD::UNDEF	2025-04-10 11:29:14 -04:00
Jakub Kuderski	ef1088f703	Revert "[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR." (#135060 ) Reverts llvm/llvm-project#125883 This PR causes crashes in RISC-V codegen around f16/f64 poison values: https://github.com/llvm/llvm-project/pull/125883#issuecomment-2787048206	2025-04-09 14:40:56 -04:00
zhijian lin	8fddef8483	[SelectionDAG] Introducing a new ISD::POISON SDNode to represent the poison value in the IR. (#125883 ) A new ISD::POISON SDNode is introduced to represent the `poison value` in the IR, replacing the previous use of ISD::UNDEF.	2025-04-07 10:03:05 -04:00
Sergei Barannikov	0a1742708d	[SelectionDAG] Wire up -gen-sdnode-info TableGen backend (#125358 ) This patch introduces SelectionDAGGenTargetInfo and SDNodeInfo classes, which provide methods for accessing the generated SDNode descriptions. Pull Request: https://github.com/llvm/llvm-project/pull/125358 Draft PR: https://github.com/llvm/llvm-project/pull/119709 RFC: https://discourse.llvm.org/t/rfc-tablegen-erating-sdnode-descriptions	2025-04-06 13:14:37 +03:00
Craig Topper	7bd2be4266	[SelectionDAG] Use Register and MCRegister. NFC Add operators to Register to supporting adding an offset to get another Register.	2025-03-02 22:33:25 -08:00
James Chesterman	d4a0848dc6	[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207 ) Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line argument (aarch64-enable-partial-reduce-nodes) that indicates whether the intrinsic experimental_vector_partial_ reduce_add will be transformed into the new ISD node. Lowering with the new ISD nodes will, for now, always be done as an expand.	2025-02-18 09:08:47 +00:00
Benjamin Maxwell	701223ac20	[IR] Add llvm.sincospi intrinsic (#125873 ) This adds the `llvm.sincospi` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.sincospi` intrinsic takes a floating-point value and returns both the sine and cosine of the value multiplied by pi. It computes the result more accurately than the naive approach of doing the multiplication ahead of time, especially for large input values. ``` declare { float, float } @llvm.sincospi.f32(float %Val) declare { double, double } @llvm.sincospi.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val) ``` Currently, the default lowering of this intrinsic relies on the `sincospi[f\|l]` functions being available in the target's runtime (e.g. libc).	2025-02-11 09:01:30 +00:00
Rahul Joshi	0f674cce82	[NFC][LLVM] Remove unused `TargetIntrinsicInfo` class (#126003 ) Remove `TargetIntrinsicInfo` class as its practically unused (its pure virtual with no subclasses) and its references in the code.	2025-02-10 14:56:30 -08:00
Benjamin Maxwell	4bf97aa818	[IR] Add `llvm.modf` intrinsic (#121948 ) This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.modf` intrinsic takes a floating-point value and returns both the integral and fractional parts (as a struct). ``` declare { float, float } @llvm.modf.f32(float %Val) declare { double, double } @llvm.modf.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val) ``` This corresponds to the libm `modf` function but returns multiple values in a struct (rather than take output pointers), which makes it easier to vectorize.	2025-02-07 09:25:13 +00:00
Graham Hunter	d9f165ddea	[SDAG] Add an ISD node to help lower vector.extract.last.active (#118810 ) Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder. The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.	2025-01-20 12:57:05 +00:00
Antonio Frighetto	8a2113c5c5	Reapply "[SelectionDAG] Add preliminary plumbing for `samesign` flag" Original commit: 19c8475871faee5ebb06281034872a85a2552675 Multiple 2-stage sanitizer buildbots were reporting failures, the issue has been addressed separately in 29246a92aee87e86cbc2bb64ee520d7385644f34.	2024-11-02 14:19:58 +01:00
Vitaly Buka	c5eb591257	Revert "[SelectionDAG] Add preliminary plumbing for `samesign` flag" (#114647 ) Crashes on ARM builds https://lab.llvm.org/buildbot/#/builders/85/builds/2548 ``` DAGCombiner.cpp:16157: SDValue (anonymous namespace)::DAGCombiner::visitFREEZE(SDNode *): Assertion `DAG.isGuaranteedNotToBeUndefOrPoison(R, false) && "Can't create node that may be undef/poison!"' failed. ``` Issue #114648 This reverts commit 19c8475871faee5ebb06281034872a85a2552675.	2024-11-02 01:35:13 -07:00
Antonio Frighetto	19c8475871	[SelectionDAG] Add preliminary plumbing for `samesign` flag Extend recently-added poison-generating IR flag to codegen as well.	2024-10-31 19:47:50 +01:00
Tex Riddell	875afa939d	[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 Based on example PR #96222 and fix PR #101268, with some differences due to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp). - Add llvm.experimental.constrained.atan2 - Intrinsics.td, ConstrainedOps.def, LangRef.rst - Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp - Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp, and LegalizeVectorTypes.cpp - Update isKnownNeverNaN in SelectionDAG.cpp - Update SelectionDAGDumper.cpp - Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp - TargetLoweringBase.cpp - Expand for vectors, promote f16 - X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC Part 4 for Implement the atan2 HLSL Function #70096.	2024-10-16 11:43:17 -07:00
Matt Arsenault	9578db9c11	DAG: Handle atomic fsub in node dumper	2024-09-13 10:22:27 +04:00
anjenner	4af249fe6e	Add usub_cond and usub_sat operations to atomicrmw (#105568 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2024-09-06 16:19:20 +01:00
Stephen Tozer	3d08ade7bd	[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149 ) This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>	2024-08-29 17:53:32 +01:00
YunQiang Su	fb9e685fc4	Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649 ) C23 introduced new functions fminimum_num and fmaximum_num, and they follow the minimumNumber and maximumNumber of IEEE754-2019. Let's introduce new intrinsics to support them. This patch introduces support only support for scalar values. The support of vector (vp, vp.reduce, vector.reduce), experimental.constrained will be added in future patches. With this patch, MIPSr6 and LoongArch can work out of box with fcanonical and fmax/fmin. Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while they have no fcanonical support yet. I will add it in future patches. The FMIN/FMAX of RISC-V instructions follows the minimumNumber/maximumNumber of IEEE754-2019. We can just add it in future patch. Background https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735 Currently we have fminnum/fmaxnum, which have different behavior on different platform for NUM vs sNaN: 1) Fallback to fmin(3)/fmax(3): return qNaN. 2) ARM64/ARM32+Neon: same as libc. 3) MIPSr6/LoongArch/RISC-V: return NUM. And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008 will submit as separated patches.	2024-08-15 14:09:36 +08:00
hanbeom	0d074ba197	[DAG] Support saturated truncate (#99418 ) A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it. Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs: `ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and the range of values matches the range of signed values of the destination type. `ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type. `ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type. These ISDs indicate a saturated truncate. Fixes https://github.com/llvm/llvm-project/issues/85903	2024-08-14 09:52:36 +01:00
Lawrence Benson	177ce1900f	[LLVM] Add `llvm.experimental.vector.compress` intrinsic (#92289 ) This PR adds a new vector intrinsic `@llvm.experimental.vector.compress` to "compress" data within a vector based on a selection mask, i.e., it moves all selected values (i.e., where `mask[i] == 1`) to consecutive lanes in the result vector. A `passthru` vector can be provided, from which remaining lanes are filled. The main reason for this is that the existing `@llvm.masked.compressstore` has very strong constraints in that it can only write values that were selected, resulting in guard branches for all targets except AVX-512 (and even there the AMD implementation is _very_ slow). More instruction sets support "compress" logic, but only within registers. So to store the values, an additional store is needed. But this combination is likely significantly faster on many target as it avoids branches. In follow up PRs, my plan is to add target-specific lowerings for x86, SVE, and possibly RISCV. I also want to combine this with a store instruction, as this is probably a common case and we can avoid some memory writes in that case. See [discussion in forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663) for initial discussion on the design.	2024-07-17 14:24:24 +02:00
Farzon Lotfi	0b58f34c98	[X86][CodeGen] Add base trig intrinsic lowerings (#96222 ) This change is an implementation of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds constraint intrinsics and some lowering cases for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. The only x86 specific change was for f80. https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966 The x86 lowering is going to be done in three pr changes with this being the first. A second PR will be put up for Loop Vectorizing and then SLPVectorizer. The constraint intrinsics is also going to be in multiple parts, but just 2. This part covers just the llvm specific changes, part2 will cover clang specifc changes and legalization for backends than have special legalization requirements like aarch64 and wasm.	2024-07-11 15:58:43 -04:00

1 2 3 4 5 ...

297 Commits