llvm-project

Author	SHA1	Message	Date
David Green	5db67e1c86	[GlobalISel] Add a fadd 0.0 combine with nsz (#153748 ) This is surprisingly helpful, coming up a lot from fadd reductions.	2025-08-21 10:19:39 +01:00
Benjamin Maxwell	810ea69edd	[LiveRegUnits] Exclude runtime defined liveins when computing liveouts (#154325 ) These liveins are not defined by predecessors, so should not be considered as liveouts in predecessor blocks. This resolves: - https://github.com/llvm/llvm-project/pull/149062#discussion_r2285072001 - https://github.com/llvm/llvm-project/pull/153417#issuecomment-3199972351	2025-08-21 09:06:32 +01:00
Dominik Adamski	b69fd34e76	[Offload] Add oneInterationPerThread param to loop device RTL (#151959 ) Currently, Flang can generate no-loop kernels for all OpenMP target kernels in the program if the flags -fopenmp-assume-teams-oversubscription or -fopenmp-assume-threads-oversubscription are set. If we add an additional parameter, we can choose in the future which OpenMP kernels should be generated as no-loop kernels. This PR doesn't modify current behavior of oversubscription flags. RFC for no-loop kernels: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517	2025-08-21 09:03:56 +02:00
Steven Wu	deab049b5c	[CAS] Add ActionCache to LLVMCAS Library (#114097 ) ActionCache is used to store a mapping from CASID to CASID. The current implementation of the ActionCache can only be used to associate the key/value from the same hash context. ActionCache has two operations: `put` to store the key/value and `get` to lookup the key/value mapping. ActionCache uses the same TrieRawHashMap data structure to store the mapping, where is CASID of the key is the hash to index the map. While CASIDs for key/value are often associcate with actual CAS ObjectStore, it doesn't provide the guarantee of the existence of such object in any ObjectStore.	2025-08-20 14:42:44 -07:00
David Majnemer	0a7eabcc56	Reapply "[APFloat] Fix getExactInverse for DoubleAPFloat" The previous implementation of getExactInverse used the following check to identify powers of two: // Check that the number is a power of two by making sure that only the // integer bit is set in the significand. if (significandLSB() != semantics->precision - 1) return false; This condition verifies that the only set bit in the significand is the integer bit, which is correct for normal numbers. However, this logic is not correct for subnormal values. APFloat represents subnormal numbers by shifting the significand right while holding the exponent at its minimum value. For a power of two in the subnormal range, its single set bit will therefore be at a position lower than precision - 1. The original check would consequently fail, causing the function to determine that these numbers do not have an exact multiplicative inverse. The new logic calculated this correctly but it seems that test/CodeGen/Thumb2/mve-vcvt-fixed-to-float.ll expected the old behavior. Seeing as how getExactInverse does not have tests or documentation, we conservatively maintain (and document) this behavior. This reverts commit 47e62e846beb267aad50eb9195dfd855e160483e.	2025-08-20 14:02:36 -07:00
Philip Reames	e6b4a21849	[IR] Add utilities for manipulating length of MemIntrinsic [nfc] (#153856 ) Goal is simply to reduce direct usage of getLength and setLength so that if we end up moving memset.pattern (whose length is in elements) there are fewer places to audit.	2025-08-20 13:50:11 -07:00
Florian Hahn	4e6c88be7c	[TTI] Remove Args argument from getOperandsScalarizationOverhead (NFC). (#154126 ) Remove the ArrayRef<const Value> Args operand from getOperandsScalarizationOverhead and require that the callers de-duplicate arguments and filter constant operands. Removing the Value based Args argument enables callers where no Value * operands are available to use the function in a follow-up: computing the scalarization cost directly for a VPlan recipe. It also allows more accurate cost-estimates in the future: for example, when vectorizing a loop, we could also skip operands that are live-ins, as those also do not require scalarization. PR: https://github.com/llvm/llvm-project/pull/154126	2025-08-20 21:09:08 +01:00
Finn Plummer	15babbaf5d	[DirectX] Add boilerplate integration of `objcopy` for `DXContainerObjectFile` (#153079 ) This pr implements the boiler plate required to use `llvm-objcopy` for `DXContainer` object files. It defines a minimal structure `object` to represent the `DXContainer` header and the following parts. This structure is a simple representation of the object data to allow for simple modifications at the granularity of each part. It follows similarily to how the respective `object`s are defined for `ELF`, `wasm`, `XCOFF`, etc. This is the first step to implement https://github.com/llvm/llvm-project/issues/150275 and https://github.com/llvm/llvm-project/issues/150277 as compiler actions that invoke `llvm-objcopy` for functionality.	2025-08-20 10:58:42 -07:00
Krzysztof Parzyszek	15cb06109d	[Frontend][OpenMP] Allow multiple occurrences of DYN_GROUPPRIVATE (#154549 ) It was mistakenly placed in "allowOnceClauses" on the constructs that allow it.	2025-08-20 10:56:30 -05:00
Steven Wu	2cfba9678d	[FileSystem] Allow exclusive file lock (#114098 ) Add parameter to file lock API to allow exclusive file lock. Both Unix and Windows support lock the file exclusively for write for one process and LLVM OnDiskCAS uses exclusive file lock to coordinate CAS creation.	2025-08-20 08:32:18 -07:00
Mehdi Amini	8b2028ced6	Update log_level for LLVM_DEBUG and associated macros (#154525 ) During the review of #150855 we switched from 0 to 1 for the default log level used, but this macro wasn't updated.	2025-08-20 13:31:13 +00:00
Nikita Popov	99119a5a81	[OpenMPIRBuilder] Add missing LLVM_ABI annotations	2025-08-20 15:19:08 +02:00
Nikita Popov	822496db7f	[SampleContextTracker] Add missing LLVM_ABI annotations	2025-08-20 15:19:08 +02:00
Nikita Popov	7eb5031e2c	[GlobalDCE] Add missing LLVM_ABI annotation	2025-08-20 15:19:08 +02:00
Nikita Popov	6a99ad2975	[Debug] Add missing LLVM_ABI annotations	2025-08-20 15:19:08 +02:00
Zhaoxuan Jiang	2738828c0e	[Reland] [CGData] Lazy loading support for stable function map (#154491 ) This is an attempt to reland #151660 by including a missing STL header found by a buildbot failure. The stable function map could be huge for a large application. Fully loading it is slow and consumes a significant amount of memory, which is unnecessary and drastically slows down compilation especially for non-LTO and distributed-ThinLTO setups. This patch introduces an opt-in lazy loading support for the stable function map. The detailed changes are: - `StableFunctionMap` - The map now stores entries in an `EntryStorage` struct, which includes offsets for serialized entries and a `std::once_flag` for thread-safe lazy loading. - The underlying map type is changed from `DenseMap` to `std::unordered_map` for compatibility with `std::once_flag`. - `contains()`, `size()` and `at()` are implemented to only load requested entries on demand. - Lazy Loading Mechanism - When reading indexed codegen data, if the newly-introduced `-indexed-codegen-data-lazy-loading` flag is set, the stable function map is not fully deserialized up front. The binary format for the stable function map now includes offsets and sizes to support lazy loading. - The safety of lazy loading is guarded by the once flag per function hash. This guarantees that even in a multi-threaded environment, the deserialization for a given function hash will happen exactly once. The first thread to request it performs the load, and subsequent threads will wait for it to complete before using the data. For single-threaded builds, the overhead is negligible (a single check on the once flag). For multi-threaded scenarios, users can omit the flag to retain the previous eager-loading behavior.	2025-08-20 06:15:04 -07:00
jyli0116	9df7ca1f0f	[GlobalISel] Legalize Saturated Truncate instructions and intrinsics (#154340 ) Adds legalization support for `G_TRUNC_SSAT_S`, `G_TRUNC_SSAT_S`, `G_TRUNC_USAT_U` instructions for GlobalISel.	2025-08-20 10:37:22 +01:00
Gang Chen	ef68d1587d	[AMDGPU] upstream barrier count reporting part1 (#154409 )	2025-08-19 16:42:31 -07:00
Krzysztof Parzyszek	292faf6133	[Frontend][OpenMP] Add definition of groupprivate directive (#153799 ) This is the common point for clang and flang implementations.	2025-08-19 08:27:29 -05:00
Orlando Cazalet-Hyams	da45b6c71d	[RemoveDIs][NFC] Remove dbg intrinsic version of calculateFragmentIntersect (#153378 )	2025-08-19 13:44:25 +01:00
David Green	a7df02f83c	[InstCombine] Make strlen optimization more resilient to different gep types. (#153623 ) This makes the optimization in optimizeStringLength for strlen(gep @glob, %x) -> sub endof@glob, %x a little more resilient, and maybe a bit more correct for geps with non-array types.	2025-08-19 10:37:17 +01:00
Aditi Medhane	948abf1bf5	[PowerPC] Add BCDCOPYSIGN and BCDSETSIGN Instruction Support (#144874 ) Support the following BCD format conversion builtins for PowerPC. - `__builtin_bcdcopysign` – Conversion that returns the decimal value of the first parameter combined with the sign code of the second parameter. ` - `__builtin_bcdsetsign` – Conversion that sets the sign code of the input parameter in packed decimal format. > Note: This built-in function is valid only when all following conditions are met: > -qarch is set to utilize POWER9 technology. > The bcd.h file is included. ## Prototypes ```c vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char); vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char); ``` ## Usage Details `__builtin_bcdsetsign`: Returns the packed decimal value of the first parameter combined with the sign code. The sign code is set according to the following rules: - If the packed decimal value of the first parameter is positive, the following rules apply: - If the second parameter is 0, the sign code is set to 0xC. - If the second parameter is 1, the sign code is set to 0xF. - If the packed decimal value of the first parameter is negative, the sign code is set to 0xD. > notes: > The second parameter can only be 0 or 1. > You can determine whether a packed decimal value is positive or negative as follows: > - Packed decimal values with sign codes 0xA, 0xC, 0xE, or 0xF are interpreted as positive. > - Packed decimal values with sign codes 0xB or 0xD are interpreted as negative. --------- Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>	2025-08-19 14:47:27 +05:30
David Sherwood	13d8ba7dea	[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086 ) There are a couple of places in the loop vectoriser where we want to calculate the cost of extracting the last lane in a vector. However, we wrongly assume that asking for the cost of extracting lane (VF.getKnownMinValue() - 1) is an accurate representation of the cost of extracting the last lane. For SVE at least, this is non-trivial as it requires the use of whilelo and lastb instructions. To solve this problem I have added a new getReverseVectorInstrCost interface where the index is used in reverse from the end of the vector. Suppose a vector has a given ElementCount EC, the extracted/inserted lane would be EC - 1 - Index. For scalable vectors this index is unknown at compile time. I've added a AArch64 hook that better represents the cost, and also a RISCV hook that maintains compatibility with the behaviour prior to this PR. I've also taken the liberty of adding support in vplan for calculating the cost of VPInstruction::ExtractLastElement.	2025-08-19 09:31:37 +01:00
Matt Arsenault	19ebfa6d0b	RuntimeLibcalls: Move exception call config to tablegen (#151948 ) Also starts pruning out these calls if the exception model is forced to none. I worked backwards from the logic in addPassesToHandleExceptions and the pass content. There appears to be some tolerance for mixing and matching exception modes inside of a single module. As far as I can tell _Unwind_CallPersonality is only relevant for wasm, so just add it there. As usual, the arm64ec case makes things difficult and is missing test coverage. The set of calls in list form is necessary to use foreach for the duplication, but in every other context a dag is more convenient. You cannot use foreach over a dag, and I haven't found a way to flatten a dag into a list. This removes the last manual setLibcallImpl call in generic code.	2025-08-19 10:35:59 +09:00
Matt Arsenault	fe67267d19	MSP430: Move __mspabi_mpyll calling conv config to tablegen (#153988 ) There are several libcall choices for MUL_I64 which depend on the subtarget, but this is the base case. The manual custom ISelLowering is still overriding the decision until we have a way to control lowering choices, but we can still get the calling convention set for now.	2025-08-19 10:25:10 +09:00
Mehdi Amini	89abccc9a6	[MLIR] Update GreedyRewriter to use the LDBG() debug log mechanism (NFC) (#153961 ) Also improve a bit the LDBG() implementation	2025-08-18 21:05:34 +00:00
Krzysztof Parzyszek	8429f7faaa	[flang][OpenMP] Parsing support for DYN_GROUPPRIVATE (#153615 ) This does not perform semantic checks or lowering.	2025-08-18 13:35:02 -05:00
Tobias Stadler	8135b7c1ab	[LV] Emit all remarks for unvectorizable instructions (#153833 ) If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first. This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.	2025-08-18 18:04:53 +01:00
Damyan Pepper	cc49f3b3e1	[NFC][HLSL] Remove confusing enum aliases / duplicates (#153909 ) Remove: * DescriptorType enum - this almost exactly shadowed the ResourceClass enum * ClauseType aliased ResourceClass Although these were introduced to make the HLSL root signature handling code a bit cleaner, they were ultimately causing confusion as they appeared to be unique enums that needed to be converted between each other. Closes #153890	2025-08-18 08:58:33 -07:00
Alex MacLean	d12f58ff11	[NVVM] Add various intrinsic attrs, cleanup and consolidate td (#153436 ) - llvm.nvvm.reflect - Use a PureIntrinsic for (adding speculatable), this will be replaced by a constant prior to lowering so speculation is fine. - llvm.nvvm.tex.* - Add [IntrNoCallback, IntrNoFree, IntrWillReturn] - llvm.nvvm.suld.* - Add [IntrNoCallback, IntrNoFree] and [IntrWillReturn] when not using "clamp" mode - llvm.nvvm.sust.* - Add [IntrNoCallback, IntrNoFree, IntrWriteMem] and [IntrWillReturn] when not using "clamp" mode - llvm.nvvm.[suq\|txq\|istypep].* - Use DefaultAttrsIntrinsic - llvm.nvvm.read.ptx.sreg.* - Add [IntrNoFree, IntrWillReturn] to non-constant reads as well.	2025-08-18 08:33:23 -07:00
Kazu Hirata	07eb7b7692	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.	2025-08-18 07:01:29 -07:00
Benjamin Maxwell	81c06d198e	Reland "[AArch64][SME] Port all SME routines to RuntimeLibcalls" (#153417 ) This updates everywhere we emit/check an SME routines to use RuntimeLibcalls to get the function name and calling convention.	2025-08-18 14:53:40 +01:00
jofrn	e8e3e6e893	[LiveVariables] Mark use without implicit if defined at instr (#119446 ) LiveVariables will mark instructions with their implicit subregister uses. However, it will also mark the subregister as an implicit if its own definition is a subregister of it, i.e. `$r3 = OP val, implicit-def $r0_r1_r2_r3, ..., implicit $r2_r3`, even if it is otherwise unused, which defines $r3 on the same line it is used. This change ensures such uses are marked without implicit, i.e. `$r3 = OP val, implicit-def $r0_r1_r2_r3, ..., $r2_r3`. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-08-18 08:34:59 -04:00
Nikita Popov	238c3dcd0d	[CodeGen][Mips] Remove fp128 libcall list (#153798 ) Mips requires fp128 args/returns to be passed differently than i128. It handles this by inspecting the pre-legalization type. However, for soft float libcalls, the original type is currently not provided (it will look like a i128 call). To work around that, MIPS maintains a list of libcalls working on fp128. This patch removes that list by providing the original, pre-softening type to calling convention lowering. This is done by carrying additional information in CallLoweringInfo, as we unfortunately do need both types (we want the un-softened type for OrigTy, but we need the softened type for the actual register assignment etc.) This is in preparation for completely removing all the custom pre-analysis code in the Mips backend and replacing it with use of OrigTy.	2025-08-18 09:22:41 +02:00
Kazu Hirata	b6a62a496f	[ADT] Use range-based for loops in SetVector (NFC) (#154058 )	2025-08-17 23:46:43 -07:00
Jim Lin	9c02d66255	[LegalizeTypes][VP] Teach isVPBinaryOp to recognize vp.sadd/saddu/ssub/ssubu.sat (#154047 ) Those vp intrinsics also are vp binary operations. Similar to https://reviews.llvm.org/D135753.	2025-08-18 13:10:00 +08:00
Nadharm	83a1b40b16	[NFC] Fix unary minus operator on unsigned type warning (#153887 ) Fixes: `warning C4146: unary minus operator applied to unsigned type, result still unsigned`	2025-08-17 20:08:44 -07:00
Carl Ritson	97d5d483ec	[MsgPack] Add code for floating point assignment and writes (#153544 ) Allow assignment of float to DocType and support output of float in writeToBlob method. Expand tests coverage to various missing basic I/O operations. Co-authored-by: Xavi Zhang <Xavi.Zhang@amd.com>	2025-08-18 10:03:40 +09:00
Fangrui Song	34c7b7ccae	MCSymbol: Remove setUndefined The name is misleading, as setting Fragment to nullptr does not necessarily make it undefined - common and equated symbols have a nullptr fragment as well.	2025-08-17 15:57:27 -07:00
Fangrui Song	2cedb286b8	MCSymbol: Remove unused IsTarget parameter from declareCommon	2025-08-16 15:47:39 -07:00
Fangrui Song	aa96e20dce	MCSymbol: Remove AMDGPU-specific Kind::TargetCommon The SymContentsTargetCommon kind introduced by https://reviews.llvm.org/D61493 lackes significant and should be treated as a regular common symbol with a different section index. Update ELFObjectWriter to respect the specified section index. The new representation also works with Hexagon's SHN_HEXAGON_SCOMMON.	2025-08-16 15:39:33 -07:00
Fangrui Song	190778a8ba	MCSymbol: Rename SymContents to kind The names "SymbolContents" and "SymContents*" members are confusing. Rename to kind and Kind::XXX similar to lld/ELF/Symbols.h Rename SymContentsVariable to Kind::Equated as the former term is "equated symbol", not "variable".	2025-08-16 15:10:35 -07:00
Kazu Hirata	1c8da29f48	[ADT] Use small_buckets() in SmallPtrSetImpl::remove_if (NFC) (#153962 )	2025-08-16 13:15:36 -07:00
Fangrui Song	1893caa9bc	MCSymbol: Decrease the bitfield size of SymbolContents Follow-up to 57b0843f68f5f349c73d1bf54e321a1a6d1800bf The size of MCSymbol has been reduced to 24 bytes on 64-bit systems.	2025-08-16 10:43:05 -07:00
Mircea Trofin	c971c25544	[licm] don't drop `MD_prof` when dropping other metadata (#152420 ) Part of Issue #147390	2025-08-16 07:26:13 -07:00
Kazu Hirata	0ede7ace0d	[ADT] Use llvm::copy in SmallPtrSet.cpp (NFC) (#153930 ) This patch uses llvm::copy in combination with buckets() and small_buckets().	2025-08-16 06:47:18 -07:00
Mingjie Xu	a293573c4e	[SSAUpdater] Only iterate blocks modified by CheckIfPHIMatches() in RecordMatchingPHIs() (#153596 ) In https://github.com/llvm/llvm-project/pull/100281, we use `TaggedBlocks` to record blocks modified by `CheckIfPHIMatche()`, so do not need to clear every block in `BlockList` if `CheckIfPHIMatches()` match failed. If `CheckIfPHIMatches()` match succeed, we can reuse `TaggedBlocks` to only record matching PHIs for modified blocks, avoid checking every block in `BlockList` to see if `PHITag` is set.	2025-08-16 19:59:10 +08:00
Kazu Hirata	627f8018fe	[ADT] Rename NumNonEmpty to NumEntries in SmallPtrSet (NFC) (#153757 ) Without this patch, we use NumNonEmpty, which keeps track of the number of valid entries plus tombstones even though we have a separate variable to keep track of the number of tombstones. This patch simplifies the metadata. Specifically, it changes the name and semantics of the variable to NumEntries to keep track of the number of valid entries. The difference in semantics requires some code changes aside from mechanical replacements: - size() just returns NumEntries. - erase_imp() and remove_if() need to decrement NumEntries in the large mode. - insert_imp_big() increments NumEntries for successful insertions, regardless of whether a tombstone is being replaced with a valid entry. It also computes the number of non-tombstone empty slots as: CurArraySize - NumEntries - NumTombstones - Grow() no longer needs NumNonEmpty -= NumTombstones. Overall, the resulting code should look more intuitive and more consistent with DenseMapSet.	2025-08-15 21:22:37 -07:00
joaosaffran	37729d8ceb	[HLSL] Refactoring DXILABI.h to not depend on scope printer (#153840 ) This patch refactors DXILABI to remove the dependency on scope printer. Closes: #153827 --------- Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>	2025-08-15 21:33:44 -04:00
Matt Arsenault	3e5d8a1439	Reapply "RuntimeLibcalls: Generate table of libcall name lengths (#153… (#153864 ) This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2. Check if llvm-nm exists before building the benchmark.	2025-08-16 09:53:50 +09:00

1 2 3 4 5 ...

60265 Commits