llvm-project

Author	SHA1	Message	Date
yonghong-song	3e05ab6322	[ThinLTO] Reduce the number of renaming due to promotions (#183793 ) Currently for thin-lto, the imported static global values (functions, variables, etc) will be promoted/renamed from e.g., foo() to foo.llvm.(). Such a renaming caused difficulties in live patching since function name is changed ([1]). It is possible that some global value names have to be promoted to avoid name collision and linker failure. But in practice, majority of name promotions can be avoided. In [2], the suggestion is that thin-lto pre-link decides whether a particular global value needs name promotion or not. If yes, later on in thinBackend() the name will be promoted. I compiled a particular linux kernel version (latest bpf-next tree) and found 1216 global values with suffix .llvm.. With this patch, the number of promoted functions is 2, 98% reduction from the original kernel build. If some native objects are not participating with LTO, name promotions have to be done to avoid potential linker issues. So the current implementation cannot be on by default. But in certain cases, e.g., linux kernel build, people can enable lld flag --lto-whole-program-visibility to reduce the number of functions like foo.llvm.(). For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a few other rare cases, reducing the number of renaming due to promotion, is not implemented as lld flag '-lto-whole-program-visibility' is not supported in ThinLTOCodeGenerator.cpp for now. In summary, this pull request only supports llvm-lto2 style workflow. The feature is off by default. To enable the future, lld flag '-lto-whole-program-visibility' and llvm flag '-always-rename-promoted-locals=false' are needed. The link [3] has more context for the pull request discussions. [1] https://lpc.events/event/19/contributions/2212 [2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400 [3] https://github.com/llvm/llvm-project/pull/178587	2026-02-28 12:44:25 -08:00
yonghong-song	cd50a3074b	Revert "[ThinLTO] Reduce the number of renaming due to promotions (#178587 )" (#183782 ) There is a conflict with existing code. See https://github.com/llvm/llvm-project/pull/178587 Revert and resolve the conflict and then will submit later.	2026-02-27 10:04:30 -08:00
yonghong-song	975dba2863	[ThinLTO] Reduce the number of renaming due to promotions (#178587 ) Currently for thin-lto, the imported static global values (functions, variables, etc) will be promoted/renamed from e.g., foo() to foo.llvm.<hash>(). Such a renaming caused difficulties in live patching since function name is changed ([1]). It is possible that some global value names have to be promoted to avoid name collision and linker failure. But in practice, majority of name promotions can be avoided. In [2], the suggestion is that thin-lto pre-link decides whether a particular global value needs name promotion or not. If yes, later on in thinBackend() the name will be promoted. I compiled a particular linux kernel version (latest bpf-next tree) and found 1216 global values with suffix .llvm.<hash>. With this patch, the number of promoted functions is 2, 98% reduction from the original kernel build. If some native objects are not participating with LTO, name promotions have to be done to avoid potential linker issues. So the current implementation cannot be on by default. But in certain cases, e.g., linux kernel build, people can enable lld flag --lto-whole-program-visibility to reduce the number of functions like foo.llvm.<hash>(). For ThinLTOCodeGenerator.cpp which is used by llvm-lto tool and a few other rare cases, reducing the number of renaming due to promotion, is not implemented as lld flag '-lto-whole-program-visibility' is not supported in ThinLTOCodeGenerator.cpp for now. In summary, this pull request only supports llvm-lto2 style workflow. [1] https://lpc.events/event/19/contributions/2212 [2] https://discourse.llvm.org/t/rfc-avoid-functions-like-foo-llvm-for-kernel-live-patch/89400	2026-02-27 09:09:54 -08:00
Peter Collingbourne	943504eb08	IR: Add prefalign attribute for function definitions. The prefalign attribute determines the function's preferred alignment. By default, the function's preferred alignment is set in a target-specific way, but it may be overridden with this attribute. The backend logic will be added in followup patches. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019 Reviewers: efriedma-quic, nikic, arsenm Pull Request: https://github.com/llvm/llvm-project/pull/155527	2026-02-20 10:54:01 -08:00
Sam Elliott	0d08cb0e70	[outliners] Turn nooutline into an Enum Attribute (#163665 ) This change turns the `"nooutline"` attribute into an enum attribute called `nooutline`, and adds an auto-upgrader for bitcode to make the same change to existing IR. This IR attribute disables both the Machine Outliner (enabled at Oz for some targets), and the IR Outliner (disabled by default).	2026-02-10 21:44:17 -08:00
Matt Arsenault	2502e3b7ba	IR: Promote "denormal-fp-math" to a first class attribute (#174293 ) Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first class denormal_fpenv attribute. Previously the query for the effective denormal mode involved two string attribute queries with parsing. I'm introducing more uses of this, so it makes sense to convert this to a more efficient encoding. The old representation was also awkward since it was split across two separate attributes. The new encoding just stores the default and float modes as bitfields, largely avoiding the need to consider if the other mode is set. The syntax in the common cases looks like this: `denormal_fpenv(preservesign,preservesign)` `denormal_fpenv(float: preservesign,preservesign)` `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)` I wasn't sure about reusing the float type name instead of adding a new keyword. It's parsed as a type but only accepts float. I'm also debating switching the name to subnormal to match the current preferred IEEE terminology (also used by nofpclass and other contexts). This has a behavior change when using the command flag debug options to set the denormal mode. The behavior of the flag ignored functions with an explicit attribute set, per the default and f32 version. Now that these are one attribute, the flag logic can't distinguish which of the two components were explicitly set on the function. Only one test appeared to rely on this behavior, so I just avoided using the flags in it. This also does not perform all the code cleanups this enables. In particular the attributor handling could be cleaned up. I also guessed at how to support this in MLIR. I followed MemoryEffects as a reference; it appears bitfields are expanded into arguments to attributes, so the representation there is a bit uglier with the 2 2-element fields flattened into 4 arguments.	2026-02-05 13:31:26 +00:00
Vladislav Dzhidzhoev	b9cecee3fb	Reland "[DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)" (#165032 ) This is an attempt to merge https://reviews.llvm.org/D144006 with LTO fix. The last merge attempt was https://github.com/llvm/llvm-project/pull/75385. The issue with it was investigated in https://github.com/llvm/llvm-project/pull/75385#issuecomment-2386684121. The problem happens when 1. Several modules are being linked. 2. There are several DISubprograms that initially belong to different modules but represent the same source code function (for example, a function included from the same source code file). 3. Some of such DISubprograms survive IR linking. It may happen if one of them is inlined somewhere or if the functions that have these DISubprograms attached have internal linkage. 4. Each of these DISubprograms has a local type that corresponds to the same source code type. These types are initially from different modules, but have the same ODR identifier. If the same (in the sense of ODR identifier/ODR uniquing rules) local type is present in two modules, and these modules are linked together, the type gets uniqued. A DIType, that happens to be loaded first, survives linking, and the references on other types with the same ODR identifier from the modules loaded later are replaced with the references on the DIType loaded first. Since defintion subprograms, in scope of which these types are located, are not deduplicated, the linker output may contain multiple DISubprogram's having the same (uniqued) type in their retainedNodes lists. Further compilation of such modules causes crashes. To tackle that, * previous solution to handle LTO linking with local types in retainedNodes is removed (cloneLocalTypes() function), * for each loaded distinct (definition) DISubprogram, its retainedNodes list is scanned after loading, and DITypes with a scope of another subprogram are removed. If something from a Function corresponding to the DISubprogram references uniqued type, we rely on cross-CU links. Additionally: * a check is added to Verifier to report about local types located in a wrong retainedNodes list, Original commit message follows. --------- RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544 Similar to imported declarations, the patch tracks function-local types in DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with the aforementioned metadata change and provided a support of function-local types scoped within a lexical block. The patch assumes that DICompileUnit's 'enums field' no longer tracks local types and DwarfDebug would assert if any locally-scoped types get placed there. Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com> Co-authored-by: Jeremy Morse <jeremy.morse@sony.com>	2026-02-04 00:34:52 +01:00
Teresa Johnson	b30971c4bb	[ThinLTO] Remove unused relative block frequency support (#177215 ) This removes most of the handling of the relative block frequency support added in 2018 in c73cec84c99e5a63dca961fef67998a677c53a3c, which was disabled by default and never utilized in the thin link as expected. Support for reading old Bitcode containing the record is maintained as required for backwards compatibility requirements, as is the support for parsing old LLVM assembly containing that information. Tests ensure that this backwards compatibility is maintained. This came up in the context of redundant BFI/DT computations which existed largely for the purpose of computing this information and are being addressed in PR176646.	2026-01-21 11:39:57 -08:00
Aiden Grossman	e2d7cd685d	[IR] Make dead_on_return attribute optionally sized This patch makes the dead_on_return parameter attribute optionally require a number of bytes to be passed in to specify the number of bytes known to be dead upon function return/unwind. This is aimed at enabling annotating the this pointer in C++ destructors with dead_on_return in clang. We need this to handle cases like the following: ``` struct X { int n; ~X() { this[n].n = 0; } }; void f() { X xs[] = {42, -1}; } ``` Where we only certain that sizeof(X) bytes are dead upon return of ~X. Otherwise DSE would be able to eliminate the store in ~X which would not be correct. This patch only does the wiring within IR. Future patches will make clang emit correct sizing information and update DSE to only delete stores to objects marked dead_on_return that are provably in bounds of the number of bytes specified to be dead_on_return. Reviewers: nikic, alinas, antoniofrighetto Pull Request: https://github.com/llvm/llvm-project/pull/171712	2026-01-21 08:22:05 -08:00
Luke Lau	cee36b23cc	[IR] Allow non-constant offsets in @llvm.vector.splice.{left,right} (#174693 ) Following on from #170796, this PR implements the second part of https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974 by allowing non-constant offsets in the vector splice intrinsics. Previously @llvm.vector.splice had a restriction enforced by the verifier that the offset had to be known to be within the range of the vector at compile time. Because we can't enforce this with non-constant offsets, it's been relaxed so that offsets that would slide the vector out of bounds return a poison value, similar to insertelement/extractelement. @llvm.vector.splice.left also previously only allowed offsets within the range 0 <= Offset < N, but this has been relaxed to 0 <= Offset <= N so that it's consistent with @llvm.vector.splice.right. In lieu of the verifier checks that were removed, InstSimplify has been taught to fold splices to poison when the offset is out of bounds. The cost model isn't implemented in this PR, and just returns invalid for any non-constant offsets for now. I think the correct way to cost these non-constant offets isn't through getShuffleCost because they can't handle variable masks, but instead just through getIntrinsicInstCost.	2026-01-21 10:58:40 +00:00
Matt Arsenault	0d4a35d560	IR: Remove llvm.convert.to.fp16 and llvm.convert.from.fp16 intrinsics (#174484 ) These are long overdue for removal. These were originally a hack to support loading half values before there was any / decent support for the half type through the backend. There's no reason to continue supporting these, they're equivalent to fpext/fptrunc with a bitcast. SelectionDAG stopped translating these directly, and used the bitcast + fp cast since f7a02c17628e825, so there's been no reason to use these since 2014.	2026-01-21 09:50:28 +00:00
Nikita Popov	af7c10618b	[BitcodeReader] Improve error messages Avoid using "Invalid record" for all errors. At least mention what kind of record it is.	2026-01-19 14:28:40 +01:00
Shilei Tian	5a63367b15	Reapply "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674 ) (#174697 ) This reverts commit 0b2f3cfb72a76fa90f3ec2a234caabe0d0712590.	2026-01-07 06:12:19 +00:00
dyung	0b2f3cfb72	Revert "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674 ) Reverts llvm/llvm-project#174310 This change is causing 2 cross-project-test failures on https://lab.llvm.org/buildbot/#/builders/174/builds/29695	2026-01-07 01:18:23 +00:00
Shilei Tian	ccca3b8c67	[AMDGPU] Rework the clamp support for WMMA instructions (#174310 ) Fixes #166989.	2026-01-06 15:46:40 -05:00
Luke Lau	ad4bfac732	[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796 ) This PR implements the first change outlined in https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel In order to allow non-immediate offsets in the llvm.vector.splice intrinsic, we need to separate out the "shift left" and "shift right" modes into two separate intrinsics, which were previously determined by whether or not the offset is positive or negative. The description in the LangRef has also been reworded in terms of sliding elements left or right and extracting either the upper or lower half as opposed to extracting from a certain index, which brings it inline with the definition of `llvm.fshr.`/`llvm.fshl.`. This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into their new equivalent one based on their offset, so existing uses of vector.splice should still work. Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced in this PR to keep the diff small and kick the tyres on the AutoUpgrader a bit. I planned to do this in a follow up NFC but can include it in this PR if reviewers prefer. Similarly the shuffle costing kind `SK_Splice` has just been kept the same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight` later.	2026-01-06 15:41:26 +08:00
Shilei Tian	c97de4387b	Revert "[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069 )" (#174303 ) This reverts commit 2c376ffeca490a5732e4fd6e98e5351fcf6d692a because it breaks assembler. ``` $ llvm-mc -triple=amdgcn -mcpu=gfx1250 -show-encoding <<< "v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] matrix_b_reuse" v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] clamp ; encoding: [0x10,0x80,0x72,0xcc,0x00,0x11,0x42,0x1c] ``` We have a fundamental issue in the clamp support in VOP3P instructions, which will need more changes.	2026-01-04 02:13:21 +00:00
Muhammad Abdul	2c376ffeca	[AMDGPU] add clamp immediate operand to WMMA iu8 intrinsic (#171069 ) Fixes #166989 - Adds a clamp immediate operand to the AMDGPU WMMA iu8 intrinsic and threads it through LLVM IR, MIR lowering, Clang builtins/tests, and MLIR ROCDL dialect so all layers agree on the new operand - Updates AMDGPUWmmaIntrinsicModsAB so the clamp attribute is emitted, teaches VOP3P encoding to accept the immediate, and adjusts Clang codegen/builtin headers plus MLIR op definitions and tests to match - Documents what the WMMA clamp operand do - Implement bitcode AutoUpgrade for source compatibility on WMMA IU8 Intrinsic op Possible future enhancements: - infer clamping as an optimization fold based on the use context --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-12-27 12:51:29 -05:00
Teresa Johnson	e3c621c50b	[ThinLTO][MemProf] Add option to override max ICP with larger number (#171652 ) Adds an option -module-summary-max-indirect-edges, and wiring into the ICP logic that collects promotion candidates from VP metadata, to support a larger number of promotion candidates for use in building the ThinLTO summary. Also use this in the MemProf ThinLTO backend handling where we perform memprof ICP during cloning. The new option, essentially off by default, can be used to override the value of -icp-max-prom, which is checked internally in ICP, with a larger max value when collecting candidates from the VP metadata. For MemProf in particular, where we synthesize new VP metadata targets from allocation contexts, which may not be all that frequent, we need to be able to include a larger set of these targets in the summary in order to correctly handle indirect calls in the contexts. Otherwise we will not set up the callsite graph edges correctly.	2025-12-15 10:16:06 -08:00
anjenner	27651133e2	AMDGPU: Drop and upgrade llvm.amdgcn.atomic.csub/cond.sub to atomicrmw (#105553 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2025-12-09 23:13:33 +00:00
Paul Walker	b5a3b8b704	[LLVM][SVE] Remove aarch64.sve.rev intrinsic, using vector.reverse instead. (#169654 )	2025-11-28 11:59:34 +00:00
Paul Walker	8401a8d0be	[NFC][LLVM] Add bitcode tests for llvm.aarch64.sve.rev	2025-11-27 10:42:29 +00:00
Peter Collingbourne	d2379effe9	Add deactivation symbol operand to ConstantPtrAuth. Deactivation symbol operands are supported in the code generator by building on the previously added support for IRELATIVE relocations. Reviewers: ojhunt, fmayer, ahmedbougacha, nikic, efriedma-quic Reviewed By: fmayer Pull Request: https://github.com/llvm/llvm-project/pull/133537	2025-11-26 12:39:40 -08:00
Peter Collingbourne	6227eb90da	Add IR and codegen support for deactivation symbols. Deactivation symbols are a mechanism for allowing object files to disable specific instructions in other object files at link time. The initial use case is for pointer field protection. For more information, see the RFC: https://discourse.llvm.org/t/rfc-deactivation-symbols/85556 Reviewers: ojhunt, nikic, fmayer, arsenm, ahmedbougacha Reviewed By: fmayer Pull Request: https://github.com/llvm/llvm-project/pull/133536	2025-11-26 12:37:09 -08:00
CarolineConcatto	200793ac21	Extend MemoryEffects to Support Target-Specific Memory Locations (#148650 ) This patch introduces preliminary support for additional memory locations. They are: target_mem0 and target_mem1 and they model memory locations that cannot be represented with existing memory locations. It was a solution suggested in : https://discourse.llvm.org/t/rfc-improving-fpmr-handling-for-fp8-intrinsics-in-llvm/86868/6 Currently, these locations are not yet target-specific. The goal is to enable the compiler to express read/write effects on these resources.	2025-11-18 11:10:58 +00:00
Jay Foad	f037f41350	[IR] Add new function attribute nocreateundeforpoison (#164809 ) Also add a corresponding intrinsic property that can be used to mark intrinsics that do not introduce poison, for example simple arithmetic intrinsics that propagate poison just like a simple arithmetic instruction. As a smoke test this patch adds the new property to llvm.amdgcn.fmul.legacy.	2025-11-04 12:00:44 +00:00
Orlando Cazalet-Hyams	aa5fe56db4	[DebugInfo] Add dataSize to DIBasicType to add DW_AT_bit_size to _BitInt types (#164372 ) DW_TAG_base_type DIEs are permitted to have both byte_size and bit_size attributes "If the value of an object of the given type does not fully occupy the storage described by a byte size attribute" * Add DataSizeInBits to DIBasicType (`DIBasicType(... dataSize: n ...)` in IR). * Change Clang to add DataSizeInBits to _BitInt type metadata. * Change LLVM to add DW_AT_bit_size to base_type DIEs that have non-zero DataSizeInBits. TODO: Do we need to emit DW_AT_data_bit_offset for big endian targets? See discussion on the PR. Fixes [#61952](https://github.com/llvm/llvm-project/issues/61952) --------- Co-authored-by: David Stenberg <david.stenberg@ericsson.com>	2025-10-29 15:23:46 +00:00
Michael Buch	49f918d4c3	[llvm][Bitcode][ObjC] Fix order of setter/getter argument to DIObjCProperty constructor (#165421 ) Depends on: * https://github.com/llvm/llvm-project/pull/165401 We weren't testing `DIObjCProperty` roundtripping. So this was never caught. The consequence of this is that the `setter:` would have the getter name and `getter:` would have the setter name.	2025-10-29 12:14:56 +00:00
paperchalice	4a95cd14b3	[test][Bitcode] Remove unsafe-fp-math uses (NFC) (#164743 ) Post cleanup for #164534.	2025-10-23 16:38:22 +08:00
Teresa Johnson	eb74d8e03c	[ThinLTO] Add index flag for internalization/promotion status (#164530 ) Add an index-wide flag indicating whether index-based internalization and promotion have completed. This will be used in a follow on change.	2025-10-22 07:30:43 -07:00
Daniel Kiss	048070ba6f	[ARM][AArch64] BTI,GCS,PAC Module flag update. (#86212 ) Module flag is used to indicate the feature to be propagated to the function. As now the frontend emits all attributes accordingly let's help the auto upgrade to only do work when old and new bitcodes are merged. Depends on #82819 and #86031	2025-10-22 09:29:06 +02:00
Nikita Popov	573ca36753	[IR] Replace alignment argument with attribute on masked intrinsics (#163802 ) The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).	2025-10-20 08:50:09 +00:00
Michael Buch	cf1cdde24e	[llvm][DebugInfo] Add 'sourceLanguageVersion' field support to DICompileUnit (#162632 ) Depends on: * https://github.com/llvm/llvm-project/pull/162445 In preparation to emit DWARFv6's `DW_AT_language_version`.	2025-10-15 16:52:45 +01:00
Michael Buch	c32753a77a	[llvm][DebugInfo] Add 'sourceLanguageName' field support to DICompileUnit (#162445 ) Depends on: * https://github.com/llvm/llvm-project/pull/162255 * https://github.com/llvm/llvm-project/pull/162434 Part of a patch series to support the DWARFv6 `DW_AT_language_name`/`DW_AT_language_version` attributes.	2025-10-10 09:54:04 +01:00
Marco Elver	224873d7ac	[AllocToken] Introduce sanitize_alloc_token attribute and alloc_token metadata (#160131 ) In preparation of adding the "AllocToken" pass, add the pre-requisite `sanitize_alloc_token` function attribute and `alloc_token` metadata. --- This change is part of the following series: 1. https://github.com/llvm/llvm-project/pull/160131 2. https://github.com/llvm/llvm-project/pull/156838 3. https://github.com/llvm/llvm-project/pull/162098 4. https://github.com/llvm/llvm-project/pull/162099 5. https://github.com/llvm/llvm-project/pull/156839 6. https://github.com/llvm/llvm-project/pull/156840 7. https://github.com/llvm/llvm-project/pull/156841 8. https://github.com/llvm/llvm-project/pull/156842	2025-10-07 12:51:42 +02:00
Nikita Popov	f31bc666f4	[IR] Handle addrspacecast in findBaseObject() (#162076 ) Make findBaseObject() look through addrspacecast, so that getAliaseeObject() works with an aliasee that uses and addrspacecast. This fixes a crash during module summary index emission. Fixes https://github.com/llvm/llvm-project/issues/161646.	2025-10-06 16:18:12 +02:00
Tom Tromey	296fddc89e	Allow DW_OP_rot, DW_OP_neg, and DW_OP_abs in DIExpression (#160757 ) The Ada front end can emit somewhat complicated DWARF expressions for the offset of a field. While working in this area I found that I needed DW_OP_rot (to implement a branch-free computation -- it looked more difficult to add support for branching); and DW_OP_neg and DW_OP_abs (just basic functionality).	2025-10-03 14:36:17 +01:00
Florian Hahn	4d4cb757f9	[LLVMContext] Add OB_align assume bundle op ID. (#158078 ) Assume operand bundles are emitted in a few more places now, including used in various places in libc++. Add a dedicated ID for them. PR: https://github.com/llvm/llvm-project/pull/158078	2025-09-24 16:03:46 +00:00
Sander de Smalen	17e008db17	[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637 ) The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.	2025-09-17 11:44:47 +01:00
Antonio Frighetto	370607065d	[llvm] Regenerate test checks including TBAA semantics (NFC) Tests exercizing TBAA metadata (both purposefully and not), and previously generated via UTC, have been regenerated and updated to version 6.	2025-09-12 20:01:17 +02:00
Alexander Richardson	3a4b351ba1	[IR] Introduce the `ptrtoaddr` instruction This introduces a new `ptrtoaddr` instruction which is similar to `ptrtoint` but has two differences: 1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance 2) `ptrtoaddr` only extracts (and then extends/truncates) the low index-width bits of the pointer For most architectures, difference 2) does not matter since index (address) width and pointer representation width are the same, but this does make a difference for architectures that have pointers that aren't just plain integer addresses such as AMDGPU fat pointers or CHERI capabilities. This commit introduces textual and bitcode IR support as well as basic code generation, but optimization passes do not handle the new instruction yet so it may result in worse code than using ptrtoint. Follow-up changes will update capture tracking, etc. for the new instruction. RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54 Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/139357	2025-08-08 10:12:39 -07:00
Diana Picus	20d8398825	[AMDGPU] ISel & PEI for whole wave functions (#145858 ) Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-21 10:39:09 +02:00
Jeremy Morse	51f4e2cda2	[Bitcode][NFC] Add abbrev for FUNC_CODE_DEBUG_LOC (#147211 ) DILocations that are not attached to instructions are encoded using METADATA_LOCATION records which have an abbrev. DILocations attached to instructions are interleaved with instruction records as FUNC_CODE_DEBUG_LOC records, which do not have an abbrev (and FUNC_CODE_DEBUG_LOC_AGAIN which have no operands). Add a new FUNCTION_BLOCK abbrev FUNCTION_DEBUG_LOC_ABBREV for FUNC_CODE_DEBUG_LOC records. This reduces the bc file size by up to 7% in CTMark, with many between 2-4% smaller. [per-file file size compile-time-tracker](https://llvm-compile-time-tracker.com/compare.php?from=75cf826849713c00829cdf657e330e24c1a2fd03&to=1e268ebd0a581016660d9d7e942495c1be041f7d&stat=size-file&details=on) (go to stage1-ReleaseLTO-g). This optimisation is motivated by #144102, which adds the new Key Instructions fields to bitcode records. The combined patches still overall look to be a slight improvement over the base. (Originally reviewed in PR #146497) Co-authored-by: Orlando Cazalet-Hyams <orlando.hyams@sony.com>	2025-07-06 22:30:31 +01:00
Nikita Popov	0a656d8e57	[Bitcode] Add abbreviations for additional instructions (#146825 ) Add abbreviations for icmp/fcmp, store and br, which are the most common instructions that don't have abbreviations yet. This requires increasing the abbreviation size to 5 bits. This gives about 3-5% bitcode size reductions for the clang build.	2025-07-03 14:28:32 +02:00
Antonio Frighetto	f1cc0b607b	[IR] Introduce `dead_on_return` attribute Add `dead_on_return` attribute, which is meant to be taken advantage by the frontend, and states that the memory pointed to by the argument is dead upon function return. As with `byval`, it is supposed to be used for passing aggregates by value. The difference lies in the ABI: `byval` implies that the pointer is explicitly passed as argument to the callee (during codegen the copy is emitted as per byval contract), whereas a `dead_on_return`-marked argument implies that the copy already exists in the IR, is located at a specific stack offset within the caller, and this memory will not be read further by the caller upon callee return – or otherwise poison, if read before being written. RFC: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.	2025-07-02 09:29:36 +02:00
Timothy Werquin	a8e486bfc4	[Bitcode] Fix constexpr expansion creating invalid PHIs (#141560 ) Fixes errors about duplicate PHI edges when the input had duplicates with constexprs in them. The constexpr translation makes new basic blocks, causing the verifier to complain about duplicate entries in PHI nodes.	2025-05-27 15:51:48 +02:00
Jonathan Thackray	6e49f73825	Reland [llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#137701 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-30 22:06:37 +01:00
Jonathan Thackray	7ee0097b48	Revert "[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions" (#137657 ) Reverts llvm/llvm-project#136759 due to bad interaction with c792b25e4	2025-04-28 16:53:36 +01:00
Jonathan Thackray	ba420d8122	[llvm] Add support for llvm IR atomicrmw fminimum/fmaximum instructions (#136759 ) This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum` instructions. These mirror the `llvm.maximum.` and `llvm.minimum.` instructions, but are atomic and use IEEE754 2019 handling for NaNs, which is different to `fmax` and `fmin`. See: https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic for more details. Future changes will allow this LLVM IR to be lowered to specialised assembler instructions on suitable targets, such as AArch64.	2025-04-28 15:31:44 +01:00
Matt Arsenault	c5ac63e4fc	ThinLTO: Add flag to print uselistorder in bitcode writer pass (#133230 ) This is needed in llvm-reduce to avoid perturbing the uselistorder in intermediate steps. Really llvm-reduce wants pure serialization with no dependency on the pass manager. There are other optimizations mixed in to the serialization here depending on metadata in the module, which is also bad. Part of #63621	2025-04-14 20:34:02 +02:00

1 2 3 4 5 ...

985 Commits