llvm-project

Author	SHA1	Message	Date
Kazu Hirata	ff7b42c194	[memprof] Speed up llvm-profdata (#117446 ) CallStackRadixTreeBuilder::build takes the parameter MemProfFrameIndexes by value, involving copies: std::optional<const llvm::DenseMap<FrameIdTy, LinearFrameId>> MemProfFrameIndexes Then "build" makes another copy of MemProfFrameIndexe and passes it to encodeCallStack for every call stack, which is painfully slow. This patch changes the type to a pointer so that we don't have to make a copy every time we pass the argument. Without this patch, it takes 553 seconds to run "llvm-profdata merge" on a large MemProf raw profile. This patch shortenes that down to 67 seconds.	2024-11-24 21:08:54 -08:00
Teresa Johnson	776476c282	Reapply "[MemProf] Use radix tree for alloc contexts in bitcode summaries" (#117395 ) (#117404 ) This reverts commit fdb050a5024320ec29d2edf3f2bc686c3a84abaa, and restores ccb4702038900d82d1041ff610788740f5cef723, with a fix for build bot failures. Specifically, add ProfileData to the dependences of the BitWriter library, which was causing shared library builds of LLVM to fail. Reproduced the failure with a shared library build and confirmed this change fixes that build failure.	2024-11-22 16:18:30 -08:00
Teresa Johnson	fdb050a502	Revert "[MemProf] Use radix tree for alloc contexts in bitcode summaries" (#117395 ) Reverts llvm/llvm-project#117066 This is causing some build bot failures that need investigation.	2024-11-22 14:57:58 -08:00
Teresa Johnson	ccb4702038	[MemProf] Use radix tree for alloc contexts in bitcode summaries (#117066 ) Leverage the support added to represent allocation contexts in a more compact way via a radix tree in the indexed profile to similarly reduce sizes of the bitcode summaries. For a large target, this reduced the size of the per-module summaries by about 18% and in the distributed combined index files by 28%.	2024-11-22 14:49:55 -08:00
Mingming Liu	97b2903455	[NFCI][WPD]Use unique string saver to store type id (#106932 ) Currently, both [TypeIdMap](`67a1fdb014/llvm/include/llvm/IR/ModuleSummaryIndex.h (L1356)`) and [TypeIdCompatibleVtableMap](`67a1fdb014/llvm/include/llvm/IR/ModuleSummaryIndex.h (L1363)`) keep type-id as `std::string` in the combined index for LTO indexing analysis. With this change, index uses a unique-string-saver to own the string copies and two maps above can use string references to save some memory. This shows a 3% memory reduction (from 8.2GiB to 7.9GiB) in an internal binary with high indexing memory usage.	2024-11-20 23:44:18 -08:00
Alex Voicu	53a6a11e0d	[LLVM][NFC] Use `used`'s element type if available (#116804 ) When embedding, if `compiler.used` exists, we should re-use it's element type instead of blindly assuming it's an unqualified pointer.	2024-11-20 23:57:55 +00:00
Teresa Johnson	b35f40688e	[MemProf] Change the STACK_ID record to fixed width values (#116448 ) The stack ids are hashes that are close to 64 bits in size, so emitting as a pair of 32-bit fixed-width values is more efficient than a VBR. This reduced the summary bitcode size for a large target by about 1%. Bump the index version and ensure we can read the old format.	2024-11-18 15:16:48 -08:00
Teresa Johnson	9513f2fdf2	[MemProf] Print full context hash when reporting hinted bytes (#114465 ) Improve the information printed when -memprof-report-hinted-sizes is enabled. Now print the full context hash computed from the original profile, similar to what we do when reporting matching statistics. This will make it easier to correlate with the profile. Note that the full context hash must be computed at profile match time and saved in the metadata and summary, because we may trim the context during matching when it isn't needed for distinguishing hotness. Similarly, due to the context trimming, we may have more than one full context id and total size pair per MIB in the metadata and summary, which now get a list of these pairs. Remove the old aggregate size from the metadata and summary support. One other change from the prior support is that we no longer write the size information into the combined index for the LTO backends, which don't use this information, which reduces unnecessary bloat in distributed index files.	2024-11-15 08:24:44 -08:00
Augusto Noronha	67fb2686fb	[DebugInfo] Add a specification attribute to LLVM DebugInfo (#115362 ) Add a specification attribute to LLVM DebugInfo, which is analogous to DWARF's DW_AT_specification. According to the DWARF spec: "A debugging information entry that represents a declaration that completes another (earlier) non-defining declaration may have a DW_AT_specification attribute whose value is a reference to the debugging information entry representing the non-defining declaration." This patch allows types to be specifications of other types. This is used by Swift to represent generic types. For example, given this Swift program: ``` struct MyStruct<T> { let t: T } let variable = MyStruct<Int>(t: 43) ``` The Swift compiler emits (roughly) an unsubtituted type for MyStruct<T>: ``` DW_TAG_structure_type DW_AT_name ("MyStruct") // "$s1w8MyStructVyxGD" is a Swift mangled name roughly equivalent to // MyStruct<T> DW_AT_linkage_name ("$s1w8MyStructVyxGD") // other attributes here ``` And a specification for MyStruct<Int>: ``` DW_TAG_structure_type DW_AT_specification (<link to "MyStruct">) // "$s1w8MyStructVySiGD" is a Swift mangled name equivalent to // MyStruct<Int> DW_AT_linkage_name ("$s1w8MyStructVySiGD") DW_AT_byte_size (0x08) // other attributes here ```	2024-11-13 09:55:37 -08:00
Kazu Hirata	c784d321d9	[ThinLTO] Use heterogenous lookups with std::map (NFC) (#115812 ) Heterogenous lookups allow us to call find with StringRef, avoiding a temporary heap allocation of std::string.	2024-11-12 10:09:28 -08:00
Augusto Noronha	f6617d65e4	[DebugInfo] Add num_extra_inhabitants to debug info (#112590 ) An extra inhabitant is a bit pattern that does not represent a valid value for instances of a given type. The number of extra inhabitants is the number of those bit configurations. This is used by Swift to save space when composing types. For example, because Bool only needs 2 bit patterns to represent all of its values (true and false), an Optional<Bool> only occupies 1 byte in memory by using a bit configuration that is unused by Bool. Which bit patterns are unused are part of the ABI of the language. Since Swift generics are not monomorphized, by using dynamic libraries you can have generic types whose size, alignment, etc, are known only at runtime (which is why this feature is needed). This patch adds num_extra_inhabitants to LLVM-IR debug info and in DWARF as an Apple extension.	2024-11-06 15:48:04 -08:00
Jay Foad	4831e0aa88	[IR] Disallow recursive types (#114799 ) StructType::setBody is the only mechanism that can potentially create recursion in the type system. Add a runtime check that it is not actually used to create recursion. If the check fails, report an error from LLParser, BitcodeReader and IRLinker. In all other cases assert that the check succeeds. In future StructType::setBody will be removed in favor of specifying the body when the type is created, so any performance hit from this runtime check will be temporary.	2024-11-05 09:41:10 +00:00
davidtrevelyan	4102625380	[rtsan][llvm][NFC] Rename sanitize_realtime_unsafe attr to sanitize_realtime_blocking (#113155 ) # What This PR renames the newly-introduced llvm attribute `sanitize_realtime_unsafe` to `sanitize_realtime_blocking`. Likewise, sibling variables such as `SanitizeRealtimeUnsafe` are renamed to `SanitizeRealtimeBlocking` respectively. There are no other functional changes. # Why? - There are a number of problems that can cause a function to be real-time "unsafe", - we wish to communicate what problems rtsan detects and why they're unsafe, and - a generic "unsafe" attribute is, in our opinion, too broad a net - which may lead to future implementations that need extra contextual information passed through them in order to communicate meaningful reasons to users. - We want to avoid this situation and make the runtime library boundary API/ABI as simple as possible, and - we believe that restricting the scope of attributes to names like `sanitize_realtime_blocking` is an effective means of doing so. We also feel that the symmetry between `[[clang::blocking]]` and `sanitize_realtime_blocking` is easier to follow as a developer. # Concerns - I'm aware that the LLVM attribute `sanitize_realtime_unsafe` has been part of the tree for a few weeks now (introduced here: https://github.com/llvm/llvm-project/pull/106754). Given that it hasn't been released in version 20 yet, am I correct in considering this to not be a breaking change?	2024-10-26 13:06:11 +01:00
Serge Pavlov	95e5a999ab	[Bitcode] Get rid of compiler message (#113428 ) Insert explicit cast from an enumerator to unsigned int, because some compilers issue a warning on signed vs unsigned comparison, see: https://github.com/llvm/llvm-project/pull/110805#issuecomment-2411095723.	2024-10-23 22:40:02 +07:00
goldsteinn	c85611e858	[SimplifyLibCall][Attribute] Fix bug where we may keep `range` attr with incompatible type (#112649 ) In a variety of places we change the bitwidth of a parameter but don't update the attributes. The issue in this case is from the `range` attribute when inlining `__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an `i8`, and if the `i32` had a `range` attr assosiated it will cause an error. Fixes #112633	2024-10-17 10:32:55 -05:00
Nikita Popov	255a99c29f	[APInt] Fix APInt constructions where value does not fit bitwidth (NFCI) (#80309 ) This fixes all the places that hit the new assertion added in https://github.com/llvm/llvm-project/pull/106524 in tests. That is, cases where the value passed to the APInt constructor is not an N-bit signed/unsigned integer, where N is the bit width and signedness is determined by the isSigned flag. The fixes either set the correct value for isSigned, set the implicitTrunc flag, or perform more calculations inside APInt. Note that the assertion is currently still disabled by default, so this patch is mostly NFC.	2024-10-17 08:48:08 +02:00
elhewaty	9efb07f261	[IR] Add `samesign` flag to icmp instruction (#111419 ) Inspired by https://discourse.llvm.org/t/rfc-signedness-independent-icmps/81423	2024-10-15 17:11:25 +08:00
Tim Renouf	76007138f4	[LLVM] New NoDivergenceSource function attribute (#111832 ) A call to a function that has this attribute is not a source of divergence, as used by UniformityAnalysis. That allows a front-end to use known-name calls as an instruction extension mechanism (e.g. https://github.com/GPUOpen-Drivers/llvm-dialects ) without such a call being a source of divergence.	2024-10-12 09:34:45 +01:00
Serge Pavlov	15de239406	[IR] Allow MDString in operand bundles (#110805 ) This change implements support of metadata strings in operand bundle values. It makes possible calls like: call void @some_func(i32 %x) [ "foo"(i32 42, metadata !"abc") ] It requires some extension of the bitcode serialization. As SSA values and metadata are stored in different tables, there must be a way to distinguish them during deserialization. It is implemented by putting a special marker before the metadata index. The marker cannot be treated as a reference to any SSA value, so it unambiguously identifies metadata. It allows extending the bitcode serialization without breaking compatibility. Metadata as operand bundle values are intended to be used in floating-point function calls. They would represent the same information as now is passed by the constrained intrinsic arguments.	2024-10-11 12:09:10 +07:00
davidtrevelyan	0f488a0b7d	[LLVM][rtsan] Add `sanitize_realtime_unsafe` attribute (#106754 )	2024-09-19 16:45:25 -06:00
Jay Foad	e03f427196	[LLVM] Use {} instead of std::nullopt to initialize empty ArrayRef (#109133 ) It is almost always simpler to use {} instead of std::nullopt to initialize an empty ArrayRef. This patch changes all occurrences I could find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor could be deprecated or removed.	2024-09-19 16:16:38 +01:00
Jonas Paulsson	14120227a3	Target ABI: improve call parameters extensions handling (#100757 ) For the purpose of verifying proper arguments extensions per the target's ABI, introduce the NoExt attribute that may be used by a target when neither sign- or zeroextension is required (e.g. with a struct in register). The purpose of doing so is to be able to verify that there is always one of these attributes present and by this detecting cases where sign/zero extension is actually missing. As a first step, this patch has the verification step done for the SystemZ backend only, but left off by default until all known issues have been addressed. Other targets/front-ends can now also add NoExt attribute where needed and do this check in the backend.	2024-09-19 16:59:31 +02:00
Mingming Liu	7d371725cd	[NFCI][BitcodeReader]Read real GUID from VI as opposed to storing it in map (#107735 ) Currently, `ValueIdToValueInfoMap` [1] stores `std::tuple<ValueInfo, GlobalValue::GUID /* original GUID /, GlobalValue::GUID / real GUID/ >`. This change updates the stored value type to `std::pair<ValueInfo, GlobalValue::GUID / original GUID */>`, and reads real GUID from ValueInfo. When an entry is inserted into `ValueIdToValueInfoMap`, ValueInfo is created or inserted using real GUID [2]. ValueInfo keeps a pointer to GlobalValueMap [3], using either `GUID` or `{GUID, Name}` [4] when reading per-module summaries to create a combined summary. [1] owned by per module-summary bitcode reader `caebb4562c/llvm/lib/Bitcode/Reader/BitcodeReader.cpp (L947-L950)` [2] [first](`caebb4562c/llvm/lib/Bitcode/Reader/BitcodeReader.cpp (L7130-L7133)`), [second](`caebb4562c/llvm/lib/Bitcode/Reader/BitcodeReader.cpp (L7221-L7222)`), [third](`caebb4562c/llvm/lib/Bitcode/Reader/BitcodeReader.cpp (L7622-L7623)`) [3] `caebb4562c/llvm/include/llvm/IR/ModuleSummaryIndex.h (L1427-L1431)` [4] `caebb4562c/llvm/include/llvm/IR/ModuleSummaryIndex.h (L1631)` and `caebb4562c/llvm/include/llvm/IR/ModuleSummaryIndex.h (L1621)` --------- Co-authored-by: Kazu Hirata <kazu@google.com>	2024-09-09 09:43:47 -07:00
Yuxuan Chen	e17a39bc31	[Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] (#99282 ) This patch is the frontend implementation of the coroutine elide improvement project detailed in this discourse post: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 This patch proposes a C++ struct/class attribute `[[clang::coro_await_elidable]]`. This notion of await elidable task gives developers and library authors a certainty that coroutine heap elision happens in a predictable way. Originally, after we lower a coroutine to LLVM IR, CoroElide is responsible for analysis of whether an elision can happen. Take this as an example: ``` Task foo(); Task bar() { co_await foo(); } ``` For CoroElide to happen, the ramp function of `foo` must be inlined into `bar`. This inlining happens after `foo` has been split but `bar` is usually still a presplit coroutine. If `foo` is indeed a coroutine, the inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide then runs an analysis to figure out whether the SSA value of `coro.begin()` of `foo` gets destroyed before `bar` terminates. `Task` types are rarely simple enough for the destroy logic of the task to reference the SSA value from `coro.begin()` directly. Hence, the pass is very ineffective for even the most trivial C++ Task types. Improving CoroElide by implementing more powerful analyses is possible, however it doesn't give us the predictability when we expect elision to happen. The approach we want to take with this language extension generally originates from the philosophy that library implementations of `Task` types has the control over the structured concurrency guarantees we demand for elision to happen. That is, the lifetime for the callee's frame is shorter to that of the caller. The ``[[clang::coro_await_elidable]]`` is a class attribute which can be applied to a coroutine return type. When a coroutine function that returns such a type calls another coroutine function, the compiler performs heap allocation elision when the following conditions are all met: - callee coroutine function returns a type that is annotated with ``[[clang::coro_await_elidable]]``. - In caller coroutine, the return value of the callee is a prvalue that is immediately `co_await`ed. From the C++ perspective, it makes sense because we can ensure the lifetime of elided callee cannot exceed that of the caller if we can guarantee that the caller coroutine is never destroyed earlier than the callee coroutine. This is not generally true for any C++ programs. However, the library that implements `Task` types and executors may provide this guarantee to the compiler, providing the user with certainty that HALO will work on their programs. After this patch, when compiling coroutines that return a type with such attribute, the frontend checks that the type of the operand of `co_await` expressions (not `operator co_await`). If it's also attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata on the call or invoke instruction as a hint for a later middle end pass to elide the elision. The original patch version is https://github.com/llvm/llvm-project/pull/94693 and as suggested, the patch is split into frontend and middle end solutions into stacked PRs. The middle end CoroSplit patch can be found at https://github.com/llvm/llvm-project/pull/99283 The middle end transformation that performs the elide can be found at https://github.com/llvm/llvm-project/pull/99285	2024-09-08 23:08:58 -07:00
Kazu Hirata	6f6100f19c	[Bitcode] Avoid repeated hash lookups (NFC) (#107708 )	2024-09-07 11:23:15 -07:00
Kazu Hirata	51d3829d8f	[ThinLTO] Shrink FunctionSummary by 8 bytes (#107706 ) During the ThinLTO indexing step for one of our large applications, we create 4 million instances of FunctionSummary. Changing: std::vector<EdgeTy> CallGraphEdgeList; to: SmallVector<EdgeTy, 0> CallGraphEdgeList; in FunctionSummary reduces the size of each instance by 8 bytes. The rest of the patch makes the same change to other places so that the types stay compatible across function boundaries.	2024-09-07 11:21:20 -07:00
Jie Fu	a7f152f59b	[Bitcode] Fix -Wunused-but-set-variable in BitcodeReader.cpp (NFC) /llvm-project/llvm/lib/Bitcode/Reader/BitcodeReader.cpp:7795:16: error: variable 'EntryCount' set but not used [-Werror,-Wunused-but-set-variable] uint64_t EntryCount = 0; ^ 1 error generated.	2024-09-07 07:53:17 +08:00
Mingming Liu	d4ddf06b0c	[NFCI]Remove EntryCount from FunctionSummary and clean up surrounding synthetic count passes. (#107471 ) The primary motivation is to remove `EntryCount` from `FunctionSummary`. This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of `64498c5483`). While I'm at it, this PR clean up {SummaryBasedOptimizations, SyntheticCountsPropagation} since they were not used and there are no plans to further invest on them. With this patch, bitcode writer writes a placeholder 0 at the byte offset of `EntryCount` and bitcode reader can parse the function entry count at the correct byte offset. Added a TODO to stop writing `EntryCount` and bump bitcode version	2024-09-06 16:38:17 -07:00
Kazu Hirata	0ffa377c6b	[ThinLTO] Shrink GlobalValueSummary by 8 bytes (#107342 ) During the ThinLTO indexing step for one of our large applications, we create 7.5 million instances of GlobalValueSummary. Changing: std::vector<ValueInfo> RefEdgeList; to: SmallVector<ValueInfo, 0> RefEdgeList; in GlobalValueSummary reduces the size of each instance by 8 bytes. The rest of the patch makes the same change to other places so that the types stay compatible across function boundaries.	2024-09-06 10:25:08 -07:00
anjenner	4af249fe6e	Add usub_cond and usub_sat operations to atomicrmw (#105568 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2024-09-06 16:19:20 +01:00
Jay Foad	2f6e4ed389	[IR] Check parameters of target extension types on construction (#107268 ) Since IR Types are immutable it makes sense to check them on construction instead of in the IR Verifier pass. This patch checks that some TargetExtTypes are well-formed in the sense that they have the expected number of type parameters and integer parameters. When called from LLParser it gives a diagnostic message. When called from anywhere else it just asserts that they are well-formed.	2024-09-05 16:48:22 +01:00
Chris Apple	fef3426ad3	Revert "[LLVM][rtsan] Add LLVM nosanitize_realtime attribute (#105447 )" (#106743 ) This reverts commit 178fc4779ece31392a2cd01472b0279e50b3a199. This attribute was not needed now that we are using the lsan style ScopedDisabler for disabling this sanitizer See #106736 #106125 For more discussion	2024-08-30 07:48:31 -07:00
Jan Voung	fa4fbaefde	Reapply: Use an abbrev to reduce size of VALUE_GUID records in ThinLTO summaries (#106165 ) This retries #90692 which was reverted previously due to issues with lld-available being set, even if the copy of lld is not built from source. This does not change any code compared to #90692 to address the lld-available issue. The main change w.r.t, lld-available is xfailing tests in PR #99056 (until a longer term fix is available).	2024-08-27 13:53:25 -04:00
Chris Apple	178fc4779e	[LLVM][rtsan] Add LLVM nosanitize_realtime attribute (#105447 )	2024-08-26 12:49:27 -07:00
Kazu Hirata	dbd7ce0ccd	[IR] Inroduce ModuleToSummariesForIndexTy (NFC) (#105906 ) This patch introduces type alias ModuleToSummariesForIndexTy. I'm planning to change the type slightly to allow heterogeneous lookup (that is, std::map<K, V, std::less<>>) in a subsequent patch. The problem is that changing the type affects many places. Using a type alias reduces the impact.	2024-08-23 17:32:52 -07:00
Kazu Hirata	ca53611c90	[llvm] Use range-based for loops (NFC) (#105861 )	2024-08-23 16:56:27 -07:00
Kazu Hirata	3b703d479f	[Bitcode] Use DenseSet instead of std::set (NFC) (#105851 ) DefOrUseGUIDs is used only for membership checking purposes. We don't need std::set's strengths like iterators staying valid or the ability to traverse in a sorted order. While I am at it, this patch replaces count with contains for slightly increased readability.	2024-08-23 14:19:48 -07:00
Kazu Hirata	5f01fda4ac	[Bitcode] Use range-based for loops (NFC) (#104534 )	2024-08-15 20:17:40 -07:00
Kazu Hirata	74c5fa12a8	[Bitcode] Use range-based for loops (NFC) (#103628 )	2024-08-13 23:31:40 -07:00
Chris Apple	b143b2483f	[LLVM][rtsan] Add sanitize_realtime attribute for the realtime sanitizer (#100596 ) Add a new "sanitize_realtime" attribute, which will correspond to the nonblocking function effect in clang. This is used in the realtime sanitizer transform. Please see the [reviewer support document](https://github.com/realtime-sanitizer/radsan/blob/doc/review-support/doc/review.md) for what our next steps are. The original discourse thread can be found [here](https://discourse.llvm.org/t/rfc-nolock-and-noalloc-attributes/76837)	2024-08-08 15:41:06 +02:00
James Y Knight	dfeb3991fb	Remove the `x86_mmx` IR type. (#98505 ) It is now translated to `<1 x i64>`, which allows the removal of a bunch of special casing. This _incompatibly_ changes the ABI of any LLVM IR function with `x86_mmx` arguments or returns: instead of passing in mmx registers, they will now be passed via integer registers. However, the real-world incompatibility caused by this is expected to be minimal, because Clang never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>` or `double`, depending on ABI. This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type. That type simply no longer corresponds to an IR type, and is used only by MMX intrinsics and inline-asm operands. Because SelectionDAGBuilder only knows how to generate the operands/results of intrinsics based on the IR type, it thus now generates the intrinsics with the type MVT::v1i64, instead of MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus have the X86 backend fix them up in DAGCombine. (This may be a short-lived hack, if all the MMX intrinsics can be removed in upcoming changes.) Works towards issue #98272.	2024-07-25 09:19:22 -04:00
Jacek Caban	6cc8774228	[CodeGen][ARM64EC] Add support for hybrid_patchable attribute. (#92965 )	2024-07-19 11:43:25 +02:00
Tom Stellard	ce2b280553	[BitcodeReader] Remove dead increment (#98412 ) This was found by the Clang Static Analyzer.	2024-07-11 17:42:11 -07:00
Teresa Johnson	9f8205d9d8	[MemProf] Track and report profiled sizes through cloning (#98382 ) If requested, via the -memprof-report-hinted-sizes option, track the total profiled size of each MIB through the thin link, then report on the corresponding allocation coldness after all cloning is complete. To save size, a different bitcode record type is used for the allocation info when the option is specified, and the sizes are kept separate from the MIBs in the index.	2024-07-11 16:10:30 -07:00
Mingming Liu	50fea9943f	Reland "[ThinLTO][Bitcode] Generate import type in bitcode" (#97253 ) https://github.com/llvm/llvm-project/pull/87600 was reverted in order to revert `6262763341`. Now https://github.com/llvm/llvm-project/pull/95482 is fix forward for `6262763341`. This patch is a reland for https://github.com/llvm/llvm-project/pull/87600 Changes on top of original patch In `llvm/include/llvm/IR/ModuleSummaryIndex.h`, make the type of `GVSummaryPtrSet` an `unordered_set` which is more memory efficient when the number of elements is smaller than 128 [1] Original commit message For distributed ThinLTO, the LTO indexing step generates combined summary for each module, and postlink pipeline reads the combined summary which stores the information for link-time optimization. This patch populates the 'import type' of a summary in bitcode, and updates bitcode reader to parse the bit correctly. [1] `393eff4e02/llvm/lib/Support/SmallPtrSet.cpp (L43)`	2024-07-08 22:20:33 -07:00
Kazu Hirata	40c12648c6	[Bitcode] Use range-based for loops (NFC) (#97776 )	2024-07-07 06:34:18 +09:00
Alex Voicu	9acb533c38	[clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (#95061 ) This patch augments the HIPAMD driver to allow it to target AMDGCN flavoured SPIR-V compilation. It's mostly straightforward, as we re-use some of the existing SPIRV infra, however there are a few notable additions: - we introduce an `amdgcnspirv` offload arch, rather than relying on using `generic` (this is already fairly overloaded) or simply using `spirv` or `spirv64` (we'll want to use these to denote unflavoured SPIRV, once we bring up that capability) - initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU targets, as it would require some relatively intrusive surgery in the HIPAMD Toolchain and the Driver to deal with two triples (`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively) - in order to retain user provided compiler flags and have them available at JIT time, we rely on embedding the command line via `-fembed-bitcode=marker`, which the bitcode writer had previously not implemented for SPIRV; we only allow it conditionally for AMDGCN flavoured SPIRV, and it is handled correctly by the Translator (it ends up as a string literal) Once the SPIRV BE is no longer experimental we'll switch to using that rather than the translator. There's some additional work that'll come via a separate PR around correctly piping through AMDGCN's implementation of `printf`, for now we merely handle its flags correctly.	2024-06-25 12:19:28 +01:00
Nikita Popov	6258b5f610	[BitcodeReader] Use poison instead of undef for invalid instructions	2024-06-24 16:36:46 +02:00
Haopeng Liu	5ece35df85	Add the 'initializes' attribute langref and support (#84803 ) We propose adding a new LLVM attribute, `initializes((Lo1,Hi1),(Lo2,Hi2),...)`, which expresses the notion of memory space (i.e., intervals, in bytes) that the argument pointing to is initialized in the function. Will commit the attribute inferring in the follow-up PRs. https://discourse.llvm.org/t/rfc-llvm-new-initialized-parameter-attribute-for-improved-interprocedural-dse/77337	2024-06-21 12:09:00 -07:00
Jan Voung	e1e5ed5893	Update ModuleSummaryIndexBitcodeReader::makeCallList reserve amount (#95461 ) Tighten the reserve() to `Record.size() / 2` instead of `Record.size()` in the HasProfile/HasRelBF cases. For the uncommon old profile format cases we leave it as is, but those should be rare and not worth optimizing. This reduces peak memory during ThinLTO indexing by ~3% in one example. Alternatively, we could make the branching for reserve more complex and try to cover every case.	2024-06-20 21:04:14 -04:00

1 2 3 4 5 ...

2196 Commits