llvm-project

Author	SHA1	Message	Date
Diana Picus	ac005e16f6	Reapply "[AMDGPU] Intrinsic for launching whole wave functions" (#153584 ) This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The buildbot failure seems to have been a cmake issue which has been discussed in more detail in this Discourse post: https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901 If any buildbots fail to select arbitrary intrinsics with this patch, it's worth considering using clean builds with ccache instead of incremental builds, as recommended here: https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds The original commit message for this patch: Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave functions. This will take as its first argument the callee with the amdgpu_gfx_whole_wave calling convention, followed by the call parameters which must match the signature of the callee except for the first function argument (the i1 original EXEC mask, which doesn't need to be passed in). Indirect calls are not allowed. Make direct calls to amdgpu_gfx_whole_wave functions a verifier error. Tail calls are handled in a future patch.	2025-08-15 10:12:47 +02:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Diana Picus	14cd133931	Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286 ) Reverts llvm/llvm-project#145859 because it broke a HIP test: ``` [34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane ```	2025-08-06 12:24:52 +02:00
Diana Picus	0461cd3d1d	[AMDGPU] Intrinsic for launching whole wave functions (#145859 ) Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave functions. This will take as its first argument the callee with the amdgpu_gfx_whole_wave calling convention, followed by the call parameters which must match the signature of the callee except for the first function argument (the i1 original EXEC mask, which doesn't need to be passed in). Indirect calls are not allowed. Make direct calls to amdgpu_gfx_whole_wave functions a verifier error. Unspeakable horrors happen around calls from whole wave functions, the plan is to improve the handling of caller/callee-saved registers in a future patch. Tail calls are also handled in a future patch.	2025-08-06 10:25:53 +02:00
Joel E. Denny	37e03b56b8	Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585 ) Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)	2025-07-31 15:56:31 -04:00
Joel E. Denny	f7b65011de	[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758 ) This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.	2025-07-31 12:28:25 -04:00
Changpeng Fang	d6094370cb	AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2025-07-21 10:09:42 -07:00
Nikita Popov	92c55a315e	[IR] Only allow lifetime.start/end on allocas (#149310 ) lifetime.start and lifetime.end are primarily intended for use on allocas, to enable stack coloring and other liveness optimizations. This is necessary because all (static) allocas are hoisted into the entry block, so lifetime markers are the only way to convey the actual lifetimes. However, lifetime.start and lifetime.end are currently allowed to be used on non-alloca pointers. We don't actually do this in practice, but just the mere fact that this is possible breaks the core purpose of the lifetime markers, which is stack coloring of allocas. Stack coloring can only work correctly if all lifetime markers for an alloca are analyzable. * If a lifetime marker may operate on multiple allocas via a select/phi, we don't know which lifetime actually starts/ends and handle it incorrectly (https://github.com/llvm/llvm-project/issues/104776). * Stack coloring operates on the assumption that all lifetime markers are visible, and not, for example, hidden behind a function call or escaped pointer. It's not possible to change this, as part of the purpose of lifetime markers is that they work even in the presence of escaped pointers, where simple use analysis is insufficient. I don't think there is any way to have coherent semantics for lifetime markers on allocas, while also permitting them on arbitrary pointer values. This PR restricts lifetimes to operate on allocas only. As a followup, I will also drop the size argument, which is superfluous if we always operate on an alloca. (This change also renders various code handling lifetime markers on non-alloca dead. I plan to clean up that kind of code after dropping the size argument as well.) In practice, I've only found a few places that currently produce lifetimes on non-allocas: * CoroEarly replaces the promise alloca with the result of an intrinsic, which will later be replaced back with an alloca. I think this is the only place where there is some legitimate loss of functionality, but I don't think this is particularly important (I don't think we'd expect the promise in a coroutine to admit useful lifetime optimization.) * SafeStack moves unsafe allocas onto a separate frame. We can safely drop lifetimes here, as SafeStack performs its own stack coloring. * Similar for AddressSanitizer, it also moves allocas into separate memory. * LSR sometimes replaces the lifetime argument with a GEP chain of the alloca (where the offsets ultimately cancel out). This is just unnecessary. (Fixed separately in https://github.com/llvm/llvm-project/pull/149492.) * InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast of an alloca. I don't think this is necessary.	2025-07-21 15:04:50 +02:00
Diana Picus	20d8398825	[AMDGPU] ISel & PEI for whole wave functions (#145858 ) Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>	2025-07-21 10:39:09 +02:00
Prabhu Rajasekaran	921c6dbeca	[llvm] Introduce callee_type metadata Introduce `callee_type` metadata which will be attached to the indirect call instructions. The `callee_type` metadata will be used to generate `.callgraph` section described in this RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html Reviewers: morehouse, petrhosek, nikic, ilovepi Reviewed By: nikic, ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87573	2025-07-18 14:40:54 -07:00
Florian Hahn	cad62df49a	[Loads] Support dereferenceable assumption with variable size. (#128436 ) Update isDereferenceableAndAlignedPointer to make use of dereferenceable assumptions with variable sizes via SCEV. To do so, factor out the logic to check via an assumption to a helper, and use SE to check if the access size is less than the dereferenceable size. PR: https://github.com/llvm/llvm-project/pull/128436	2025-07-14 08:17:33 +01:00
Craig Topper	6ee87759e3	[RISCV][IR] Implement verifier check for llvm.experimental.vp.splice immediate. (#147458 ) This applies the same check as llvm.vector.splice which checks that the immediate is in the range [-VL, VL-1] where VL is the minimum vector length. If vscale_range is available, the lower bound is used to increase the known minimum vector length for this check. This ensures the immediate is in range for any possible value of vscale that satisfies the vscale_range.	2025-07-08 13:54:11 -07:00
Antonio Frighetto	f1cc0b607b	[IR] Introduce `dead_on_return` attribute Add `dead_on_return` attribute, which is meant to be taken advantage by the frontend, and states that the memory pointed to by the argument is dead upon function return. As with `byval`, it is supposed to be used for passing aggregates by value. The difference lies in the ABI: `byval` implies that the pointer is explicitly passed as argument to the callee (during codegen the copy is emitted as per byval contract), whereas a `dead_on_return`-marked argument implies that the copy already exists in the IR, is located at a specific stack offset within the caller, and this memory will not be read further by the caller upon callee return – or otherwise poison, if read before being written. RFC: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.	2025-07-02 09:29:36 +02:00
Mircea Trofin	d2500e639b	[pgo] add means to specify "unknown" MD_prof (#145578 ) This PR is part of https://discourse.llvm.org/t/rfc-profile-information-propagation-unittesting/73595 In a slight departure from the RFC, instead of a brand-new `MD_prof_unknown` kind, this adds a first operand to `MD_prof` metadata. This makes it easy to replace with valid metadata (only one `MD_prof`), otherwise sites inserting valid `MD_prof` would also have to check to remove the `unknown` one. The patch just introduces the notion and fixes the verifier accordingly. Existing APIs working (esp. reading) `MD_prof` will be updated subsequently.	2025-06-30 16:57:11 -07:00
Mircea Trofin	46628718c0	[IR][PGO] Verify the structure of `VP` metadata. (#145584 )	2025-06-30 12:31:19 -07:00
Mircea Trofin	62f8281e08	[IR][PGO] Verify invalid `MD_prof` metadata on instructions (#145576 ) This PR places the validation of `MD_prof` instruction metadata in the Verifier.	2025-06-25 13:10:43 -07:00
Florian Hahn	17c5c19902	[Verifier] Always verify all assume bundles. (#145586 ) For some reason, some of the checks for specific assumbe bundle elements exit early if the check pass, meaning we don't verify other entries. Replace the early returns with early continues. This also requires removing some tests that are currently rejected. They will be added back as part of https://github.com/llvm/llvm-project/pull/128436. PR: https://github.com/llvm/llvm-project/pull/145586	2025-06-25 09:31:02 +01:00
Florian Hahn	3187d4da24	[Verifier] Add additional tests for dereferenceable assumptions.	2025-06-24 20:45:21 +01:00
Matt Arsenault	f0d898f36b	DAG: Move get_dynamic_area_offset type check to IR verifier (#145268 ) Also fix the LangRef to match the implementation. This was checking against the alloca address space size rather than the default address space. The check was also more permissive than the LangRef. The error check permitted any size less than the pointer size; follow the stricter wording of the LangRef.	2025-06-24 11:11:52 +09:00
Durgadoss R	bfee625821	[NVPTX] Attach Range attr to setmaxnreg and fence intrinsics (#144120 ) This patch attaches the range attribute to the setmaxnreg and fence.proxy.tensormap.* intrinsics. The range checks are now handled generically in the Verifier. So, this patch removes the per-intrinsic error-handling for range-checks from the Verifier. This patch also adds more coverage tests for these cases. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-06-19 07:49:08 +05:30
Jeremy Morse	3d7aa961ac	[DebugInfo][RemoveDIs] Use autoupgrader to convert old debug-info (#143452 ) By chance, two things have prevented the autoupgrade path being exercised much so far: * LLParser setting the debug-info mode to "old" on seeing intrinsics, * The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info block. In practice, this appears to mean this code path hasn't seen the various invalid inputs that can come its way. This commit does a number of things: * Tolerates the various illegal inputs that can be written with debug-intrinsics, and that must be tolerated until the Verifier runs, * Printing illegal/null DbgRecord fields must succeed, * Verifier errors need to localise the function/block where the error is, * Tests that now see debug records will print debug-record errors, Plus a few new tests for other intrinsic-to-debug-record failures modes I found. There are also two edge cases: * Some of the unit tests switch back and forth between intrinsic and record modes at will; I've deleted coverage and some assertions to tolerate this as intrinsic support is now Gone (TM), * In sroa-extract-bits.ll, the order of debug records flips. This is because the autoupgrader upgrades in the opposite order to the basic block conversion routines... which doesn't change the record order, but _does_ change the use list order in Metadata! This should (TM) have no consequence to the correctness of LLVM, but will change the order of various records and the order of DWARF record output too. I tried to reduce this patch to a smaller collection of changes, but they're all intertwined, sorry.	2025-06-11 13:56:30 +01:00
arysef	78af498035	[LLVM][IR] Support target extension types in vectors (#140630 ) This change is to support target extension types in vectors. The change allows sized target extension types to opt-in to being a valid vector element. Allowing target extension types as vector elements will allow backends to use vector operations such as `insertelement` and `extractelement` on their target types with minimal changes. RFC: https://discourse.llvm.org/t/rfc-supporting-sized-target-extension-types-in-vector/86431	2025-06-09 12:03:15 -07:00
clubby789	c7c79d2590	[IR][DSE] Support non-malloc functions in malloc+memset->calloc fold (#138299 ) Add a `alloc-variant-zeroed` function attribute which can be used to inform folding allocation+memset. This addresses https://github.com/rust-lang/rust/issues/104847, where LLVM does not know how to perform this transformation for non-C languages. Co-authored-by: Jamie <jamie@osec.io>	2025-06-04 09:35:20 +02:00
Nathan Gauër	df344285e2	[IR] Relax convergence requirements on call (#135794 ) Before this commit, having a convergence token on a non-convergent call was considered to be an error. This commit relaxes this requirement and allows convergence tokens to be present on non-convergent calls. When such token is present, they have no effect as the underlying call is non-convergent. This allows passes like DCE to strip `convergent` attribute from functions for which all convergent operations have been stripped. When this happens, a convergence token can still exist in the call-site, causing the verifier to complain. Alternatives have been considered in #134863 and #134844.	2025-05-02 08:17:58 +00:00
Nikita Popov	a8ec6e8788	[IR] Require that global value initializers are sized (#137358 ) While external globals can be unsized, I don't think an unsized initializer makes sense. It seems like the backend currently ends up treating this as a zero-size global. If people want that behavior, they should declare it as such.	2025-05-02 09:52:39 +02:00
Matt Arsenault	9494464b13	Verifier: Move test from Assembler to right place and avoid grep (#138113 )	2025-05-01 13:44:49 +02:00
Nikita Popov	6feb4a8ef4	[IR] Don't allow values of opaque type (#137625 ) Consider opaque types as non-first-class types, i.e. do not allow SSA values to have opaque type.	2025-04-30 15:01:00 +02:00
Nikita Popov	38cb7d5e75	[IR] Don't allow label arguments (#137799 ) We currently accept label arguments to inline asm calls. This support predates both blockaddresses and callbr and is only covered by one X86 test. Remove it in favor of callbr (or at least blockaddress, though that cannot guarantee correct codegen, just like using block labels directly can't). I didn't bother implementing bitcode upgrade support for this, but I can add it if desired.	2025-04-30 09:11:36 +02:00
Shilei Tian	3bc125490a	[AMDGPU][Verifier] Check address space of `alloca` instruction (#135820 ) This PR updates the `Verifier` to enforce that `alloca` instructions on AMDGPU must be in AS5. This prevents hitting a misleading backend error like "unable to select FrameIndex," which makes it look like a backend bug when it's actually an IR-level issue.	2025-04-26 00:54:00 -04:00
Benjamin Maxwell	8c7a2ce01a	[AArch64][SME] Allow spills of ZT0 around SME ABI routines again (#136726 ) In #132722 spills of ZT0 were disabled around all SME ABI routines to avoid a case where ZT0 is spilled before ZA is enabled (resulting in a crash). It turns out that the ABI does not promise that routines will preserve ZT0 (however in practice they do), so generally disabling ZT0 spills for ABI routines is not correct. The case where a crash was possible was "aarch64_new_zt0" functions with ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this patch avoids the ZT0 spill by marking the callsite with "aarch64_zt0_undef". This attribute only applies to callsites and marks that at the point the call is made ZT0 is not defined, so does not need preserving.	2025-04-25 13:33:09 +01:00
Stephen Tozer	928c33354e	[DebugInfo] Handle additional types of stores in assignment tracking (#129070 ) Fixes #126417. Currently, assignment tracking recognizes allocas, stores, and mem intrinsics as valid instructions to tag with DIAssignID, with allocas representing the allocation for a variable and the others representing instructions that may assign to the variable. There are other intrinsics that can perform these assignments however, and if we transform a store instruction into one of these intrinsics and correctly transfer the DIAssignID over, this results in a verifier error. The AssignmentTrackingAnalysis pass also does not know how to handle these intrinsics if they are untagged, as it does not know how to extract assignment information (base address, offset, size) from them. This patch adds _some_ support for some intrinsics that may perform assignments: masked store/scatter, and vp store/strided store/scatter. This patch does not add support for extracting assignment information from these, as they may store with either non-constant size or to non-contiguous blocks of memory; instead it adds support for recognizing untagged stores with "unknown" assignment info, for which we assume that the memory location of the associated variable should not be used, as we can't determine which fragments of it should or should not be used. In principle, it should be possible to handle the more complex cases mentioned above, but it would require more substantial changes to AssignmentTrackingAnalysis, and it is mostly only needed as a fallback if the DIAssignID is not preserved on these alternative stores.	2025-04-22 17:14:25 +01:00
YLChenZ	b9e11eade7	[llvm][ir]: fix llc crashes on function definitions with label parameters (#136285 ) Closes #136144. After the patch: ```llvm ; label-crash.ll define void @invalid_arg_type(i32 %0) { 1: call void @foo(label %1) ret void } declare void @foo(label) ``` ``` lambda@ubuntu22:~/test$ llc -o out.s label-crash.ll Function argument cannot be of label type! label %0 ptr @foo llc: error: 'label-crash.ll': input module cannot be verified ```	2025-04-19 09:32:40 +08:00
Nikita Popov	44e32a263a	[Verifier] Fix intrinsic signatures in immarg tests (NFC)	2025-04-14 17:24:50 +02:00
Shilei Tian	a45b133d40	[AMDGPU][Verifier] Mark calls to entry functions as invalid in the IR verifier (#134910 )	2025-04-11 15:32:37 -04:00
Sami Tolvanen	acc6bcdc50	Support alternative sections for patchable function entries (#131230 ) With -fpatchable-function-entry (or the patchable_function_entry function attribute), we emit records of patchable entry locations to the __patchable_function_entries section. Add an additional parameter to the command line option that allows one to specify a different default section name for the records, and an identical parameter to the function attribute that allows one to override the section used. The main use case for this change is the Linux kernel using prefix NOPs for ftrace, and thus depending on__patchable_function_entries to locate traceable functions. Functions that are not traceable currently disable entry NOPs using the function attribute, but this creates a compatibility issue with -fsanitize=kcfi, which expects all indirectly callable functions to have a type hash prefix at the same offset from the function entry. Adding a section parameter would allow the kernel to distinguish between traceable and non-traceable functions by adding entry records to separate sections while maintaining a stable function prefix layout for all functions. LKML discussion: https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/	2025-04-02 21:53:55 +00:00
Alex Bradbury	be0a3b223a	[IR] Allow llvm.experimental.memset.pattern to take any sized type as the pattern argument (#132026 ) I initially thought starting with a more narrow definition and later expanding would make more sense. But as pointed out in review for PR #129220, this restriction is generating additional unnecessary work. This patch alters the intrinsic to accept patterns of any type. Future patches will update LoopIdiomRecognize and PreISelIntrinsicLowering to take advantage of this. The verifier will complain if an unsized type is used. I've additionally taken the opportunity to remove a comment from the LangRef about some bit widths potentially not being supported by the target. I don't think this is any more true than it is for arbitrary width loads/stores which don't carry a similar warning that I can see. A verifier check ensures that only sized types are used for the pattern.	2025-03-19 14:17:42 +00:00
Heejin Ahn	d2d469eb79	[WebAssembly] Make llvm.wasm.throw invokable (#128104 ) `llvm.wasm.throw` intrinsic can throw but it was not invokable. Not sure what the rationale was when it was first written that way, but I think at least in Emscripten's C++ exception support with the Wasm port of libunwind, `__builtin_wasm_throw`, which is lowered down to `llvm.wasm.rethrow`, is used only within `_Unwind_RaiseException`, which is an one-liner and thus does not need an `invoke`: `720e97f76d/system/lib/libunwind/src/Unwind-wasm.c (L69)` (`_Unwind_RaiseException` is called by `__cxa_throw`, which is generated by the `throw` C++ keyword) But this does not address other direct uses of the builtin in C++, whose use I'm not sure about but is not prohibited. Also other language frontends may need to use the builtin in different functions, which has `try`-`catch`es or destructors. This makes `llvm.wasm.throw` invokable in the backend. To do that, this adds a custom lowering routine to `SelectionDAGBuilder::visitInvoke`, like we did for `llvm.wasm.rethrow`. This does not generate `invoke`s for `__builtin_wasm_throw` yet, which will be done by a follow-up PR. Addresses #124710.	2025-02-25 09:53:01 -08:00
Robert Imschweiler	cafad2b75a	[AMDGPU] Add verification for amdgcn.init.exec.from.input (#128172 ) Check that the input register is an inreg argument to the parent function. (See the comment in `IntrinsicsAMDGPU.td`.) This LLVM defect was identified via the AMD Fuzzing project. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-02-23 10:42:31 -05:00
Robert Imschweiler	fb25216209	[AMDGPU] Enhance verification of amdgcn.cs.chain intrinsic (#128162 ) Make sure that this intrinsic is being followed by unreachable. This LLVM defect was identified via the AMD Fuzzing project. Thanks to @rovka for helping me solve this issue!	2025-02-22 00:24:18 +07:00
David Blaikie	42043c423f	Reapply "Verifier: Add check for DICompositeType elements being null" This remove some erroneous debug info from tests that should address the test failures that showed up when the this was previously committed. This reverts commit 6716ce8b641f0e42e2343e1694ee578b027be0c4.	2025-01-23 22:29:30 +00:00
David Blaikie	6716ce8b64	Revert "Verifier: Add check for DICompositeType elements being null" Asserts on various tests/buildbots, at least one example is DebugInfo/X86/set.ll This reverts commit 2dc5682dacab2dbb52a771746fdede0e938fc6e9.	2025-01-17 19:36:35 +00:00
David Blaikie	2dc5682dac	Verifier: Add check for DICompositeType elements being null Came up recently with some nodebug case on codeview, that caused a null entry in elements and crashed LLVM. Original clang fix to avoid generating IR like this: 504dd577675e8c85cdc8525990a7c8b517a38a89	2025-01-17 19:18:25 +00:00
Sander de Smalen	2ce168baed	[AArch64] SME implementation for agnostic-ZA functions (#120150 ) This implements the lowering of calls from agnostic-ZA functions to non-agnostic-ZA functions, using the ABI routines `__arm_sme_state_size`, `__arm_sme_save` and `__arm_sme_restore`. This implements the proposal described in the following PRs: * https://github.com/ARM-software/acle/pull/336 * https://github.com/ARM-software/abi-aa/pull/264	2024-12-23 19:10:21 +00:00
Matt Arsenault	a05a1d6eef	AMDGPU: Add basic verification for mfma scale intrinsics (#117048 ) Verify the format is valid and the type is one of the expected i32 vectors. Verify the used vector types at least cover the requirements of the corresponding format operand.	2024-11-22 12:13:21 -08:00
Teresa Johnson	9513f2fdf2	[MemProf] Print full context hash when reporting hinted bytes (#114465 ) Improve the information printed when -memprof-report-hinted-sizes is enabled. Now print the full context hash computed from the original profile, similar to what we do when reporting matching statistics. This will make it easier to correlate with the profile. Note that the full context hash must be computed at profile match time and saved in the metadata and summary, because we may trim the context during matching when it isn't needed for distinguishing hotness. Similarly, due to the context trimming, we may have more than one full context id and total size pair per MIB in the metadata and summary, which now get a list of these pairs. Remove the old aggregate size from the metadata and summary support. One other change from the prior support is that we no longer write the size information into the combined index for the LTO backends, which don't use this information, which reduces unnecessary bloat in distributed index files.	2024-11-15 08:24:44 -08:00
Alex Bradbury	298127dcbe	Reapply [IR] Initial introduction of llvm.experimental.memset_pattern (#97583 ) Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the test case. Supersedes the draft PR #94992, taking a different approach following feedback: * Lower in PreISelIntrinsicLowering * Don't require that the number of bytes to set is a compile-time constant * Define llvm.memset_pattern rather than llvm.memset_pattern.inline As discussed in the [RFC thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496), the intent is that the intrinsic will be lowered to loops, a sequence of stores, or libcalls depending on the expected cost and availability of libcalls on the target. Right now, there's just a single lowering path that aims to handle all cases. My intent would be to follow up with additional PRs that add additional optimisations when possible (e.g. when libcalls are available, when arguments are known to be constant etc).	2024-11-15 15:21:39 +00:00
Alex Bradbury	0fb8fac5d6	Revert "[IR] Initial introduction of llvm.experimental.memset_pattern (#97583 )" This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3. Recent scheduling changes means tests need to be re-generated. Reverting to green while I do that.	2024-11-15 14:48:32 +00:00
Alex Bradbury	7ff3a9acd8	[IR] Initial introduction of llvm.experimental.memset_pattern (#97583 ) Supersedes the draft PR #94992, taking a different approach following feedback: * Lower in PreISelIntrinsicLowering * Don't require that the number of bytes to set is a compile-time constant * Define llvm.memset_pattern rather than llvm.memset_pattern.inline As discussed in the [RFC thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496), the intent is that the intrinsic will be lowered to loops, a sequence of stores, or libcalls depending on the expected cost and availability of libcalls on the target. Right now, there's just a single lowering path that aims to handle all cases. My intent would be to follow up with additional PRs that add additional optimisations when possible (e.g. when libcalls are available, when arguments are known to be constant etc).	2024-11-15 14:07:46 +00:00
Lee Wei	1469d82e1c	Remove `br i1 undef` from some regression tests [NFC] (#115130 ) As defined in LangRef, branching on `undef` is undefined behavior. This PR aims to remove undefined behavior from tests. As UB tests break Alive2 and may be the root cause of breaking future optimizations. Here's an Alive2 proof for one of the examples: https://alive2.llvm.org/ce/z/TncxhP	2024-11-07 08:11:15 +00:00
Jay Foad	4831e0aa88	[IR] Disallow recursive types (#114799 ) StructType::setBody is the only mechanism that can potentially create recursion in the type system. Add a runtime check that it is not actually used to create recursion. If the check fails, report an error from LLParser, BitcodeReader and IRLinker. In all other cases assert that the check succeeds. In future StructType::setBody will be removed in favor of specifying the body when the type is created, so any performance hit from this runtime check will be temporary.	2024-11-05 09:41:10 +00:00

1 2 3 4 5 ...

713 Commits