713 Commits

Author SHA1 Message Date
Diana Picus
ac005e16f6
Reapply "[AMDGPU] Intrinsic for launching whole wave functions" (#153584)
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The
buildbot failure seems to have been a cmake issue which has been
discussed in more detail in this Discourse post:

https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901

If any buildbots fail to select arbitrary intrinsics with this patch,
it's worth considering using clean builds with ccache instead of
incremental builds, as recommended here:

https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds

The original commit message for this patch:
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Tail calls are handled in a future patch.
2025-08-15 10:12:47 +02:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Diana Picus
14cd133931
Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286)
Reverts llvm/llvm-project#145859 because it broke a HIP test:
```
[34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o 
/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane
```
2025-08-06 12:24:52 +02:00
Diana Picus
0461cd3d1d
[AMDGPU] Intrinsic for launching whole wave functions (#145859)
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.

Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.

Unspeakable horrors happen around calls from whole wave functions, the
plan is to improve the handling of caller/callee-saved registers in
a future patch.

Tail calls are also handled in a future patch.
2025-08-06 10:25:53 +02:00
Joel E. Denny
37e03b56b8
Revert "[PGO] Add llvm.loop.estimated_trip_count metadata" (#151585)
Reverts llvm/llvm-project#148758

[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31 15:56:31 -04:00
Joel E. Denny
f7b65011de
[PGO] Add llvm.loop.estimated_trip_count metadata (#148758)
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.

An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
2025-07-31 12:28:25 -04:00
Changpeng Fang
d6094370cb
AMDGPU: Support v_wmma_f32_16x16x128_f8f6f4 on gfx1250 (#149684)
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2025-07-21 10:09:42 -07:00
Nikita Popov
92c55a315e
[IR] Only allow lifetime.start/end on allocas (#149310)
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.

However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.

* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.

I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.

This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)

In practice, I've only found a few places that currently produce
lifetimes on non-allocas:

* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
2025-07-21 15:04:50 +02:00
Diana Picus
20d8398825
[AMDGPU] ISel & PEI for whole wave functions (#145858)
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.

Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
  used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
  a SReg_1 representing `%active`, which needs to be passed into
  SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
  special handling of %active. Since the return may be in a different
  basic block, it's difficult to add the virtual reg for %active to
  SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
  which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
  marks any used VGPRs as WWM registers, which are then spilled and
  restored with the usual logic.

Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-07-21 10:39:09 +02:00
Prabhu Rajasekaran
921c6dbeca
[llvm] Introduce callee_type metadata
Introduce `callee_type` metadata which will be attached to the indirect
call instructions.

The `callee_type` metadata will be used to generate `.callgraph` section
described in this RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html

Reviewers: morehouse, petrhosek, nikic, ilovepi

Reviewed By: nikic, ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/87573
2025-07-18 14:40:54 -07:00
Florian Hahn
cad62df49a
[Loads] Support dereferenceable assumption with variable size. (#128436)
Update isDereferenceableAndAlignedPointer to make use of dereferenceable
assumptions with variable sizes via SCEV.

To do so, factor out the logic to check via an assumption to a helper,
and use SE to check if the access size is less than the dereferenceable
size.

PR: https://github.com/llvm/llvm-project/pull/128436
2025-07-14 08:17:33 +01:00
Craig Topper
6ee87759e3
[RISCV][IR] Implement verifier check for llvm.experimental.vp.splice immediate. (#147458)
This applies the same check as llvm.vector.splice which checks that the immediate is in the range [-VL, VL-1] where VL is the minimum vector length. If vscale_range is available, the lower bound is used to increase the known minimum vector length for this check. This ensures the immediate is in range for any possible value of vscale that satisfies the vscale_range.
2025-07-08 13:54:11 -07:00
Antonio Frighetto
f1cc0b607b [IR] Introduce dead_on_return attribute
Add `dead_on_return` attribute, which is meant to be taken advantage
by the frontend, and states that the memory pointed to by the argument
is dead upon function return. As with `byval`, it is supposed to be
used for passing aggregates by value. The difference lies in the ABI:
`byval` implies that the pointer is explicitly passed as argument to
the callee (during codegen the copy is emitted as per byval contract),
whereas a `dead_on_return`-marked argument implies that the copy
already exists in the IR, is located at a specific stack offset within
the caller, and this memory will not be read further by the caller upon
callee return – or otherwise poison, if read before being written.

RFC: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
2025-07-02 09:29:36 +02:00
Mircea Trofin
d2500e639b
[pgo] add means to specify "unknown" MD_prof (#145578)
This PR is part of https://discourse.llvm.org/t/rfc-profile-information-propagation-unittesting/73595

In a slight departure from the RFC, instead of a brand-new `MD_prof_unknown` kind, this adds a first operand to `MD_prof` metadata. This makes it easy to replace with valid metadata (only one `MD_prof`), otherwise sites inserting valid `MD_prof` would also have to check to remove the `unknown` one.

The patch just introduces the notion and fixes the verifier accordingly. Existing APIs working (esp. reading) `MD_prof` will be updated subsequently.
2025-06-30 16:57:11 -07:00
Mircea Trofin
46628718c0
[IR][PGO] Verify the structure of VP metadata. (#145584) 2025-06-30 12:31:19 -07:00
Mircea Trofin
62f8281e08
[IR][PGO] Verify invalid MD_prof metadata on instructions (#145576)
This PR places the validation of `MD_prof` instruction metadata in the Verifier.
2025-06-25 13:10:43 -07:00
Florian Hahn
17c5c19902
[Verifier] Always verify all assume bundles. (#145586)
For some reason, some of the checks for specific assumbe bundle elements
exit early if the check pass, meaning we don't verify other entries.
Replace the early returns with early continues.

This also requires removing some tests that are currently rejected. They will
be added back as part of https://github.com/llvm/llvm-project/pull/128436.

PR: https://github.com/llvm/llvm-project/pull/145586
2025-06-25 09:31:02 +01:00
Florian Hahn
3187d4da24
[Verifier] Add additional tests for dereferenceable assumptions. 2025-06-24 20:45:21 +01:00
Matt Arsenault
f0d898f36b
DAG: Move get_dynamic_area_offset type check to IR verifier (#145268)
Also fix the LangRef to match the implementation. This was checking
against the alloca address space size rather than the default address
space.

The check was also more permissive than the LangRef. The error
check permitted any size less than the pointer size; follow the
stricter wording of the LangRef.
2025-06-24 11:11:52 +09:00
Durgadoss R
bfee625821
[NVPTX] Attach Range attr to setmaxnreg and fence intrinsics (#144120)
This patch attaches the range attribute to the setmaxnreg
and fence.proxy.tensormap.* intrinsics. The range checks
are now handled generically in the Verifier. So, this patch
removes the per-intrinsic error-handling for range-checks
from the Verifier.

This patch also adds more coverage tests for these cases.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-06-19 07:49:08 +05:30
Jeremy Morse
3d7aa961ac
[DebugInfo][RemoveDIs] Use autoupgrader to convert old debug-info (#143452)
By chance, two things have prevented the autoupgrade path being
exercised much so far:
 * LLParser setting the debug-info mode to "old" on seeing intrinsics,
* The test in AutoUpgrade.cpp wanting to upgrade into a "new" debug-info
block.

In practice, this appears to mean this code path hasn't seen the various
invalid inputs that can come its way. This commit does a number of
things:
* Tolerates the various illegal inputs that can be written with
debug-intrinsics, and that must be tolerated until the Verifier runs,
 * Printing illegal/null DbgRecord fields must succeed,
* Verifier errors need to localise the function/block where the error
is,
 * Tests that now see debug records will print debug-record errors,

Plus a few new tests for other intrinsic-to-debug-record failures modes
I found. There are also two edge cases:
* Some of the unit tests switch back and forth between intrinsic and
record modes at will; I've deleted coverage and some assertions to
tolerate this as intrinsic support is now Gone (TM),
* In sroa-extract-bits.ll, the order of debug records flips. This is
because the autoupgrader upgrades in the opposite order to the basic
block conversion routines... which doesn't change the record order, but
_does_ change the use list order in Metadata! This should (TM) have no
consequence to the correctness of LLVM, but will change the order of
various records and the order of DWARF record output too.

I tried to reduce this patch to a smaller collection of changes, but
they're all intertwined, sorry.
2025-06-11 13:56:30 +01:00
arysef
78af498035
[LLVM][IR] Support target extension types in vectors (#140630)
This change is to support target extension types in vectors. The change
allows sized target extension types to opt-in to being a valid vector
element.
Allowing target extension types as vector elements will allow backends
to use vector operations such as `insertelement` and `extractelement` on
their target types with minimal changes.

RFC:
https://discourse.llvm.org/t/rfc-supporting-sized-target-extension-types-in-vector/86431
2025-06-09 12:03:15 -07:00
clubby789
c7c79d2590
[IR][DSE] Support non-malloc functions in malloc+memset->calloc fold (#138299)
Add a `alloc-variant-zeroed` function attribute which can be used to
inform folding allocation+memset. This addresses
https://github.com/rust-lang/rust/issues/104847, where LLVM does not
know how to perform this transformation for non-C languages.

Co-authored-by: Jamie <jamie@osec.io>
2025-06-04 09:35:20 +02:00
Nathan Gauër
df344285e2
[IR] Relax convergence requirements on call (#135794)
Before this commit, having a convergence token on a non-convergent call
was considered to be an error.
This commit relaxes this requirement and allows convergence tokens to be
present on non-convergent calls.

When such token is present, they have no effect as the underlying call
is non-convergent.
This allows passes like DCE to strip `convergent` attribute from
functions for which all convergent operations have been stripped. When
this happens, a convergence token can still exist in the call-site,
causing the verifier to complain.

Alternatives have been considered in #134863 and #134844.
2025-05-02 08:17:58 +00:00
Nikita Popov
a8ec6e8788
[IR] Require that global value initializers are sized (#137358)
While external globals can be unsized, I don't think an unsized
initializer makes sense.

It seems like the backend currently ends up treating this as a zero-size
global. If people want that behavior, they should declare it as such.
2025-05-02 09:52:39 +02:00
Matt Arsenault
9494464b13
Verifier: Move test from Assembler to right place and avoid grep (#138113) 2025-05-01 13:44:49 +02:00
Nikita Popov
6feb4a8ef4
[IR] Don't allow values of opaque type (#137625)
Consider opaque types as non-first-class types, i.e. do not allow SSA
values to have opaque type.
2025-04-30 15:01:00 +02:00
Nikita Popov
38cb7d5e75
[IR] Don't allow label arguments (#137799)
We currently accept label arguments to inline asm calls. This support
predates both blockaddresses and callbr and is only covered by one X86
test. Remove it in favor of callbr (or at least blockaddress, though
that cannot guarantee correct codegen, just like using block labels
directly can't).

I didn't bother implementing bitcode upgrade support for this, but I can
add it if desired.
2025-04-30 09:11:36 +02:00
Shilei Tian
3bc125490a
[AMDGPU][Verifier] Check address space of alloca instruction (#135820)
This PR updates the `Verifier` to enforce that `alloca` instructions on
AMDGPU must be in AS5. This prevents hitting a misleading backend error
like "unable to select FrameIndex," which makes it look like a backend
bug when it's actually an IR-level issue.
2025-04-26 00:54:00 -04:00
Benjamin Maxwell
8c7a2ce01a
[AArch64][SME] Allow spills of ZT0 around SME ABI routines again (#136726)
In #132722 spills of ZT0 were disabled around all SME ABI routines to
avoid a case where ZT0 is spilled before ZA is enabled (resulting in a
crash).

It turns out that the ABI does not promise that routines will preserve
ZT0 (however in practice they do), so generally disabling ZT0 spills for
ABI routines is not correct.

The case where a crash was possible was "aarch64_new_zt0" functions with
ZA disabled on entry and a ZT0 spill around __arm_tpidr2_save. In this
case, ZT0 will be undefined at the call to __arm_tpidr2_save, so this
patch avoids the ZT0 spill by marking the callsite with
"aarch64_zt0_undef". This attribute only applies to callsites and marks
that at the point the call is made ZT0 is not defined, so does not need
preserving.
2025-04-25 13:33:09 +01:00
Stephen Tozer
928c33354e
[DebugInfo] Handle additional types of stores in assignment tracking (#129070)
Fixes #126417.

Currently, assignment tracking recognizes allocas, stores, and mem
intrinsics as valid instructions to tag with DIAssignID, with allocas
representing the allocation for a variable and the others representing
instructions that may assign to the variable. There are other intrinsics
that can perform these assignments however, and if we transform a store
instruction into one of these intrinsics and correctly transfer the
DIAssignID over, this results in a verifier error. The
AssignmentTrackingAnalysis pass also does not know how to handle these
intrinsics if they are untagged, as it does not know how to extract
assignment information (base address, offset, size) from them.

This patch adds _some_ support for some intrinsics that may perform
assignments: masked store/scatter, and vp store/strided store/scatter.
This patch does not add support for extracting assignment information
from these, as they may store with either non-constant size or to
non-contiguous blocks of memory; instead it adds support for recognizing
untagged stores with "unknown" assignment info, for which we assume that
the memory location of the associated variable should not be used, as we
can't determine which fragments of it should or should not be used.

In principle, it should be possible to handle the more complex cases
mentioned above, but it would require more substantial changes to
AssignmentTrackingAnalysis, and it is mostly only needed as a fallback
if the DIAssignID is not preserved on these alternative stores.
2025-04-22 17:14:25 +01:00
YLChenZ
b9e11eade7
[llvm][ir]: fix llc crashes on function definitions with label parameters (#136285)
Closes #136144.

After the patch:
```llvm
; label-crash.ll
define void @invalid_arg_type(i32 %0) {
1:
  call void @foo(label %1)
  ret void
}

declare void @foo(label)
```
```
lambda@ubuntu22:~/test$ llc -o out.s label-crash.ll 
Function argument cannot be of label type!
label %0
ptr @foo
llc: error: 'label-crash.ll': input module cannot be verified
```
2025-04-19 09:32:40 +08:00
Nikita Popov
44e32a263a [Verifier] Fix intrinsic signatures in immarg tests (NFC) 2025-04-14 17:24:50 +02:00
Shilei Tian
a45b133d40
[AMDGPU][Verifier] Mark calls to entry functions as invalid in the IR verifier (#134910) 2025-04-11 15:32:37 -04:00
Sami Tolvanen
acc6bcdc50
Support alternative sections for patchable function entries (#131230)
With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.

The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.

Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:

https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
2025-04-02 21:53:55 +00:00
Alex Bradbury
be0a3b223a
[IR] Allow llvm.experimental.memset.pattern to take any sized type as the pattern argument (#132026)
I initially thought starting with a more narrow definition and later
expanding would make more sense. But as pointed out in review for PR
#129220, this restriction is generating additional unnecessary work.

This patch alters the intrinsic to accept patterns of any type. Future
patches will update LoopIdiomRecognize and PreISelIntrinsicLowering to
take advantage of this. The verifier will complain if an unsized type is
used. I've additionally taken the opportunity to remove a comment from
the LangRef about some bit widths potentially not being supported by the
target. I don't think this is any more true than it is for arbitrary
width loads/stores which don't carry a similar warning that I can see.

A verifier check ensures that only sized types are used for the pattern.
2025-03-19 14:17:42 +00:00
Heejin Ahn
d2d469eb79
[WebAssembly] Make llvm.wasm.throw invokable (#128104)
`llvm.wasm.throw` intrinsic can throw but it was not invokable. Not sure
what the rationale was when it was first written that way, but I think
at least in Emscripten's C++ exception support with the Wasm port of
libunwind, `__builtin_wasm_throw`, which is lowered down to
`llvm.wasm.rethrow`, is used only within `_Unwind_RaiseException`, which
is an one-liner and thus does not need an `invoke`:
720e97f76d/system/lib/libunwind/src/Unwind-wasm.c (L69)
(`_Unwind_RaiseException` is called by `__cxa_throw`, which is generated
by the `throw` C++ keyword)

But this does not address other direct uses of the builtin in C++, whose
use I'm not sure about but is not prohibited. Also other language
frontends may need to use the builtin in different functions, which has
`try`-`catch`es or destructors.

This makes `llvm.wasm.throw` invokable in the backend. To do that, this
adds a custom lowering routine to `SelectionDAGBuilder::visitInvoke`,
like we did for `llvm.wasm.rethrow`.

This does not generate `invoke`s for `__builtin_wasm_throw` yet, which
will be done by a follow-up PR.

Addresses #124710.
2025-02-25 09:53:01 -08:00
Robert Imschweiler
cafad2b75a
[AMDGPU] Add verification for amdgcn.init.exec.from.input (#128172)
Check that the input register is an inreg argument to the parent
function. (See the comment in `IntrinsicsAMDGPU.td`.)

This LLVM defect was identified via the AMD Fuzzing project.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-02-23 10:42:31 -05:00
Robert Imschweiler
fb25216209
[AMDGPU] Enhance verification of amdgcn.cs.chain intrinsic (#128162)
Make sure that this intrinsic is being followed by unreachable.

This LLVM defect was identified via the AMD Fuzzing project.

Thanks to @rovka for helping me solve this issue!
2025-02-22 00:24:18 +07:00
David Blaikie
42043c423f Reapply "Verifier: Add check for DICompositeType elements being null"
This remove some erroneous debug info from tests that should address the
test failures that showed up when the this was previously committed.

This reverts commit 6716ce8b641f0e42e2343e1694ee578b027be0c4.
2025-01-23 22:29:30 +00:00
David Blaikie
6716ce8b64 Revert "Verifier: Add check for DICompositeType elements being null"
Asserts on various tests/buildbots, at least one example is
DebugInfo/X86/set.ll

This reverts commit 2dc5682dacab2dbb52a771746fdede0e938fc6e9.
2025-01-17 19:36:35 +00:00
David Blaikie
2dc5682dac Verifier: Add check for DICompositeType elements being null
Came up recently with some nodebug case on codeview, that caused a null
entry in elements and crashed LLVM.

Original clang fix to avoid generating IR like this: 504dd577675e8c85cdc8525990a7c8b517a38a89
2025-01-17 19:18:25 +00:00
Sander de Smalen
2ce168baed
[AArch64] SME implementation for agnostic-ZA functions (#120150)
This implements the lowering of calls from agnostic-ZA functions to
non-agnostic-ZA functions, using the ABI routines
`__arm_sme_state_size`, `__arm_sme_save` and `__arm_sme_restore`.

This implements the proposal described in the following PRs:
* https://github.com/ARM-software/acle/pull/336
* https://github.com/ARM-software/abi-aa/pull/264
2024-12-23 19:10:21 +00:00
Matt Arsenault
a05a1d6eef
AMDGPU: Add basic verification for mfma scale intrinsics (#117048)
Verify the format is valid and the type is one of the expected
i32 vectors. Verify the used vector types at least cover the
requirements of the corresponding format operand.
2024-11-22 12:13:21 -08:00
Teresa Johnson
9513f2fdf2
[MemProf] Print full context hash when reporting hinted bytes (#114465)
Improve the information printed when -memprof-report-hinted-sizes is
enabled. Now print the full context hash computed from the original
profile, similar to what we do when reporting matching statistics. This
will make it easier to correlate with the profile.

Note that the full context hash must be computed at profile match time
and saved in the metadata and summary, because we may trim the context
during matching when it isn't needed for distinguishing hotness.
Similarly, due to the context trimming, we may have more than one full
context id and total size pair per MIB in the metadata and summary,
which now get a list of these pairs.

Remove the old aggregate size from the metadata and summary support.
One other change from the prior support is that we no longer write the
size information into the combined index for the LTO backends, which
don't use this information, which reduces unnecessary bloat in
distributed index files.
2024-11-15 08:24:44 -08:00
Alex Bradbury
298127dcbe Reapply [IR] Initial introduction of llvm.experimental.memset_pattern (#97583)
Relands 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3 after regenerating the
test case.

Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline

As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
2024-11-15 15:21:39 +00:00
Alex Bradbury
0fb8fac5d6 Revert "[IR] Initial introduction of llvm.experimental.memset_pattern (#97583)"
This reverts commit 7ff3a9acd84654c9ec2939f45ba27f162ae7fbc3.

Recent scheduling changes means tests need to be re-generated. Reverting
to green while I do that.
2024-11-15 14:48:32 +00:00
Alex Bradbury
7ff3a9acd8
[IR] Initial introduction of llvm.experimental.memset_pattern (#97583)
Supersedes the draft PR #94992, taking a different approach following
feedback:
* Lower in PreISelIntrinsicLowering
* Don't require that the number of bytes to set is a compile-time
constant
* Define llvm.memset_pattern rather than llvm.memset_pattern.inline

As discussed in the [RFC
thread](https://discourse.llvm.org/t/rfc-introducing-an-llvm-memset-pattern-inline-intrinsic/79496),
the intent is that the intrinsic will be lowered to loops, a sequence of
stores, or libcalls depending on the expected cost and availability of
libcalls on the target. Right now, there's just a single lowering path
that aims to handle all cases. My intent would be to follow up with
additional PRs that add additional optimisations when possible (e.g.
when libcalls are available, when arguments are known to be constant
etc).
2024-11-15 14:07:46 +00:00
Lee Wei
1469d82e1c
Remove br i1 undef from some regression tests [NFC] (#115130)
As defined in LangRef, branching on `undef` is undefined behavior.
This PR aims to remove undefined behavior from tests. As UB tests break
Alive2 and may be the root cause of breaking future optimizations.

Here's an Alive2 proof for one of the examples:
https://alive2.llvm.org/ce/z/TncxhP
2024-11-07 08:11:15 +00:00
Jay Foad
4831e0aa88
[IR] Disallow recursive types (#114799)
StructType::setBody is the only mechanism that can potentially create
recursion in the type system. Add a runtime check that it is not
actually used to create recursion.

If the check fails, report an error from LLParser, BitcodeReader and
IRLinker. In all other cases assert that the check succeeds.

In future StructType::setBody will be removed in favor of specifying the
body when the type is created, so any performance hit from this runtime
check will be temporary.
2024-11-05 09:41:10 +00:00