Also starts pruning out these calls if the exception model is
forced to none.
I worked backwards from the logic in addPassesToHandleExceptions
and the pass content. There appears to be some tolerance
for mixing and matching exception modes inside of a single module.
As far as I can tell _Unwind_CallPersonality is only relevant for
wasm, so just add it there.
As usual, the arm64ec case makes things difficult and is
missing test coverage. The set of calls in list form is necessary
to use foreach for the duplication, but in every other context a
dag is more convenient. You cannot use foreach over a dag, and I
haven't found a way to flatten a dag into a list.
This removes the last manual setLibcallImpl call in generic code.
There are several libcall choices for MUL_I64 which depend on the
subtarget, but this is the base case. The manual custom ISelLowering
is still overriding the decision until we have a way to control
lowering choices, but we can still get the calling convention
set for now.
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
…210)"
This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3.
Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)"
This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de.
Revert "TableGen: Emit statically generated hash table for runtime
libcalls (#150192)"
This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05.
Reverted three changes because of a CMake error while building llvm-nm
as reported in the following PR:
https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The
buildbot failure seems to have been a cmake issue which has been
discussed in more detail in this Discourse post:
https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901
If any buildbots fail to select arbitrary intrinsics with this patch,
it's worth considering using clean builds with ccache instead of
incremental builds, as recommended here:
https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds
The original commit message for this patch:
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.
Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.
Tail calls are handled in a future patch.
The logic isn’t instrumentation-specific, and the refactoring allows users avoid a dependency on `Instrumentation` and just take one on `ProfileData` (which a fairly low-level dependency)
a96121089b9c94e08c6632f91f2dffc73c0ffa28 reverted a change
to use a binary search on the string name table because it
was too slow. This replaces it with a static string hash
table based on the known set of libcall names. Microbenchmarking
shows this is similarly fast to using DenseMap. It's possibly
slightly slower than using StringSet, though these aren't an
exact comparison. This also saves on the one time use construction
of the map, so it could be better in practice.
This search isn't simple set check, since it does find the
range of possible matches with the same name. There's also
an additional check for whether the current target supports
the name. The runtime constructed set doesn't require this,
since it only adds the symbols live for the target.
Followed algorithm from this post
http://0x80.pl/notesen/2023-04-30-lookup-in-strings.html
I'm also thinking the 2 special case global symbols should
just be added to RuntimeLibcalls. There are also other global
references emitted in the backend that aren't tracked; we probably
should just use this as a centralized database for all compiler
selected symbols.
This change allows globals to have multiple metadata attached. The
GlobalSetMetadata function only allows only one and is clobbered if
more metadata is attempted to be added. The addDebugInfo
function calls addMetadata. This is needed because some languages have
global structs containing lots of compiler-generated globals.
The profiling - related metadata information for the hoisted conditional branch should be copied from the original branch, not from the current terminator of the block it's hoisted to.
The patch adds a way to disable the fix just so we can do an ablation test, after which the flag will be removed. The same flag will be reused for other similar fixes.
(This was identified through `profcheck` (see Issue #147390), and this PR addresses most of the test failures (when running under profcheck) under `Transforms/LICM`.)
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:
1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
index-width bits of the pointer
For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.
This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.
RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/139357
The CMake flag has been on by default for a month without any issues.
This makes the feature support in LLVM unconditional (but does not
enable the feature by default).
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.
Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.
Unspeakable horrors happen around calls from whole wave functions, the
plan is to improve the handling of caller/callee-saved registers in
a future patch.
Tail calls are also handled in a future patch.
While testing Arm64EC, I observed that LLVM crashes when an empty
function name is used. My original fix in #151409 was to raise an error,
but this change now handles the empty name via
`Mangler::getNameWithPrefix` (which assigns a name to the function).
To get this working, I had to create the `Mangler` in
`TargetLoweringObjectFile` early so it would be available to Arm64EC's
lowering. There's no reason why `Mangler` is only created when
`Initialize` is called (or re-created if it exists), and so I moved
creation to the constructor and switched the raw pointer for a
`unique_ptr` to avoid the explicit `delete` in the destructor.
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
I'm assuming this was the set of targets that were relevant
for sjlj handling. Just take the raw exception setting instead,
and assume it makes sense for the target.
These are already the default calls set for these conversions, so
they should not require explicit setting. The non-default cases are
currently overridden in ARMISelLowering. Just delete this until
the list of calls and lowering decisions are separated.
This was added back in 6402ad27c01c9503a12d41d7e40646cf0d1f919f. It
appears to not be relevant for AArch64, where calls appear to never
be used for these. It also appears to not be relevant for x86, where
the default calls seem to always end up used anyway.
Hack in the default setting so it's consistently generated like
the other cases. Maintain a list of targets where this applies.
The alternative would require new infrastructure to sort the system
library initialization in some way.
I wanted the unhandled target case to be treated as a fatal
error, but it turns out there's a hack in IRSymtab using
RuntimeLibcalls, which will fail out in many tests that
do not have a triple set. Many of the failures are simply
running llvm-as with no triple, which probably should not
depend on knowing an accurate set of calls.
std::make_optional<T> is a lot like std::make_unique<T> in that it
performs perfect forwarding of arguments for T's constructor. As a
result, we don't have to repeat type names twice.
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
This PR adds a new interface to IRBuilder called CreateVectorInterleave,
which can be used to create vector.interleave intrinsics of factors 2-8.
For convenience I have also moved getInterleaveIntrinsicID and
getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where
it can be used by IRBuilder.
Currently __nv_fast_tanhf() in libdevice maps to an nvvm intrinsic that
has not been upstreamed, which is causing issues when using the NVPTX
backend from upstream. Instead of upstreaming the intrinsic, we can
instead use the existing Intrinsic::tanh with the afn flag. This change
adds NVPTX backend support for ISD::TANH, adds auto-upgrade for the old
tanh_approx intrinsic to @llvm.tanh.f32 with afn flag so that libdevice
works properly upstream, and adds a basic codegen test and a case to the
auto-upgrade test.
These were caching pointers to memory owned by LLVMContext and can
outlive the LLVMContext. The LLVMContext already caches pointer types so
we shouldn't need any caching here.
For the common case where we don't have bit width changing address
space casts, we can directly call accumulateConstantOffset() on the
original Offset. Skip the bit width reconciliation logic in that
case.
Don't modify ClassToPassName map unless ClassName is found. Instead,
just return empty StringRef if there is no matching entry. This will
prevent possible dangling references in ClassToPassName map in case of
ClassName being freed.
See https://github.com/llvm/llvm-project/pull/145059/files#r2219763671
for more context.