This commit follows up 0191307b by auto-upgrading !"align" metadata on
return values to stackalign. This allows us to remove all logic to check
the metadata from NVPTXUtilities.
After 3c8c2914e067e132af951f70d2b3577fe049e19a the lowering of 64-bit
funnel shifts has been improved to the point where this intrinsic is no
longer needed.
This commit adds a function for creating fma intrinsic calls to the IRBuilder. If the "IsFPConstrained" flag of the builder is set,
the function creates a call to "experimental.constrained.fma" instead of "llvm.fma" .
To support the creation of the constrained intrinsic, a function "CreateConstrainedFPIntrinsic" is introduced.
Replace some more nvvm.annotations with function attributes,
auto-upgrading the annotations as needed. These new attributes will be
more idiomatic and compile-time efficient than the annotations.
- !"maxntid[xyz]" -> "nvvm.maxntid"
- !"reqntid[xyz]" -> "nvvm.reqntid"
- !"cluster_dim_[xyz]" -> "nvvm.cluster_dim"
Replace some more nvvm.annotations with function attributes,
auto-upgrading the annotations as needed. These new attributes will be
more idiomatic and compile-time efficient than the annotations.
- !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank"
- !"minctasm" -> "nvvm.minctasm"
- !"maxnreg" -> "nvvm.maxnreg"
Add a new AutoUpgrade function to convert some legacy nvvm.annotations
metadata to function level attributes. These attributes are quicker to
look-up so improve compile time and are more idiomatic than using
metadata which should not include required information that changes the
meaning of the program.
Currently supported annotations are:
- !"kernel" -> ptx_kernel calling convention
- !"align" -> alignstack parameter attributes (return not yet supported)
This started out as trying to combine bf16 fpround to BFCVT2
instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in
favour of generating fpround instructions directly. This simplifies the
patterns and can lead to other optimizations. The BFCVT2 instruction is
adjusted to makes sure the types are valid, and a bfcvt2 is now
generated in more place. The old intrinsics are auto-upgraded to fptrunc
instructions too.
Clang [defaults to aligning `__int128_t` to 16 bytes], while LLVM
`datalayout` strings [default to aligning `i128` to 8 bytes]. Wasm is
currently using the defaults for both, so it's inconsistent. Fix this by
adding `-i128:128` to Wasm's `datalayout` string so that it aligns
`i128` to 16 bytes too.
This is similar to
[llvm/llvm-project@dbad963](dbad963a69)
for SPARC.
This fixesrust-lang/rust#133991; see that issue for further discussion.
[defaults to aligning `__int128_t` to 16 bytes]:
f8b4182f07/clang/lib/Basic/TargetInfo.cpp (L77)
[default to aligning `i128` to 8 bytes]:
https://llvm.org/docs/LangRef.html#langref-datalayout
As discussed in #112738, it may be better to have an intrinsic to represent vector element extracts based on mask bits. This intrinsic is for the case of extracting the last active element, if any, or a default value if the mask is all-false.
The target-agnostic SelectionDAG lowering is similar to the IR in #106560.
Remove these intrinsics which can be better represented by load
instructions with `!invariant.load` metadata:
- llvm.nvvm.ldg.global.i
- llvm.nvvm.ldg.global.f
- llvm.nvvm.ldg.global.p
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.
The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.
Fixes#112633
MSVC has a set of qualifiers to allow using 32-bit signed/unsigned
pointers when building 64-bit targets. This is useful for WoW code
(i.e., the part of Windows that handles running 32-bit application on a
64-bit OS). Currently this is supported on x64 using the 270, 271 and
272 address spaces, but does not work for AArch64 at all.
This change adds the same 270, 271 and 272 address spaces to AArch64 and
adjusts the data layout string accordingly. Clang will generate the
correct address space casts, but these will currently be ignored until
the AArch64 backend is updated to handle them.
Partially fixes#62536
This is a resurrected version of <https://reviews.llvm.org/D158857>
(originally created by @a_vorobev) - I've cleaned it up a little, fixed
the rest of the tests and added to auto-upgrade for the data layout.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Affected intrinsics:
llvm.aarch64.sve.fcvt.bf16f32
llvm.aarch64.sve.fcvtnt.bf16f32
The named intrinsics took a predicate based on the smallest element type
when it should be based on the largest. The intrinsics have been replace
by v2 equivalents and affected code ported to use them.
Patch includes changes to getSVEPredicateBitCast() that ensure the
generated code for the auto-upgraded old intrinsics is unchanged.
It turns out that we cannot rely on the presence of `-i64:64` as a
position reference when adding the `-i128:128` datalayout string due to
some custom datalayout strings lacking it (e.g ones used by bugpoint,
among other things).
Do not add the `-i128:128` string in that case.
This fixes the regression introduced in
https://github.com/llvm/llvm-project/pull/106951.
Remove the following intrinsics which can be trivially replaced with an
`addrspacecast`
* llvm.nvvm.ptr.gen.to.global
* llvm.nvvm.ptr.gen.to.shared
* llvm.nvvm.ptr.gen.to.constant
* llvm.nvvm.ptr.gen.to.local
* llvm.nvvm.ptr.global.to.gen
* llvm.nvvm.ptr.shared.to.gen
* llvm.nvvm.ptr.constant.to.gen
* llvm.nvvm.ptr.local.to.gen
Also, cleanup the NVPTX lowering of `addrspacecast` making it more
concise.
This was reverted to avoid conflicts while reverting #107655. Re-landing
unchanged.
This change deprecates the following intrinsics which can be trivially
converted to llvm funnel-shift intrinsics:
- @llvm.nvvm.rotate.b32
- @llvm.nvvm.rotate.right.b64
- @llvm.nvvm.rotate.b64
This fixes a bug in the previous version (#107655) which flipped the
order of the operands to the PTX funnel shift instruction. In LLVM IR
the high bits are the first arg and the low bits are the second arg,
while in PTX this is reversed.
Remove the following intrinsics which can be trivially replaced with an
`addrspacecast`
* llvm.nvvm.ptr.gen.to.global
* llvm.nvvm.ptr.gen.to.shared
* llvm.nvvm.ptr.gen.to.constant
* llvm.nvvm.ptr.gen.to.local
* llvm.nvvm.ptr.global.to.gen
* llvm.nvvm.ptr.shared.to.gen
* llvm.nvvm.ptr.constant.to.gen
* llvm.nvvm.ptr.local.to.gen
Also, cleanup the NVPTX lowering of `addrspacecast` making it more
concise.
This change deprecates the following intrinsics which can be trivially
converted to llvm funnel-shift intrinsics:
- @llvm.nvvm.rotate.b32
- @llvm.nvvm.rotate.right.b64
- @llvm.nvvm.rotate.b64
Remove the following intrinsics which correspond directly to a bitcast:
- llvm.nvvm.bitcast.f2i
- llvm.nvvm.bitcast.i2f
- llvm.nvvm.bitcast.d2ll
- llvm.nvvm.bitcast.ll2d
This patch is moving out stepvector intrinsic from the experimental
namespace.
This intrinsic exists in LLVM for several years now, and is widely used.
8b4306ce050bd5 introduced validity checks for every module flag access,
because the auto-upgrader uses named metadata before verifying the
module.
This causes overhead for all other accesses, and the check is, in fact,
only need at that single place. Change the upgrader to be careful when
accessing module flags before the module is verified and remove the
checks on all other occasions.
There are two tangential optimizations included: first, when querying a
specific flag, don't enumerate all other flags into a vector as well.
Second, don't use a Twine for getNamedMetadata(), which has
materialization overhead -- all call sites use simple strings that can
be implicitly converted to a StringRef.
Outlining these patterns has a significant impact on the overall stack
frame size of llvm::UpgradeIntrinsicCall. This is helpful for scenarios
where compilation threads are stack-constrained. The overall impact is
low when using clang as the host compiler, but very pronounced when
using MSVC 2022 with release builds.
Clang: 1,624 -> 824 bytes
MSVC: 23,560 -> 6,120 bytes
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.
This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
Although i32 type is illegal in the backend, LA64 has pretty good
support for i32 types by using W instructions.
By adding n32 to the DataLayout string, middle end optimizations will
consider i32 to be a native type. One known effect of this is enabling
LoopStrengthReduce on loops with i32 induction variables. This can be
beneficial because C/C++ code often has loops with i32 induction
variables due to the use of `int` or `unsigned int`.
If this patch exposes performance issues, those are better addressed by
tuning LSR or other passes.
This addresses an issue where the explicit alignment of 2 (for C++ ABI
reasons) was being propagated to the back end and causing under-aligned
functions (in special sections).
This is an alternate approach suggested by @efriedma-quic in PR #90415.
Fixes#90358
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator== outnumbers StringRef::equals by a factor of 22
under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice
from the experimental namespace.
All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.