545 Commits

Author SHA1 Message Date
Alexander Richardson
07e2ba445d
[AMDGPU] Set AS8 address width to 48 bits
Of the 128-bits of buffer descriptor only 48 bits are address bits, so
following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54,
the logic conclusion is to set the index width to 48 bits instead of
the current value of 128.

Most of the test changes are mechanical datalayout updates, but there
is one actual change: the ptrmask test now uses .i48 instead of .i128
and I had to update SelectionDAGBuilder to correctly extend the mask.

Reviewed By: krzysz00

Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-19 17:26:05 -07:00
Shilei Tian
7314d38b72
[NFC][AutoUpgrade] Use ConstantPointerNull::get instead of Constant::getNullValue for known pointer types (#139984)
This is a preparation change for upcoming PRs that will update the
semantics of
`ConstantPointerNull`, making it to represent an actual `nullptr` rather
than a
zero-valued pointer.
2025-05-15 10:03:20 -04:00
Thurston Dang
0dd2c9f7a3 Fix-forward build error from #132489
Replace deprecated use of getDeclaration that was added in #132489

llvm/lib/IR/AutoUpgrade.cpp:1480:26: error: 'getDeclaration' is deprecated: Use getOrInsertDeclaration instead [-Werror,-Wdeprecated-declarations]
 1480 |       NewFn = Intrinsic::getDeclaration(
      |                          ^~~~~~~~~~~~~~
      |                          getOrInsertDeclaration
2025-05-14 21:44:13 +00:00
Jessica Clarke
864f0ff4ef
[clang][IR] Overload @llvm.thread.pointer to support non-AS0 targets (#132489)
Thread-local globals live, by default, in the default globals address
space, which may not be 0, so we need to overload @llvm.thread.pointer
to support other address spaces, and use the default globals address
space in Clang.
2025-05-14 21:51:56 +01:00
Alex MacLean
eb5280938b
[NVPTX] Fixup AutoUpgrade of llvm.nvvm.atomic.load.{inc,dec}.32 (#138907)
The previous implementation failed to account for the fact that these
intrinsics have an overloaded pointer type. This version handles the
pointer type and adds tests for llvm.nvvm.atomic.load.add.{f32,f64}.
2025-05-08 07:17:32 -07:00
Craig Topper
123758b1f4
[IRBuilder] Add versions of createInsertVector/createExtractVector that take a uint64_t index. (#138324)
Most callers want a constant index. Instead of making every caller
create a ConstantInt, we can do it in IRBuilder. This is similar to
createInsertElement/createExtractElement.
2025-05-02 16:10:18 -07:00
Alex MacLean
a643ac4395
[NVPTX] Remove 'param' variants of nvvm.ptr.* intrinics (#137065)
After #136008 these intrinsics are no longer inserted by the
compiler and can be upgraded to addrspacecasts.
2025-04-25 10:44:12 -07:00
modiking
3d04da5bc0
[NVPTX] Add support for Shared Cluster Memory address space [2/2] (#136768)
Adds support for new Shared Cluster Memory Address Space
(SHARED_CLUSTER, addrspace 7). See
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#distributed-shared-memory
for details.

Follow-up to https://github.com/llvm/llvm-project/pull/135444

1. Update existing codegen/intrinsics in LLVM and MLIR that now use this
address space
2. Auto-upgrade previous intrinsics that used SMEM (addrspace 3) but
were really taking in a shared cluster pointer to the new address space
2025-04-22 16:50:45 -07:00
Alex MacLean
09069088dd
[NVPTX] Add auto-upgrade rules for fabs.{f,d,ftz.f} (#136150)
These auto-upgrade rules are required after these intrinsics were
removed in #135644
2025-04-17 11:44:58 -07:00
Alex MacLean
7daa5010ab
[NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (#135644)
This change unifies the NVVM intrinsics for floating point absolute
value into two new overloaded intrinsics "llvm.nvvm.fabs.*" and
"llvm.nvvm.fabs.ftz.*". Documentation has been added specifying the
semantics of these intrinsics to clarify how they differ from
"llvm.fabs.*". In addition, support for these new intrinsics is
extended to cover the f16 variants.
2025-04-17 07:37:24 -07:00
Nikita Popov
c3c0b27f2d
[Intrinsics] Add support for range attributes (#135642)
Add support for specifying range attributes in Intrinsics.td. Use this
to specify the ucmp/scmp range [-1,2).

This case is trickier than existing intrinsic attributes, because we
need to create the attribute with the correct bitwidth. As such, the
attribute construction now needs to be aware of the function type.

We also need to be careful to no longer assign attributes on intrinsics
with invalid signatures, as we'd make invalid assumptions about the
number of arguments etc otherwise.

Fixes https://github.com/llvm/llvm-project/issues/130179.
2025-04-17 11:11:00 +02:00
Alex MacLean
a6853cd9af
[NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (#134111)
These intrinsics can be upgrade to an atomicrmw instruction.
2025-04-08 13:44:11 -07:00
Rahul Joshi
74b7abf154
[IRBuilder] Add new overload for CreateIntrinsic (#131942)
Add a new `CreateIntrinsic` overload with no `Types`, useful for
creating calls to non-overloaded intrinsics that don't need additional
mangling.
2025-03-31 08:10:34 -07:00
Alex MacLean
10fd5b925f
[NVPTX] Auto-Upgrade !"align" metadata on return values to stackalign (#131726)
This commit follows up 0191307b by auto-upgrading !"align" metadata on
return values to stackalign. This allows us to remove all logic to check
the metadata from NVPTXUtilities.
2025-03-24 12:00:44 -07:00
Alex MacLean
30ff508614
[NVPTX] Auto-Upgrade llvm.nvvm.swap.lo.hi.b64 to llvm.fshl (#132098)
After 3c8c2914e067e132af951f70d2b3577fe049e19a the lowering of 64-bit
funnel shifts has been improved to the point where this intrinsic is no
longer needed.
2025-03-20 19:21:24 -07:00
AdityaK
b17af9d8ee
[NFC][llvm/IR] comparison of unsigned expression in ‘>= 0’ is always true (#130843) 2025-03-14 11:59:29 -07:00
Frederik Harwath
c979ce7e36
Add IRBuilder::CreateFMA (#131112)
This commit adds a function for creating fma intrinsic calls to the IRBuilder.  If the "IsFPConstrained" flag of the builder is set,
the function creates a call to "experimental.constrained.fma" instead of "llvm.fma" .
To support the creation of the constrained intrinsic, a function "CreateConstrainedFPIntrinsic" is introduced.
2025-03-14 13:20:58 +01:00
Alex MacLean
6c2e170d04
[NVPTX] Convert vector function nvvm.annotations to attributes (#127736)
Replace some more nvvm.annotations with function attributes,
auto-upgrading the annotations as needed. These new attributes will be
more idiomatic and compile-time efficient than the annotations.

- !"maxntid[xyz]" -> "nvvm.maxntid"
- !"reqntid[xyz]" -> "nvvm.reqntid"
- !"cluster_dim_[xyz]" -> "nvvm.cluster_dim"
2025-02-26 08:45:27 -08:00
Alex MacLean
a282b6c486
[NVPTX] Convert scalar function nvvm.annotations to attributes (#125908)
Replace some more nvvm.annotations with function attributes,
auto-upgrading the annotations as needed. These new attributes will be
more idiomatic and compile-time efficient than the annotations.

- !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank"
- !"minctasm" -> "nvvm.minctasm"
- !"maxnreg" -> "nvvm.maxnreg"
2025-02-12 07:33:22 -08:00
Alex MacLean
de7438e472
[NVPTX] Auto-Upgrade some nvvm.annotations to attributes (#119261)
Add a new AutoUpgrade function to convert some legacy nvvm.annotations
metadata to function level attributes. These attributes are quicker to
look-up so improve compile time and are more idiomatic than using
metadata which should not include required information that changes the
meaning of the program.

Currently supported annotations are:

- !"kernel" -> ptx_kernel calling convention
- !"align" -> alignstack parameter attributes (return not yet supported)
2025-01-29 16:27:27 -08:00
David Green
547bfda56b
[AArch64] Improve bcvtn2 and remove aarch64_neon_bfcvt intrinsics (#120363)
This started out as trying to combine bf16 fpround to BFCVT2
instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in
favour of generating fpround instructions directly. This simplifies the
patterns and can lead to other optimizations. The BFCVT2 instruction is
adjusted to makes sure the types are valid, and a bfcvt2 is now
generated in more place. The old intrinsics are auto-upgraded to fptrunc
instructions too.
2025-01-21 09:16:04 +00:00
Nikita Popov
0b1ae8963e [AutoUpgrade] Avoid unnecessary pointer bitcasts (NFCI)
Not needed with opaque pointers.
2025-01-20 09:55:35 +01:00
Dan Gohman
c5ab70c508
[WebAssembly] Add -i128:128 to the datalayout string. (#119204)
Clang [defaults to aligning `__int128_t` to 16 bytes], while LLVM
`datalayout` strings [default to aligning `i128` to 8 bytes]. Wasm is
currently using the defaults for both, so it's inconsistent. Fix this by
adding `-i128:128` to Wasm's `datalayout` string so that it aligns
`i128` to 16 bytes too.

This is similar to
[llvm/llvm-project@dbad963](dbad963a69)
for SPARC.

This fixes rust-lang/rust#133991; see that issue for further discussion.

[defaults to aligning `__int128_t` to 16 bytes]:
f8b4182f07/clang/lib/Basic/TargetInfo.cpp (L77)
[default to aligning `i128` to 8 bytes]:
https://llvm.org/docs/LangRef.html#langref-datalayout
2024-12-10 09:21:58 -08:00
Lei Huang
a13ec9cd54
[PowerPC] Update data layout aligment of i128 to 16 (#118004)
Fix 64-bit PowerPC part of
https://github.com/llvm/llvm-project/issues/102783.
2024-12-09 18:02:24 -05:00
Graham Hunter
ed5aaddd7b
[IR] Vector extract last active element intrinsic (#113587)
As discussed in #112738, it may be better to have an intrinsic to represent vector element extracts based on mask bits. This intrinsic is for the case of extracting the last active element, if any, or a default value if the mask is all-false.

The target-agnostic SelectionDAG lowering is similar to the IR in #106560.
2024-11-14 17:48:43 +00:00
yingopq
86e4beb702
[MIPS] LLVM data layout give i128 an alignment of 16 for mips64 (#112084)
Fix parts of #102783.
2024-11-06 16:14:30 +01:00
Alex MacLean
fb33af08e4
[NVPTX] Remove nvvm.ldg.global.* intrinsics (#112834)
Remove these intrinsics which can be better represented by load
instructions with `!invariant.load` metadata:

- llvm.nvvm.ldg.global.i
- llvm.nvvm.ldg.global.f
- llvm.nvvm.ldg.global.p
2024-10-27 16:14:13 -07:00
goldsteinn
c85611e858
[SimplifyLibCall][Attribute] Fix bug where we may keep range attr with incompatible type (#112649)
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.

The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.

Fixes #112633
2024-10-17 10:32:55 -05:00
Jay Foad
85c17e4092
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)
Convert many instances of:
  Fn = Intrinsic::getOrInsertDeclaration(...);
  CreateCall(Fn, ...)
to the equivalent CreateIntrinsic call.
2024-10-17 16:20:43 +01:00
Jay Foad
d9c95efb6c
[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112546)
Convert almost every instance of:
  CreateCall(Intrinsic::getOrInsertDeclaration(...), ...)
to the equivalent CreateIntrinsic call.
2024-10-16 15:43:30 +01:00
Daniel Paoliello
c9f27275c1
[clang][aarch64] Add support for the MSVC qualifiers __ptr32, __ptr64, __sptr, __uptr for AArch64 (#111879)
MSVC has a set of qualifiers to allow using 32-bit signed/unsigned
pointers when building 64-bit targets. This is useful for WoW code
(i.e., the part of Windows that handles running 32-bit application on a
64-bit OS). Currently this is supported on x64 using the 270, 271 and
272 address spaces, but does not work for AArch64 at all.

This change adds the same 270, 271 and 272 address spaces to AArch64 and
adjusts the data layout string accordingly. Clang will generate the
correct address space casts, but these will currently be ignored until
the AArch64 backend is updated to handle them.

Partially fixes #62536

This is a resurrected version of <https://reviews.llvm.org/D158857>
(originally created by @a_vorobev) - I've cleaned it up a little, fixed
the rest of the tests and added to auto-upgrade for the data layout.
2024-10-15 10:37:36 -07:00
Rahul Joshi
fa789dffb1
[NFC] Rename Intrinsic::getDeclaration to getOrInsertDeclaration (#111752)
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
2024-10-11 05:26:03 -07:00
Matt Arsenault
c198f775cd
AMDGPU: Remove flat/global fmin/fmax intrinsics (#105642)
These have been replaced with atomicrmw
2024-10-09 09:27:28 +04:00
Matt Arsenault
9dca83f2e1
AMDGPU: Add noalias.addrspace metadata when autoupgrading atomic intrinsics (#102599)
This will be needed to continue generating the raw instruction in the flat case.
2024-10-08 00:13:28 +04:00
Paul Walker
d283705829
[AArch64][SVE] Fix definition of bfloat fcvt intrinsics. (#110281)
Affected intrinsics:
  llvm.aarch64.sve.fcvt.bf16f32
  llvm.aarch64.sve.fcvtnt.bf16f32
    
The named intrinsics took a predicate based on the smallest element type
when it should be based on the largest. The intrinsics have been replace
by v2 equivalents and affected code ported to use them.
    
Patch includes changes to getSVEPredicateBitCast() that ensure the
generated code for the auto-upgraded old intrinsics is unchanged.
2024-10-03 12:36:01 +01:00
Koakuma
076392b0aa
[SPARC] Fix regression from UpgradeDataLayoutString change (#110608)
It turns out that we cannot rely on the presence of `-i64:64` as a
position reference when adding the `-i128:128` datalayout string due to
some custom datalayout strings lacking it (e.g ones used by bugpoint,
among other things).
Do not add the `-i128:128` string in that case.

This fixes the regression introduced in
https://github.com/llvm/llvm-project/pull/106951.
2024-10-03 05:20:56 +07:00
Koakuma
dbad963a69
[SPARC] Align i128 to 16 bytes in SPARC datalayouts (#106951)
Align i128s to 16 bytes, following the example at
https://reviews.llvm.org/D86310.

clang already does this implicitly, but do it in backend code too for
the benefit of other frontends (see e.g
https://github.com/llvm/llvm-project/issues/102783 &
https://github.com/rust-lang/rust/issues/128950).
2024-09-30 08:32:33 +07:00
Alex MacLean
e7621f4199
Reland "[NVVM] Upgrade nvvm.ptr.* intrinics to addrspace cast" (#110262)
Remove the following intrinsics which can be trivially replaced with an
`addrspacecast`

  * llvm.nvvm.ptr.gen.to.global
  * llvm.nvvm.ptr.gen.to.shared
  * llvm.nvvm.ptr.gen.to.constant
  * llvm.nvvm.ptr.gen.to.local
  * llvm.nvvm.ptr.global.to.gen
  * llvm.nvvm.ptr.shared.to.gen
  * llvm.nvvm.ptr.constant.to.gen
  * llvm.nvvm.ptr.local.to.gen

Also, cleanup the NVPTX lowering of `addrspacecast` making it more
concise.

This was reverted to avoid conflicts while reverting #107655. Re-landing
unchanged.
2024-09-28 14:13:17 -07:00
Alex MacLean
a131fbf168
Reland "[NVPTX] deprecate nvvm.rotate.* intrinsics, cleanup funnel-shift handling" (#110025)
This change deprecates the following intrinsics which can be trivially
converted to llvm funnel-shift intrinsics:

- @llvm.nvvm.rotate.b32
- @llvm.nvvm.rotate.right.b64
- @llvm.nvvm.rotate.b64

This fixes a bug in the previous version (#107655) which flipped the
order of the operands to the PTX funnel shift instruction. In LLVM IR
the high bits are the first arg and the low bits are the second arg,
while in PTX this is reversed.
2024-09-27 05:23:08 -07:00
Dmitry Chernenkov
4cb61c20ef Revert "[NVPTX] deprecate nvvm.rotate.* intrinsics, cleanup funnel-shift handling (#107655)"
This reverts commit 9ac00b85e05d21be658d6aa0c91cbe05bb5dbde2.
2024-09-25 14:50:26 +00:00
Dmitry Chernenkov
9a0e281e8c Revert "[NVVM] Upgrade nvvm.ptr.* intrinics to addrspace cast (#109710)"
This reverts commit 36757613b73908f055674a8df0b51cc00aa04373.
2024-09-25 14:50:26 +00:00
Alex MacLean
36757613b7
[NVVM] Upgrade nvvm.ptr.* intrinics to addrspace cast (#109710)
Remove the following intrinsics which can be trivially replaced with an
`addrspacecast`

  * llvm.nvvm.ptr.gen.to.global
  * llvm.nvvm.ptr.gen.to.shared
  * llvm.nvvm.ptr.gen.to.constant
  * llvm.nvvm.ptr.gen.to.local
  * llvm.nvvm.ptr.global.to.gen
  * llvm.nvvm.ptr.shared.to.gen
  * llvm.nvvm.ptr.constant.to.gen
  * llvm.nvvm.ptr.local.to.gen

Also, cleanup the NVPTX lowering of `addrspacecast` making it more
concise.
2024-09-24 08:15:14 -07:00
Alex MacLean
9ac00b85e0
[NVPTX] deprecate nvvm.rotate.* intrinsics, cleanup funnel-shift handling (#107655)
This change deprecates the following intrinsics which can be trivially
converted to llvm funnel-shift intrinsics:

- @llvm.nvvm.rotate.b32
- @llvm.nvvm.rotate.right.b64
- @llvm.nvvm.rotate.b64
2024-09-23 14:58:52 -07:00
Alex MacLean
8be6b108fb
[NVPTX] Remove nvvm.bitcast.* intrinsics (#107936)
Remove the following intrinsics which correspond directly to a bitcast:

- llvm.nvvm.bitcast.f2i
- llvm.nvvm.bitcast.i2f
- llvm.nvvm.bitcast.d2ll
- llvm.nvvm.bitcast.ll2d
2024-09-23 11:24:07 -07:00
Nikita Popov
5dcea4628d [AutoUpgrade] Preserve attributes when upgrading named struct return
For example, if the argument has an alignment attribute, preserve it.
2024-09-02 12:42:52 +02:00
Maciej Gabka
95d2d1cba0
Move stepvector intrinsic out of experimental namespace (#98043)
This patch is moving out stepvector intrinsic from the experimental
namespace.

This intrinsic exists in LLVM for several years now, and is widely used.
2024-08-28 12:48:20 +01:00
Matt Arsenault
ee08d9cba5
AMDGPU: Remove global/flat atomic fadd intrinics (#97051)
These have been replaced with atomicrmw.
2024-08-22 23:27:33 +04:00
Matt Arsenault
9d364286f3
AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (#97050)
These are now fully covered by atomicrmw.
2024-08-21 14:26:42 +04:00
Matt Arsenault
70feafdb27
IR/AMDGPU: Autoupgrade amdgpu-unsafe-fp-atomics attribute (#101698)
Delete the attribute and annotate any atomicrmw instructions in the
function with new metadata.
2024-08-12 14:56:53 +04:00
Alexis Engelke
b7cd564fa3
[IR] Don't verify module flags on every access (#102153)
8b4306ce050bd5 introduced validity checks for every module flag access,
because the auto-upgrader uses named metadata before verifying the
module.

This causes overhead for all other accesses, and the check is, in fact,
only need at that single place. Change the upgrader to be careful when
accessing module flags before the module is verified and remove the
checks on all other occasions.

There are two tangential optimizations included: first, when querying a
specific flag, don't enumerate all other flags into a vector as well.
Second, don't use a Twine for getNamedMetadata(), which has
materialization overhead -- all call sites use simple strings that can
be implicitly converted to a StringRef.
2024-08-06 18:33:26 +02:00