445 Commits

Author SHA1 Message Date
Alex MacLean
35693daa70
[NVPTX] Fix v2i8 call lowering, use generic ld/st nodes for call params (#146930) 2025-07-28 10:41:51 -07:00
Alex MacLean
70333de6cf
[NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (#145581)
This change consolidates and cleans up various NVPTXISD target-specific
nodes in order to simplify SDAG ISel. While there are some whitespace
changes in the emitted PTX it is otherwise a non-functional change.

NVPTXISD::Wrapper - This node was used to wrap external-symbol and
global-address nodes. It is redundant and has been removed. Instead we
use the non-target versions of these nodes and convert them
appropriately during ISel.

NVPTXISD::CALL - Much of the family of nodes used to represent a PTX
call instruction have been replaced by this new single node. It
corresponds to a single instruction and is therefore much simpler to
create and lower.
2025-06-25 11:42:21 -07:00
Yaxun (Sam) Liu
54ddbc6be3
[AMDGPU] fix amdgpu_max_num_work_groups in templates (#141633)
Clang does not instantiate amdgpu_max_num_work_groups attribute with one
template argument, causing
assertion codegen.

Fixes: https://github.com/llvm/llvm-project/issues/139570
2025-05-28 08:05:03 -04:00
Yaxun (Sam) Liu
c720f92009
[HIP] disable sanitizer for __hip_cuid (#141581)
Global variable `__hip_cuid_*` is for identifying purpose and does not
need sanitization, therefore disable it for sanitizers.
2025-05-27 16:58:04 -04:00
Alex MacLean
369891b674
[NVPTX] use untyped loads and stores where ever possible (#137698)
In most cases, the type information attached to load and store
instructions is meaningless and inconsistently applied. We can usually
use ".b" loads and avoid the complexity of trying to assign the correct
type. The one expectation is sign-extending load, which will continue to
use ".s" to ensure the sign extension into a larger register is done
correctly.
2025-05-10 08:26:26 -07:00
Yaxun (Sam) Liu
83c309b905
[CUDA][HIP] capture possible ODR-used var (#136645)
In a lambda function, a call of a function may
resolve to host and device functions with different
signatures. Especially, a constexpr local variable may
be passed by value by the device function and
passed by reference by the host function, which
will cause the constexpr variable captured by
the lambda function in host compilation but
not in the device compilation. The discrepancy
in the lambda captures will violate ODR and
causes UB for kernels using these lambdas.

This PR fixes the issue by identifying
discrepancy of ODR/non-ODR usages of constexpr
local variables passed to host/device functions
and conservatively capture them.

Fixes: https://github.com/llvm/llvm-project/issues/132068
2025-04-23 12:50:28 -04:00
modiking
3d04da5bc0
[NVPTX] Add support for Shared Cluster Memory address space [2/2] (#136768)
Adds support for new Shared Cluster Memory Address Space
(SHARED_CLUSTER, addrspace 7). See
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#distributed-shared-memory
for details.

Follow-up to https://github.com/llvm/llvm-project/pull/135444

1. Update existing codegen/intrinsics in LLVM and MLIR that now use this
address space
2. Auto-upgrade previous intrinsics that used SMEM (addrspace 3) but
were really taking in a shared cluster pointer to the new address space
2025-04-22 16:50:45 -07:00
Austin Schuh
32fd97c61d
cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (#134758)
Signed-off-by: Austin Schuh <austin.linux@gmail.com>
2025-04-22 12:36:39 -07:00
Arvind Sudarsanam
6cfec29cb9
[Offload][SYCL] Refactor OffloadKind implementation (#135809)
Following are the changes:

1. Make OffloadKind enum values to be powers of two so we can use them
like a bitfield
2. Include OFK_SYCL enum value
3. Modify ActiveOffloadKinds support in clang-linker-wrapper to use
bitfields instead of a vector.

Thanks

---------

Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
2025-04-16 14:13:30 +00:00
Joseph Huber
772173f548
[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870)
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.

This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).

So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
2025-03-28 07:35:16 -05:00
Shilei Tian
f1ac2afe21
Reapply "[AMDGPU] Use COV6 by default (#118515)" (#130963)
This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.
2025-03-21 15:26:45 -04:00
Joseph Huber
0a41fb71f1
[Clang] Treat ext_vector_type as a regular type attribute (#130177)
Summary:
This attribute is mostly borrowed from OpenCL, but is useful in general
for accessing the LLVM vector types. Previously the only way to use it
was through typedefs. This patch changes that to allow use as a regular
type attribute, similar to address spaces.
2025-03-07 10:09:06 -06:00
Yaxun (Sam) Liu
240f2269ff
Add clang atomic control options and attribute (#114841)
Add option and statement attribute for controlling emitting of
target-specific
metadata to atomicrmw instructions in IR.

The RFC for this attribute and option is

https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641,
Originally a pragma was proposed, then it was changed to clang
attribute.

This attribute allows users to specify one, two, or all three options
and must be applied
to a compound statement. The attribute can also be nested, with inner
attributes
overriding the options specified by outer attributes or the target's
default
options. These options will then determine the target-specific metadata
added to atomic
instructions in the IR.

In addition to the attribute, three new compiler options are introduced:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
 `-f[no-]atomic-ignore-denormal-mode`.
These compiler options allow users to override the default options
through the
Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.

In terms of implementation, the atomic attribute is represented in the
AST by the
existing AttributedStmt, with minimal changes to AST and Sema.

During code generation in Clang, the CodeGenModule maintains the current
atomic options,
which are used to emit the relevant metadata for atomic instructions.
RAII is used
to manage the saving and restoring of atomic options when entering
and exiting nested AttributedStmt.
2025-02-27 10:41:04 -05:00
Alex MacLean
6c2e170d04
[NVPTX] Convert vector function nvvm.annotations to attributes (#127736)
Replace some more nvvm.annotations with function attributes,
auto-upgrading the annotations as needed. These new attributes will be
more idiomatic and compile-time efficient than the annotations.

- !"maxntid[xyz]" -> "nvvm.maxntid"
- !"reqntid[xyz]" -> "nvvm.reqntid"
- !"cluster_dim_[xyz]" -> "nvvm.cluster_dim"
2025-02-26 08:45:27 -08:00
Anshil Gandhi
95000fdb9e
[CUDA] Increment VTable index for device thunks (#124989)
Currently, the clang frontend incorrectly emits the callee instead of
the thunk for the callee in the VTable. This is the case because the
thunk index is not incremented when their callees cannot be emitted.
This patch fixes the bug.
2025-02-19 23:47:22 -05:00
Anshil Gandhi
bf9b72e134
[NFC][Clang] Precommit test for VTable codegen (#124983)
Associated PR: https://github.com/llvm/llvm-project/pull/124989
2025-02-19 21:14:02 -05:00
Fabian Ritter
029c8e783d
[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
2025-02-19 10:11:48 +01:00
Srinivasa Ravi
bd860f9864
[NVPTX] Add intrinsics for redux.sync f32 instructions (#126664)
Adds NVVM intrinsics, NVPTX codegen and Clang builtins for `redux.sync`
f32 instructions introduced in ptx8.6 for sm_100a.
Tests added in `CodeGen/NVPTX/redux-sync.ll` and
`CodeGenCUDA/redux-builtins.cu` and verified through ptxas 12.8.0.

PTX Spec Reference:

https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-redux-sync
2025-02-14 11:11:44 +05:30
Alex MacLean
a282b6c486
[NVPTX] Convert scalar function nvvm.annotations to attributes (#125908)
Replace some more nvvm.annotations with function attributes,
auto-upgrading the annotations as needed. These new attributes will be
more idiomatic and compile-time efficient than the annotations.

- !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank"
- !"minctasm" -> "nvvm.minctasm"
- !"maxnreg" -> "nvvm.maxnreg"
2025-02-12 07:33:22 -08:00
Joseph Huber
f1e917d07b
[Offload] Unify offloading entries into a single section (#125731)
Summary:
This patch unifies the existing offloading entires into a single section
called `llvm_offload_entires`. This lets us use a more unified
offloading infrastructure so that all targets share the same handling.
The effect is that people in the runtimes now need to check if the kind
is what they expect, but the expectation is that you can combine
multiple potential providers into a compile job. Doesn't fully work
yet because of other runtime issues, but some day. Mostly this helps the
future of liboffload where we want to handle different languages than
OpenMP.
2025-02-06 08:24:01 -06:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Joseph Huber
13dcc95dcd
[Offload] Rework offloading entry type to be more generic (#124018)
Summary:
The previous offloading entry type did not fit the current use-cases
very well. This widens it and adds a version to prevent further
annoyances. It also includes the kind to better sort who's using it.

The first 64-bytes are reserved as zero so the OpenMP runtime can detect
the old format for binary compatibilitry.
2025-01-28 07:26:13 -06:00
Joseph Huber
70a16b90ff
[HIP] Support managed variables using the new driver (#123437)
Summary:
Previously, managed variables didn't work in rdc mode using the new
driver because we just didn't register them. This was previously ignored
because we didn't have enough space in the current struct format. This
patch amends that by just emitting a struct pair for the two variables
and using the single pointer.

In the future, a more extensible entry format would be nice, but that
can be done later.
2025-01-22 09:13:14 -06:00
Alex MacLean
4583f6d344
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.

This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
2025-01-07 18:24:50 -08:00
Alex Voicu
66acb26946
[clang][CodeGen][SPIRV] Translate amdgpu_flat_work_group_size into max_work_group_size. (#116820)
HIPAMD relies on the `amdgpu_flat_work_group_size` attribute to
implement key functionality such as the `__launch_bounds__` `__global__`
function annotation. This attribute is not available / directly
translatable to SPIR-V, hence as it is AMDGCN flavoured SPIR-V suffers
from information loss.

This patch addresses that limitation by converting the unsupported
attribute into the `max_work_group_size` attribute which maps to
[`MaxWorkgroupSizeINTEL`](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_kernel_attributes.asciidoc),
which is available in / handled by SPIR-V. When reverse translating from
SPIR-V to AMDGCN LLVMIR we invert the map and add the original AMDGPU
attribute.
2025-01-07 12:01:31 +02:00
Joseph Huber
81fae0d5e3
[Clang][AMDGPU] Stop defaulting to one-as for all atomic scopes (#120095)
Summary:
The documentation at
https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these
'one-as' modifiers are more specific versions of the scopes that only
apply to a specific address space. This doesn't make sense for fences
which have no associated address space to use, and it's a more
restrictive version the normal scope. This should not tbe the default
behavior, but it is currently emitted in all cases except for
sequentially consistent.
2025-01-06 08:11:08 -06:00
Jun Wang
41ed16c3b3
Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)" (#118907)
This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5.

This fixes the test file attributor-flatscratchinit-globalisel.ll.
2024-12-09 16:44:48 -08:00
Nikita Popov
462cb3cd6c
[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144)
If the gep is nusw (usually via inbounds) and the offset is
non-negative, we can infer nuw.

Proof: https://alive2.llvm.org/ce/z/ihztLy
2024-12-05 14:36:40 +01:00
Philip Reames
1ef9410a96 Revert "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)"
This reverts commit e6aec2c12095cc7debd1a8004c8535eef41f4c36.  Commit breaks "ninja check-llvm" on x86 host.
2024-12-04 15:37:25 -08:00
Jun Wang
e6aec2c120
[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)
The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and
"amdgpu-stack-objects" attributes, which are used to infer whether we
need to initialize flat scratch. This is, however, not precise. Instead,
we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on
kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .
2024-12-04 14:10:15 -08:00
John Brawn
ecbe4d1e36
[IR] Allow fast math flags on fptrunc and fpext (#115894)
This consists of:
 * Make these instructions part of FPMathOperator.
* Adjust bitcode/ir readers/writers to expect fast math flags on these
instructions.
 * Make IRBuilder set the fast math flags on these instructions.
 * Update langref and release notes.
* Update a bunch of tests. Some of these are due to InstCombineCasts
incorrectly adding fast math flags to fptrunc, which will be fixed in a
later patch.
2024-12-04 10:53:04 +00:00
Shilei Tian
68bcba6d7a Revert "[AMDGPU] Use COV6 by default (#118515)"
This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some
buildbots are not ready yet.
2024-12-03 20:17:06 -05:00
Shilei Tian
410cbe3cf2
[AMDGPU] Use COV6 by default (#118515) 2024-12-03 19:38:35 -05:00
Haopeng Liu
4d6e69143d
Add the initializes attribute inference (#117104)
reland https://github.com/llvm/llvm-project/pull/97373 after fixing
clang tests.

Confirmed with "ninja check-llvm" and "ninja check-clang"
2024-11-20 19:15:23 -08:00
Artem Belevich
7c3fdcc276
[CUDA] Add support for __grid_constant__ attribute (#114589)
LLVM support for the attribute has been implemented already, so it just
plumbs it through to the CUDA front-end.

One notable difference from NVCC is that the attribute can be used
regardless of the targeted GPU. On the older GPUs it will just be
ignored. The attribute is a performance hint, and does not warrant a
hard error if compiler can't benefit from it on a particular GPU
variant.
2024-11-05 10:48:54 -08:00
Joseph Huber
42eb54b774
[Clang] Put offloading globals in the .llvm.rodata.offloading section (#111890)
Summary:
For our offloading entries, we currently store all the string names of
kernels that the runtime will need to load from the target executable.
These are available via pointer in the `__tgt_offload_entry` struct,
however this makes it difficult to obtain from the object itself. This
patch simply puts the strings in a named section so they can be easily
queried.

The motivation behind this is that when the linker wrapper is doing
linking, it wants to know which kernels the host executable is calling.
We *could* get this already via the `.relaomp_offloading_entires`
section and trawling through the string table, but that's quite annoying
and not portable. The follow-up to this should be to make the linker
wrapper get a list of all used symbols the device link job should count
as "needed" so we can handle static linking more directly.
2024-10-28 07:17:50 -07:00
Alex Voicu
2074de252b
[clang][HIP] Don't use the OpenCLKernel CC when targeting AMDGCNSPIRV (#110447)
When compiling HIP source for AMDGCN flavoured SPIR-V that is expected
to be consumed by the ROCm HIP RT, it's not desirable to set the OpenCL
Kernel CC on `__global__` functions. On one hand, this is not an OpenCL
RT, so it doesn't compose with e.g. OCL specific attributes. On the
other it is a "noisy" CC that carries semantics, and breaks overload
resolution when using [generic dispatchers such as those used by
RAJA](186d4194a5/src/common/HipDataUtils.hpp (L39)).
2024-10-22 17:16:46 +01:00
Youngsuk Kim
0f0a96b862
[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017)
Remove the extraneous '+0' immediate offset part in PTX load/stores, to
improve readability of output PTX code.
2024-10-19 10:05:36 -04:00
Matt Arsenault
51b4ada458
clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (#102462) 2024-10-17 17:10:45 +04:00
Matt Arsenault
d50302f31c
clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (#111579) 2024-10-09 08:52:32 +04:00
Alex Voicu
e203a67f4c
[cuda][HIP] __constant__ should imply constant (#110182)
Currently, `__constant__` variables do not get unconditionally marked as
`constant` in IR, which seems a bit odd given their definition. This is
generally inconsequential for NVPTX/AMDGPU, since said variables get
emitted in the constant address space for those BEs. However, it is
potentially significant for e.g. HIP-on-SPIR-V cases, as SPIR-V does not
allow casts to/from the constant AS (`UniformConstant`), which forces
`__constant__` variables to be emitted in the global AS, thus making IR
constness meaningful.
2024-09-29 01:22:52 +01:00
jofrn
b5fd9463a3
[HIP][Clang][CodeGen] Handle hip bin symbols properly. (#107458)
Remove '_' in fatbin and gpubin symbol suffixes when missing TU hash ID.
Internalize gpubin symbol so that it is not unresolved at link-time when
symbol is not relocatable.
2024-09-11 18:46:46 -04:00
Alex Voicu
ad435bcc14
[clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. (#102776)
The AMDGPU kernel ABI is not directly representable in SPIR-V, since it
relies on passing aggregates `byref`, and SPIR-V only encodes `byval`
(which the AMDGPU BE disallows for kernel arguments). As a temporary
solution to this mismatch, we add special handling for AMDGCN flavoured
SPIR-V, whereby aggregates are passed as direct, both to kernels and to
normal functions. This is not ideal (there are pathological cases where
performance is heavily impacted), but empirically robust and guaranteed
to work as the AMDGPU BE retains handling of `direct` passing for legacy
reasons.

We will revisit this in the future, but as it stands it is enough to
pass a wide array of integration tests and generates correct SPIR-V and
correct reverse translation into LLVM IR. The
amdgpu-kernel-arg-pointer-type test is updated via the automated script,
and thus becomes quite noisy.
2024-08-21 13:16:59 +01:00
Johannes Doerfert
80525dfcde
[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549)
Through the new `-foffload-via-llvm` flag, CUDA kernels can now be
lowered to the LLVM/Offload API. On the Clang side, this is simply done
by using the OpenMP offload toolchain and emitting calls to `llvm*`
functions to orchestrate the kernel launch rather than `cuda*`
functions. These `llvm*` functions are implemented on top of the
existing LLVM/Offload API.

As we are about to redefine the Offload API, this wil help us in the
design process as a second offload language.

We do not support any CUDA APIs yet, however, we could:
  https://www.osti.gov/servlets/purl/1892137

For proper host execution we need to resurrect/rebase
  https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf
(which was designed for debugging).

```
❯❯❯ cat test.cu
extern "C" {
void *llvm_omp_target_alloc_shared(size_t Size, int DeviceNum);
void llvm_omp_target_free_shared(void *DevicePtr, int DeviceNum);
}

__global__ void square(int *A) { *A = 42; }

int main(int argc, char **argv) {
  int DevNo = 0;
  int *Ptr = reinterpret_cast<int *>(llvm_omp_target_alloc_shared(4, DevNo));
  *Ptr = 7;
  printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
  square<<<1, 1>>>(Ptr);
  printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr);
  llvm_omp_target_free_shared(Ptr, DevNo);
}

❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native

❯❯❯ llvm-objdump --offloading test123

test123:        file format elf64-x86-64

OFFLOADING IMAGE [0]:
kind            elf
arch            gfx90a
triple          amdgcn-amd-amdhsa
producer        openmp

❯❯❯ LIBOMPTARGET_INFO=16 ./test123
Ptr 0x155448ac8000, *Ptr 7
Ptr 0x155448ac8000, *Ptr 42
```
2024-08-12 17:44:58 -07:00
Artem Belevich
5629249575
[CUDA] Emit used function list in deterministic order. (#102661)
Fixes https://github.com/llvm/llvm-project/issues/101560
2024-08-12 10:21:23 -07:00
Hari Limaye
94473f4db6
[IRBuilder] Generate nuw GEPs for struct member accesses (#99538)
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.

Regression tests are updated using update scripts where possible, and by
find + replace where not.
2024-08-09 13:25:04 +01:00
Eli Friedman
1762e01cca
Fix codegen of consteval functions returning an empty class, and related issues (#93115)
Fix codegen of consteval functions returning an empty class, and related
issues

If a class is empty, don't store it to memory: the store might overwrite
useful data. Similarly, if a class has tail padding that might overlap
other fields, don't store the tail padding to memory.

The problem here turned out a bit more general than I initially thought:
basically all uses of EmitAggregateStore were broken. Call lowering had
a method that did mostly the right thing, though: CreateCoercedStore.
Adapt CreateCoercedStore so it always does the conservatively right
thing, and use it for both calls and ConstantExpr.

Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was
set incorrectly for empty classes in some cases.

Fixes #93040.
2024-08-01 16:18:20 -07:00
Matt Arsenault
41439d5bb7
AMDGPU: Handle remote/fine-grained memory in atomicrmw fmin/fmax lowering (#96759)
Consider the new atomic metadata when choosing to expand as cmpxchg
instead.
2024-08-01 22:08:01 +04:00
darkbuck
fa84297002
[clang][CUDA] Add 'noconvergent' function and statement attribute
- For languages following SPMD/SIMT programming model, functions and
  call sites are marked 'convergent' by default. 'noconvergent' is added
  in this patch to allow developers to remove that 'convergent'
  attribute when it's safe.

Reviewers:
nhaehnle, Sirraide, yxsamliu, Artem-B, ilovepi, jayfoad, ssahasra, arsenm

Reviewed By: arsenm

Pull Request: https://github.com/llvm/llvm-project/pull/100637
2024-07-31 11:30:48 -04:00
Matt Arsenault
e108853ac8
clang: Allow targets to set custom metadata on atomics (#96906)
Use this to replace the emission of the amdgpu-unsafe-fp-atomics
attribute in favor of per-instruction metadata. In the future
new fine grained controls should be introduced that also cover
the integer cases.

Add a wrapper around CreateAtomicRMW that appends the metadata,
and update a few use contexts to use it.
2024-07-26 09:57:28 +04:00