llvm-project

Author	SHA1	Message	Date
Alex MacLean	35693daa70	[NVPTX] Fix v2i8 call lowering, use generic ld/st nodes for call params (#146930 )	2025-07-28 10:41:51 -07:00
Alex MacLean	70333de6cf	[NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (#145581 ) This change consolidates and cleans up various NVPTXISD target-specific nodes in order to simplify SDAG ISel. While there are some whitespace changes in the emitted PTX it is otherwise a non-functional change. NVPTXISD::Wrapper - This node was used to wrap external-symbol and global-address nodes. It is redundant and has been removed. Instead we use the non-target versions of these nodes and convert them appropriately during ISel. NVPTXISD::CALL - Much of the family of nodes used to represent a PTX call instruction have been replaced by this new single node. It corresponds to a single instruction and is therefore much simpler to create and lower.	2025-06-25 11:42:21 -07:00
Yaxun (Sam) Liu	54ddbc6be3	[AMDGPU] fix amdgpu_max_num_work_groups in templates (#141633 ) Clang does not instantiate amdgpu_max_num_work_groups attribute with one template argument, causing assertion codegen. Fixes: https://github.com/llvm/llvm-project/issues/139570	2025-05-28 08:05:03 -04:00
Yaxun (Sam) Liu	c720f92009	[HIP] disable sanitizer for `__hip_cuid` (#141581 ) Global variable `__hip_cuid_*` is for identifying purpose and does not need sanitization, therefore disable it for sanitizers.	2025-05-27 16:58:04 -04:00
Alex MacLean	369891b674	[NVPTX] use untyped loads and stores where ever possible (#137698 ) In most cases, the type information attached to load and store instructions is meaningless and inconsistently applied. We can usually use ".b" loads and avoid the complexity of trying to assign the correct type. The one expectation is sign-extending load, which will continue to use ".s" to ensure the sign extension into a larger register is done correctly.	2025-05-10 08:26:26 -07:00
Yaxun (Sam) Liu	83c309b905	[CUDA][HIP] capture possible ODR-used var (#136645 ) In a lambda function, a call of a function may resolve to host and device functions with different signatures. Especially, a constexpr local variable may be passed by value by the device function and passed by reference by the host function, which will cause the constexpr variable captured by the lambda function in host compilation but not in the device compilation. The discrepancy in the lambda captures will violate ODR and causes UB for kernels using these lambdas. This PR fixes the issue by identifying discrepancy of ODR/non-ODR usages of constexpr local variables passed to host/device functions and conservatively capture them. Fixes: https://github.com/llvm/llvm-project/issues/132068	2025-04-23 12:50:28 -04:00
modiking	3d04da5bc0	[NVPTX] Add support for Shared Cluster Memory address space [2/2] (#136768 ) Adds support for new Shared Cluster Memory Address Space (SHARED_CLUSTER, addrspace 7). See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#distributed-shared-memory for details. Follow-up to https://github.com/llvm/llvm-project/pull/135444 1. Update existing codegen/intrinsics in LLVM and MLIR that now use this address space 2. Auto-upgrade previous intrinsics that used SMEM (addrspace 3) but were really taking in a shared cluster pointer to the new address space	2025-04-22 16:50:45 -07:00
Austin Schuh	32fd97c61d	cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (#134758 ) Signed-off-by: Austin Schuh <austin.linux@gmail.com>	2025-04-22 12:36:39 -07:00
Arvind Sudarsanam	6cfec29cb9	[Offload][SYCL] Refactor OffloadKind implementation (#135809 ) Following are the changes: 1. Make OffloadKind enum values to be powers of two so we can use them like a bitfield 2. Include OFK_SYCL enum value 3. Modify ActiveOffloadKinds support in clang-linker-wrapper to use bitfields instead of a vector. Thanks --------- Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>	2025-04-16 14:13:30 +00:00
Joseph Huber	772173f548	[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870 ) Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in https://github.com/llvm/llvm-project/pull/131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.	2025-03-28 07:35:16 -05:00
Shilei Tian	f1ac2afe21	Reapply "[AMDGPU] Use COV6 by default (#118515 )" (#130963 ) This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.	2025-03-21 15:26:45 -04:00
Joseph Huber	0a41fb71f1	[Clang] Treat `ext_vector_type` as a regular type attribute (#130177 ) Summary: This attribute is mostly borrowed from OpenCL, but is useful in general for accessing the LLVM vector types. Previously the only way to use it was through typedefs. This patch changes that to allow use as a regular type attribute, similar to address spaces.	2025-03-07 10:09:06 -06:00
Yaxun (Sam) Liu	240f2269ff	Add clang atomic control options and attribute (#114841 ) Add option and statement attribute for controlling emitting of target-specific metadata to atomicrmw instructions in IR. The RFC for this attribute and option is https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641, Originally a pragma was proposed, then it was changed to clang attribute. This attribute allows users to specify one, two, or all three options and must be applied to a compound statement. The attribute can also be nested, with inner attributes overriding the options specified by outer attributes or the target's default options. These options will then determine the target-specific metadata added to atomic instructions in the IR. In addition to the attribute, three new compiler options are introduced: `-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`, `-f[no-]atomic-ignore-denormal-mode`. These compiler options allow users to override the default options through the Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to `-f[no-]ignore-denormal-mode`. In terms of implementation, the atomic attribute is represented in the AST by the existing AttributedStmt, with minimal changes to AST and Sema. During code generation in Clang, the CodeGenModule maintains the current atomic options, which are used to emit the relevant metadata for atomic instructions. RAII is used to manage the saving and restoring of atomic options when entering and exiting nested AttributedStmt.	2025-02-27 10:41:04 -05:00
Alex MacLean	6c2e170d04	[NVPTX] Convert vector function nvvm.annotations to attributes (#127736 ) Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxntid[xyz]" -> "nvvm.maxntid" - !"reqntid[xyz]" -> "nvvm.reqntid" - !"cluster_dim_[xyz]" -> "nvvm.cluster_dim"	2025-02-26 08:45:27 -08:00
Anshil Gandhi	95000fdb9e	[CUDA] Increment VTable index for device thunks (#124989 ) Currently, the clang frontend incorrectly emits the callee instead of the thunk for the callee in the VTable. This is the case because the thunk index is not incremented when their callees cannot be emitted. This patch fixes the bug.	2025-02-19 23:47:22 -05:00
Anshil Gandhi	bf9b72e134	[NFC][Clang] Precommit test for VTable codegen (#124983 ) Associated PR: https://github.com/llvm/llvm-project/pull/124989	2025-02-19 21:14:02 -05:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Srinivasa Ravi	bd860f9864	[NVPTX] Add intrinsics for redux.sync f32 instructions (#126664 ) Adds NVVM intrinsics, NVPTX codegen and Clang builtins for `redux.sync` f32 instructions introduced in ptx8.6 for sm_100a. Tests added in `CodeGen/NVPTX/redux-sync.ll` and `CodeGenCUDA/redux-builtins.cu` and verified through ptxas 12.8.0. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-redux-sync	2025-02-14 11:11:44 +05:30
Alex MacLean	a282b6c486	[NVPTX] Convert scalar function nvvm.annotations to attributes (#125908 ) Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank" - !"minctasm" -> "nvvm.minctasm" - !"maxnreg" -> "nvvm.maxnreg"	2025-02-12 07:33:22 -08:00
Joseph Huber	f1e917d07b	[Offload] Unify offloading entries into a single section (#125731 ) Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.	2025-02-06 08:24:01 -06:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Joseph Huber	70a16b90ff	[HIP] Support managed variables using the new driver (#123437 ) Summary: Previously, managed variables didn't work in rdc mode using the new driver because we just didn't register them. This was previously ignored because we didn't have enough space in the current struct format. This patch amends that by just emitting a struct pair for the two variables and using the single pointer. In the future, a more extensible entry format would be nice, but that can be done later.	2025-01-22 09:13:14 -06:00
Alex MacLean	4583f6d344	[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806 ) the `ptx_kernel` calling convention is a more idiomatic and standard way of specifying a NVPTX kernel than using the metadata which is not supposed to change the meaning of the program. Further, checking the calling convention is significantly faster than traversing the metadata, improving compile time. This change updates the clang and mlir frontends as well as the NVPTXCtorDtorLowering pass to emit kernels using the calling convention. In addition, this updates all NVPTX unit tests to use the calling convention as well.	2025-01-07 18:24:50 -08:00
Alex Voicu	66acb26946	[clang][CodeGen][SPIRV] Translate `amdgpu_flat_work_group_size` into `max_work_group_size`. (#116820 ) HIPAMD relies on the `amdgpu_flat_work_group_size` attribute to implement key functionality such as the `__launch_bounds__` `__global__` function annotation. This attribute is not available / directly translatable to SPIR-V, hence as it is AMDGCN flavoured SPIR-V suffers from information loss. This patch addresses that limitation by converting the unsupported attribute into the `max_work_group_size` attribute which maps to [`MaxWorkgroupSizeINTEL`](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_kernel_attributes.asciidoc), which is available in / handled by SPIR-V. When reverse translating from SPIR-V to AMDGCN LLVMIR we invert the map and add the original AMDGPU attribute.	2025-01-07 12:01:31 +02:00
Joseph Huber	81fae0d5e3	[Clang][AMDGPU] Stop defaulting to `one-as` for all atomic scopes (#120095 ) Summary: The documentation at https://llvm.org/docs/AMDGPUUsage.html#memory-scopes states that these 'one-as' modifiers are more specific versions of the scopes that only apply to a specific address space. This doesn't make sense for fences which have no associated address space to use, and it's a more restrictive version the normal scope. This should not tbe the default behavior, but it is currently emitted in all cases except for sequentially consistent.	2025-01-06 08:11:08 -06:00
Jun Wang	41ed16c3b3	Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 )" (#118907 ) This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5. This fixes the test file attributor-flatscratchinit-globalisel.ll.	2024-12-09 16:44:48 -08:00
Nikita Popov	462cb3cd6c	[InstCombine] Infer nusw + nneg -> nuw for getelementptr (#111144 ) If the gep is nusw (usually via inbounds) and the offset is non-negative, we can infer nuw. Proof: https://alive2.llvm.org/ce/z/ihztLy	2024-12-05 14:36:40 +01:00
Philip Reames	1ef9410a96	Revert "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 )" This reverts commit e6aec2c12095cc7debd1a8004c8535eef41f4c36. Commit breaks "ninja check-llvm" on x86 host.	2024-12-04 15:37:25 -08:00
Jun Wang	e6aec2c120	[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647 ) The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and "amdgpu-stack-objects" attributes, which are used to infer whether we need to initialize flat scratch. This is, however, not precise. Instead, we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .	2024-12-04 14:10:15 -08:00
John Brawn	ecbe4d1e36	[IR] Allow fast math flags on fptrunc and fpext (#115894 ) This consists of: * Make these instructions part of FPMathOperator. * Adjust bitcode/ir readers/writers to expect fast math flags on these instructions. * Make IRBuilder set the fast math flags on these instructions. * Update langref and release notes. * Update a bunch of tests. Some of these are due to InstCombineCasts incorrectly adding fast math flags to fptrunc, which will be fixed in a later patch.	2024-12-04 10:53:04 +00:00
Shilei Tian	68bcba6d7a	Revert "[AMDGPU] Use COV6 by default (#118515 )" This reverts commit 410cbe3cf28913cca2fc61b3437306b841d08172 because some buildbots are not ready yet.	2024-12-03 20:17:06 -05:00
Shilei Tian	410cbe3cf2	[AMDGPU] Use COV6 by default (#118515 )	2024-12-03 19:38:35 -05:00
Haopeng Liu	4d6e69143d	Add the initializes attribute inference (#117104 ) reland https://github.com/llvm/llvm-project/pull/97373 after fixing clang tests. Confirmed with "ninja check-llvm" and "ninja check-clang"	2024-11-20 19:15:23 -08:00
Artem Belevich	7c3fdcc276	[CUDA] Add support for __grid_constant__ attribute (#114589 ) LLVM support for the attribute has been implemented already, so it just plumbs it through to the CUDA front-end. One notable difference from NVCC is that the attribute can be used regardless of the targeted GPU. On the older GPUs it will just be ignored. The attribute is a performance hint, and does not warrant a hard error if compiler can't benefit from it on a particular GPU variant.	2024-11-05 10:48:54 -08:00
Joseph Huber	42eb54b774	[Clang] Put offloading globals in the `.llvm.rodata.offloading` section (#111890 ) Summary: For our offloading entries, we currently store all the string names of kernels that the runtime will need to load from the target executable. These are available via pointer in the `__tgt_offload_entry` struct, however this makes it difficult to obtain from the object itself. This patch simply puts the strings in a named section so they can be easily queried. The motivation behind this is that when the linker wrapper is doing linking, it wants to know which kernels the host executable is calling. We could get this already via the `.relaomp_offloading_entires` section and trawling through the string table, but that's quite annoying and not portable. The follow-up to this should be to make the linker wrapper get a list of all used symbols the device link job should count as "needed" so we can handle static linking more directly.	2024-10-28 07:17:50 -07:00
Alex Voicu	2074de252b	[clang][HIP] Don't use the OpenCLKernel CC when targeting AMDGCNSPIRV (#110447 ) When compiling HIP source for AMDGCN flavoured SPIR-V that is expected to be consumed by the ROCm HIP RT, it's not desirable to set the OpenCL Kernel CC on `__global__` functions. On one hand, this is not an OpenCL RT, so it doesn't compose with e.g. OCL specific attributes. On the other it is a "noisy" CC that carries semantics, and breaks overload resolution when using [generic dispatchers such as those used by RAJA](`186d4194a5/src/common/HipDataUtils.hpp (L39)`).	2024-10-22 17:16:46 +01:00
Youngsuk Kim	0f0a96b862	[llvm][NVPTX] Strip unneeded '+0' in PTX load/store (#113017 ) Remove the extraneous '+0' immediate offset part in PTX load/stores, to improve readability of output PTX code.	2024-10-19 10:05:36 -04:00
Matt Arsenault	51b4ada458	clang/AMDGPU: Set noalias.addrspace metadata on atomicrmw (#102462 )	2024-10-17 17:10:45 +04:00
Matt Arsenault	d50302f31c	clang/AMDGPU: Stop emitting amdgpu-unsafe-fp-atomics attribute (#111579 )	2024-10-09 08:52:32 +04:00
Alex Voicu	e203a67f4c	[cuda][HIP] `__constant__` should imply constant (#110182 ) Currently, `__constant__` variables do not get unconditionally marked as `constant` in IR, which seems a bit odd given their definition. This is generally inconsequential for NVPTX/AMDGPU, since said variables get emitted in the constant address space for those BEs. However, it is potentially significant for e.g. HIP-on-SPIR-V cases, as SPIR-V does not allow casts to/from the constant AS (`UniformConstant`), which forces `__constant__` variables to be emitted in the global AS, thus making IR constness meaningful.	2024-09-29 01:22:52 +01:00
jofrn	b5fd9463a3	[HIP][Clang][CodeGen] Handle hip bin symbols properly. (#107458 ) Remove '_' in fatbin and gpubin symbol suffixes when missing TU hash ID. Internalize gpubin symbol so that it is not unresolved at link-time when symbol is not relocatable.	2024-09-11 18:46:46 -04:00
Alex Voicu	ad435bcc14	[clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. (#102776 ) The AMDGPU kernel ABI is not directly representable in SPIR-V, since it relies on passing aggregates `byref`, and SPIR-V only encodes `byval` (which the AMDGPU BE disallows for kernel arguments). As a temporary solution to this mismatch, we add special handling for AMDGCN flavoured SPIR-V, whereby aggregates are passed as direct, both to kernels and to normal functions. This is not ideal (there are pathological cases where performance is heavily impacted), but empirically robust and guaranteed to work as the AMDGPU BE retains handling of `direct` passing for legacy reasons. We will revisit this in the future, but as it stands it is enough to pass a wide array of integration tests and generates correct SPIR-V and correct reverse translation into LLVM IR. The amdgpu-kernel-arg-pointer-type test is updated via the automated script, and thus becomes quite noisy.	2024-08-21 13:16:59 +01:00
Johannes Doerfert	80525dfcde	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549 ) Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```	2024-08-12 17:44:58 -07:00
Artem Belevich	5629249575	[CUDA] Emit used function list in deterministic order. (#102661 ) Fixes https://github.com/llvm/llvm-project/issues/101560	2024-08-12 10:21:23 -07:00
Hari Limaye	94473f4db6	[IRBuilder] Generate nuw GEPs for struct member accesses (#99538 ) Generate nuw GEPs for struct member accesses, as inbounds + non-negative implies nuw. Regression tests are updated using update scripts where possible, and by find + replace where not.	2024-08-09 13:25:04 +01:00
Eli Friedman	1762e01cca	Fix codegen of consteval functions returning an empty class, and related issues (#93115 ) Fix codegen of consteval functions returning an empty class, and related issues If a class is empty, don't store it to memory: the store might overwrite useful data. Similarly, if a class has tail padding that might overlap other fields, don't store the tail padding to memory. The problem here turned out a bit more general than I initially thought: basically all uses of EmitAggregateStore were broken. Call lowering had a method that did mostly the right thing, though: CreateCoercedStore. Adapt CreateCoercedStore so it always does the conservatively right thing, and use it for both calls and ConstantExpr. Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was set incorrectly for empty classes in some cases. Fixes #93040.	2024-08-01 16:18:20 -07:00
Matt Arsenault	41439d5bb7	AMDGPU: Handle remote/fine-grained memory in atomicrmw fmin/fmax lowering (#96759 ) Consider the new atomic metadata when choosing to expand as cmpxchg instead.	2024-08-01 22:08:01 +04:00
darkbuck	fa84297002	[clang][CUDA] Add 'noconvergent' function and statement attribute - For languages following SPMD/SIMT programming model, functions and call sites are marked 'convergent' by default. 'noconvergent' is added in this patch to allow developers to remove that 'convergent' attribute when it's safe. Reviewers: nhaehnle, Sirraide, yxsamliu, Artem-B, ilovepi, jayfoad, ssahasra, arsenm Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/100637	2024-07-31 11:30:48 -04:00
Matt Arsenault	e108853ac8	clang: Allow targets to set custom metadata on atomics (#96906 ) Use this to replace the emission of the amdgpu-unsafe-fp-atomics attribute in favor of per-instruction metadata. In the future new fine grained controls should be introduced that also cover the integer cases. Add a wrapper around CreateAtomicRMW that appends the metadata, and update a few use contexts to use it.	2024-07-26 09:57:28 +04:00

1 2 3 4 5 ...

445 Commits