llvm-project

Author	SHA1	Message	Date
Daniil Fukalov	6bf794a02a	[AMDGPU] Disable generic DAG combines at -O0 to preserve debuggability. (#176304 ) Disable generic DAG combines for AMDGPU at -O0 via disableGenericCombines() to preserve instructions that users may want to set breakpoints on during debugging. Assisted-by: Cursor / Claude Opus 4.6	2026-04-01 11:55:17 +02:00
Alex Voicu	18e6958903	[SPIRV][AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (#134016 ) This change adds two builtins for AMDGPU: - `__builtin_amdgcn_processor_is`, which is similar in observable behaviour with `__builtin_cpu_is`, except that it is never "evaluated" at run time; - `__builtin_amdgcn_is_invocable`, which is behaviourally similar with `__has_builtin`, except that it is not a macro (i.e. not evaluated at preprocessing time). Neither of these are `constexpr`, even though when compiling for concrete (i.e. `gfxXXX` / `gfxXXX-generic`) targets they get evaluated in Clang, so they shouldn't tear the AST too badly / at all for multi-pass compilation cases like HIP. They can only be used in specific contexts (as args to control structures). The motivation for adding these is two-fold: - as a nice to have, it provides an AST-visible way to incorporate architecture specific code, rather than having to rely on macros and the preprocessor, which burn in the choice quite early; - as a must have, it allows featureful AMDGCN flavoured SPIR-V to be produced, where target specific capability is guarded and chosen or discarded when finalising compilation for a concrete target; this is built atop the Speciali\ation Constant concept which is described in the SPIR-V specification under section [2.12 Specialization](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_specialization_2) I've tried to keep the overall footprint of the change small. The changes to Sema are a bit unpleasant, but there was a strong desire to have Clang validate these, and to constrain their uses, and this was the most compact solution I could come up with (suggestions welcome). --------- Co-authored-by: Juan Manuel Martinez Caamaño <jmartinezcaamao@gmail.com> Co-authored-by: Voicu <avoicu@amd.com>	2026-03-30 23:02:26 +01:00
Srivarshitha M	61b9fc1d42	[CIR] Upstream CUDA mangling test with LLVM and OGCG verification (#184444 ) This PR upstreams the `mangling.cu` test from the ClangIR incubator. Building on the feedback from my previous upstreaming PR, I have expanded the verification for this file to include: 1. CIR checks: Verifying the ClangIR generated functions. 2. LLVM checks: Verifying the LLVM IR generated via the ClangIR pipeline. 3. OGCG checks: Verifying against the original CodeGen pipeline to ensure name-mangling parity. I have also moved the `Inputs/cuda.h` mock header to the upstream `Inputs` directory to support this and future CUDA tests. If this multi-stage verification approach looks correct to the maintainers, I plan to follow up by upstreaming the other currently passing CUDA test, `simple-nvptx-triple.cu`, using the same standard. Verified locally with `llvm-lit`. Partially addresses #156747.	2026-03-19 14:53:12 -07:00
Matt Arsenault	7cb3005ba2	AMDGPU: Add dereferenceable attribute to dispatch ptr intrinsic (#185955 ) Stop manually setting it on the callsite in clang.	2026-03-12 07:28:39 +01:00
Lewis Crawford	380ac9e301	[NVPTX][clang] Ensure CLZ(0) is defined on NVPTX (#185630 ) CUDA semantics specify that clz(0) = bitwidth, so clang should emit clz / ctz intrinsics for NVPTX with zero-is-poison = false.	2026-03-11 09:49:02 +00:00
Matt Arsenault	47e31088f3	clang/AMDGPU: Fix workgroup size builtins for nonuniform work group sizes (#185098 ) These were assuming uniform work group sizes. Emit the v4 and v5 sequences to take the remainder group for the nonuniform case. Currently the device libs uses this builtin on the legacy ABI path with the same sequence to calculate the remainder, and fully implements the v5 path. If you perform a franken-build of the library with the updated builtin, the result is worse. The duplicate sequence does not fully fold out. However, it does not appear to be wrong. The relevant conformance tests still pass.	2026-03-09 11:47:49 +01:00
Matt Arsenault	d840396e20	clang: Simplify emission of uniform-work-group-size attribute (#185066 )	2026-03-09 11:17:30 +01:00
Nikita Popov	b3d99ac2cd	[CodeGen] Use data layout aware constant folder in CGBuilder (#184819 ) Use the DataLayout-aware TargetFolder instead of ConstantFolder in Clang's CGBuilder. The primary impact of this change is that GEP constant expressions are now emitted in canonical `getelementptr i8` form. This is in preparation for the migration to ptradd, which requires this form. Part of the test updates were performed by Claude Code and reviewed by me.	2026-03-06 08:33:34 +00:00
Joseph Huber	5a88dffc40	[Clang] Only define `wchar_size` module flag if non-standard (#184668 ) Summary: This PR simply changes the behavior of the `wchar_size` flag. Currently, we emit this in all cases for all targets. This causes problems during LLVM-IR linking, specifically because this would vary between Linux and Windows in unintuitive ways. Now we have an llvm::Triple helper to determine the size from the known values. The module flag will only be emitted if these do not match (indicating a non-standard environment). In addition to fixing AMDGCN bitcode linking, this also means we don't need to bloat every IR module compiled by clang with this flag. The changed tests reflects this, one less unnecessary piece of metadata.	2026-03-04 16:13:48 -06:00
Shilei Tian	f05d2e8a39	[AMDGPU] Make uniform-work-group-size a valueless attribute (#183925 ) The "uniform-work-group-size" function attribute previously took a string value of "true" or "false". Since presence alone can convey the "true" semantics and absence can convey "false", the value is unnecessary. This patch converts it to a valueless string attribute: presence indicates true, absence indicates false. For backward compatibility, auto-upgrade logic is added in both UpgradeAttributes (bitcode) and UpgradeFunctionAttributes: if the old value is "true", the attribute is kept without a value; if "false", the attribute is removed.	2026-03-01 21:29:55 +00:00
Yaxun (Sam) Liu	16d0090ffc	[CUDA/HIP] Externalize __device__ const variables accessed by host code (#177292 ) In standard C++, const variables at namespace scope have internal linkage. For __device__ const variables, this makes them invisible to runtime symbol lookup APIs (cudaGetSymbolAddress/hipGetSymbolAddress). Reading a __device__ const variable from host code is a valid usage pattern — the host may need to know the value at runtime. This is also needed by libcudacxx's cuda::get_device_address. This patch extends the existing CUDADeviceVarODRUsedByHost tracking to cover __device__ const variables. When host code references such a variable, it gets externalized (same mechanism used for static device vars). Variables only used in device code keep internal linkage and can still be constant-folded. The fix is in SemaExpr: __device__ const variables are classified as CVT_Both (due to an implicit CUDAConstantAttr), so the ODR-use tracking is extended to include CVT_Both variables with an explicit CUDADeviceAttr, distinguishing them from plain const variables.	2026-02-24 12:35:13 -05:00
Joseph Huber	9f2351c27e	[HIP] Do not apply 'externally_initialized' to constant device variables (#182157 ) Summary: From the Language reference: > By default, global initializers are optimized by assuming that global variables defined within the module are not modified from their initial values before the start of the global initializer. This is true even for variables potentially accessible from outside the module, including those with external linkage or appearing in @llvm.used or dllexported variables. This assumption may be suppressed by marking the variable with externally_initialized. This is intended because device programs can be modified beyond the normal lifetime expected by the optimization pipeline. However, for constant variables we should be able to safely assume that these are truly constant within the module. In the vast majority of cases these will not get externally visible symbols, but even `extern const` uses we should assert that the user should not be writing them if they are marked const.	2026-02-24 09:11:31 -06:00
Justin Fargnoli	9475f6af81	Reland "[NVPTX] Validate user-specified PTX version against SM version" (#180116 ) Previous commit message: >Previous commit message: > >> Original commit message: >> >>>When users explicitly specify a PTX version via -mattr=+ptxNN that's insufficient for their target SM, we now emit a fatal error. Previously, we silently upgraded the PTX version to the minimum required for the target SM. >>> >>>When no SM or PTX version is specified, we now use PTX 3.2 (the minimum for the default SM 3.0) instead of PTX 6.0. >> >>The following commits should fix the failures that arose when I previously tried to land this commit: >> >> >>`9fc5fd0ad6` should address the llvm-nvptx-nvidia- build failures: https://github.com/llvm/llvm-project/pull/174834#issuecomment-3742242651 >> >> >>`600514a637` should address the MLIR failures > >The previous commit was reverted with `d23cb79ba4` because the [mlir-nvidia](https://lab.llvm.org/buildbot/#/builders/138/builds/24797) and [mlir-nvidia-gcc7](https://lab.llvm.org/buildbot/#/builders/116/builds/23929) Buildbots were failing. > >Those tests failed because MLIR's default SM was 5.0, which caused NVPTX to target PTX ISA v4.0, which did not support the intrinsics used in the failing tests. > >`243f011577` should address this by bumping MLIR's default SM to 7.5. Now, using MLIR's new default SM, NVPTX targets the PTX ISA v6.3, which supports the intrinsics used in the failing tests. --- The previous commit was reverted with e9b578a4d77025e18318efedd0f3f3764338d859 [because](https://github.com/llvm/llvm-project/pull/179304#issuecomment-3856301333) the clang driver set the default PTX ISA version to v4.2 when no CUDA installation is found. However, given our patch, we should not set a default; instead, let the LLVM backend select the appropriate PTX ISA version for the target SM.	2026-02-10 18:11:54 +00:00
Justin Fargnoli	e9b578a4d7	Revert "Reland "[NVPTX] Validate user-specified PTX version against SM version"" (#180035 ) Reverts llvm/llvm-project#179304 due to https://github.com/llvm/llvm-project/pull/179304#issuecomment-3856100622	2026-02-05 13:24:58 -08:00
Justin Fargnoli	ca93dd399e	Reland "[NVPTX] Validate user-specified PTX version against SM version" (#179304 ) Previous commit message: > Original commit message: > >>When users explicitly specify a PTX version via -mattr=+ptxNN that's insufficient for their target SM, we now emit a fatal error. Previously, we silently upgraded the PTX version to the minimum required for the target SM. >> >>When no SM or PTX version is specified, we now use PTX 3.2 (the minimum for the default SM 3.0) instead of PTX 6.0. > >The following commits should fix the failures that arose when I previously tried to land this commit: > >`9fc5fd0ad6` should address the llvm-nvptx-nvidia- build failures: https://github.com/llvm/llvm-project/pull/174834#issuecomment-3742242651 > >`600514a637` should address the MLIR failures --- The previous commit was reverted with d23cb79ba497281de050ef609cb91b91058bf323 because the [mlir-nvidia](https://lab.llvm.org/buildbot/#/builders/138/builds/24797) and [mlir-nvidia-gcc7](https://lab.llvm.org/buildbot/#/builders/116/builds/23929) Buildbots were failing. Those tests failed because MLIR's default SM was 5.0, which caused NVPTX to target PTX ISA v4.0, which did not support the intrinsics used in the failing tests. 243f011577193c99358ccc4142b296d4fa80ea11 should address this by bumping MLIR's default SM to 7.5. Now, using MLIR's new default SM, NVPTX targets the PTX ISA v6.3, which supports the intrinsics used in the failing tests.	2026-02-05 17:42:41 +00:00
Matt Arsenault	2502e3b7ba	IR: Promote "denormal-fp-math" to a first class attribute (#174293 ) Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first class denormal_fpenv attribute. Previously the query for the effective denormal mode involved two string attribute queries with parsing. I'm introducing more uses of this, so it makes sense to convert this to a more efficient encoding. The old representation was also awkward since it was split across two separate attributes. The new encoding just stores the default and float modes as bitfields, largely avoiding the need to consider if the other mode is set. The syntax in the common cases looks like this: `denormal_fpenv(preservesign,preservesign)` `denormal_fpenv(float: preservesign,preservesign)` `denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)` I wasn't sure about reusing the float type name instead of adding a new keyword. It's parsed as a type but only accepts float. I'm also debating switching the name to subnormal to match the current preferred IEEE terminology (also used by nofpclass and other contexts). This has a behavior change when using the command flag debug options to set the denormal mode. The behavior of the flag ignored functions with an explicit attribute set, per the default and f32 version. Now that these are one attribute, the flag logic can't distinguish which of the two components were explicitly set on the function. Only one test appeared to rely on this behavior, so I just avoided using the flags in it. This also does not perform all the code cleanups this enables. In particular the attributor handling could be cleaned up. I also guessed at how to support this in MLIR. I followed MemoryEffects as a reference; it appears bitfields are expanded into arguments to attributes, so the representation there is a bit uglier with the 2 2-element fields flattened into 4 arguments.	2026-02-05 13:31:26 +00:00
Jameson Nash	0dd21ad1c6	[clang] remove addrspace cast from CreateIRTemp (#179327 ) This just added unnecessary work to the IR, since they are only used for load and store, which just causes some IR noise. Tests updated by UTC script to remove the extra lines.	2026-02-04 13:09:32 -05:00
Justin Fargnoli	d23cb79ba4	Revert "Reland "[NVPTX] Validate user-specified PTX version against SM version"" (#178046 ) Reverts llvm/llvm-project#177459 `mlir-nvidia` and `mlir-nvidia-gcc7` Buildbots are failing. The blamelist is small and likely because of my change. Preemptively reverting.	2026-01-26 12:48:59 -08:00
Justin Fargnoli	487b69ae0a	Reland "[NVPTX] Validate user-specified PTX version against SM version" (#177459 ) Original commit message: > When users explicitly specify a PTX version via -mattr=+ptxNN that's insufficient for their target SM, we now emit a fatal error. Previously, we silently upgraded the PTX version to the minimum required for the target SM. > >When no SM or PTX version is specified, we now use PTX 3.2 (the minimum for the default SM 3.0) instead of PTX 6.0. --- The following commits should fix the failures that arose when I previously tried to land this commit: - 9fc5fd0ad689eed94f65b1d6d10f9c5642935e68 should address the `llvm-nvptx-nvidia-` build failures: https://github.com/llvm/llvm-project/pull/174834#issuecomment-3742242651 - 600514a63760c6730e4cd970d2fcead9c5a897b3 should address the MLIR failures	2026-01-26 19:47:05 +00:00
Mehdi Amini	af39416f8b	Revert "[NVPTX] Validate user-specified PTX version against SM version" (#175760 ) Reverts llvm/llvm-project#174834 Bots are broken.	2026-01-13 14:33:12 +00:00
Justin Fargnoli	21b3b37dbb	[NVPTX] Validate user-specified PTX version against SM version (#174834 ) When users explicitly specify a PTX version via `-mattr=+ptxNN` that's insufficient for their target SM, we now emit a fatal error. Previously, we silently upgraded the PTX version to the minimum required for the target SM. When no SM or PTX version is specified, we now use PTX 3.2 (the minimum for the default SM 3.0) instead of PTX 6.0.	2026-01-12 19:04:20 +00:00
Alex Voicu	68fea00dfb	[SPIRV] Use AMDGPU ABI for AMDGCN flavoured SPIRV (#169865 ) At the moment AMDGCN flavoured SPIRV uses the SPIRV ABI with some tweaks revolving around passing aggregates as direct. This is problematic in multiple ways: - it leads to divergence from code compiled for a concrete target, which makes it difficult to debug; - it incurs a run time cost, when dealing with larger aggregates; - it incurs a compile time cost, when dealing with larger aggregates. This patch switches over AMDGCN flavoured SPIRV to implement the AMDGPU ABI (except for dealing with variadic functions, which will be added in the future). One additional complication (and the primary motivation behind the current less than ideal state of affairs) stems from `byref`, which AMDGPU uses, not being expressible in SPIR-V. We deal with this by CodeGen-ing for `byref`, lowering it to the `FuncParamAttr ByVal` in SPIR-V, and restoring it when doing reverse translation from AMDGCN flavoured SPIR-V.	2025-12-07 17:15:48 +00:00
darkbuck	61881c307c	[CUDA] Add device-side kernel launch support (#165519 ) - CUDA's dynamic parallelism extension allows device-side kernel launches, which share the identical syntax to host-side launches, e.g., kernel<<<Dg, Db, Ns, S>>>(arguments); but differ from the code generation. That device-side kernel launches is eventually translated into the following sequence config = cudaGetParameterBuffer(alignment, size); // setup arguments by copying them into `config`. cudaLaunchDevice(func, config, Dg, Db, Ns, S); - To support the device-side kernel launch, 'CUDAKernelCallExpr' is reused but its config expr is set to a call to 'cudaLaunchDevice'. During the code generation, 'CUDAKernelCallExpr' is expanded into the sequence aforementioned. - As the device-side kernel launch requires the source to be compiled as relocatable device code and linked with '-lcudadevrt'. Linkers are changed to pass relevant link options to 'nvlink'.	2025-12-01 17:45:10 +00:00
Jason-VanBeusekom	4394aa685c	[OpenMP][clang][HIP][CUDA] fix weak alias emit on device compilation (#164326 ) This PR adds checks for when emitting weak aliases in: `void CodeGenModule::EmitGlobal(GlobalDecl GD)`, before for device compilation for OpenMP, HIP and Cuda, clang would look for the aliasee even if it was never marked for device compilation. For OpenMP the following case now works: > Failed before when compiling with device, ie: `clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa` > ``` > int __Two(void) { return 2; } > int Two(void) __attribute__ ((weak, alias("__Two"))); > ``` For HIP / Cuda: > > ``` > int __HostFunc(void) { return 42; } > int HostFunc(void) __attribute__ ((weak, alias("__HostFunc"))); > ``` For HIP: >Failed before on HIP, Cuda fails due to: `NVPTX aliasee must not be '.weak'` error > ``` > __device__ int __One(void) { return 2; } > __device__ int One(void) __attribute__ ((weak, alias("__One"))); > ``` Included are Codegen LIT tests for the above cases, and also cases for weak alias cases that currently work in clang. Fixes https://github.com/llvm/llvm-project/issues/117369	2025-11-27 09:05:24 -05:00
Shilei Tian	bde0a8cd63	[Clang][HIP][CUDA] Add `__cluster_dims__` and `__no_cluster__` attribute (#156686 ) This PR adds basic frontend support for `__cluster_dims__` and `__no_cluster__` attribute. In CUDA/HIP programming, the ``__cluster_dims__`` attribute can be applied to a kernel function to set the dimensions of a thread block cluster. The ``__no_cluster__`` attribute can be applied to a kernel function to indicate that the thread block cluster feature will not be enabled at both compile time and kernel launch time. Note that `__no_cluster__` is a LLVM/Clang only attribute. Co-authored-by: Yaxun (Sam) Liu <yaxun.liu@amd.com> Co-authored-by: Jay Foad <jay.foad@amd.com>	2025-10-17 11:39:43 -04:00
paperchalice	2aeefcf40f	[clang][CodeGen] Remove "unsafe-fp-math" attribute support (#162779 ) These global flags block furthur improvements for clang, users should always use fast-math flags see also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797 Remove them incrementally, this is the clang part.	2025-10-10 15:56:29 +08:00
Alex MacLean	06bcc34e3d	[NVPTX] Auto-upgrade nvvm.grid_constant to param attribute (#155489 ) Upgrade the !"grid_constant" !nvvm.annotation to a "nvvm.grid_constant" attribute. This attribute is much simpler for front-ends to apply and faster and simpler to query.	2025-08-27 16:32:28 -07:00
Alex MacLean	35693daa70	[NVPTX] Fix v2i8 call lowering, use generic ld/st nodes for call params (#146930 )	2025-07-28 10:41:51 -07:00
Alex MacLean	70333de6cf	[NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (#145581 ) This change consolidates and cleans up various NVPTXISD target-specific nodes in order to simplify SDAG ISel. While there are some whitespace changes in the emitted PTX it is otherwise a non-functional change. NVPTXISD::Wrapper - This node was used to wrap external-symbol and global-address nodes. It is redundant and has been removed. Instead we use the non-target versions of these nodes and convert them appropriately during ISel. NVPTXISD::CALL - Much of the family of nodes used to represent a PTX call instruction have been replaced by this new single node. It corresponds to a single instruction and is therefore much simpler to create and lower.	2025-06-25 11:42:21 -07:00
Yaxun (Sam) Liu	54ddbc6be3	[AMDGPU] fix amdgpu_max_num_work_groups in templates (#141633 ) Clang does not instantiate amdgpu_max_num_work_groups attribute with one template argument, causing assertion codegen. Fixes: https://github.com/llvm/llvm-project/issues/139570	2025-05-28 08:05:03 -04:00
Yaxun (Sam) Liu	c720f92009	[HIP] disable sanitizer for `__hip_cuid` (#141581 ) Global variable `__hip_cuid_*` is for identifying purpose and does not need sanitization, therefore disable it for sanitizers.	2025-05-27 16:58:04 -04:00
Alex MacLean	369891b674	[NVPTX] use untyped loads and stores where ever possible (#137698 ) In most cases, the type information attached to load and store instructions is meaningless and inconsistently applied. We can usually use ".b" loads and avoid the complexity of trying to assign the correct type. The one expectation is sign-extending load, which will continue to use ".s" to ensure the sign extension into a larger register is done correctly.	2025-05-10 08:26:26 -07:00
Yaxun (Sam) Liu	83c309b905	[CUDA][HIP] capture possible ODR-used var (#136645 ) In a lambda function, a call of a function may resolve to host and device functions with different signatures. Especially, a constexpr local variable may be passed by value by the device function and passed by reference by the host function, which will cause the constexpr variable captured by the lambda function in host compilation but not in the device compilation. The discrepancy in the lambda captures will violate ODR and causes UB for kernels using these lambdas. This PR fixes the issue by identifying discrepancy of ODR/non-ODR usages of constexpr local variables passed to host/device functions and conservatively capture them. Fixes: https://github.com/llvm/llvm-project/issues/132068	2025-04-23 12:50:28 -04:00
modiking	3d04da5bc0	[NVPTX] Add support for Shared Cluster Memory address space [2/2] (#136768 ) Adds support for new Shared Cluster Memory Address Space (SHARED_CLUSTER, addrspace 7). See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#distributed-shared-memory for details. Follow-up to https://github.com/llvm/llvm-project/pull/135444 1. Update existing codegen/intrinsics in LLVM and MLIR that now use this address space 2. Auto-upgrade previous intrinsics that used SMEM (addrspace 3) but were really taking in a shared cluster pointer to the new address space	2025-04-22 16:50:45 -07:00
Austin Schuh	32fd97c61d	cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (#134758 ) Signed-off-by: Austin Schuh <austin.linux@gmail.com>	2025-04-22 12:36:39 -07:00
Arvind Sudarsanam	6cfec29cb9	[Offload][SYCL] Refactor OffloadKind implementation (#135809 ) Following are the changes: 1. Make OffloadKind enum values to be powers of two so we can use them like a bitfield 2. Include OFK_SYCL enum value 3. Modify ActiveOffloadKinds support in clang-linker-wrapper to use bitfields instead of a vector. Thanks --------- Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>	2025-04-16 14:13:30 +00:00
Joseph Huber	772173f548	[Clang][AMDGPU] Remove special handling for COV4 libraries (#132870 ) Summary: When we were first porting to COV5, this lead to some ABI issues due to a change in how we looked up the work group size. Bitcode libraries relied on the builtins to emit code, but this was changed between versions. This prevented the bitcode libraries, like OpenMP or libc, from being used for both COV4 and COV5. The solution was to have this 'none' functionality which effectively emitted code that branched off of a global to resolve to either version. This isn't a great solution because it forced every TU to have this variable in it. The patch in https://github.com/llvm/llvm-project/pull/131033 removed support for COV4 from OpenMP, which was the only consumer of this functionality. Other users like HIP and OpenCL did not use this because they linked the ROCm Device Library directly which has its own handling (The name was borrowed from it after all). So, now that we don't need to worry about backward compatibility with COV4, we can remove this special handling. Users can still emit COV4 code, this simply removes the special handling used to make the OpenMP device runtime bitcode version agnostic.	2025-03-28 07:35:16 -05:00
Shilei Tian	f1ac2afe21	Reapply "[AMDGPU] Use COV6 by default (#118515 )" (#130963 ) This reverts commit 68bcba6d7a1cc18996c0bcb7c62267c62d2040d0.	2025-03-21 15:26:45 -04:00
Joseph Huber	0a41fb71f1	[Clang] Treat `ext_vector_type` as a regular type attribute (#130177 ) Summary: This attribute is mostly borrowed from OpenCL, but is useful in general for accessing the LLVM vector types. Previously the only way to use it was through typedefs. This patch changes that to allow use as a regular type attribute, similar to address spaces.	2025-03-07 10:09:06 -06:00
Yaxun (Sam) Liu	240f2269ff	Add clang atomic control options and attribute (#114841 ) Add option and statement attribute for controlling emitting of target-specific metadata to atomicrmw instructions in IR. The RFC for this attribute and option is https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641, Originally a pragma was proposed, then it was changed to clang attribute. This attribute allows users to specify one, two, or all three options and must be applied to a compound statement. The attribute can also be nested, with inner attributes overriding the options specified by outer attributes or the target's default options. These options will then determine the target-specific metadata added to atomic instructions in the IR. In addition to the attribute, three new compiler options are introduced: `-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`, `-f[no-]atomic-ignore-denormal-mode`. These compiler options allow users to override the default options through the Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to `-f[no-]ignore-denormal-mode`. In terms of implementation, the atomic attribute is represented in the AST by the existing AttributedStmt, with minimal changes to AST and Sema. During code generation in Clang, the CodeGenModule maintains the current atomic options, which are used to emit the relevant metadata for atomic instructions. RAII is used to manage the saving and restoring of atomic options when entering and exiting nested AttributedStmt.	2025-02-27 10:41:04 -05:00
Alex MacLean	6c2e170d04	[NVPTX] Convert vector function nvvm.annotations to attributes (#127736 ) Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxntid[xyz]" -> "nvvm.maxntid" - !"reqntid[xyz]" -> "nvvm.reqntid" - !"cluster_dim_[xyz]" -> "nvvm.cluster_dim"	2025-02-26 08:45:27 -08:00
Anshil Gandhi	95000fdb9e	[CUDA] Increment VTable index for device thunks (#124989 ) Currently, the clang frontend incorrectly emits the callee instead of the thunk for the callee in the VTable. This is the case because the thunk index is not incremented when their callees cannot be emitted. This patch fixes the bug.	2025-02-19 23:47:22 -05:00
Anshil Gandhi	bf9b72e134	[NFC][Clang] Precommit test for VTable codegen (#124983 ) Associated PR: https://github.com/llvm/llvm-project/pull/124989	2025-02-19 21:14:02 -05:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Srinivasa Ravi	bd860f9864	[NVPTX] Add intrinsics for redux.sync f32 instructions (#126664 ) Adds NVVM intrinsics, NVPTX codegen and Clang builtins for `redux.sync` f32 instructions introduced in ptx8.6 for sm_100a. Tests added in `CodeGen/NVPTX/redux-sync.ll` and `CodeGenCUDA/redux-builtins.cu` and verified through ptxas 12.8.0. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-redux-sync	2025-02-14 11:11:44 +05:30
Alex MacLean	a282b6c486	[NVPTX] Convert scalar function nvvm.annotations to attributes (#125908 ) Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank" - !"minctasm" -> "nvvm.minctasm" - !"maxnreg" -> "nvvm.maxnreg"	2025-02-12 07:33:22 -08:00
Joseph Huber	f1e917d07b	[Offload] Unify offloading entries into a single section (#125731 ) Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.	2025-02-06 08:24:01 -06:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Joseph Huber	70a16b90ff	[HIP] Support managed variables using the new driver (#123437 ) Summary: Previously, managed variables didn't work in rdc mode using the new driver because we just didn't register them. This was previously ignored because we didn't have enough space in the current struct format. This patch amends that by just emitting a struct pair for the two variables and using the single pointer. In the future, a more extensible entry format would be nice, but that can be done later.	2025-01-22 09:13:14 -06:00

1 2 3 4 5 ...

472 Commits