llvm-project

Author	SHA1	Message	Date
Mirko Brkušanin	5d9eb0c76a	[AMDGPU] Define new targets gfx1171 and gfx1172 (#187735 )	2026-04-01 18:16:11 +02:00
Matt Arsenault	28efe7b554	OpenMP: Reimplement getOffloadArch (#189561 ) This function made no sense at all. It was scanning through the feature map looking for something that parsed as an OffloadArch. Directly compute the arch from the target device. I don't know why there isn't just an OffloadArch in TargetOpts, this shouldn't really require parsing.	2026-03-31 19:42:15 +02:00
Khem Raj	82f18b02d9	[Clang] Rename OffloadArch::UNUSED to UNUSED_ to avoid macro collisions (#174528 ) OffloadArch uses an enumerator named `UNUSED`, which is a very common macro name in external codebases (e.g. Mesa defines UNUSED as an attribute helper). If such a macro is visible when including clang/Basic/OffloadArch.h, the preprocessor expands the token inside the enum and breaks compilation of the installed Clang headers. Rename the enumerator to `UNUSED_` and update all in-tree references. This is a spelling-only change (no behavioral impact) and mirrors the existing approach used for SM_32_ to avoid macro clashes.	2026-03-20 17:22:17 -04:00
Jameson Nash	6c532a621a	[OpenMP] Remove NVPTX local addrspace on parameters (#183195 ) In CGOpenMPRuntimeGPU::translateParameter, reference-type captured variables were translated to pointer parameters with two address-space annotations: 1. LangAS::opencl_global on the pointee (for map'd variables), which correctly produces ptr addrspace(1) in NVPTX IR. 2. getLangASFromTargetAS(NVPTX_local_addr=5) on the pointer itself, annotating the parameter as living in NVPTX local (stack) memory. The second annotation is incorrect at the Clang type-system level: EmitParmDecl only supports parameters to be in LangAS::Default (or the special cases for OpenCL). Temporarily add an assert in EmitParmDecl that catches parameters with non-default address spaces in non-OpenCL compilations, and fix the violation by dropping the NVPTX_local_addr addAddressSpace call. Should fix the issue noticed in https://github.com/llvm/llvm-project/pull/181256#discussion_r2821894122, allowing removing that special case there for OpenMP, though I haven't tested the combination yet. That PR would fix EmitParmDecl to actually support non-default address spaces from Sema, and will remove this assert again. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-26 15:56:33 -05:00
Stanislav Mekhanoshin	33fd75f55d	[AMDGPU] Add gfx12-5-generic subtarget (#183381 ) This is functionally equivalent to gfx1250.	2026-02-25 13:34:48 -08:00
Mirko Brkušanin	20b5849e17	[AMDGPU] Define new target gfx1170 (#180185 )	2026-02-06 14:38:50 +01:00
Mariusz Sikora	6de6f7b46b	[AMDGPU] Define gfx1310 target with ELF number 0x50 (#177355 ) For now this is identical to gfx1250. --------- Co-authored-by: Jay Foad <jay.foad@amd.com>	2026-01-22 17:08:38 +01:00
Nikita Popov	3d06968437	[CodeGen] Use getAllOnesValue() for -1 constants	2025-12-16 09:48:38 +01:00
Rajat Bajpai	0df8306479	[Clang][CUDA] Add support for SM_88, SM_110, and SM_110a architectures (#170258 ) This patch adds support for new GPU architectures introduced in CUDA 13.0 in Clang: - SM_88: Ampere architecture variant - SM_110/SM_110a: Blackwell architecture variants Additionally, this patch deprecates SM_101/SM_101a support for CUDA 13.0 and later versions. The SM_101 architecture is superseded by SM_110 and is no longer supported by CUDA 13.0+ toolchain components.	2025-12-09 10:47:28 +05:30
Kevin Sala Penades	0e92beb0c0	[Clang][OpenMP] Switch to __kmpc_parallel_60 with strict parameter (#171082 ) This commit switches the `__kmpc_parallel_51` to `__kmpc_parallel_60`, and adds the strict boolean for the number of threads.	2025-12-08 09:37:11 -08:00
Kareem Ergawy	f481f5bef9	[OpenMP][flang] Add initial support for by-ref reductions on the GPU (#165714 ) Adds initial support for GPU by-ref reductions. The main problem for reduction by reference is that, prior to this PR, we were shuffling (from remote lanes within the same warp or across different warps within the block) pointers/references to the private reduction values rather than the private reduction values themselves. In particular, this diff adds support for reductions on scalar allocatables where reductions happen on loops nested in `target` regions. For example: ```fortran integer :: i real, allocatable :: scalar_alloc allocate(scalar_alloc) scalar_alloc = 0 !$omp target map(tofrom: scalar_alloc) !$omp parallel do reduction(+: scalar_alloc) do i = 1, 1000000 scalar_alloc = scalar_alloc + 1 end do !$omp end target ``` This PR supports by-ref reductions on the intra- and inter-warp levels. So far, there are still steps to be takens for full support of by-ref reductions, for example: * Support inter-block value combination is still not supported. Therefore, `target teams distribute parallel do` is still not supported. * Support for dynamically-sized arrays still needs to be added. * Support for more than one allocatable/array on the same `reduction` clause.	2025-11-26 11:59:22 +01:00
Nick Sarnie	4538818c79	[OpenMP][OMPIRBuilder] Use runtime CC for runtime calls (#168608 ) Some targets have a specific calling convention that should be used for generated calls to runtime functions. Pass that down and use it. Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>	2025-11-21 15:40:20 +00:00
Nick Sarnie	e56b592189	[clang][OMPIRBuilder] Fix two missed function pointer type issues (#162914 ) Two small issues, when storing function pointers we need to use the program address space. With this change there are no asserts if I run all OpenMP tests with the offload target manually changed to SPIR-V, so we are getting somewhere. About 10 test fails though. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-10-13 19:06:39 +00:00
Nick Sarnie	d59f7f2717	[clang][OMPIRBuilder] Fix reduction codegen for SPIR-V (#162529 ) When creating function pointers, make sure the pointer is in the program address space. Also fix a spot where I forgot to set `setDefaultTargetAS` and one spot where we didn't use the default AS for a pointer. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-10-09 14:25:36 +00:00
Robert Imschweiler	814a3a6b61	[OpenMP][clang] Set num_threads 'strict' to unsupported on GPUs (#160659 ) Setting the prescriptiveness of the num_threads clause to 'strict' and having a corresponding check (with message and severity clauses) does not align well with how OpenMP should be handled for GPUs. The num_threads expression may be an arbitrary integer expression which is evaluated on the target, in correspondance to the OpenMP spec. This prevents the check from being done before launching the kernel, especially considering that the num_threads clause is associated with the parallel directive and that there may be multiple parallel directives with different num_threads clauses in a single target region. Acting on the result of the 'strict' check on the GPU would require doing I/O on the GPU, which can introduce performance regressions. Delaying any actions resulting from the 'strict' check and doing them on the host after executing the target region involves additional data copies and is not really semantically correct. For now, the 'strict' modifier for the num_threads clause and its associated message and severity clause are set to be unsupported on GPUs. Targets other than GPUs still support the aforementioned features in the context of an OpenMP target region.	2025-09-26 13:50:18 -05:00
Stanislav Mekhanoshin	e556dc0b23	[AMDGPU] Add gfx1251 subtarget (#159430 )	2025-09-17 13:02:02 -07:00
Robert Imschweiler	23302a2aac	[offload][OpenMP] Remove device code for num_threads strict (#157893 ) Due to potential performance issues, this commit temporarily removes support for the num_threads 'strict' modifier and its corresponding message and severity clauses on the device.	2025-09-11 13:12:29 +00:00
Robert Imschweiler	c94b5f0c0c	Reland: [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (#155839 ) OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary codegen changes.	2025-08-28 21:00:15 +02:00
Robert Imschweiler	9d7e436d86	Revert "[OpenMP][clang] 6.0: num_threads strict (part 3: codegen)" (#155809 ) Reverts llvm/llvm-project#146405	2025-08-28 12:12:53 +02:00
Robert Imschweiler	baf9d2c35d	[OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (#146405 ) OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary codegen changes.	2025-08-28 08:52:27 +00:00
Matheus Izvekov	91cdd35008	[clang] Improve nested name specifier AST representation (#147835 ) This is a major change on how we represent nested name qualifications in the AST. * The nested name specifier itself and how it's stored is changed. The prefixes for types are handled within the type hierarchy, which makes canonicalization for them super cheap, no memory allocation required. Also translating a type into nested name specifier form becomes a no-op. An identifier is stored as a DependentNameType. The nested name specifier gains a lightweight handle class, to be used instead of passing around pointers, which is similar to what is implemented for TemplateName. There is still one free bit available, and this handle can be used within a PointerUnion and PointerIntPair, which should keep bit-packing aficionados happy. * The ElaboratedType node is removed, all type nodes in which it could previously apply to can now store the elaborated keyword and name qualifier, tail allocating when present. * TagTypes can now point to the exact declaration found when producing these, as opposed to the previous situation of there only existing one TagType per entity. This increases the amount of type sugar retained, and can have several applications, for example in tracking module ownership, and other tools which care about source file origins, such as IWYU. These TagTypes are lazily allocated, in order to limit the increase in AST size. This patch offers a great performance benefit. It greatly improves compilation time for [stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for `test_on2.cpp` in that project, which is the slowest compiling test, this patch improves `-c` compilation time by about 7.2%, with the `-fsyntax-only` improvement being at ~12%. This has great results on compile-time-tracker as well: ![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831) This patch also further enables other optimziations in the future, and will reduce the performance impact of template specialization resugaring when that lands. It has some other miscelaneous drive-by fixes. About the review: Yes the patch is huge, sorry about that. Part of the reason is that I started by the nested name specifier part, before the ElaboratedType part, but that had a huge performance downside, as ElaboratedType is a big performance hog. I didn't have the steam to go back and change the patch after the fact. There is also a lot of internal API changes, and it made sense to remove ElaboratedType in one go, versus removing it from one type at a time, as that would present much more churn to the users. Also, the nested name specifier having a different API avoids missing changes related to how prefixes work now, which could make existing code compile but not work. How to review: The important changes are all in `clang/include/clang/AST` and `clang/lib/AST`, with also important changes in `clang/lib/Sema/TreeTransform.h`. The rest and bulk of the changes are mostly consequences of the changes in API. PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just for easier to rebasing. I plan to rename it back after this lands. Fixes #136624 Fixes https://github.com/llvm/llvm-project/issues/43179 Fixes https://github.com/llvm/llvm-project/issues/68670 Fixes https://github.com/llvm/llvm-project/issues/92757	2025-08-09 05:06:53 -03:00
Artem Belevich	507b879b6e	[CUDA] add support for targeting sm_103/sm_121 with CUDA-12.9 (#151587 )	2025-07-31 13:38:54 -07:00
Robert Imschweiler	775a69b237	[OpenMP] Fix comma -> semicolon (#145900 ) Fix small typo.	2025-06-26 17:27:20 +02:00
Stanislav Mekhanoshin	69974658f0	[AMDGPU] Initial support for gfx1250 target. (#144965 ) This is just a stub for now.	2025-06-19 22:52:51 -07:00
Devon Loehr	63de20c0de	Reland "Add macro to suppress -Wunnecessary-virtual-specifier" (#141091 ) This fixes #139614 on non-clang compilers by moving `__has_warning` completely inside the `#if defined(__clang__)` block. This prevents a parse failure from compilers which don't recognize `__has_warning`. Original description: Followup to #138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.	2025-05-28 12:15:22 +02:00
Philip Reames	e4e7a7e64e	Revert "Add macro to suppress -Wunnecessary-virtual-specifier (#139614 )" This reverts commit 0954c9d487e7cb30673df9f0ac125f71320d2936. It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).	2025-05-21 11:31:26 -07:00
Devon Loehr	0954c9d487	Add macro to suppress -Wunnecessary-virtual-specifier (#139614 ) Followup to #138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.	2025-05-21 10:54:36 -07:00
Kazu Hirata	325281631a	[clang] Use Map::try_emplace (NFC) (#140477 ) We can simplify the code with Map::try_emplace where we need default-constructed values while avoding calling constructors when keys are already present.	2025-05-19 06:19:53 -07:00
Kazu Hirata	f9f69dac2a	[clang] Remove redundant control flow statements (NFC) (#140359 )	2025-05-17 12:59:47 -07:00
Justin Cai	faf4e8af74	[Clang][SYCL] Add initial set of Intel OffloadArch values (#138158 ) Following #137070, this PR adds an initial set of Intel `OffloadArch` values with corresponding predicates that will be used in SYCL offloading. More Intel architectures will be added in a future PR.	2025-05-01 16:29:48 -05:00
Kazu Hirata	55651e743b	[clang] Use range constructors of *Set (NFC) (#137574 )	2025-04-27 21:17:14 -07:00
Jan Leyonberg	fbc8335311	[MLIR][OpenMP] Add codegen for teams reductions (#133310 ) This patch adds the lowering of teams reductions from the omp dialect to LLVM-IR. Some minor cleanup was done in clang to remove an unused parameter.	2025-04-07 12:47:16 -04:00
Nikita Popov	b384d6d6cc	[CodeGen] Don't include CGDebugInfo.h in CodeGenFunction.h (NFC) (#134100 ) This is an expensive header, only include it where needed. Move some functions out of line to achieve that. This reduces time to build clang by ~0.5% in terms of instructions retired.	2025-04-03 08:04:19 +02:00
Sebastian Jodłowski	0127f169dc	[CUDA] Add support for sm101 and sm120 target architectures (#127187 ) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. --------- Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>	2025-02-19 14:41:07 -08:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Sergey Kozub	616979ebd7	[NVPTX] Add support for PTX 8.6 and CUDA 12.6 (12.8) (#123398 ) Add CUDA versions 12.7, 12.8, 12.9 which support PTX8.6+ (enables using Blackwell-specific instructions).	2025-01-21 11:00:24 +01:00
Sergio Afonso	fabc443e93	[OMPIRBuilder] Support runtime number of teams and threads, and SPMD mode (#116051 ) This patch introduces a `TargetKernelRuntimeAttrs` structure to hold host-evaluated `num_teams`, `thread_limit`, `num_threads` and trip count values passed to the runtime kernel offloading call. Additionally, kernel type information is used to influence target device code generation and the `IsSPMD` flag is replaced by `ExecFlags`, which provides more granularity.	2025-01-14 12:34:37 +00:00
Sergio Afonso	27bc6bdaba	[OMPIRBuilder] Introduce struct to hold default kernel teams/threads (#116050 ) This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs` structure used to simplify passing default and constant values for number of teams and threads, and possibly other target kernel-related information in the future. This is used to forward values passed to `createTarget` to `createTargetInit`, which previously used a default unrelated set of values.	2025-01-14 11:08:55 +00:00
Sergio Afonso	b79ed8729b	[OpenMP][OMPIRBuilder] Handle non-failing calls properly (#115863 ) The preprocessor definition used to enable asserts and the one that `llvm::Error` and `llvm::Expected` use to ensure all created instances are checked are not the same. By making these checks inside of an `assert` in cases where errors are not expected, certain build configurations would trigger runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF -DLLVM_UNREACHABLE_OPTIMIZE=ON`). The `llvm::cantFail()` function, which was intended for this use case, is used by this patch in place of `assert` to prevent these runtime failures. In tests, new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and `EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release builds.	2025-01-09 10:28:16 +00:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Kazu Hirata	e8a6624325	[CodeGen] Remove unused includes (NFC) (#116459 ) Identified with misc-include-cleaner.	2024-11-16 07:37:13 -08:00
Shilei Tian	de0fd64bed	[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190 ) This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.	2024-11-12 23:11:05 -05:00
Sergio Afonso	d87964de78	[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533 ) This patch implements an approach to communicate errors between the OMPIRBuilder and its users. It introduces `llvm::Error` and `llvm::Expected` objects to replace the values returned by callbacks passed to `OMPIRBuilder` codegen functions. These functions then check the result for errors when callbacks are called and forward them back to the caller, which has the flexibility to recover, exit cleanly or dump a stack trace. This prevents a failed callback to leave the IR in an invalid state and still continue the codegen process, triggering unrelated assertions or segmentation faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this change results in the compiler emitting errors and exiting early instead of triggering a crash for not-yet-implemented errors. The behavior in Clang and openmp-opt stays unchanged, since callbacks will continue always returning 'success'.	2024-10-25 11:30:16 +01:00
Jay Foad	4dd55c567a	[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399 ) Follow up to #109133.	2024-10-24 10:23:40 +01:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Artem Belevich	30a06e8022	[CUDA] Add support for CUDA-12.6 and sm_100 (#112028 ) This is a copy of #97402(with minor updates), which is now ready to land. --------- Co-authored-by: Sergey Kozub <skozub@nvidia.com>	2024-10-14 11:51:05 -07:00
Youngsuk Kim	29d0a84704	[clang][CGOpenMPRuntimeGPU] Avoid llvm::Type::getPointerTo() (NFC) (#110357 ) `llvm::Type::getPointerTo()` is to be removed soon.	2024-09-28 09:57:20 -04:00
Joseph Huber	e0326b668e	[OpenMP] Map `omp_default_mem_alloc` to global memory (#104790 ) Summary: Currently, we assign this to private memory. This causes failures on some SOLLVE tests. The standard isn't clear on the semantics of this allocation type, but there seems to be a consensus that it's supposed to be shared memory.	2024-08-20 12:00:41 -05:00
Jan Leyonberg	5b15d9c441	[clang][OpenMP] Propoagate debug location to OMPIRBuilder reduction codegen (#100358 ) This patch propagates the debug location from Clang to the OpenMPIRBuilder. Fixes https://github.com/llvm/llvm-project/issues/97458	2024-07-24 08:57:39 -05:00
Jakub Chlanda	ab20086422	[CUDA][NFC] CudaArch to OffloadArch rename (#97028 ) Rename `CudaArch` to `OffloadArch` to better reflect its content and the use. Apply a similar rename to helpers handling the enum.	2024-06-30 07:56:07 +02:00

1 2 3 4 5

211 Commits