llvm-project

Author	SHA1	Message	Date
Matheus Izvekov	91cdd35008	[clang] Improve nested name specifier AST representation (#147835 ) This is a major change on how we represent nested name qualifications in the AST. * The nested name specifier itself and how it's stored is changed. The prefixes for types are handled within the type hierarchy, which makes canonicalization for them super cheap, no memory allocation required. Also translating a type into nested name specifier form becomes a no-op. An identifier is stored as a DependentNameType. The nested name specifier gains a lightweight handle class, to be used instead of passing around pointers, which is similar to what is implemented for TemplateName. There is still one free bit available, and this handle can be used within a PointerUnion and PointerIntPair, which should keep bit-packing aficionados happy. * The ElaboratedType node is removed, all type nodes in which it could previously apply to can now store the elaborated keyword and name qualifier, tail allocating when present. * TagTypes can now point to the exact declaration found when producing these, as opposed to the previous situation of there only existing one TagType per entity. This increases the amount of type sugar retained, and can have several applications, for example in tracking module ownership, and other tools which care about source file origins, such as IWYU. These TagTypes are lazily allocated, in order to limit the increase in AST size. This patch offers a great performance benefit. It greatly improves compilation time for [stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for `test_on2.cpp` in that project, which is the slowest compiling test, this patch improves `-c` compilation time by about 7.2%, with the `-fsyntax-only` improvement being at ~12%. This has great results on compile-time-tracker as well: ![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831) This patch also further enables other optimziations in the future, and will reduce the performance impact of template specialization resugaring when that lands. It has some other miscelaneous drive-by fixes. About the review: Yes the patch is huge, sorry about that. Part of the reason is that I started by the nested name specifier part, before the ElaboratedType part, but that had a huge performance downside, as ElaboratedType is a big performance hog. I didn't have the steam to go back and change the patch after the fact. There is also a lot of internal API changes, and it made sense to remove ElaboratedType in one go, versus removing it from one type at a time, as that would present much more churn to the users. Also, the nested name specifier having a different API avoids missing changes related to how prefixes work now, which could make existing code compile but not work. How to review: The important changes are all in `clang/include/clang/AST` and `clang/lib/AST`, with also important changes in `clang/lib/Sema/TreeTransform.h`. The rest and bulk of the changes are mostly consequences of the changes in API. PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just for easier to rebasing. I plan to rename it back after this lands. Fixes #136624 Fixes https://github.com/llvm/llvm-project/issues/43179 Fixes https://github.com/llvm/llvm-project/issues/68670 Fixes https://github.com/llvm/llvm-project/issues/92757	2025-08-09 05:06:53 -03:00
Artem Belevich	507b879b6e	[CUDA] add support for targeting sm_103/sm_121 with CUDA-12.9 (#151587 )	2025-07-31 13:38:54 -07:00
Robert Imschweiler	775a69b237	[OpenMP] Fix comma -> semicolon (#145900 ) Fix small typo.	2025-06-26 17:27:20 +02:00
Stanislav Mekhanoshin	69974658f0	[AMDGPU] Initial support for gfx1250 target. (#144965 ) This is just a stub for now.	2025-06-19 22:52:51 -07:00
Devon Loehr	63de20c0de	Reland "Add macro to suppress -Wunnecessary-virtual-specifier" (#141091 ) This fixes #139614 on non-clang compilers by moving `__has_warning` completely inside the `#if defined(__clang__)` block. This prevents a parse failure from compilers which don't recognize `__has_warning`. Original description: Followup to #138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.	2025-05-28 12:15:22 +02:00
Philip Reames	e4e7a7e64e	Revert "Add macro to suppress -Wunnecessary-virtual-specifier (#139614 )" This reverts commit 0954c9d487e7cb30673df9f0ac125f71320d2936. It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).	2025-05-21 11:31:26 -07:00
Devon Loehr	0954c9d487	Add macro to suppress -Wunnecessary-virtual-specifier (#139614 ) Followup to #138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.	2025-05-21 10:54:36 -07:00
Kazu Hirata	325281631a	[clang] Use Map::try_emplace (NFC) (#140477 ) We can simplify the code with Map::try_emplace where we need default-constructed values while avoding calling constructors when keys are already present.	2025-05-19 06:19:53 -07:00
Kazu Hirata	f9f69dac2a	[clang] Remove redundant control flow statements (NFC) (#140359 )	2025-05-17 12:59:47 -07:00
Justin Cai	faf4e8af74	[Clang][SYCL] Add initial set of Intel OffloadArch values (#138158 ) Following #137070, this PR adds an initial set of Intel `OffloadArch` values with corresponding predicates that will be used in SYCL offloading. More Intel architectures will be added in a future PR.	2025-05-01 16:29:48 -05:00
Kazu Hirata	55651e743b	[clang] Use range constructors of *Set (NFC) (#137574 )	2025-04-27 21:17:14 -07:00
Jan Leyonberg	fbc8335311	[MLIR][OpenMP] Add codegen for teams reductions (#133310 ) This patch adds the lowering of teams reductions from the omp dialect to LLVM-IR. Some minor cleanup was done in clang to remove an unused parameter.	2025-04-07 12:47:16 -04:00
Nikita Popov	b384d6d6cc	[CodeGen] Don't include CGDebugInfo.h in CodeGenFunction.h (NFC) (#134100 ) This is an expensive header, only include it where needed. Move some functions out of line to achieve that. This reduces time to build clang by ~0.5% in terms of instructions retired.	2025-04-03 08:04:19 +02:00
Sebastian Jodłowski	0127f169dc	[CUDA] Add support for sm101 and sm120 target architectures (#127187 ) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. --------- Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>	2025-02-19 14:41:07 -08:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Sergey Kozub	616979ebd7	[NVPTX] Add support for PTX 8.6 and CUDA 12.6 (12.8) (#123398 ) Add CUDA versions 12.7, 12.8, 12.9 which support PTX8.6+ (enables using Blackwell-specific instructions).	2025-01-21 11:00:24 +01:00
Sergio Afonso	fabc443e93	[OMPIRBuilder] Support runtime number of teams and threads, and SPMD mode (#116051 ) This patch introduces a `TargetKernelRuntimeAttrs` structure to hold host-evaluated `num_teams`, `thread_limit`, `num_threads` and trip count values passed to the runtime kernel offloading call. Additionally, kernel type information is used to influence target device code generation and the `IsSPMD` flag is replaced by `ExecFlags`, which provides more granularity.	2025-01-14 12:34:37 +00:00
Sergio Afonso	27bc6bdaba	[OMPIRBuilder] Introduce struct to hold default kernel teams/threads (#116050 ) This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs` structure used to simplify passing default and constant values for number of teams and threads, and possibly other target kernel-related information in the future. This is used to forward values passed to `createTarget` to `createTargetInit`, which previously used a default unrelated set of values.	2025-01-14 11:08:55 +00:00
Sergio Afonso	b79ed8729b	[OpenMP][OMPIRBuilder] Handle non-failing calls properly (#115863 ) The preprocessor definition used to enable asserts and the one that `llvm::Error` and `llvm::Expected` use to ensure all created instances are checked are not the same. By making these checks inside of an `assert` in cases where errors are not expected, certain build configurations would trigger runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF -DLLVM_UNREACHABLE_OPTIMIZE=ON`). The `llvm::cantFail()` function, which was intended for this use case, is used by this patch in place of `assert` to prevent these runtime failures. In tests, new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and `EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release builds.	2025-01-09 10:28:16 +00:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Kazu Hirata	e8a6624325	[CodeGen] Remove unused includes (NFC) (#116459 ) Identified with misc-include-cleaner.	2024-11-16 07:37:13 -08:00
Shilei Tian	de0fd64bed	[AMDGPU] Introduce a new generic target `gfx9-4-generic` (#115190 ) This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.	2024-11-12 23:11:05 -05:00
Sergio Afonso	d87964de78	[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533 ) This patch implements an approach to communicate errors between the OMPIRBuilder and its users. It introduces `llvm::Error` and `llvm::Expected` objects to replace the values returned by callbacks passed to `OMPIRBuilder` codegen functions. These functions then check the result for errors when callbacks are called and forward them back to the caller, which has the flexibility to recover, exit cleanly or dump a stack trace. This prevents a failed callback to leave the IR in an invalid state and still continue the codegen process, triggering unrelated assertions or segmentation faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this change results in the compiler emitting errors and exiting early instead of triggering a crash for not-yet-implemented errors. The behavior in Clang and openmp-opt stays unchanged, since callbacks will continue always returning 'success'.	2024-10-25 11:30:16 +01:00
Jay Foad	4dd55c567a	[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399 ) Follow up to #109133.	2024-10-24 10:23:40 +01:00
Carl Ritson	076aac59ac	[AMDGPU] Add a new target for gfx1153 (#113138 )	2024-10-23 12:56:58 +09:00
Artem Belevich	30a06e8022	[CUDA] Add support for CUDA-12.6 and sm_100 (#112028 ) This is a copy of #97402(with minor updates), which is now ready to land. --------- Co-authored-by: Sergey Kozub <skozub@nvidia.com>	2024-10-14 11:51:05 -07:00
Youngsuk Kim	29d0a84704	[clang][CGOpenMPRuntimeGPU] Avoid llvm::Type::getPointerTo() (NFC) (#110357 ) `llvm::Type::getPointerTo()` is to be removed soon.	2024-09-28 09:57:20 -04:00
Joseph Huber	e0326b668e	[OpenMP] Map `omp_default_mem_alloc` to global memory (#104790 ) Summary: Currently, we assign this to private memory. This causes failures on some SOLLVE tests. The standard isn't clear on the semantics of this allocation type, but there seems to be a consensus that it's supposed to be shared memory.	2024-08-20 12:00:41 -05:00
Jan Leyonberg	5b15d9c441	[clang][OpenMP] Propoagate debug location to OMPIRBuilder reduction codegen (#100358 ) This patch propagates the debug location from Clang to the OpenMPIRBuilder. Fixes https://github.com/llvm/llvm-project/issues/97458	2024-07-24 08:57:39 -05:00
Jakub Chlanda	ab20086422	[CUDA][NFC] CudaArch to OffloadArch rename (#97028 ) Rename `CudaArch` to `OffloadArch` to better reflect its content and the use. Apply a similar rename to helpers handling the enum.	2024-06-30 07:56:07 +02:00
Kazu Hirata	2f57df5826	[CodeGen] Fix a warning This patch fixes: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1662:8: error: unused variable 'ParallelReduction' [-Werror,-Wunused-variable]	2024-06-26 14:37:06 -07:00
Akash Banerjee	6b1c51bc05	[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343 ) This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions. Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>	2024-06-26 20:18:38 +01:00
Alex Voicu	9acb533c38	[clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (#95061 ) This patch augments the HIPAMD driver to allow it to target AMDGCN flavoured SPIR-V compilation. It's mostly straightforward, as we re-use some of the existing SPIRV infra, however there are a few notable additions: - we introduce an `amdgcnspirv` offload arch, rather than relying on using `generic` (this is already fairly overloaded) or simply using `spirv` or `spirv64` (we'll want to use these to denote unflavoured SPIRV, once we bring up that capability) - initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU targets, as it would require some relatively intrusive surgery in the HIPAMD Toolchain and the Driver to deal with two triples (`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively) - in order to retain user provided compiler flags and have them available at JIT time, we rely on embedding the command line via `-fembed-bitcode=marker`, which the bitcode writer had previously not implemented for SPIRV; we only allow it conditionally for AMDGCN flavoured SPIRV, and it is handled correctly by the Translator (it ends up as a string literal) Once the SPIRV BE is no longer experimental we'll switch to using that rather than the translator. There's some additional work that'll come via a separate PR around correctly piping through AMDGCN's implementation of `printf`, for now we merely handle its flags correctly.	2024-06-25 12:19:28 +01:00
Shilei Tian	1ca0055f45	[AMDGPU] Add a new target gfx1152 (#94534 )	2024-06-06 12:16:11 -04:00
Konstantin Zhuravlyov	2bfa26d30f	AMDGPU: Add missing gfx* generic targets handling in clang (NVPTX, OpenMP runtime) (#94483 )	2024-06-05 11:57:17 -04:00
Ahmed Bougacha	3575d23ca8	[clang][CodeGen] Remove unused LValue::getAddress CGF arg. (#92465 ) This is in effect a revert of f139ae3d93797, as we have since gained a more sophisticated way of doing extra IRGen with the addition of RawAddress in #86923.	2024-05-20 10:23:04 -07:00
Erich Keane	39adc8f423	[NFC] Generalize ArraySections to work for OpenACC in the future (#89639 ) OpenACC is going to need an array sections implementation that is a simpler version/more restrictive version of the OpenMP version. This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`, then adds an enum to choose between the two. This also fixes a couple of 'drive-by' issues that I discovered on the way, but leaves the OpenACC Sema parts reasonably unimplemented (no semantic analysis implementation), as that will be a followup patch.	2024-04-25 10:22:03 -07:00
Joseph Huber	9e7aab951f	[CUDA] Rename SM_32 to SM_32_ to work around AIX headers (#88779 ) Summary: AIX headers define this, so we need to work around it. In the future this will be removed but for now we should just rename it to avoid these issues.	2024-04-16 07:43:13 -05:00
David Pagan	a12836647e	[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278 ) IR for 'target teams loop' is now dependent on suitability of associated loop-nest. If a loop-nest: - does not contain a function call, or - the -fopenmp-assume-no-nested-parallelism has been specified, - or the call is to an OpenMP API AND - does not contain nested loop bind(parallel) directives then it can be emitted as 'target teams distribute parallel for', which is the current default. Otherwise, it is emitted as 'target teams distribute'. Added debug output indicating how 'target teams loop' was emitted. Flag is -mllvm -debug-only=target-teams-loop-codegen Added LIT tests explicitly verifying 'target teams loop' emitted as a parallel loop and a distribute loop. Updated other 'loop' related tests as needed to reflect change in IR. - These updates account for most of the changed files and additions/deletions.	2024-04-10 13:09:17 -07:00
Akira Hatanaka	84780af4b0	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86923 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies d9a685a9dd589486e882b722e513ee7b8c84870c, which was reverted because it broke ubsan bots. There seems to be a bug in coroutine code-gen, which is causing EmitTypeCheck to use the wrong alignment. For now, pass alignment zero to EmitTypeCheck so that it can compute the correct alignment based on the passed type (see function EmitCXXMemberOrOperatorMemberCallExpr).	2024-03-28 06:54:36 -07:00
Akira Hatanaka	f75eebab88	Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721 )" (#86898 ) This reverts commit d9a685a9dd589486e882b722e513ee7b8c84870c. The commit broke ubsan bots.	2024-03-27 18:14:04 -07:00
Akira Hatanaka	d9a685a9dd	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit broke msan bots because LValue::IsKnownNonNull was uninitialized.	2024-03-27 12:24:49 -07:00
Akira Hatanaka	b311756450	Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 )" (#86674 ) This reverts commit 8bd1f9116aab879183f34707e6d21c7051d083b6. It appears that the commit broke msan bots.	2024-03-26 07:37:57 -07:00
Akira Hatanaka	8bd1f9116a	[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454 ) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects.	2024-03-25 18:05:42 -07:00
Artem Belevich	631c6e834c	[CUDA] Add support for CUDA-12.3 and sm_90a (#74895 )	2023-12-11 12:18:28 -08:00
Youngsuk Kim	d43c081aef	[clang][CGOpenMPRuntimeGPU] Merge consecutive AddrSpaceCasts (NFC) (#74279 ) Merge consecutive AddrSpaceCasts into a single AddrSpaceCast.	2023-12-04 07:03:09 -05:00
Johannes Doerfert	fae233c63f	[OpenMP] Avoid initializing the KernelLaunchEnvironment if possible (#73864 ) If we don't have a team reduction we don't need a kernel launch environment (for now). In that case we can avoid the cost.	2023-11-29 14:49:13 -08:00
Youngsuk Kim	bc6b632723	[CGOpenMPRuntimeGPU] Remove no-op ptr-to-ptr bitcasts (NFC) Opaque ptr cleanup effort	2023-11-25 11:28:18 -06:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Youngsuk Kim	b4db24e330	[CGOpenMPRuntimeGPU] Replace unneeded use of CreatePointerBitCastOrAddrSpaceCast (NFC) Opaque ptr cleanup effort (NFC)	2023-11-18 04:17:46 -06:00

1 2 3 4

191 Commits