191 Commits

Author SHA1 Message Date
Matheus Izvekov
91cdd35008
[clang] Improve nested name specifier AST representation (#147835)
This is a major change on how we represent nested name qualifications in
the AST.

* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.

This has great results on compile-time-tracker as well:

![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831)

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.

The rest and bulk of the changes are mostly consequences of the changes
in API.

PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
2025-08-09 05:06:53 -03:00
Artem Belevich
507b879b6e
[CUDA] add support for targeting sm_103/sm_121 with CUDA-12.9 (#151587) 2025-07-31 13:38:54 -07:00
Robert Imschweiler
775a69b237
[OpenMP] Fix comma -> semicolon (#145900)
Fix small typo.
2025-06-26 17:27:20 +02:00
Stanislav Mekhanoshin
69974658f0
[AMDGPU] Initial support for gfx1250 target. (#144965)
This is just a stub for now.
2025-06-19 22:52:51 -07:00
Devon Loehr
63de20c0de
Reland "Add macro to suppress -Wunnecessary-virtual-specifier" (#141091)
This fixes #139614 on non-clang compilers by moving `__has_warning`
completely inside the `#if defined(__clang__)` block. This prevents a
parse failure from compilers which don't recognize `__has_warning`.

Original description:
Followup to #138741.

This adds the requested macro to silence
`-Wunnecessary-virtual-specifier` when declaring virtual anchor
functions in `final` classes, per [LLVM
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers).

It also cleans up any remaining instances of the warning, allowing us to
stop disabling it when we build LLVM.
2025-05-28 12:15:22 +02:00
Philip Reames
e4e7a7e64e Revert "Add macro to suppress -Wunnecessary-virtual-specifier (#139614)"
This reverts commit 0954c9d487e7cb30673df9f0ac125f71320d2936.

It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).
2025-05-21 11:31:26 -07:00
Devon Loehr
0954c9d487
Add macro to suppress -Wunnecessary-virtual-specifier (#139614)
Followup to #138741.

This adds the requested macro to silence
`-Wunnecessary-virtual-specifier` when declaring virtual anchor
functions in `final` classes, per [LLVM
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers).

It also cleans up any remaining instances of the warning, allowing us to
stop disabling it when we build LLVM.
2025-05-21 10:54:36 -07:00
Kazu Hirata
325281631a
[clang] Use *Map::try_emplace (NFC) (#140477)
We can simplify the code with *Map::try_emplace where we need
default-constructed values while avoding calling constructors when
keys are already present.
2025-05-19 06:19:53 -07:00
Kazu Hirata
f9f69dac2a
[clang] Remove redundant control flow statements (NFC) (#140359) 2025-05-17 12:59:47 -07:00
Justin Cai
faf4e8af74
[Clang][SYCL] Add initial set of Intel OffloadArch values (#138158)
Following #137070, this PR adds an initial set of Intel `OffloadArch`
values with corresponding predicates that will be used in SYCL
offloading. More Intel architectures will be added in a future PR.
2025-05-01 16:29:48 -05:00
Kazu Hirata
55651e743b
[clang] Use range constructors of *Set (NFC) (#137574) 2025-04-27 21:17:14 -07:00
Jan Leyonberg
fbc8335311
[MLIR][OpenMP] Add codegen for teams reductions (#133310)
This patch adds the lowering of teams reductions from the omp dialect to
LLVM-IR. Some minor cleanup was done in clang to remove an unused
parameter.
2025-04-07 12:47:16 -04:00
Nikita Popov
b384d6d6cc
[CodeGen] Don't include CGDebugInfo.h in CodeGenFunction.h (NFC) (#134100)
This is an expensive header, only include it where needed. Move some
functions out of line to achieve that.

This reduces time to build clang by ~0.5% in terms of instructions
retired.
2025-04-03 08:04:19 +02:00
Sebastian Jodłowski
0127f169dc
[CUDA] Add support for sm101 and sm120 target architectures (#127187)
Add support for sm101 and sm120 target architectures. It requires CUDA
12.8.

---------

Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>
2025-02-19 14:41:07 -08:00
Fabian Ritter
029c8e783d
[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
2025-02-19 10:11:48 +01:00
Sergey Kozub
616979ebd7
[NVPTX] Add support for PTX 8.6 and CUDA 12.6 (12.8) (#123398)
Add CUDA versions 12.7, 12.8, 12.9 which support PTX8.6+ (enables using Blackwell-specific instructions).
2025-01-21 11:00:24 +01:00
Sergio Afonso
fabc443e93
[OMPIRBuilder] Support runtime number of teams and threads, and SPMD mode (#116051)
This patch introduces a `TargetKernelRuntimeAttrs` structure to hold
host-evaluated `num_teams`, `thread_limit`, `num_threads` and trip count
values passed to the runtime kernel offloading call.

Additionally, kernel type information is used to influence target device
code generation and the `IsSPMD` flag is replaced by `ExecFlags`, which
provides more granularity.
2025-01-14 12:34:37 +00:00
Sergio Afonso
27bc6bdaba
[OMPIRBuilder] Introduce struct to hold default kernel teams/threads (#116050)
This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs`
structure used to simplify passing default and constant values for
number of teams and threads, and possibly other target kernel-related
information in the future.

This is used to forward values passed to `createTarget` to
`createTargetInit`, which previously used a default unrelated set of
values.
2025-01-14 11:08:55 +00:00
Sergio Afonso
b79ed8729b
[OpenMP][OMPIRBuilder] Handle non-failing calls properly (#115863)
The preprocessor definition used to enable asserts and the one that
`llvm::Error` and `llvm::Expected` use to ensure all created instances are
checked are not the same. By making these checks inside of an `assert` in cases
where errors are not expected, certain build configurations would trigger
runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF
-DLLVM_UNREACHABLE_OPTIMIZE=ON`).

The `llvm::cantFail()` function, which was intended for this use case, is used
by this patch in place of `assert` to prevent these runtime failures. In tests,
new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and
`EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release
builds.
2025-01-09 10:28:16 +00:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Kazu Hirata
e8a6624325
[CodeGen] Remove unused includes (NFC) (#116459)
Identified with misc-include-cleaner.
2024-11-16 07:37:13 -08:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Sergio Afonso
d87964de78
[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533)
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.

This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
2024-10-25 11:30:16 +01:00
Jay Foad
4dd55c567a
[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399)
Follow up to #109133.
2024-10-24 10:23:40 +01:00
Carl Ritson
076aac59ac
[AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Artem Belevich
30a06e8022
[CUDA] Add support for CUDA-12.6 and sm_100 (#112028)
This is a copy of #97402(with minor updates), which is now ready to land.

---------

Co-authored-by: Sergey Kozub <skozub@nvidia.com>
2024-10-14 11:51:05 -07:00
Youngsuk Kim
29d0a84704
[clang][CGOpenMPRuntimeGPU] Avoid llvm::Type::getPointerTo() (NFC) (#110357)
`llvm::Type::getPointerTo()` is to be removed soon.
2024-09-28 09:57:20 -04:00
Joseph Huber
e0326b668e
[OpenMP] Map omp_default_mem_alloc to global memory (#104790)
Summary:
Currently, we assign this to private memory. This causes failures on
some SOLLVE tests. The standard isn't clear on the semantics of this
allocation type, but there seems to be a consensus that it's supposed to
be shared memory.
2024-08-20 12:00:41 -05:00
Jan Leyonberg
5b15d9c441
[clang][OpenMP] Propoagate debug location to OMPIRBuilder reduction codegen (#100358)
This patch propagates the debug location from Clang to the
OpenMPIRBuilder.

Fixes https://github.com/llvm/llvm-project/issues/97458
2024-07-24 08:57:39 -05:00
Jakub Chlanda
ab20086422
[CUDA][NFC] CudaArch to OffloadArch rename (#97028)
Rename `CudaArch` to `OffloadArch` to better reflect its content and the
use.
Apply a similar rename to helpers handling the enum.
2024-06-30 07:56:07 +02:00
Kazu Hirata
2f57df5826 [CodeGen] Fix a warning
This patch fixes:

  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp:1662:8: error: unused
  variable 'ParallelReduction' [-Werror,-Wunused-variable]
2024-06-26 14:37:06 -07:00
Akash Banerjee
6b1c51bc05
[OpenMP] Migrate GPU Reductions CodeGen from Clang to OMPIRBuilder (#80343)
This patch migrates the CGOpenMPRuntimeGPU::emitReduction and related functions to the OpenMPIRBUilder. In future patches MLIR OpenMP translation would be making use of these functions.

Co-authored-by: Jan Leyonberg <jan.leyonberg@amd.com>
2024-06-26 20:18:38 +01:00
Alex Voicu
9acb533c38
[clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (#95061)
This patch augments the HIPAMD driver to allow it to target AMDGCN
flavoured SPIR-V compilation. It's mostly straightforward, as we re-use
some of the existing SPIRV infra, however there are a few notable
additions:

- we introduce an `amdgcnspirv` offload arch, rather than relying on
using `generic` (this is already fairly overloaded) or simply using
`spirv` or `spirv64` (we'll want to use these to denote unflavoured
SPIRV, once we bring up that capability)
- initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU
targets, as it would require some relatively intrusive surgery in the
HIPAMD Toolchain and the Driver to deal with two triples
(`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively)
- in order to retain user provided compiler flags and have them
available at JIT time, we rely on embedding the command line via
`-fembed-bitcode=marker`, which the bitcode writer had previously not
implemented for SPIRV; we only allow it conditionally for AMDGCN
flavoured SPIRV, and it is handled correctly by the Translator (it ends
up as a string literal)

Once the SPIRV BE is no longer experimental we'll switch to using that
rather than the translator. There's some additional work that'll come
via a separate PR around correctly piping through AMDGCN's
implementation of `printf`, for now we merely handle its flags
correctly.
2024-06-25 12:19:28 +01:00
Shilei Tian
1ca0055f45
[AMDGPU] Add a new target gfx1152 (#94534) 2024-06-06 12:16:11 -04:00
Konstantin Zhuravlyov
2bfa26d30f
AMDGPU: Add missing gfx* generic targets handling in clang (NVPTX, OpenMP runtime) (#94483) 2024-06-05 11:57:17 -04:00
Ahmed Bougacha
3575d23ca8
[clang][CodeGen] Remove unused LValue::getAddress CGF arg. (#92465)
This is in effect a revert of f139ae3d93797, as we have since gained a
more sophisticated way of doing extra IRGen with the addition of
RawAddress in #86923.
2024-05-20 10:23:04 -07:00
Erich Keane
39adc8f423
[NFC] Generalize ArraySections to work for OpenACC in the future (#89639)
OpenACC is going to need an array sections implementation that is a
simpler version/more restrictive version of the OpenMP version. 

This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`,
 then adds an enum to choose between the two.

This also fixes a couple of 'drive-by' issues that I discovered on the way,
but leaves the OpenACC Sema parts reasonably unimplemented (no semantic
analysis implementation), as that will be a followup patch.
2024-04-25 10:22:03 -07:00
Joseph Huber
9e7aab951f
[CUDA] Rename SM_32 to SM_32_ to work around AIX headers (#88779)
Summary:
AIX headers define this, so we need to work around it. In the future
this will be removed but for now we should just rename it to avoid these
issues.
2024-04-16 07:43:13 -05:00
David Pagan
a12836647e
[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278)
IR for 'target teams loop' is now dependent on suitability of associated
loop-nest.

If a loop-nest:

- does not contain a function call, or
- the -fopenmp-assume-no-nested-parallelism has been specified,
- or the call is to an OpenMP API AND
- does not contain nested loop bind(parallel) directives

then it can be emitted as 'target teams distribute parallel for', which
is the current default. Otherwise, it is emitted as 'target teams
distribute'.

Added debug output indicating how 'target teams loop' was emitted. Flag
is -mllvm -debug-only=target-teams-loop-codegen

Added LIT tests explicitly verifying 'target teams loop' emitted as a
parallel loop and a distribute loop.

Updated other 'loop' related tests as needed to reflect change in IR.
- These updates account for most of the changed files and
additions/deletions.
2024-04-10 13:09:17 -07:00
Akira Hatanaka
84780af4b0
[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86923)
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.

This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.

In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.

This reapplies d9a685a9dd589486e882b722e513ee7b8c84870c, which was
reverted because it broke ubsan bots. There seems to be a bug in
coroutine code-gen, which is causing EmitTypeCheck to use the wrong
alignment. For now, pass alignment zero to EmitTypeCheck so that it can
compute the correct alignment based on the passed type (see function
EmitCXXMemberOrOperatorMemberCallExpr).
2024-03-28 06:54:36 -07:00
Akira Hatanaka
f75eebab88
Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721)" (#86898)
This reverts commit d9a685a9dd589486e882b722e513ee7b8c84870c.

The commit broke ubsan bots.
2024-03-27 18:14:04 -07:00
Akira Hatanaka
d9a685a9dd
[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721)
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.

This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.

In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.

This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit
broke msan bots because LValue::IsKnownNonNull was uninitialized.
2024-03-27 12:24:49 -07:00
Akira Hatanaka
b311756450
Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454)" (#86674)
This reverts commit 8bd1f9116aab879183f34707e6d21c7051d083b6.

It appears that the commit broke msan bots.
2024-03-26 07:37:57 -07:00
Akira Hatanaka
8bd1f9116a
[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454)
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.

This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.

In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
2024-03-25 18:05:42 -07:00
Artem Belevich
631c6e834c
[CUDA] Add support for CUDA-12.3 and sm_90a (#74895) 2023-12-11 12:18:28 -08:00
Youngsuk Kim
d43c081aef
[clang][CGOpenMPRuntimeGPU] Merge consecutive AddrSpaceCasts (NFC) (#74279)
Merge consecutive AddrSpaceCasts into a single AddrSpaceCast.
2023-12-04 07:03:09 -05:00
Johannes Doerfert
fae233c63f
[OpenMP] Avoid initializing the KernelLaunchEnvironment if possible (#73864)
If we don't have a team reduction we don't need a kernel launch
environment (for now). In that case we can avoid the cost.
2023-11-29 14:49:13 -08:00
Youngsuk Kim
bc6b632723 [CGOpenMPRuntimeGPU] Remove no-op ptr-to-ptr bitcasts (NFC)
Opaque ptr cleanup effort
2023-11-25 11:28:18 -06:00
Jay Foad
cf1e0c0b07
[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133)
Define target names and ELF numbers for new GFX12 targets gfx1200 and
gfx1201. For now they behave identically to GFX11.
2023-11-23 16:44:05 +00:00
Youngsuk Kim
b4db24e330 [CGOpenMPRuntimeGPU] Replace unneeded use of CreatePointerBitCastOrAddrSpaceCast (NFC)
Opaque ptr cleanup effort (NFC)
2023-11-18 04:17:46 -06:00