211 Commits

Author SHA1 Message Date
Mirko Brkušanin
5d9eb0c76a
[AMDGPU] Define new targets gfx1171 and gfx1172 (#187735) 2026-04-01 18:16:11 +02:00
Matt Arsenault
28efe7b554
OpenMP: Reimplement getOffloadArch (#189561)
This function made no sense at all. It was scanning through
the feature map looking for something that parsed as an OffloadArch.
Directly compute the arch from the target device.

I don't know why there isn't just an OffloadArch in TargetOpts,
this shouldn't really require parsing.
2026-03-31 19:42:15 +02:00
Khem Raj
82f18b02d9
[Clang] Rename OffloadArch::UNUSED to UNUSED_ to avoid macro collisions (#174528)
OffloadArch uses an enumerator named `UNUSED`, which is a very common
macro name in external codebases (e.g. Mesa defines UNUSED as an
attribute helper). If such a macro is visible when including
clang/Basic/OffloadArch.h, the preprocessor expands the token inside the
enum and breaks compilation of the installed Clang headers.

Rename the enumerator to `UNUSED_` and update all in-tree references.
This is a spelling-only change (no behavioral impact) and mirrors the
existing approach used for SM_32_ to avoid macro clashes.
2026-03-20 17:22:17 -04:00
Jameson Nash
6c532a621a
[OpenMP] Remove NVPTX local addrspace on parameters (#183195)
In CGOpenMPRuntimeGPU::translateParameter, reference-type captured
variables were translated to pointer parameters with two address-space
annotations:

1. LangAS::opencl_global on the pointee (for map'd variables), which
correctly produces ptr addrspace(1) in NVPTX IR.
2. getLangASFromTargetAS(NVPTX_local_addr=5) on the pointer itself,
annotating the parameter as living in NVPTX local (stack) memory.

The second annotation is incorrect at the Clang type-system level:
EmitParmDecl only supports parameters to be in LangAS::Default (or the
special cases for OpenCL).

Temporarily add an assert in EmitParmDecl that catches parameters with
non-default address spaces in non-OpenCL compilations, and fix the
violation by dropping the NVPTX_local_addr addAddressSpace call.

Should fix the issue noticed in

https://github.com/llvm/llvm-project/pull/181256#discussion_r2821894122,
allowing removing that special case there for OpenMP, though I haven't
tested the combination yet. That PR would fix EmitParmDecl to actually
support non-default address spaces from Sema, and will remove this
assert again.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 15:56:33 -05:00
Stanislav Mekhanoshin
33fd75f55d
[AMDGPU] Add gfx12-5-generic subtarget (#183381)
This is functionally equivalent to gfx1250.
2026-02-25 13:34:48 -08:00
Mirko Brkušanin
20b5849e17
[AMDGPU] Define new target gfx1170 (#180185) 2026-02-06 14:38:50 +01:00
Mariusz Sikora
6de6f7b46b
[AMDGPU] Define gfx1310 target with ELF number 0x50 (#177355)
For now this is identical to gfx1250.

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
2026-01-22 17:08:38 +01:00
Nikita Popov
3d06968437 [CodeGen] Use getAllOnesValue() for -1 constants 2025-12-16 09:48:38 +01:00
Rajat Bajpai
0df8306479
[Clang][CUDA] Add support for SM_88, SM_110, and SM_110a architectures (#170258)
This patch adds support for new GPU architectures introduced in CUDA
13.0 in Clang:
- SM_88: Ampere architecture variant
- SM_110/SM_110a: Blackwell architecture variants

Additionally, this patch deprecates SM_101/SM_101a support for CUDA 13.0
and later versions. The SM_101 architecture is superseded by SM_110 and
is no longer supported by CUDA 13.0+ toolchain components.
2025-12-09 10:47:28 +05:30
Kevin Sala Penades
0e92beb0c0
[Clang][OpenMP] Switch to __kmpc_parallel_60 with strict parameter (#171082)
This commit switches the `__kmpc_parallel_51` to `__kmpc_parallel_60`,
and adds the strict boolean for the number of threads.
2025-12-08 09:37:11 -08:00
Kareem Ergawy
f481f5bef9
[OpenMP][flang] Add initial support for by-ref reductions on the GPU (#165714)
Adds initial support for GPU by-ref reductions. The main problem for
reduction by reference is that, prior to this PR, we were shuffling
(from remote lanes within the same warp or across different warps within
the block) pointers/references to the private reduction values rather
than the private reduction values themselves.

In particular, this diff adds support for reductions on scalar
allocatables where reductions happen on loops nested in `target`
regions. For example:

```fortran
  integer :: i
  real, allocatable :: scalar_alloc

  allocate(scalar_alloc)
  scalar_alloc = 0

  !$omp target map(tofrom: scalar_alloc)
  !$omp parallel do reduction(+: scalar_alloc)
  do i = 1, 1000000
    scalar_alloc = scalar_alloc + 1
  end do
  !$omp end target
```

This PR supports by-ref reductions on the intra- and inter-warp levels.

So far, there are still steps to be takens for full support of by-ref
reductions, for example:
* Support inter-block value combination is still not supported.
Therefore, `target teams distribute parallel do` is still not supported.
* Support for dynamically-sized arrays still needs to be added.
* Support for more than one allocatable/array on the same `reduction`
clause.
2025-11-26 11:59:22 +01:00
Nick Sarnie
4538818c79
[OpenMP][OMPIRBuilder] Use runtime CC for runtime calls (#168608)
Some targets have a specific calling convention that should be used for
generated calls to runtime functions.

Pass that down and use it.

Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
2025-11-21 15:40:20 +00:00
Nick Sarnie
e56b592189
[clang][OMPIRBuilder] Fix two missed function pointer type issues (#162914)
Two small issues, when storing function pointers we need to use the
program address space.

With this change there are no asserts if I run all OpenMP tests with the
offload target manually changed to SPIR-V, so we are getting somewhere.

About 10 test fails though.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-10-13 19:06:39 +00:00
Nick Sarnie
d59f7f2717
[clang][OMPIRBuilder] Fix reduction codegen for SPIR-V (#162529)
When creating function pointers, make sure the pointer is in the program
address space.

Also fix a spot where I forgot to set `setDefaultTargetAS` and one spot
where we didn't use the default AS for a pointer.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-10-09 14:25:36 +00:00
Robert Imschweiler
814a3a6b61
[OpenMP][clang] Set num_threads 'strict' to unsupported on GPUs (#160659)
Setting the prescriptiveness of the num_threads clause to 'strict' and
having a corresponding check (with message and severity clauses) does
not align well with how OpenMP should be handled for GPUs.

The num_threads expression may be an arbitrary integer expression which
is evaluated on the target, in correspondance to the OpenMP spec. This
prevents the check from being done before launching the kernel,
especially considering that the num_threads clause is associated with
the parallel directive and that there may be multiple parallel
directives with different num_threads clauses in a single target region.
Acting on the result of the 'strict' check on the GPU would require
doing I/O on the GPU, which can introduce performance regressions.
Delaying any actions resulting from the 'strict' check and doing them on
the host after executing the target region involves additional data
copies and is not really semantically correct.

For now, the 'strict' modifier for the num_threads clause and its
associated message and severity clause are set to be unsupported on
GPUs. Targets other than GPUs still support the aforementioned features
in the context of an OpenMP target region.
2025-09-26 13:50:18 -05:00
Stanislav Mekhanoshin
e556dc0b23
[AMDGPU] Add gfx1251 subtarget (#159430) 2025-09-17 13:02:02 -07:00
Robert Imschweiler
23302a2aac
[offload][OpenMP] Remove device code for num_threads strict (#157893)
Due to potential performance issues, this commit temporarily removes
support for the num_threads 'strict' modifier and its corresponding
message and severity clauses on the device.
2025-09-11 13:12:29 +00:00
Robert Imschweiler
c94b5f0c0c
Reland: [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (#155839)
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary codegen changes.
2025-08-28 21:00:15 +02:00
Robert Imschweiler
9d7e436d86
Revert "[OpenMP][clang] 6.0: num_threads strict (part 3: codegen)" (#155809)
Reverts llvm/llvm-project#146405
2025-08-28 12:12:53 +02:00
Robert Imschweiler
baf9d2c35d
[OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (#146405)
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary codegen changes.
2025-08-28 08:52:27 +00:00
Matheus Izvekov
91cdd35008
[clang] Improve nested name specifier AST representation (#147835)
This is a major change on how we represent nested name qualifications in
the AST.

* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.

This has great results on compile-time-tracker as well:

![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831)

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.

The rest and bulk of the changes are mostly consequences of the changes
in API.

PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
2025-08-09 05:06:53 -03:00
Artem Belevich
507b879b6e
[CUDA] add support for targeting sm_103/sm_121 with CUDA-12.9 (#151587) 2025-07-31 13:38:54 -07:00
Robert Imschweiler
775a69b237
[OpenMP] Fix comma -> semicolon (#145900)
Fix small typo.
2025-06-26 17:27:20 +02:00
Stanislav Mekhanoshin
69974658f0
[AMDGPU] Initial support for gfx1250 target. (#144965)
This is just a stub for now.
2025-06-19 22:52:51 -07:00
Devon Loehr
63de20c0de
Reland "Add macro to suppress -Wunnecessary-virtual-specifier" (#141091)
This fixes #139614 on non-clang compilers by moving `__has_warning`
completely inside the `#if defined(__clang__)` block. This prevents a
parse failure from compilers which don't recognize `__has_warning`.

Original description:
Followup to #138741.

This adds the requested macro to silence
`-Wunnecessary-virtual-specifier` when declaring virtual anchor
functions in `final` classes, per [LLVM
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers).

It also cleans up any remaining instances of the warning, allowing us to
stop disabling it when we build LLVM.
2025-05-28 12:15:22 +02:00
Philip Reames
e4e7a7e64e Revert "Add macro to suppress -Wunnecessary-virtual-specifier (#139614)"
This reverts commit 0954c9d487e7cb30673df9f0ac125f71320d2936.

It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).
2025-05-21 11:31:26 -07:00
Devon Loehr
0954c9d487
Add macro to suppress -Wunnecessary-virtual-specifier (#139614)
Followup to #138741.

This adds the requested macro to silence
`-Wunnecessary-virtual-specifier` when declaring virtual anchor
functions in `final` classes, per [LLVM
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers).

It also cleans up any remaining instances of the warning, allowing us to
stop disabling it when we build LLVM.
2025-05-21 10:54:36 -07:00
Kazu Hirata
325281631a
[clang] Use *Map::try_emplace (NFC) (#140477)
We can simplify the code with *Map::try_emplace where we need
default-constructed values while avoding calling constructors when
keys are already present.
2025-05-19 06:19:53 -07:00
Kazu Hirata
f9f69dac2a
[clang] Remove redundant control flow statements (NFC) (#140359) 2025-05-17 12:59:47 -07:00
Justin Cai
faf4e8af74
[Clang][SYCL] Add initial set of Intel OffloadArch values (#138158)
Following #137070, this PR adds an initial set of Intel `OffloadArch`
values with corresponding predicates that will be used in SYCL
offloading. More Intel architectures will be added in a future PR.
2025-05-01 16:29:48 -05:00
Kazu Hirata
55651e743b
[clang] Use range constructors of *Set (NFC) (#137574) 2025-04-27 21:17:14 -07:00
Jan Leyonberg
fbc8335311
[MLIR][OpenMP] Add codegen for teams reductions (#133310)
This patch adds the lowering of teams reductions from the omp dialect to
LLVM-IR. Some minor cleanup was done in clang to remove an unused
parameter.
2025-04-07 12:47:16 -04:00
Nikita Popov
b384d6d6cc
[CodeGen] Don't include CGDebugInfo.h in CodeGenFunction.h (NFC) (#134100)
This is an expensive header, only include it where needed. Move some
functions out of line to achieve that.

This reduces time to build clang by ~0.5% in terms of instructions
retired.
2025-04-03 08:04:19 +02:00
Sebastian Jodłowski
0127f169dc
[CUDA] Add support for sm101 and sm120 target architectures (#127187)
Add support for sm101 and sm120 target architectures. It requires CUDA
12.8.

---------

Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>
2025-02-19 14:41:07 -08:00
Fabian Ritter
029c8e783d
[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762)
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.

This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.

For SWDEV-512631
2025-02-19 10:11:48 +01:00
Sergey Kozub
616979ebd7
[NVPTX] Add support for PTX 8.6 and CUDA 12.6 (12.8) (#123398)
Add CUDA versions 12.7, 12.8, 12.9 which support PTX8.6+ (enables using Blackwell-specific instructions).
2025-01-21 11:00:24 +01:00
Sergio Afonso
fabc443e93
[OMPIRBuilder] Support runtime number of teams and threads, and SPMD mode (#116051)
This patch introduces a `TargetKernelRuntimeAttrs` structure to hold
host-evaluated `num_teams`, `thread_limit`, `num_threads` and trip count
values passed to the runtime kernel offloading call.

Additionally, kernel type information is used to influence target device
code generation and the `IsSPMD` flag is replaced by `ExecFlags`, which
provides more granularity.
2025-01-14 12:34:37 +00:00
Sergio Afonso
27bc6bdaba
[OMPIRBuilder] Introduce struct to hold default kernel teams/threads (#116050)
This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs`
structure used to simplify passing default and constant values for
number of teams and threads, and possibly other target kernel-related
information in the future.

This is used to forward values passed to `createTarget` to
`createTargetInit`, which previously used a default unrelated set of
values.
2025-01-14 11:08:55 +00:00
Sergio Afonso
b79ed8729b
[OpenMP][OMPIRBuilder] Handle non-failing calls properly (#115863)
The preprocessor definition used to enable asserts and the one that
`llvm::Error` and `llvm::Expected` use to ensure all created instances are
checked are not the same. By making these checks inside of an `assert` in cases
where errors are not expected, certain build configurations would trigger
runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF
-DLLVM_UNREACHABLE_OPTIMIZE=ON`).

The `llvm::cantFail()` function, which was intended for this use case, is used
by this patch in place of `assert` to prevent these runtime failures. In tests,
new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and
`EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release
builds.
2025-01-09 10:28:16 +00:00
Matt Arsenault
a6fc489bb7
AMDGPU: Add gfx950 subtarget definitions (#116307)
Mostly a stub, but adds some baseline tests and
tests for removed instructions.
2024-11-18 10:41:14 -08:00
Kazu Hirata
e8a6624325
[CodeGen] Remove unused includes (NFC) (#116459)
Identified with misc-include-cleaner.
2024-11-16 07:37:13 -08:00
Shilei Tian
de0fd64bed
[AMDGPU] Introduce a new generic target gfx9-4-generic (#115190)
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
2024-11-12 23:11:05 -05:00
Sergio Afonso
d87964de78
[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533)
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.

This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
2024-10-25 11:30:16 +01:00
Jay Foad
4dd55c567a
[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399)
Follow up to #109133.
2024-10-24 10:23:40 +01:00
Carl Ritson
076aac59ac
[AMDGPU] Add a new target for gfx1153 (#113138) 2024-10-23 12:56:58 +09:00
Artem Belevich
30a06e8022
[CUDA] Add support for CUDA-12.6 and sm_100 (#112028)
This is a copy of #97402(with minor updates), which is now ready to land.

---------

Co-authored-by: Sergey Kozub <skozub@nvidia.com>
2024-10-14 11:51:05 -07:00
Youngsuk Kim
29d0a84704
[clang][CGOpenMPRuntimeGPU] Avoid llvm::Type::getPointerTo() (NFC) (#110357)
`llvm::Type::getPointerTo()` is to be removed soon.
2024-09-28 09:57:20 -04:00
Joseph Huber
e0326b668e
[OpenMP] Map omp_default_mem_alloc to global memory (#104790)
Summary:
Currently, we assign this to private memory. This causes failures on
some SOLLVE tests. The standard isn't clear on the semantics of this
allocation type, but there seems to be a consensus that it's supposed to
be shared memory.
2024-08-20 12:00:41 -05:00
Jan Leyonberg
5b15d9c441
[clang][OpenMP] Propoagate debug location to OMPIRBuilder reduction codegen (#100358)
This patch propagates the debug location from Clang to the
OpenMPIRBuilder.

Fixes https://github.com/llvm/llvm-project/issues/97458
2024-07-24 08:57:39 -05:00
Jakub Chlanda
ab20086422
[CUDA][NFC] CudaArch to OffloadArch rename (#97028)
Rename `CudaArch` to `OffloadArch` to better reflect its content and the
use.
Apply a similar rename to helpers handling the enum.
2024-06-30 07:56:07 +02:00