This function made no sense at all. It was scanning through
the feature map looking for something that parsed as an OffloadArch.
Directly compute the arch from the target device.
I don't know why there isn't just an OffloadArch in TargetOpts,
this shouldn't really require parsing.
OffloadArch uses an enumerator named `UNUSED`, which is a very common
macro name in external codebases (e.g. Mesa defines UNUSED as an
attribute helper). If such a macro is visible when including
clang/Basic/OffloadArch.h, the preprocessor expands the token inside the
enum and breaks compilation of the installed Clang headers.
Rename the enumerator to `UNUSED_` and update all in-tree references.
This is a spelling-only change (no behavioral impact) and mirrors the
existing approach used for SM_32_ to avoid macro clashes.
In CGOpenMPRuntimeGPU::translateParameter, reference-type captured
variables were translated to pointer parameters with two address-space
annotations:
1. LangAS::opencl_global on the pointee (for map'd variables), which
correctly produces ptr addrspace(1) in NVPTX IR.
2. getLangASFromTargetAS(NVPTX_local_addr=5) on the pointer itself,
annotating the parameter as living in NVPTX local (stack) memory.
The second annotation is incorrect at the Clang type-system level:
EmitParmDecl only supports parameters to be in LangAS::Default (or the
special cases for OpenCL).
Temporarily add an assert in EmitParmDecl that catches parameters with
non-default address spaces in non-OpenCL compilations, and fix the
violation by dropping the NVPTX_local_addr addAddressSpace call.
Should fix the issue noticed in
https://github.com/llvm/llvm-project/pull/181256#discussion_r2821894122,
allowing removing that special case there for OpenMP, though I haven't
tested the combination yet. That PR would fix EmitParmDecl to actually
support non-default address spaces from Sema, and will remove this
assert again.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This patch adds support for new GPU architectures introduced in CUDA
13.0 in Clang:
- SM_88: Ampere architecture variant
- SM_110/SM_110a: Blackwell architecture variants
Additionally, this patch deprecates SM_101/SM_101a support for CUDA 13.0
and later versions. The SM_101 architecture is superseded by SM_110 and
is no longer supported by CUDA 13.0+ toolchain components.
Adds initial support for GPU by-ref reductions. The main problem for
reduction by reference is that, prior to this PR, we were shuffling
(from remote lanes within the same warp or across different warps within
the block) pointers/references to the private reduction values rather
than the private reduction values themselves.
In particular, this diff adds support for reductions on scalar
allocatables where reductions happen on loops nested in `target`
regions. For example:
```fortran
integer :: i
real, allocatable :: scalar_alloc
allocate(scalar_alloc)
scalar_alloc = 0
!$omp target map(tofrom: scalar_alloc)
!$omp parallel do reduction(+: scalar_alloc)
do i = 1, 1000000
scalar_alloc = scalar_alloc + 1
end do
!$omp end target
```
This PR supports by-ref reductions on the intra- and inter-warp levels.
So far, there are still steps to be takens for full support of by-ref
reductions, for example:
* Support inter-block value combination is still not supported.
Therefore, `target teams distribute parallel do` is still not supported.
* Support for dynamically-sized arrays still needs to be added.
* Support for more than one allocatable/array on the same `reduction`
clause.
Some targets have a specific calling convention that should be used for
generated calls to runtime functions.
Pass that down and use it.
Signed-off-by: Nick Sarnie <nick.sarnie@intel.com>
Two small issues, when storing function pointers we need to use the
program address space.
With this change there are no asserts if I run all OpenMP tests with the
offload target manually changed to SPIR-V, so we are getting somewhere.
About 10 test fails though.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
When creating function pointers, make sure the pointer is in the program
address space.
Also fix a spot where I forgot to set `setDefaultTargetAS` and one spot
where we didn't use the default AS for a pointer.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Setting the prescriptiveness of the num_threads clause to 'strict' and
having a corresponding check (with message and severity clauses) does
not align well with how OpenMP should be handled for GPUs.
The num_threads expression may be an arbitrary integer expression which
is evaluated on the target, in correspondance to the OpenMP spec. This
prevents the check from being done before launching the kernel,
especially considering that the num_threads clause is associated with
the parallel directive and that there may be multiple parallel
directives with different num_threads clauses in a single target region.
Acting on the result of the 'strict' check on the GPU would require
doing I/O on the GPU, which can introduce performance regressions.
Delaying any actions resulting from the 'strict' check and doing them on
the host after executing the target region involves additional data
copies and is not really semantically correct.
For now, the 'strict' modifier for the num_threads clause and its
associated message and severity clause are set to be unsupported on
GPUs. Targets other than GPUs still support the aforementioned features
in the context of an OpenMP target region.
Due to potential performance issues, this commit temporarily removes
support for the num_threads 'strict' modifier and its corresponding
message and severity clauses on the device.
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary codegen changes.
OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the
num_threads clause on parallel directives, along with the message and
severity clauses. This commit implements necessary codegen changes.
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
This fixes#139614 on non-clang compilers by moving `__has_warning`
completely inside the `#if defined(__clang__)` block. This prevents a
parse failure from compilers which don't recognize `__has_warning`.
Original description:
Followup to #138741.
This adds the requested macro to silence
`-Wunnecessary-virtual-specifier` when declaring virtual anchor
functions in `final` classes, per [LLVM
policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers).
It also cleans up any remaining instances of the warning, allowing us to
stop disabling it when we build LLVM.
We can simplify the code with *Map::try_emplace where we need
default-constructed values while avoding calling constructors when
keys are already present.
Following #137070, this PR adds an initial set of Intel `OffloadArch`
values with corresponding predicates that will be used in SYCL
offloading. More Intel architectures will be added in a future PR.
This is an expensive header, only include it where needed. Move some
functions out of line to achieve that.
This reduces time to build clang by ~0.5% in terms of instructions
retired.
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.
For SWDEV-512631
This patch introduces a `TargetKernelRuntimeAttrs` structure to hold
host-evaluated `num_teams`, `thread_limit`, `num_threads` and trip count
values passed to the runtime kernel offloading call.
Additionally, kernel type information is used to influence target device
code generation and the `IsSPMD` flag is replaced by `ExecFlags`, which
provides more granularity.
This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs`
structure used to simplify passing default and constant values for
number of teams and threads, and possibly other target kernel-related
information in the future.
This is used to forward values passed to `createTarget` to
`createTargetInit`, which previously used a default unrelated set of
values.
The preprocessor definition used to enable asserts and the one that
`llvm::Error` and `llvm::Expected` use to ensure all created instances are
checked are not the same. By making these checks inside of an `assert` in cases
where errors are not expected, certain build configurations would trigger
runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF
-DLLVM_UNREACHABLE_OPTIMIZE=ON`).
The `llvm::cantFail()` function, which was intended for this use case, is used
by this patch in place of `assert` to prevent these runtime failures. In tests,
new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and
`EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release
builds.
This patch introduces a new generic target, `gfx9-4-generic`. Since it doesn’t support FP8 and XF32-related instructions, the patch includes several code reorganizations to accommodate these changes.
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.
This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
Summary:
Currently, we assign this to private memory. This causes failures on
some SOLLVE tests. The standard isn't clear on the semantics of this
allocation type, but there seems to be a consensus that it's supposed to
be shared memory.