172 Commits

Author SHA1 Message Date
Shilei Tian
10068cd654 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-26 13:35:14 -04:00
Shilei Tian
6bd74fd65f Revert commits for kernel environment
This reverts commits for kernel environments as they causes issues in AMD BB.
2023-07-23 23:32:31 -04:00
Shilei Tian
c5c8040390 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-23 18:36:01 -04:00
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Sergio Afonso
63ca93c7d1
[OpenMP][OMPIRBuilder] Rename IsEmbedded and IsTargetCodegen flags
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.

`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.

Differential Revision: https://reviews.llvm.org/D154591
2023-07-10 14:14:16 +01:00
Doru Bercea
13888870e5 Enable dynamic-sized VLAs for data sharing in OpenMP offloaded target regions.
Review: https://reviews.llvm.org/D153883
2023-07-06 10:57:10 -04:00
Dave Pagan
eb61bde829 [OpenMP][CodeGen] Add codegen for combined 'loop' directives.
The loop directive is a descriptive construct which allows the compiler
flexibility in how it generates code for the directive's associated
loop(s). See OpenMP specification 5.2 [257:8-9].

Codegen added in this patch for the combined 'loop' directives are:

'target teams loop'     -> 'target teams distribute parallel for'
'teams loop'            -> 'teams distribute parallel for'
'target parallel loop'  -> 'target parallel for'
'parallel loop'         -> 'parallel for'

NOTE: The implementation of the 'loop' directive itself is unchanged.

Differential Revision: https://reviews.llvm.org/D145823
2023-07-05 12:31:59 -05:00
Youngsuk Kim
5f32baf17d [clang] Replace uses of CreateElementBitCast (NFC)
Partial progress towards replacing uses of CreateElementBitCast, as it
no longer does what its name suggests.

Reviewed By: barannikov88

Differential Revision: https://reviews.llvm.org/D154229
2023-06-30 17:35:36 -04:00
Sergei Barannikov
2348902268 [clang][CodeGen] Remove no-op EmitCastToVoidPtr (NFC)
Reviewed By: JOE1994

Differential Revision: https://reviews.llvm.org/D153694
2023-06-29 20:29:38 +03:00
Manna, Soumi
213709e7be [CLANG] Fix Static Code Analyzer Concerns with bad bit right shift operation in getNVPTXLaneID()
In getNVPTXLaneID(CodeGenFunction &), the value of LaneIDBits is 4294967295 since function call llvm::Log2_32(CGF->getTarget()->getGridValue().GV_Warp_Size) might return 4294967295.

  unsigned LaneIDBits =
       llvm::Log2_32(CGF.getTarget().getGridValue().GV_Warp_Size);
  unsigned LaneIDMask = ~0u >> (32u - LaneIDBits);

The shift amount (32U - LaneIDBits) might be 33, So it has undefined behavior for right shifting by more than 31 bits.

This patch adds an assert to guard the LaneIDBits overflow issue with LaneIDMask value.

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D151606
2023-06-22 13:29:28 -07:00
Nikita Popov
8a19af513d [Clang] Remove uses of PointerType::getWithSamePointeeType (NFC)
No longer relevant with opaque pointers.
2023-06-12 12:18:28 +02:00
Kazu Hirata
706c442e72 [CodeGen] Use DenseMapBase::lookup (NFC) 2023-06-11 13:19:26 -07:00
Konstantin Zhuravlyov
9d05727972 AMDGPU: Add basic gfx942 target
Differential Revision: https://reviews.llvm.org/D149983
2023-05-10 11:51:06 -04:00
Konstantin Zhuravlyov
1fc70210a6 AMDGPU: Add basic gfx941 target
Differential Revision: https://reviews.llvm.org/D149982
2023-05-10 11:51:06 -04:00
Manna, Soumi
07996804a0 [NFC] ][CLANG] Fix static code analyzer concerns
Reported by Coverity:

1. Inside "ASTReader.cpp" file,  in clang::ASTReader::FindExternalLexicalDecls(clang::DeclContext const *, llvm::function_ref<bool (clang::Decl::Kind)>, llvm::SmallVectorImpl<clang::Decl *> &): Using the auto keyword without an & causes a copy.

auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type pair.

2. Inside "ASTReader.cpp" file, in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, llvm::SmallVectorImpl<clang::ASTReader::ImportedSubmodule> *): Using the auto keyword without an & causes a copy.

auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type DenseMapPair.

3. Inside "CGOpenMPRuntimeGPU.cpp" file, in clang::CodeGen::CGOpenMPRuntimeGPU::emitGenericVarsEpilog(clang::CodeGen::CodeGenFunction &, bool): Using the auto keyword without an & causes a copy.

auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type pair.

4. Inside "ASTWriter.cpp" file, in clang::ASTWriter::WriteHeaderSearch(clang::HeaderSearch const &): Using the auto keyword without an & causes a copy.

auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type UnresolvedHeaderDirective.

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D149461
2023-05-05 14:34:36 -07:00
Shilei Tian
d4ecd1241c Revert "[OpenMP] Introduce kernel environment"
This reverts commit 35cfadfbe2decd9633560b3046fa6c17523b2fa9.

It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
2023-04-22 20:56:35 -04:00
Shilei Tian
35cfadfbe2 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-04-22 20:46:38 -04:00
Doru Bercea
01910787d3 Fix failure with team-wide allocated variable
Review: https://reviews.llvm.org/D147572
2023-04-20 14:40:35 -04:00
Itay Bookstein
782c59a4ee [OpenMP] Prefix outlined and reduction func names with original func's name
This patch prefixes omp outlined helpers and reduction funcs
with the original function's name.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D140722
2023-04-19 23:00:26 +03:00
Itay Bookstein
6fdd13e0ec Revert "[OpenMP] Prefix outlined and reduction func names with original func's name"
This reverts commit 029bfc311d4d7d3cd90be81bb08c046848796d02.
2023-04-19 19:08:49 +03:00
Itay Bookstein
029bfc311d [OpenMP] Prefix outlined and reduction func names with original func's name
This patch attempts to prefix omp outlined helpers and reduction funcs
with the original function's name.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D140722
2023-04-19 19:05:21 +03:00
Richard Sandiford
b6d4d51f8f [clang] Specify attribute syntax & spelling with a single argument
When constructing an attribute, the syntactic form was specified
using two arguments: an attribute-independent syntax type and an
attribute-specific spelling index.  This patch replaces them with
a single argument.

In most cases, that's done using a new Form class that combines the
syntax and spelling into a single object.  This has the minor benefit
of removing a couple of constructors.  But the main purpose is to allow
additional information to be stored as well, beyond just the syntax and
spelling enums.

In the case of the attribute-specific Create and CreateImplicit
functions, the patch instead uses the attribute-specific spelling
enum.  This helps to ensure that the syntax and spelling are
consistent with each other and with the Attr.td definition.

If a Create or CreateImplicit caller specified a syntax and
a spelling, the patch drops the syntax argument and keeps the
spelling.  If the caller instead specified only a syntax
(so that the spelling was SpellingNotCalculated), the patch
simply drops the syntax argument.

There were two cases of the latter: TargetVersion and Weak.
TargetVersionAttrs were created with GNU syntax, which matches
their definition in Attr.td, but which is also the default.
WeakAttrs were created with Pragma syntax, which does not match
their definition in Attr.td.  Dropping the argument switches
them to AS_GNU too (to match [GCC<"weak">]).

Differential Revision: https://reviews.llvm.org/D148102
2023-04-13 10:14:49 +01:00
Jan Sjodin
85faee6992 [OpenMP][OMPIRBuilder] Make OffloadEntriesInfoManager a member of OpenMPIRBuilder
This patch adds the OffloadEntriesInfoManager to the OpenMPIRBuilder, and
allows the OffloadEntriesInfoManager to access the Configuration in the
OpenMPIRBuilder.  With the shared Config there is no risk for inconsistencies,
and there is no longer the need for clang to have a separate
OffloadEntriesInfoManager.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D146549
2023-03-23 11:46:28 -04:00
Johannes Doerfert
40f9bf082f [OpenMP] Introduce the ompx_dyn_cgroup_mem(<N>) clause
Dynamic memory allows users to allocate fast shared memory when a kernel
is launched. We support a single size for all kernels via the
`LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable but now we can
control it per kernel invocation, hence allow computed values.

Note: Only the nextgen plugins will allocate memory based on the clause,
      the old plugins will silently miscompile.

Differential Revision: https://reviews.llvm.org/D141233
2023-01-21 18:46:36 -08:00
Johannes Doerfert
e2d75f9b6b [OpenMP][NFC] Remove more unused code, eliminate warning 2022-12-13 20:53:45 -08:00
Johannes Doerfert
90609fb68f [OpenMP][NFCI] Remove effectively dead code in clang and the runtime
Differential Revision: https://reviews.llvm.org/D136903
2022-12-13 18:44:19 -08:00
Johannes Doerfert
f9c29878b0 Revert "[OpenMP][NFCI] Remove effectively dead code in clang and the runtime"
This reverts commit c1c8cbbf5f29257d084a23a2f6c4236c40b7afb9. One of the
tests seems to be flaky/non-deterministic.
2022-12-12 22:08:28 -08:00
Johannes Doerfert
c1c8cbbf5f [OpenMP][NFCI] Remove effectively dead code in clang and the runtime 2022-12-12 20:55:36 -08:00
Jan Sjodin
1f8fecf26b [OpenMP][OMPIRBuilder] Migrate code to emit target region functions from clang to OMPIRBuilder
This patch moves some of the logic on how to emit target region functions and
adds emitTargetRegionFunction to the OpenMPIRBuilder. Also the
OpenMPOffloadMandatory flag is added to the config.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D139634
2022-12-12 10:28:58 -05:00
Kazu Hirata
bb666c6930 [CodeGen] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-03 11:13:43 -08:00
Alex Richardson
f3a17d0595 [clang] Avoid duplicating ProgramAddressSpace in TargetInfo. NFCI
This value was added to clang/Basic in D111566, but is only used during
codegen, where we can use the LLVM IR DataLayout instead. I noticed this
because the downstream CHERI targets would have to also set this value
for AArch64/RISC-V/MIPS. Instead of duplicating more information between
LLVM IR and Clang, this patch moves getTargetAddressSpace(QualType T) to
CodeGenTypes, where we can consult the DataLayout.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D138296
2022-12-01 20:40:58 +00:00
Jan Sjodin
2aa338f68e [OpenMP][OMPIRBuilder] Mirgrate getName from clang to OMPIRBuilder
This change moves the getName function from clang and moves the separator class
members from CGOpenMPRuntime into OMPIRBuilder. Also enusre all the getters
in the config class are const.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D137725
2022-11-24 10:11:13 -05:00
Jan Sjodin
969d787a47 [OpenMP][OMPIRBuilder] Add a configuration class that captures flags that affect codegen
This patch introudces the OpenMPIRBuilderConfig class which contains various
flags that are needed to lower OMP constructs to LLVM-IR. The purpose is to
keep the flags in one place so they do not have to be passed in every time.
The flags can be set optionally since some uses cases don't rely on functions
that depend on these flags.

Reviewed By: jdoerfert, tschuett

Differential Revision: https://reviews.llvm.org/D138220
2022-11-22 09:25:04 -05:00
Jan Sjodin
9ea2b150b5 [OpenMP][OMPIRBuilder] Migrate createOffloadEntriesAndInfoMetadata from clang to OpenMPIRBuilder
This patch moves the createOffloadEntriesAndInfoMetadata to OpenMPIRBuilder,
the createOffloadEntry helper function. The clang specific error handling is
invoked using a callback. This code will also be used by flang in the future.
2022-11-03 10:27:44 -04:00
Artem Belevich
9a01cca660 Add support for CUDA-11.8 and sm_{87,89,90} GPUs.
Differential Revision: https://reviews.llvm.org/D135306
2022-10-07 13:59:28 -07:00
Joseph Huber
a8ec170e01 [OpenMP] Make the exec_mode global have protected visibility
We use protected visibility for almost everything with offloading. This
is because it provides us with the ability to read things from the host
without the expectation that it will be preempted by a shared library
load, bugs related to this have happened when offloading to the host.
This patch just makes the `exec_mode` global generated for each plugin
have protected visibility.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D135285
2022-10-05 14:39:22 -05:00
Dhruva Chakrabarti
839ac62c50 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit 7539e9cf811e590d9f12ae39673ca789e26386b4.
2022-09-15 03:08:46 +00:00
Giorgis Georgakoudis
7539e9cf81 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6, ABataev

Differential Revision: https://reviews.llvm.org/D102107
2022-09-15 00:54:05 +00:00
Haojian Wu
f6e759bd26 Remove some unused static functions in CGOpenMPRuntimeGPU.cpp, NFC 2022-09-14 17:20:02 +02:00
Joseph Huber
bae1a2cf3c [OpenMP] Remove unused function after removing simplified interface
Summary:
A previous patch removed the user of this function but did not remove
the function causing unused function warnings. Remove it.
2022-09-14 10:14:43 -05:00
Joseph Huber
2d26ecb1fb [OpenMP] Remove simplified device runtime handling
The old device runtime had a "simplified" version that prevented many of
the runtime features from being initialized. The old device runtime was
deleted in LLVM 14 and is no longer in use. Selectively deactivating
features is now done using specific flags rather than the old technique.
This patch simply removes the extra logic required for handling the old
simple runtime scheme.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D133802
2022-09-14 09:41:50 -05:00
Joseph Huber
2b8f722e63 [OpenMP] Add option to assert no nested OpenMP parallelism on the GPU
The OpenMP device runtime needs to support the OpenMP standard. However
constructs like nested parallelism are very uncommon in real application
yet lead to complexity in the runtime that is sometimes difficult to
optimize out. As a stop-gap for performance we should supply an argument
that selectively disables this feature. This patch adds the
`-fopenmp-assume-no-nested-parallelism` argument which explicitly
disables the usee of nested parallelism in OpenMP.

Reviewed By: carlo.bertolli

Differential Revision: https://reviews.llvm.org/D132074
2022-08-23 14:09:51 -05:00
Corentin Jabot
127bf44385 [Clang][C++20] Support capturing structured bindings in lambdas
This completes the implementation of P1091R3 and P1381R1.

This patch allow the capture of structured bindings
both for C++20+ and C++17, with extension/compat warning.

In addition, capturing an anonymous union member,
a bitfield, or a structured binding thereof now has a
better diagnostic.

We only support structured bindings - as opposed to other kinds
of structured statements/blocks. We still emit an error for those.

In addition, support for structured bindings capture is entirely disabled in
OpenMP mode as this needs more investigation - a specific diagnostic indicate the feature is not yet supported there.

Note that the rest of P1091R3 (static/thread_local structured bindings) was already implemented.

at the request of @shafik, i can confirm the correct behavior of lldb wit this change.

Fixes https://github.com/llvm/llvm-project/issues/54300
Fixes https://github.com/llvm/llvm-project/issues/54300
Fixes https://github.com/llvm/llvm-project/issues/52720

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D122768
2022-08-04 10:12:53 +02:00
Corentin Jabot
a274219600 Revert "[Clang][C++20] Support capturing structured bindings in lambdas"
This reverts commit 44f2baa3804a62ca793f0ff3e43aa71cea91a795.

Breaks self builds and seems to have conformance issues.
2022-08-03 21:00:29 +02:00
Corentin Jabot
44f2baa380 [Clang][C++20] Support capturing structured bindings in lambdas
This completes the implementation of P1091R3 and P1381R1.

This patch allow the capture of structured bindings
both for C++20+ and C++17, with extension/compat warning.

In addition, capturing an anonymous union member,
a bitfield, or a structured binding thereof now has a
better diagnostic.

We only support structured bindings - as opposed to other kinds
of structured statements/blocks. We still emit an error for those.

In addition, support for structured bindings capture is entirely disabled in
OpenMP mode as this needs more investigation - a specific diagnostic indicate the feature is not yet supported there.

Note that the rest of P1091R3 (static/thread_local structured bindings) was already implemented.

at the request of @shafik, i can confirm the correct behavior of lldb wit this change.

Fixes https://github.com/llvm/llvm-project/issues/54300
Fixes https://github.com/llvm/llvm-project/issues/54300
Fixes https://github.com/llvm/llvm-project/issues/52720

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D122768
2022-08-03 20:00:01 +02:00
Gabriel Ravier
5674a3c880 Fixed a number of typos
I went over the output of the following mess of a command:

(ulimit -m 2000000; ulimit -v 2000000; git ls-files -z |
 parallel --xargs -0 cat | aspell list --mode=none --ignore-case |
 grep -E '^[A-Za-z][a-z]*$' | sort | uniq -c | sort -n |
 grep -vE '.{25}' | aspell pipe -W3 | grep : | cut -d' ' -f2 | less)

and proceeded to spend a few days looking at it to find probable typos
and fixed a few hundred of them in all of the llvm project (note, the
ones I found are not anywhere near all of them, but it seems like a
good start).

Differential Revision: https://reviews.llvm.org/D130827
2022-08-01 13:13:18 -04:00
Kazu Hirata
ca4af13e48 [clang] Don't use Optional::getValue (NFC) 2022-06-20 22:59:26 -07:00
Joseph Huber
af757f8980 [OpenMP] Don't set device runtime debugging flags if using '-nogpulib'
We use globals to configure debugging at compile-time for the device
runtime. Because these are only used by the OpenMP runtime we shouldn't
define them if we aren't using the device runtime. When a user passes in
'-nogpulib' this indicates that we are not using the device runtime, so
we should check for the precense of this flag and not emit these globals
if used.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D125314
2022-05-13 14:38:43 -04:00
Joe Nash
8bdfc73f63 [AMDGPU][clang] Definition of gfx11 subtarget
Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 2/N for upstreaming of AMDGPU gfx11 architecture

Depends on D124536

Reviewed By: foad, kzhuravl, #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D124537
2022-04-29 13:55:56 -04:00
Nikita Popov
8b62dd3cd6 Reapply [CodeGen] Avoid deprecated Address ctor in EmitLoadOfPointer()
This requires some adjustment in caller code, because there was
a confusion regarding the meaning of the PtrTy argument: This
argument is the type of the pointer being loaded, not the addresses
being loaded from.

Reapply after fixing the specified pointer type for one call in
47eb4f7dcd845878b16a53dadd765195b9c24b6e, where the used type is
important for determining alignment.
2022-03-23 12:06:11 +01:00