877 Commits

Author SHA1 Message Date
jyu2-git
32f7672acc
[Clang][OpenMP] This is addition fix for #92210. (#94802)
Fix another runtime problem when explicit map both pointer and pointee
in target data region.

In #92210, problem is only addressed in target region, but missing for
target data region.

The change just passing AreBothBasePtrAndPteeMapped in
generateInfoForComponentList when processing target data.

---------

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
2024-07-03 20:56:53 -07:00
Gheorghe-Teodor Bercea
1a478a69bc
[OpenMP][offload] Fix dynamic schedule tracking (#97065)
This patch fixes the dynamic schedule tracking.
2024-07-01 10:23:11 -04:00
Stephen Tozer
d75f9dd1d2 Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"
Reverts the above commit, as it updates a common header function and
did not update all callsites:

  https://lab.llvm.org/buildbot/#/builders/29/builds/382

This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24 18:00:22 +01:00
Stephen Tozer
6481dc5761
[IR][NFC] Update IRBuilder to use InsertPosition (#96497)
Uses the new InsertPosition class (added in #94226) to simplify some of
the IRBuilder interface, and removes the need to pass a BasicBlock
alongside a BasicBlock::iterator, using the fact that we can now get the
parent basic block from the iterator even if it points to the sentinel.
This patch removes the BasicBlock argument from each constructor or call
to setInsertPoint.

This has no functional effect, but later on as we look to remove the
`Instruction *InsertBefore` argument from instruction-creation
(discussed
[here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)),
this will simplify the process by allowing us to deprecate the
InsertPosition constructor directly and catch all the cases where we use
instructions rather than iterators.
2024-06-24 17:27:43 +01:00
Ahmed Bougacha
3575d23ca8
[clang][CodeGen] Remove unused LValue::getAddress CGF arg. (#92465)
This is in effect a revert of f139ae3d93797, as we have since gained a
more sophisticated way of doing extra IRGen with the addition of
RawAddress in #86923.
2024-05-20 10:23:04 -07:00
jyu2-git
8e00703be9
[Clang][OpenMP] Fix runtime problem when explicit map both pointer and pointee (#92210)
ponter int *p for following map, test currently crash.

  map(p, p[:100]) or map(p, p[1])

Currly IR looks like
// &p, &p, sizeof(int), TARGET_PARAM | TO | FROM
// &p, p[0], 100sizeof(float) TO | FROM

Worrking IR is
// map(p, p[0:100]) to map(p[0:100])
// &p, &p[0], 100*sizeof(float), TARGET_PARAM | TO | FROM | PTR_AND_OBJ

The change is add new argument AreBothBasePtrAndPteeMapped in
generateInfoForComponentList

Use that to skip map for map(p), when processing map(p[:100]) generate
map with right flag.
2024-05-15 08:20:25 -07:00
Erich Keane
39adc8f423
[NFC] Generalize ArraySections to work for OpenACC in the future (#89639)
OpenACC is going to need an array sections implementation that is a
simpler version/more restrictive version of the OpenMP version. 

This patch moves `OMPArraySectionExpr` to `Expr.h` and renames it `ArraySectionExpr`,
 then adds an enum to choose between the two.

This also fixes a couple of 'drive-by' issues that I discovered on the way,
but leaves the OpenACC Sema parts reasonably unimplemented (no semantic
analysis implementation), as that will be a followup patch.
2024-04-25 10:22:03 -07:00
David Pagan
a12836647e
[OpenMP][CodeGen] Improved codegen for combined loop directives (#87278)
IR for 'target teams loop' is now dependent on suitability of associated
loop-nest.

If a loop-nest:

- does not contain a function call, or
- the -fopenmp-assume-no-nested-parallelism has been specified,
- or the call is to an OpenMP API AND
- does not contain nested loop bind(parallel) directives

then it can be emitted as 'target teams distribute parallel for', which
is the current default. Otherwise, it is emitted as 'target teams
distribute'.

Added debug output indicating how 'target teams loop' was emitted. Flag
is -mllvm -debug-only=target-teams-loop-codegen

Added LIT tests explicitly verifying 'target teams loop' emitted as a
parallel loop and a distribute loop.

Updated other 'loop' related tests as needed to reflect change in IR.
- These updates account for most of the changed files and
additions/deletions.
2024-04-10 13:09:17 -07:00
Simon Pilgrim
110e933b7a CGOpenMPRuntime.cpp - fix Wparentheses warning. NFC. 2024-04-04 14:59:00 +01:00
Akira Hatanaka
84780af4b0
[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86923)
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.

This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.

In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.

This reapplies d9a685a9dd589486e882b722e513ee7b8c84870c, which was
reverted because it broke ubsan bots. There seems to be a bug in
coroutine code-gen, which is causing EmitTypeCheck to use the wrong
alignment. For now, pass alignment zero to EmitTypeCheck so that it can
compute the correct alignment based on the passed type (see function
EmitCXXMemberOrOperatorMemberCallExpr).
2024-03-28 06:54:36 -07:00
Akira Hatanaka
f75eebab88
Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721)" (#86898)
This reverts commit d9a685a9dd589486e882b722e513ee7b8c84870c.

The commit broke ubsan bots.
2024-03-27 18:14:04 -07:00
Akira Hatanaka
d9a685a9dd
[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#86721)
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.

This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.

In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.

This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit
broke msan bots because LValue::IsKnownNonNull was uninitialized.
2024-03-27 12:24:49 -07:00
Chris B
28ddbd4a86
[NFC] Refactor ConstantArrayType size storage (#85716)
In PR #79382, I need to add a new type that derives from
ConstantArrayType. This means that ConstantArrayType can no longer use
`llvm::TrailingObjects` to store the trailing optional Expr*.

This change refactors ConstantArrayType to store a 60-bit integer and
4-bits for the integer size in bytes. This replaces the APInt field
previously in the type but preserves enough information to recreate it
where needed.

To reduce the number of places where the APInt is re-constructed I've
also added some helper methods to the ConstantArrayType to allow some
common use cases that operate on either the stored small integer or the
APInt as appropriate.

Resolves #85124.
2024-03-26 14:15:56 -05:00
Akira Hatanaka
b311756450
Revert "[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454)" (#86674)
This reverts commit 8bd1f9116aab879183f34707e6d21c7051d083b6.

It appears that the commit broke msan bots.
2024-03-26 07:37:57 -07:00
Akira Hatanaka
8bd1f9116a
[CodeGen][arm64e] Add methods and data members to Address, which are needed to authenticate signed pointers (#67454)
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.

This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.

In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
2024-03-25 18:05:42 -07:00
mikaoP
4e3310a813
[clang] Fix OMPT ident flag in combined distribute parallel for pragma (#80987)
Authored-by: Raúl Peñacoba Veigas <rpenacob@bsc.es>
2024-03-12 07:50:35 -04:00
Joseph Huber
cc374d8056
[OpenMP] Remove register_requires global constructor (#80460)
Summary:
Currently, OpenMP handles the `omp requires` clause by emitting a global
constructor into the runtime for every translation unit that requires
it. However, this is not a great solution because it prevents us from
having a defined order in which the runtime is accessed and used.

This patch changes the approach to no longer use global constructors,
but to instead group the flag with the other offloading entires that we
already handle. This has the effect of still registering each flag per
requires TU, but now we have a single constructor that handles
everything.

This function removes support for the old `__tgt_register_requires` and
replaces it with a warning message. We just had a recent release, and
the OpenMP policy for the past four releases since we switched to LLVM
is that we do not provide strict backwards compatibility between major
LLVM releases now that the library is versioned. This means that a user
will need to recompile if they have an old binary that relied on
`register_requires` having the old behavior. It is important that we
actively deprecate this, as otherwise it would not solve the problem of
having no defined init and shutdown order for `libomptarget`. The
problem of `libomptarget` not having a define init and shutdown order
cascades into a lot of other issues so I have a strong incentive to be
rid of it.

It is worth noting that the current `__tgt_offload_entry` only has space
for a 32-bit integer here. I am planning to overhaul these at some point
as well.
2024-02-21 11:33:32 -06:00
Jan Patrick Lehr
fa4780fa6c
[OpenMP][USM] Introduces -fopenmp-force-usm flag (#76571)
This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.

The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.
Which I think is the only difference observable in code generation.

This runtime test checks for the (non-)occurence of data movement between host
and device. It does one run without the flag and one with the flag to
also see that both versions behave as expected. In the case w/o the new
flag data movement between host and device is expected. In the case with
the flag such data movement should not be present / reported.
2024-01-22 21:59:26 +01:00
Gheorghe-Teodor Bercea
4ef6587715
[Clang][OpenMP] Fix mapping of structs to device (#75642)
Fix mapping of structs to device.

The following example fails:

```
#include <stdio.h>
#include <stdlib.h>

struct Descriptor {
  int *datum;
  long int x;
  int xi;
  long int arr[1][30];
};

int main() {
  Descriptor dat = Descriptor();
  dat.datum = (int *)malloc(sizeof(int)*10);
  dat.xi = 3;
  dat.arr[0][0] = 1;

  #pragma omp target enter data map(to: dat.datum[:10]) map(to: dat)

  #pragma omp target
  {
    dat.xi = 4;
    dat.datum[dat.arr[0][0]] = dat.xi;
  }

  #pragma omp target exit data map(from: dat)

 return 0;
}
```

This is a rework of the previous attempt:
https://github.com/llvm/llvm-project/pull/72410
2023-12-18 09:47:59 -05:00
jyu2-git
0113722d82
[OpenMP] Fix runtime problem due to wrong map size. (#74692)
Currently we are missing set up-boundary address for FinalArraySection
as highests elements in partial struct data.

Currently for:
\#pragma omp target map(D.a) map(D.b[:2])
The size is:
  %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
  %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
  %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
  %2 = getelementptr float, ptr %arrayidx, i32 1
  %3 = ptrtoint ptr %2 to i64
  %4 = ptrtoint ptr %a to i64
  %5 = sub i64 %3, %4
%6 = sdiv exact i64 %5, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)

Where %2 is wrong for (D.b[:2]) is pointer to first element of array
section. It should pointe to last element of array section.
  
The fix is to emit the pointer to the last element of array section and
use this pointer as the highest element in partial struct data.

After change IR:
  %a = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 0
  %b = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
  %arrayidx = getelementptr inbounds [2 x float], ptr %b, i64 0, i64 0
  %b1 = getelementptr inbounds %struct.DataTy, ptr %D, i32 0, i32 1
  %arrayidx2 = getelementptr inbounds [2 x float], ptr %b1, i64 0, i64 1
  %1 = getelementptr float, ptr %arrayidx2, i32 1
  %2 = ptrtoint ptr %1 to i64
  %3 = ptrtoint ptr %a to i64
  %4 = sub i64 %2, %3
%5 = sdiv exact i64 %4, ptrtoint (ptr getelementptr (i8, ptr null, i32
1) to i64)
2023-12-07 09:38:56 -08:00
Ivan R. Ivanov
065796bb92
[clang][OpenMP] Fix missing DI for __kmpc_global_thread_num (#73856)
Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>
2023-11-30 13:21:03 -06:00
Joseph Huber
237adfca4e
[OpenMP] Rework handling of global ctor/dtors in OpenMP (#71739)
Summary:
This patch reworks how we handle global constructors in OpenMP.
Previously, we emitted individual kernels that were all registered and
called individually. In order to provide more generic support, this
patch moves all handling of this to the target backend and the runtime
plugin. This has the benefit of supporting the GNU extensions for
constructors an destructors, removing a class of failures related to
shared library destruction order, and allows targets other than OpenMP
to use the same support without needing to change the frontend.

This is primarily done by calling kernels that the backend emits to
iterate a list of ctor / dtor functions. For x64, this is automatic and
we get it for free with the standard `dlopen` handling. For AMDGPU, we
emit `amdgcn.device.init` and `amdgcn.device.fini` functions which
handle everything atuomatically and simply need to be called. For NVPTX,
a patch https://github.com/llvm/llvm-project/pull/71549 provides the
kernels to call, but the runtime needs to set up the array manually by
pulling out all the known constructor / destructor functions.

One concession that this patch requires is the change that for GPU
targets in OpenMP offloading we will use `llvm.global_dtors` instead of
using `atexit`. This is because `atexit` is a separate runtime function
that does not mesh well with the handling we're trying to do here. This
should be equivalent in all cases except for cases where we would need
to destruct manually such as:

```
struct S { ~S() { foo(); } };
void foo() {
  static S s;
}
```

However this is broken in many other ways on the GPU, so it is not
regressing any support, simply increasing the scope of what we can
handle.

This changes the handling of ctors / dtors. This patch now outputs a
information message regarding the deprecation if the old format is used.
This will be completely removed in a later release.

Depends on: https://github.com/llvm/llvm-project/pull/71549
2023-11-10 14:53:53 -06:00
Vlad Serebrennikov
dda8e3de35 [clang][NFC] Refactor ImplicitParamDecl::ImplicitParamKind
This patch converts `ImplicitParamDecl::ImplicitParamKind` into a scoped enum at namespace scope, making it eligible for forward declaring. This is useful for `preferred_type` annotations on bit-fields.
2023-11-06 12:01:09 +03:00
Vlad Serebrennikov
edd690b02e
[clang][NFC] Refactor TagTypeKind (#71160)
This patch converts TagTypeKind into scoped enum. Among other benefits,
this allows us to forward-declare it where necessary.
2023-11-03 21:45:39 +04:00
Vlad Serebrennikov
50dec541f3 [clang][NFC] Refactor OMPDeclareReductionDecl::InitKind
This patch moves `OMPDeclareReductionDecl::InitKind` to DeclBase.h, so that it's complete at the point where corresponding bit-field is declared. This patch also converts it to scoped enum named `OMPDeclareReductionInitKind`
2023-11-01 12:40:13 +03:00
Vlad Serebrennikov
49fd28d960 [clang][NFC] Refactor ArrayType::ArraySizeModifier
This patch moves `ArraySizeModifier` before `Type` declaration so that it's complete at `ArrayTypeBitfields` declaration. It's also converted to scoped enum along the way.
2023-10-31 18:06:34 +03:00
Youngsuk Kim
c1183399a8 [clang] Remove no-op ptr-to-ptr bitcasts (NFC) 2023-10-30 16:51:55 -05:00
Johannes Doerfert
31b91213bd [OpenMP] Unify the min/max thread/teams pathways
We used to pass the min/max threads/teams values through different paths
from the frontend to the middle end. This simplifies the situation by
passing the values once, only when we will create the KernelEnvironment,
which contains the values. At that point we also manifest the metadata,
as appropriate. Some footguns have also been removed, e.g., our target
check is now triple-based, not calling convention-based, as the latter
is dependent on the ordering of operations. The types of the values have
been unified to int32_t.
2023-10-29 10:53:20 -07:00
Johannes Doerfert
0ba57c8bba
[OpenMP] Pass min/max thread and team count to the OMPIRBuilder (#70247)
We now provide the information about the min/max thread and team count
from to the OMPIRBuilder, no matter what the source was. That means we
unify `thread_limit`, `num_teams`, `num_threads` handling with the
target specific attriutes (`__launch_bounds__` and
`amdgpu_flat_work_group_size`). This is in preparation to pass the
values to the runtime, and to allow the middle-end (OpenMP-opt) to
tighten the values if it seems appropriate. There is no "real" change
after this commit.
2023-10-26 14:45:07 -07:00
David Pagan
52315f9b75
[clang][OpenMP] Fix target data if/logical expression assert fail (#70268)
Fixed assertion failure

  Basic Block in function 'main' does not have terminator!
  label %land.end

caused by premature setting of CodeGenIP upon entry to
emitTargetDataCalls, where subsequent evaluation of logical expression
created new basic blocks, leaving CodeGenIP pointing to the wrong basic
block. CodeGenIP is now set near the end of the function, just prior to
generating a comparison of the logical expression result (from the if
clause) which uses CodeGenIP to insert new IR.
2023-10-26 13:19:37 -07:00
agozillon
bc0c1783fd
[Clang][OpenMP][OMPIRBuilder] Move Clang's OpenMP Member/MemberOf flag helpers into the OMPIRBuilder (#67844)
This patch seeks to move the following functions to the OMPIRBuilder:
 - getFlagMemberOffset
 - getMemberOfFlag
 - setCorrectMemberOfFlag

These small helper functions help set the end bits of the
OpenMPOffloadMappingFlags flag that correspond to the reserved segment
for OMP_MAP_MEMBER_OF.

They will be of use in the future for lowering MLIR types/values that
can contian members and can be lowered similarly to a structure or class
type within the OpenMPToLLVMIRTranslation step of the OpenMP dialects
lowering to LLVM-IR. In particular for Flang which currently uses this
flow. Types with descriptors like pointers/allocatables, and likely
derived types in certain cases can be lowered as if they were structures
with explicitly mapped members.
2023-10-03 15:20:44 +02:00
Matheus Izvekov
9c89b29555
-fsanitize=function: fix MSVC hashing to sugared type (#66816)
Hashing the sugared type instead of the canonical type meant that
a simple example like this would always fail under MSVC:

```
static auto l() {}
int main() {
  auto a = l;
  a();
}
```
`clang --target=x86_64-pc-windows-msvc -fno-exceptions
-fsanitize=function -g -O0 -fuse-ld=lld -o test.exe test.cc`

produces:
```
test.cc:4:3: runtime error: call to function l through pointer to incorrect function type 'void (*)()'
```
2023-10-02 19:09:39 +02:00
Corentin Jabot
af4751738d [C++] Implement "Deducing this" (P0847R7)
This patch implements P0847R7 (partially),
CWG2561 and CWG2653.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D140828
2023-10-02 14:33:02 +02:00
Jan Svoboda
8a2fb1391b
[clang] NFCI: Use FileEntryRef in SourceManager::FileInfos (#67742) 2023-09-29 08:04:34 -07:00
Sergio Afonso
9058762789
[OpenMP][Flang][MLIR] Lowering of requires directive from MLIR to LLVM IR
Default atomic ordering information is processed in the OpenMP dialect
to LLVM IR lowering stage at every spot where an operation can be
affected by it. The rest of clauses are stored globally in the
OpenMPIRBuilderConfig object before starting that lowering stage, so
that the OMPIRBuilder can conditionally modify code generation
depending on these. At the end of the process, the omp.requires
attribute is itself lowered into a global constructor that passes these
clauses as flags to the OpenMP runtime.

Depends on D147217, D147218 and D158278.

Differential Revision: https://reviews.llvm.org/D147219
2023-09-14 10:35:44 +01:00
Sergio Afonso
094a63a20b
[OpenMP][OMPIRBuilder] OpenMPIRBuilder support for requires directive
This patch updates the `OpenMPIRBuilderConfig` structure to hold all
available 'requires' clauses, and it replicates part of the code
generation for the 'requires' registration function from clang in the
`OMPIRBuilder`, to be used with flang.

Porting the rest of features of the clang implementation to the IRBuilder
and sharing it between clang and flang remains for a future patch, due to the
complexity of the logic selecting the attributes of the generated
registration function.

Differential Revision: https://reviews.llvm.org/D147217
2023-09-14 10:33:54 +01:00
Chris Bieneman
400d3261a0 [HLSL] Cleanup support for this as an l-value
The goal of this change is to clean up some of the code surrounding
HLSL using CXXThisExpr as a non-pointer l-value. This change cleans up
a bunch of assumptions and inconsistencies around how the type of
`this` is handled through the AST and code generation.

This change is be mostly NFC for HLSL, and completely NFC for other
language modes.

This change introduces a new member to query for the this object's type
and seeks to clarify the normal usages of the this type.

With the introudction of HLSL to clang, CXXThisExpr may now be an
l-value and behave like a reference type rather than C++'s normal
method of it being an r-value of pointer type.

With this change there are now three ways in which a caller might need
to query the type of `this`:

* The type of the `CXXThisExpr`
* The type of the object `this` referrs to
* The type of the implicit (or explicit) `this` argument

This change codifies those three ways you may need to query
respectively as:

* CXXMethodDecl::getThisType()
* CXXMethodDecl::getThisObjectType()
* CXXMethodDecl::getThisArgType()

This change then revisits all uses of `getThisType()`, and in cases
where the only use was to resolve the pointee type, it replaces the
call with `getThisObjectType()`. In other cases it evaluates whether
the desired returned type is the type of the `this` expr, or the type
of the `this` function argument. The `this` expr type is used for
creating additional expr AST nodes and for member lookup, while the
argument type is used mostly for code generation.

Additionally some cases that used `getThisType` in simple queries could
be substituted for `getThisObjectType`. Since `getThisType` is
implemented in terms of `getThisObjectType` calling the later should be
more efficient if the former isn't needed.

Reviewed By: aaron.ballman, bogner

Differential Revision: https://reviews.llvm.org/D159247
2023-09-05 19:38:50 -05:00
Takuya Shimizu
01b88dd66d [NFC] Remove unused variables declared in conditions
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.

Differential Revision: https://reviews.llvm.org/D158016
2023-08-30 10:05:06 +09:00
Kazu Hirata
24fcc0a7c5 [clang] Remove redundant control flow statements (NFC) 2023-08-27 12:13:25 -07:00
Sandeep Kosuri
08bbff4aad [OpenMP] Codegen support for thread_limit on target directive for host
offloading

- This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5]
- The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region.

Differential Revision: https://reviews.llvm.org/D152054
2023-08-26 22:18:49 -05:00
Joseph Huber
9da61aed75 [OpenMP] Emit offloading entries for indirect target variables
OpenMP 5.1 allows emission of the `indirect` clause on declare target
functions, see https://www.openmp.org/spec-html/5.1/openmpsu70.html#x98-1080002.14.7.
The intended use of this is to permit calling device functions via their
associated host pointer. In order to do this the first step will be
building a map associating these variables. Doing this will require the
same offloading entry handling we use for other kernels and globals.

We intentionally emit a new global on the device side. Although it's
possible to look up the device function's address directly, this would
require changing the visibility and would prevent us from making static
functions indirect. Also, the CUDA toolchain will optimize out unused
functions and using a global prevents that. The downside is that the
runtime will need to read the global and copy its value, but there
shouldn't be any other costs.

Note that this patch just performs the codegen, currently this new
offloading entry type is unused and will be ignored by the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D157738
2023-08-24 18:21:13 -05:00
Johannes Doerfert
c5488c8dcc [OpenMP] Properly set static thread limit (w/o analysis)
We used to have two separate implementations to derive the number of
threads used in a target region. This lead us to sometimes miss out on
user provided thread bounds (num_threads, or thread_limit) when we
looked for "constant default values". If we might miss out on the
presence of those bounds, we cannot set the thread_limit statically
since the runtime will try to honor user input rather than cap it at the
"preferred default". This patch replaces the secondary implementation
with the primary in a mode that will not emit code but just look for the
presence, and potentially upper bounds, of thread limiting clauses.

The runtime test would not pass without this rewrite as we missed some
clauses, set the static limit on the device to the preferred value, but
then violated that value at runtime.

Fixes: https://github.com/llvm/llvm-project/issues/64845

Differential Revision: https://reviews.llvm.org/D158381
2023-08-23 11:12:03 -07:00
Jonas Hahnfeld
2f3fe3ed97 [CodeGen] Remove Constant arguments from linkage functions, NFCI.
This was unused since commit dd2362a8ba last year.

Differential Revision: https://reviews.llvm.org/D156891
2023-08-17 08:28:51 +02:00
Akash Banerjee
5d9ccd7a96 [OpenMP] Migrate dispatch related utility functions from Clang codegen to OMPIRBuilder
Migrate createForStaticInitFunction, createDispatchInitFunction, createDispatchNextFunction and createDispatchFiniFunction from Clang CodeGen to OMPIRBuilder.

Differential Revision: https://reviews.llvm.org/D157994
2023-08-16 16:35:28 +01:00
Hao Jin
c384e79675 [OpenMP][IR] Set correct alignment for internal variables
OpenMP runtime functions assume the pointers are aligned to sizeof(pointer),
but it is being aligned incorrectly. Fix with the proper alignment in the IR builder.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D157040
2023-08-11 15:20:15 -04:00
Joseph Huber
141c4e7a94 [OpenMP] Do not always emit unused extern variables
Currently, the precense of the OpenMP target declare metadata requires
that we always codegen a global declaration. This is undesirable in the
case that we could defer or omit this declaration as is common with
unused extern variables. This is important as it allows us, in the
runtime, to rely on static linking semantics to omit unused symbols so
they are not included when the user links it in.

This patch changes the check for always emitting these variables.
Because of this we also need to extend this logic to the generation of
the offloading entries. This has the result of derring the offload entry
generation to the canonical definitoin. So we are effectively assuming
whoever owns the storage for this variable will perform that operation.
This makes an exception for `link` attributes as those require their own
special handling.

Let me know if this is sound in the implementation, I do not have the
largest view of the standards here.

Fixes: https://github.com/llvm/llvm-project/issues/64133

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D156368
2023-07-28 11:52:05 -05:00
Johannes Doerfert
08a220764b Reapply "[OpenMP] Add the ompx_attribute clause for target directives"
This reverts commit 0d12683046ca75fb08e285f4622f2af5c82609dc and
reapplies ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2 with an extension to
fix the Flang build.

Differential Revision: https://reviews.llvm.org/D156184
2023-07-25 10:40:35 -07:00
Aaron Ballman
0d12683046 Revert "[OpenMP] Add the ompx_attribute clause for target directives"
This reverts commit ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2.

The changes broke several bots:
https://lab.llvm.org/buildbot/#/builders/176/builds/3408
https://lab.llvm.org/buildbot/#/builders/198/builds/4028
https://lab.llvm.org/buildbot/#/builders/197/builds/8491
https://lab.llvm.org/buildbot/#/builders/197/builds/8491
2023-07-25 07:57:36 -04:00
Johannes Doerfert
ef9ec4bbcc [OpenMP] Add the ompx_attribute clause for target directives
CUDA and HIP have kernel attributes to tune the code generation (in the
backend). To reuse this functionality for OpenMP target regions we
introduce the `ompx_attribute` clause that takes these kernel
attributes and emits code as if they had been attached to the kernel
fuction (which is implicitly generated).

To limit the impact, we only support three kernel attributes:
`amdgpu_waves_per_eu`, for AMDGPU
`amdgpu_flat_work_group_size`, for AMDGPU
`launch_bounds`, for NVPTX

The existing implementations of those attributes are used for error
checking and code generation. `ompx_attribute` can be attached to any
executable target region and it can hold more than one kernel attribute.

Differential Revision: https://reviews.llvm.org/D156184
2023-07-24 22:04:45 -07:00
Akash Banerjee
227012cbd7 [OpenMP] Migrate device code privatisation from Clang CodeGen to OMPIRBuilder
This patch migrates the UseDevicePtr and UseDeviceAddr clause related code for handling privatisation from Clang codegen to the OMPIRBuilder

Depends on D150860

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D152554
2023-07-12 12:03:28 +01:00