With the addition of `__attribute__((visibility("hidden")))` to the test, the test fails because AIX's current default behaviour is to ignore hidden visibility, so the expected error is not seen. This patch marks the test `XFAIL` on AIX for now.
Reviewed By: cebowleratibm
Differential Revision: https://reviews.llvm.org/D122519
This patch adds the necessary AMDGPU calling convention to the ctor /
dtor kernels. These are fundamentally device kenels called by the host
on image load. Without this calling convention information the AMDGPU
plugin is unable to identify them.
Depends on D122504
Fixes#54091
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122515
The default construction of constructor functions by LLVM tends to make
them have internal linkage. When we call a ctor / dtor function in the
target region we are actually creating a kernel that is called at
registration. Because the ctor is a kernel we need to make sure it's
externally visible so we can actually call it. This prevented AMDGPU
from correctly using constructors while NVPTX could use them simply
because it ignored internal visibility.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D122504
Currently the device kernels all have weak linkage to prevent linkage
errors on multiple defintions. However, this prevents some optimizations
from adequately analyzing them because of the nature of weak linkage.
This patch replaces the weak linkage with weak_odr linkage so we can
statically assert that multiple declarations of the same kernel will
have the same definition.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122443
A previous patch removed the compiler generating offloading entries
for variables that were declared on the device but were internal or
hidden. This allowed us to compile programs but turns any attempt to run
'#pragma omp target update' on one of those variables a silent failure.
This patch adds a check in the semantic analysis for if the user is
attempting the update a variable on the device from the host that is not
externally visible.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122403
Current clang generates extra set of simd variant function attribute
with extra 'v' encoding.
For example:
_ZGVbN2v__Z5add_1Pf vs _ZGVbN2vv__Z5add_1Pf
The problem is due to declaration of ParamAttrs following:
llvm::SmallVector<ParamAttrTy, 8> ParamAttrs(ParamPositions.size());
where ParamPositions.size() is grown after following assignment:
Pos = ParamPositions[PVD];
So the PVD is not find in ParamPositions.
The problem is ParamPositions need to set for each FD decl. To fix this
Move ParamPositions's init inside while loop for each FD.
Differential Revision: https://reviews.llvm.org/D122338
Adds basic parsing/sema/serialization support for the
#pragma omp target parallel loop directive.
Differential Revision: https://reviews.llvm.org/D122359
Currently we create offloading entries to register device variables with
the host. When we register a variable we will look up the symbol in the
device image and map the device address to the host address. This is a
problem when the symbol is declared with hidden visibility or internal
linkage. This means the symbol is not accessible externally and we
cannot get its address. We should still allow static variables to be
declared on the device, but ew should not create an offloading entry for
them so they exist independently on the host and device.
Fixes#54309
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122352
Previously, OpenMP variant declarations for a function declaration that included
the 'cpu_dispatch', 'cpu_specific', or 'target' attributes was diagnosed, but
one with the 'target_clones' attribute was not. Now fixed.
Reviewed By: erichkeane, jdoerfert
Differential Revision: https://reviews.llvm.org/D121963
This change extends the existing diagnostic tests for OpenMP variant
declarations to cover diagnostics for declarations that include
multiversion function attributes. The new tests demonstrate a missing
check for the 'target_clones' attribute.
Reviewed By: erichkeane, jdoerfert
Differential Revision: https://reviews.llvm.org/D121962
The EmitLoadOfPointer() call already specified the right pointer
type, but it did not match the Address we're loading from, so we
need to insert a bitcast first.
Rather than using a dummy void pointer type, we should specify the
correct private type and perform the bitcast beforehand rather than
afterwards. This way, the Address will have correct alignment
information.
The ppc64be bot emits the dtor metadata first for some reason. We should
investigate this or make the _cc_ update script able to use variables
instead of fixed numbers (e.g., !1). The IR update script does that
already.
Rather than specifying a dummy type in EmitLoadOfPointer() and
then casting it to the correct one, we should instead specify the
correct type and cast beforehand. Otherwise the computed alignment
will be incorrect.
This reverts commit a597d6a780b184539f504392168b004bf392a135 and
reapplies 07b176646134.
In AMD GPU device code the globals are in AS(1). Before, we crashed if
the global was a structure. Now we simply cast away the AS before we
generate the code to initialize the global.
Differential Revision: https://reviews.llvm.org/D121837
Fixes: https://github.com/llvm/llvm-project/issues/54421
In AMD GPU device code the globals are in AS(1). Before, we crashed if
the global was a structure. Now we simply cast away the AS before we
generate the code to initialize the global.
Differential Revision: https://reviews.llvm.org/D121837
The paramemter of hint clause in OpenMP critical hint should be
non-negative. The omp_lock_hint_none is 0 in omp.h.
Reviewed By: Alexey Bataev
Differential Revision: https://reviews.llvm.org/D121101
The OpenMPIRBuilder has a bug. Specifically, suppose you have two nested openmp parallel regions (writing with MLIR for ease)
```
omp.parallel {
%a = ...
omp.parallel {
use(%a)
}
}
```
As OpenMP only permits pointer-like inputs, the builder will wrap all of the inputs into a stack allocation, and then pass this
allocation to the inner parallel. For example, we would want to get something like the following:
```
omp.parallel {
%a = ...
%tmp = alloc
store %tmp[] = %a
kmpc_fork(outlined, %tmp)
}
```
However, in practice, this is not what currently occurs in the context of nested parallel regions. Specifically to the OpenMPIRBuilder,
the entirety of the function (at the LLVM level) is currently inlined with blocks marking the corresponding start and end of each
region.
```
entry:
...
parallel1:
%a = ...
...
parallel2:
use(%a)
...
endparallel2:
...
endparallel1:
...
```
When the allocation is inserted, it presently inserted into the parent of the entire function (e.g. entry) rather than the parent
allocation scope to the function being outlined. If we were outlining parallel2, the corresponding alloca location would be parallel1.
This causes a variety of bugs, including https://github.com/llvm/llvm-project/issues/54165 as one example.
This PR allows the stack allocation to be created at the correct allocation block, and thus remedies such issues.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D121061
Add applyStaticChunkedWorkshareLoop method implementing static schedule when chunk-size is specified. Unlike a static schedule without chunk-size (where chunk-size is chosen by the runtime such that each thread receives one chunk), we need two nested loops: one for looping over the iterations of a chunk, and a second for looping over all chunks assigned to the threads.
This patch includes the following related changes:
* Adapt applyWorkshareLoop to triage between the schedule types, now possible since all schedules have been implemented. The default schedule is assumed to be non-chunked static, as without OpenMPIRBuilder.
* Remove the chunk parameter from applyStaticWorkshareLoop, it is ignored by the runtime. Change the value for the value passed to the init function to 0, as without OpenMPIRBuilder.
* Refactor CanonicalLoopInfo::setTripCount and CanonicalLoopInfo::mapIndVar as used by both, applyStaticWorkshareLoop and applyStaticChunkedWorkshareLoop.
* Enable Clang to use the OpenMPIRBuilder in the presence of the schedule clause.
Differential Revision: https://reviews.llvm.org/D114413
Changed the we handle llvm::Constants in sizes arrays. ConstExprs and
GlobalValues cannot be used as initializers, need to put them at the
runtime, otherwise there wight be the compilation errors.
Differential Revision: https://reviews.llvm.org/D105297
This patch adds -Wno-strict-prototypes to all of the test cases that
use functions without prototypes, but not as the primary concern of the
test. e.g., attributes testing whether they can/cannot be applied to a
function without a prototype, etc.
This is done in preparation for enabling -Wstrict-prototypes by
default.
Changed the we handle llvm::Constants in sizes arrays. ConstExprs and
GlobalValues cannot be used as initializers, need to put them at the
runtime, otherwise there wight be the compilation errors.
Differential Revision: https://reviews.llvm.org/D105297
Currently when we generate OpenMP offloading code we always make
fallback code for the CPU. This is necessary for implementing features
like conditional offloading and ensuring that unhandled pragmas don't
result in missing symbols. However, this is problematic for a few cases.
For offloading tests we can silently fail to the host without realizing
that offloading failed. Additionally, this makes it impossible to
provide interoperabiility to other offloading schemes like HIP or CUDA
because those methods do not provide any such host fallback guaruntee.
this patch adds the `-fopenmp-offload-mandatory` flag to prevent
generating the fallback symbol on the CPU and instead replaces the
function with a dummy global and the failed branch with 'unreachable'.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D120353
We need to capture the local variables into a record in task untied
regions but clang does not support record with VLA data members.
Differential Revision: https://reviews.llvm.org/D99436
In `clang/test/OpenMP/atomic_ast_print.cpp` for `atomic compare capture`,
it was using 'cond-expr-stmt' instead of 'cond-update-stmt'. The spec only supports
'cond-update-stmt'.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D120252
In 'cond-update-stmt', `else` statement is not expected. This patch adds
the check in Sema.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D120225
This patch adds the support for `atomic compare capture` in parser and part of
sema. We don't create an AST node for this because the spec doesn't say `compare`
and `capture` clauses should be used tightly, so we cannot look one more token
ahead in the parser.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D116261
The runtime uses thread state values to indicate when we use an ICV or
are in nested parallelism. This is done for OpenMP correctness, but it
not needed in the majority of cases. The new flag added is
`-fopenmp-assume-no-thread-state`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120106
This flag was previously renamed `enable_noundef_analysis` to
`disable-noundef-analysis,` which is not a conventional name. (Driver and
CC1's boolean options are using [no-] prefix)
As discussed at https://reviews.llvm.org/D105169, this patch reverts its
name to `[no-]enable_noundef_analysis` and enables noundef-analysis as
default.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D119998
Previously, we would take a declaration like void f(void) and print it
as void f(). That's correct in C++ as far as it goes, but is incorrect
in C because that converts the function from having a prototype to one
which does not.
This turns out to matter for some of our tests that use the pretty
printer where we'd like to get rid of the K&R prototypes from the test
but can't because the test is checking the pretty printed function
signature, as done with the ARCMT tests.
When an a variant is specified that is the same as the base function
the compiler will end up crashing in CodeGen. Give an error instead.
Differential Revision: https://reviews.llvm.org/D119979