This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
Fixes#153043.
This is another case of debug location not getting updated when the
insert point is changed by the `restoreIP`. Fixed by using the wrapper
function that updates the debug location.
A function added in pr#151306 was under NDEBUG macro which caused the
build to fail in certain cases. It has been moved out of the #ifdef
check to ensure it is always compiled.
This fixes#147063.
I tried to fix this issue in more general way in
https://github.com/llvm/llvm-project/pull/147091 but the reviewer
suggested to fix the locations which are causing this issue. So this is
a more targeted approach.
The `restoreIP` is frequently used in the `OMPIRBuilder` to change the
insert position. This function eventually calls
`SetInsertPoint(BasicBlock *TheBB, BasicBlock::iterator IP)`. This
function updates the insert point and the debug location. But if the
`IP` is pointing to the end of the `TheBB`, then the debug location is
not updated and we could have a mismatch between insert point and the
debug location.
The problem can occur in 2 different code patterns.
This code below shows the first scenario.
```
1. auto curPos = builder.saveIP();
2. builder.restoreIP(/* some new pos */);
3. // generate some code
4. builder.restoreIP(curPos);
```
If `curPos` points to the end of basic block, we could have a problem.
But it is easy one to handle as we have the location before hand and can
save the correct debug location before 2 and then restore it after 3.
This can be done either manually or using the `llvm::InsertPointGuard`
as shown below.
```
// manual approach
auto curPos = builder.saveIP();
llvm::DebugLoc DbgLoc = builder.getCurrentDebugLocation();
builder.restoreIP(/* some new pos */);
// generate some code
builder.SetCurrentDebugLocation(DbgLoc);
builder.restoreIP(curPos);
{
// using InsertPointGuard
llvm::InsertPointGuard IPG(builder);
builder.restoreIP(/* some new pos */);
// generate some code
}
```
This PR fixes one problematic case using the manual approach.
For the 2nd scenario, look at the code below.
```
1. void fn(InsertPointTy allocIP, InsertPointTy codegenIP) {
2. builder.setInsertPoint(allocIP);
3. // generate some alloca
4. builder.setInsertPoint(codegenIP);
5. }
```
The `fn` can be called from anywhere and we can't assume the debug
location of the builder is valid at the start of the function. So if 4
does not update the debug location because the `codegenIP` points at the
end of the block, the rest of the code can end up using the debug
location of the `allocaIP`. Unlike the first case, we don't have a debug
location that we can save before hand and restore afterwards.
The solution here is to use the location of the last instruction in that
block. I have added a wrapper function over `restoreIP` that could be
called for such cases. This PR uses it to fix one problematic case.
Scan reductions are supported in OpenMP with the help of scan directive.
Reduction clause of the for loop/simd directive can take an `inscan`
modifier along with the body of the directive specifying a `scan`
directive. This PR implements the lowering logic for scan reductions in
workshare loops of OpenMP.
The body of the for loop is split into two loops (Input phase loop and
Scan Phase loop) and a scan reduction loop is added in the middle. The
Input phase loop populates a temporary buffer with initial values that
are to be reduced. The buffer is used by the reduction loop to perform
scan reduction. Scan phase loop copies the values of the buffer to the
reduction variable before executing the scan phase. Below is a high
level view of the code generated.
```
<declare pointer to buffer> ptr
omp parallel {
size num_iters = <num_iters>
// temp buffer allocation
omp masked {
buff = malloc(num_iters*scanvarstype)
*ptr = buff
}
barrier;
// input phase loop
for (i: 0..<num_iters>) {
<input phase>;
buffer = *ptr;
buffer[i] = red;
}
// scan reduction
omp masked
{
for (int k = 0; k != ceil(log2(num_iters)); ++k) {
i=pow(2,k)
for (size cnt = last_iter; cnt >= i; --cnt) {
buffer = *ptr;
buffer[cnt] op= buffer[cnt-i];
}
}
}
barrier;
// scan phase loop
for (0..<num_iters>) {
buffer = *ptr;
red = buffer[i] ;
<scan phase>;
}
// temp buffer deletion
omp masked {
free(*ptr)
}
barrier;
}
```
The temporary buffer needs to be shared between all threads performing
reduction since it is read/written in Input and Scan workshare Loops.
This is achieved by declaring a pointer to the buffer in the shared
region and dynamically allocating the buffer by the master thread.
This is the reason why allocation, deallocation and scan reduction are
performed within `masked`. The code is verified to produce correct
results for Fortran programs with the code changes in the PR
https://github.com/llvm/llvm-project/pull/133149
Adding mlir to llvm support for atomic control options.
Atomic Control Options are used to specify architectural characteristics
to help lowering of atomic operations. The options used are:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
Legacy option `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
More details can be found in
https://github.com/llvm/llvm-project/pull/102569. This PR implements the
MLIR to LLVM lowering support of atomic control attributes specified
with OpenMP `atomicUpdateOp`.
Initial support can be found in PR:
https://github.com/llvm/llvm-project/pull/150860
`LocationDescription` contains both the insertion point and the debug
location. When `LocationDescription` is available, it is better to use
`updateToLocation` which will update both. This PR replaces
`restoreIP(Loc.IP)` with `updateToLocation(Loc)` as former may not
update debug location in all cases.
I am not checking the return value of `updateToLocation` because that is
checked just a few lines above in all cases and we would have returned
early if it failed.
Reverts llvm/llvm-project#147950
I noticed some fails in the reduction tests with clang after this
change. I need to understand the failures better. Reverting this for
now.
Reduction support: https://github.com/llvm/llvm-project/pull/146671
If Support is fixed in this PR
The problem for the IF clause in composite constructs was that wsloop
and simd both operate on the same CanonicalLoopInfo structure: with the
SIMD processed first, followed by the wsloop. Previously the IF clause
generated code like
```
if (cond) {
while (...) {
simd_loop_body;
}
} else {
while (...) {
nonsimd_loop_body;
}
}
```
The problem with this is that this invalidates the CanonicalLoopInfo
structure to be processed by the wsloop later. To avoid this, in this
patch I preserve the original loop, moving the IF clause inside of the
loop:
```
while (...) {
if (cond) {
simd_loop_body;
} else {
non_simd_loop_body;
}
}
```
On simple examples I tried LLVM was able to hoist the if condition
outside of the loop at -O3.
The disadvantage of this is that we cannot add the
llvm.loop.vectorize.enable attribute on either the SIMD or non-SIMD
loops because they both share a loop back edge. There's no way of
solving this without keeping the old design of having two different
loops: which cannot be represented using only one CanonicalLoopInfo
structure. I don't think the presence or absence of this attribute makes
much difference. In my testing it is the llvm.loop.parallel_access
metadata which makes the difference to vectorization. LLVM will
vectorize if legal whether or not this attribute is there in the TRUE
branch. In the FALSE branch this means the loop might be vectorized even
when the condition is false: but I think this is still standards
compliant: OpenMP 6.0 says that when the if clause is false that should
be treated like the SIMDLEN clause is one. The SIMDLEN clause is defined
as a "hint". For the same reason, SIMDLEN and SAFELEN clauses are
silently ignored when SIMD IF is used.
I think it is better to implement SIMD IF and ignore SIMDLEN and SAFELEN
and some vectorization encouragement metadata when combined with IF than
to ignore IF because IF could have correctness consequences whereas the
rest are optimiztion hints. For example, the user might use the IF
clause to disable SIMD programatically when it is known not safe to
vectorize the loop. In this case it is not at all safe to add the
parallel access or SAFELEN metadata.
We have this pattern of code in OMPIRBuilder for many functions that are
used in reduction operations.
```
Function *LtGRFunc = Function::Create
BasicBlock *EntryBlock = BasicBlock::Create(Ctx, "entry", LtGRFunc);
Builder.SetInsertPoint(EntryBlock);
```
The insertion point is moved to the new function but the debug location is not updated. This means that reduction function will use the debug location that points to another function. This problem gets hidden because these functions gets inlined but the potential for failure exists.
This patch resets the debug location when insertion point is moved to new function. Some `InsertPointGuard` have been added to make sure why restore the debug location correctly when we are done with the reduction function.
Implement a state machine that consumes tokens (words delimited by white
space), and returns the corresponding directive id, or fails if the tokens
did not form a valid name.
This is combination of https://github.com/llvm/llvm-project/pull/138149
and https://github.com/llvm/llvm-project/pull/138039 which were opened
separately for ease of reviewing. Only other change is adjustments in 2
tests which have gone in since.
There are `DeclareOp` present for the variables mapped into target
region. That allow us to generate debug information for them. But the
`TargetOp` is still part of parent function and those variables get the
parent function's `DISubprogram` as a scope.
In `OMPIRBuilder`, a new function is created for the `TargetOp`. We also
create a new `DISubprogram` for it. All the variables that were in the
target region now have to be updated to have the correct scope. This
after the fact updating of
debug information becomes very difficult in certain cases. Take the
example of variable arrays. The type of those arrays depend on the
artificial `DILocalVariable`(s) which hold the size(s) of the array.
This new function will now require that we generate the new variable and
and new types. Similar issue exist for character type variables too.
To avoid this after the fact updating, this PR generates a
`DISubprogramAttr` for the `TargetOp` while generating the debug info in
`flang`. Then we don't need to generate a `DISubprogram` in
`OMPIRBuilder`. This change is made a bit more complicated by the the
fact that in new scheme, the debug location already points to the new
`DISubprogram` by the time it reaches `convertOmpTarget`. But we need
some code generation in the parent function so we have to carefully
manage the debug locations.
This fixes issue `#134991`.
The code in `OpenMPIRBuilder::getTargetEntryUniqueInfo` calls
`ID.getDevice()` even when `getUniqueID` has failed and ID is
un-initialized. This caused a sanitizer fail for me in
https://github.com/llvm/llvm-project/pull/145026. Fix it by giving a
default value to `ID`. The value chosen is the same as used in
`OpenMPToLLVMIRTranslation.cpp`.
CodeExtractor can currently erroneously insert an alloca into a
different function than it inserts its users into, in cases where code
is being extracted out of a function that has already been outlined. Add
an assertion that the two blocks being inserted into are actually in the
same function.
Add a check to findAllocaInsertPoint in OpenMP to LLVMIR translation to
prevent the aforementioned scenario from happening.
OpenMPIRBuilder relies on a callback mechanism to fix-up a module later
on during the finaliser step. In some cases this results in the module
being invalid prior to the finalise step running. Remove calls to
verifyModule wrapped in LLVM_DEBUG from CodeExtractor, as the presence
of those results in the compiler crashing with -mllvm -debug due to
premature module verification where it would not crash without -debug.
Call ompBuilder->finalize() the end of mlir::translateModuleToLLVMIR, in
order to make sure the module has actually been finalized prior to
trying to verify it.
Resolves https://github.com/llvm/llvm-project/issues/138102.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Without this gcc warned like
/repo/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp:7559:68: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
7559 | NumStaleCIArgs == (OffloadingArraysToPrivatize.size() + 2) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
7560 | "Wrong number of arguments for StaleCI when shareds are present");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In OMPIRBuilder, we have many cases where we don't handle the debug
location correctly while changing the location or insertion point. This
is one of those cases.
Please see the following test program.
```
program main
implicit none
integer i, j
integer array(16384)
!$omp target teams distribute
DO i=1,16384
!$omp parallel do
DO j=1,16384
array(j) = i
ENDDO
!$omp end parallel do
ENDDO
!$omp end target teams distribute
print *, array
end program main
```
When tried to compile with the follownig command
`flang -g -O2 -fopenmp test.f90 -o test --offload-arch=gfx90a`
will fail in the verification with the following errors: `!dbg
attachment points at wrong subprogram for function`
This happens because we were dropping the debug location in the
createCanonicalLoop and the call to the functions like
`__kmpc_distribute_static_4u` get generated without a debug location.
When it gets inlined, the locations inside it are not adjusted as the
call instruction does not have the debug locations
(`llvm/lib/Transforms/Utils/InlineFunction.cpp:fixupLineNumbers`). Later
Verifier finds that the caller have instructions with debug locations
that point to another function and fails.
The fix is simple to not drop the debug location.
When we offload to the target, the pointers to data used by the kernel
are passed in arrays created by `OMPIRBuilder`. These arrays of pointers
are allocated on the stack on the host. This is fine for the most part
because absent the `nowait` clause, the default behavior is that target
tasks are included tasks. That is, the host is blocked until the
offloaded target kernel is done. In turn, this means that the host's
stack frame is intact and accessing the array of pointers when
offloading is safe. However, when `nowait` is used on the `!$ omp
target` instance, then the target task is a deferred task meaning, the
generating task on the host does not have to wait for the target task
to finish. In such cases, it is very likely that the stack frame of the
function invoking the target call is wound up thereby leading to memory
access errors as shown below.
```
AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_INVALID_ALLOCATION: The requested allocation is not valid.
AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_INVALID_ALLOCATION: The requested allocation is not valid. "PluginInterface" error: Failure to allocate device memory: Failed to allocate from memory manager
fort.cod.out: /llvm/llvm-project/offload/plugins-nextgen/common/src/PluginInterface.cpp:1434: Error llvm::omp::target::plugin::PinnedAllocationMapTy::lockMappedHostBuffer(void *, size_t): Assertion `HstPtr && "Invalid pointer"' failed.
Aborted (core dumped)
```
This PR implements support in `OMPIRBuilder` to store these arrays of
pointers in the task structure that is passed to the target task thereby
ensuring it is available to the target task when the target task is
eventually scheduled.
---------
Co-authored-by: Sergio Afonso <safonsof@amd.com>
In "get<lang>DirectiveName(Kind, Version)", return the spelling that
corresponds to Version, and in "get<lang>DirectiveKindAndVersions(Name)"
return the pair {Kind, VersionRange}, where VersionRange contains the
minimum and the maximum versions that allow "Name" as a spelling. This
applies to clauses as well. In general it applies to classes that have
spellings (defined via TableGen class "Spelling").
Given a Kind and a Version, getting the corresponding spelling requires
a runtime search (which can fail in a general case). To avoid generating
the search function inline, a small additional component of
llvm/Frontent was added: LLVMFrontendDirective. The corresponding header
file also defines C++ classes "Spelling" and "VersionRange", which are
used in TableGen/DirectiveEmitter as well.
For background information see
https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507
This PR fixes a crash that curently happens given the following input:
```fortran
subroutine caller()
real :: x
integer :: i
!$omp target
x = i
call callee(x,x)
!$omp end target
endsubroutine caller
subroutine callee(x1,x2)
real :: x1, x2
endsubroutine callee
```
The crash happens because the following sequence of events is taken by
the `OMPIRBuilder`:
1. ....
2. An outlined function for the target region is created. At first the
outlined function still refers to the SSA values from the original
function of the target region.
3. The builder then iterates over the users of SSA values used in the
target region to replace them with the corresponding function arguments
of outlined function.
4. If the same instruction references the SSA value more than once (say
m), all uses of that SSA value are replaced in the instruction. Deleting
all m uses of the value.
5. The next m-1 iterations will still iterate over the same instruction
dropping the last m-1 actual users of the value.
Hence, we collect all users first before modifying them.
Static analysis flagged the passing of Dependencies to emitTargetCall as
a
place we could use std::move to avoid copying. A closer look indicated
we could
instead turn the parameter into a const & and not have a default value
since it
was only used in two lines in a test and changing those two locations
was easy.
The device and implementation set should trigger elision of tokens if
they do not match statically in a begin/end declare variant. This simply
extends the logic from the device set only and includes the
implementation set.
Reported by @kkwli.
Static analysis flagged `MapnamesName` because we could move it into
`createOffloadMapnames`. I just replaced the local with a direct call to
`createPlatformSpecificName` at the function argument location.
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
When we get a function back from `CodeExtractor`, we discard its entry
block after coping its instructions into the entry block we prepared.
While copying the instructions, the terminator is discarded for obvious
reasons. But if there were some debug values attached to the terminator,
those are useful and needs to be copied.
This is quite ugly but it is the best I could think of. The old
FiniCBWrapper was way too brittle depending upon the exact block
structure inside of the section, and could be confused by any control
flow in the section (e.g. an if clause on cancel). The wording in the
comment and variable names didn't seem to match where it was actually
branching too as well.
Clang's (non-OpenMPIRBuilder) lowering for cancel inside of sections
branches to a block containing __kmpc_for_static_fini.
This was hard to achieve here because sometimes the FiniCBWrapper has to
run before the worksharing loop finalization has been crated.
To get around this ordering issue I created a dummy branch to a dummy
block, which is then fixed later once all of the information is
available.
This patch adds support for LLVM IR atomicrmw `fmaximum` and `fminimum`
instructions.
These mirror the `llvm.maximum.*` and `llvm.minimum.*` instructions, but
are atomic and use IEEE754 2019 handling for NaNs, which is different to
`fmax` and `fmin`. See:
https://llvm.org/docs/LangRef.html#llvm-minimum-intrinsic
for more details.
Future changes will allow this LLVM IR to be lowered to specialised
assembler instructions on suitable targets, such as AArch64.
SPIR-V has strict address space rules, constant globals cannot be in the
default address space.
The OMPIRBuilder change was required for lit tests to pass, we were
missing an addrspacecast.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
When creating the name of an outlined region, Clang uses the file's
inode ID to generate a unique name. When the file does not exist, this
causes a fatal abort of the compiler. This PR switches to a has value
that is used instead.
---------
Co-authored-by: Michael Kruse <github@meinersbur.de>
A small clean-up following up on #131795. Seems like we had 2 quite
similar implementations for the same thing: emit task dependencies
struct and filling it. This PR unifies the 2 versions into one. This is
better since we had to fix a bug in one of them in #131795 so this
applies the fix for both.
Fixes#121289
Given the following MLIR, where a variable: `x`, is `private` for the
`omp.parallel` op and a `depend` for the `omp.task` op:
```mlir
omp.private {type = private} @_QFEx_private_i32 : i32
llvm.func @nested_task_with_deps() {
%0 = llvm.mlir.constant(1 : i64) : i64
%1 = llvm.alloca %0 x i32 {bindc_name = "x"} : (i64) -> !llvm.ptr
omp.parallel private(@_QFEx_private_i32 %1 -> %arg0 : !llvm.ptr) {
omp.task depend(taskdependout -> %arg0 : !llvm.ptr) {
omp.terminator
}
omp.terminator
}
llvm.return
}
```
Before the fix proposed by this PR, the IR builder would emit the
allocation and the initialzation logic for the task's depedency info
struct in the parent function's alloc region:
```llvm
define void @nested_task_with_deps() {
....
%.dep.arr.addr = alloca [1 x %struct.kmp_dep_info], align 8
%2 = getelementptr inbounds [1 x %struct.kmp_dep_info], ptr %.dep.arr.addr, i64 0, i64 0
%3 = getelementptr inbounds nuw %struct.kmp_dep_info, ptr %2, i32 0, i32 0
%4 = ptrtoint ptr %omp.private.alloc to i64
store i64 %4, ptr %3, align 4
....
br label %entry
omp.par.entry: ; preds = %entry
....
%omp.private.alloc = alloca i32, align 4
```
Note the following:
- The private value `x` is alloced where it should be in the parallel
op's entry region,
- howerver, since the privae value is also a depedency of the task and
since allocation and initialzation of the task depedency info struct
both happen in the alloc region,
- this results in the private value being referenced before it is
actually defined.
This PR fixes the issue by only allocating the task depedency info in
the alloc region while initialzation happens in the current IP of the
function with the rest of the logic that depends on it.