From OpenMP 4.0:
> When an if clause is present on a cancel construct and the if
expression
> evaluates to false, the cancel construct does not activate
cancellation.
> The cancellation point associated with the cancel construct is always
> encountered regardless of the value of the if expression.
This wording is retained unmodified in OpenMP 6.0.
This re-opens the already approved PR #164587, which was closed by
accident. The only changes are a rebase.
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.
The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like
```
...
%ret = call i32 @__kmpc_cancel(...)
%cond = icmp eq i32 %ret, 0
br i1 %cond, label %continue, label %cancel
continue:
// do the rest of the callback, eventually branching to %fini
br label %fini
cancel:
// Populated by the callback:
// unsafe: if any thread makes it to the end without being cancelled
// it won't reach this barrier and then the program will deadlock
%unused = call i32 @__kmpc_cancel_barrier(...)
br label %fini
fini:
// run destructors etc
ret
```
In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.
```
...
%ret = call i32 @__kmpc_cancel(...)
%cond = icmp eq i32 %ret, 0
br i1 %cond, label %continue, label %cancel
continue:
// do the rest of the callback, eventually branching to %fini
br label %fini
cancel:
br label %fini
fini:
// run destructors etc
// safe so long as every exit from the function happens via this block:
%unused = call i32 @__kmpc_cancel_barrier(...)
ret
```
To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.
I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.
---
This re-lands #164586 with a small fix for a failing buildbot running
address sanitizer on clang lit tests.
In the previous version of the patch I added an insertion point guard
"just to be safe" and never removed it. There isn't insertion point
guarding on the other route out of this function and we do not
preserve the insertion point around getFiniBB either so it is not
needed here.
The problem flagged by the sanitizers was because the saved insertion
point pointed to an instruction which was then removed inside the FiniCB
for some clang codegen functions. The instruction was freed when it was
removed. Then accessing it to restore the insertion point was a use
after free bug.
A barrier will pause execution until all threads reach it. If some go to
a different barrier then we deadlock. This manifests in that the
finalization callback must only be run once. Fix by ensuring we always
go through the same finalization block whether the thread in cancelled
or not and no matter which cancellation point causes the cancellation.
The old callback only affected PARALLEL, so it has been moved into the
code generating PARALLEL. For this reason, we don't need similar changes
for other cancellable constructs. We need to create the barrier on the
shared exit from the outlined function instead of only on the cancelled
branch to make sure that threads exiting normally (without cancellation)
meet the same barriers as those which were cancelled. For example,
previously we might have generated code like
```
...
%ret = call i32 @__kmpc_cancel(...)
%cond = icmp eq i32 %ret, 0
br i1 %cond, label %continue, label %cancel
continue:
// do the rest of the callback, eventually branching to %fini
br label %fini
cancel:
// Populated by the callback:
// unsafe: if any thread makes it to the end without being cancelled
// it won't reach this barrier and then the program will deadlock
%unused = call i32 @__kmpc_cancel_barrier(...)
br label %fini
fini:
// run destructors etc
ret
```
In the new version the barrier is moved into fini. I generate it *after*
the destructors because the standard describes the barrier as occurring
after the end of the parallel region.
```
...
%ret = call i32 @__kmpc_cancel(...)
%cond = icmp eq i32 %ret, 0
br i1 %cond, label %continue, label %cancel
continue:
// do the rest of the callback, eventually branching to %fini
br label %fini
cancel:
br label %fini
fini:
// run destructors etc
// safe so long as every exit from the function happens via this block:
%unused = call i32 @__kmpc_cancel_barrier(...)
ret
```
To achieve this, the barrier is now generated alongside the finalization
code instead of in the callback. This is the reason for the changes to
the unit test.
I'm unsure if I should keep the incorrect barrier generation callback
only on the cancellation branch in clang with the OMPIRBuilder backend
because that would match clang's ordinary codegen. Right now I have
opted to remove it entirely because it is a deadlock waiting to happen.
Previously an extra block was created by splitting the previous exit
block. This produced incorrect results when the outlined region
statically never terminated because then there wouldn't be a valid exit
block for the outlined region, this caused this newly added block to
have an incoming edge from outside of the outlining region, which caused
outlining to fail.
So far as I can tell this extra block no longer serves any purpose. The
comment says it is supposed to collate multiple control flow edges into
one place, but the code as it is now does not achieve this. In fact, as
can be seen from the changes to lit tests, this block was not actually
outlined in the end. This is because there are actually two code
extractors: one in the callback for creating a parallel op which is used
to find what the input/output variables are (which does have this block
added to it), and another one which actually does the outlining (which
this block was not added to).
Tested with the gfortran and fujitsu test suites.
Fixes#112884
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.
Regression tests are updated using update scripts where possible, and by
find + replace where not.
This patch prefixes omp outlined helpers and reduction funcs
with the original function's name.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D140722
This patch attempts to prefix omp outlined helpers and reduction funcs
with the original function's name.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D140722
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6, ABataev
Differential Revision: https://reviews.llvm.org/D102107
When F calls G calls H, G is nounwind, and G is inlined into F, then the
inlined call-site to H should be effectively nounwind so as not to lose
information during inlining.
If H itself is nounwind (which often happens when H is an intrinsic), we
no longer mark the callsite explicitly as nounwind. Previously, there
were cases where the inlined call-site of H differs from a pre-existing
call-site of H in F *only* in the explicitly added nounwind attribute,
thus preventing common subexpression elimination.
v2:
- just check CI->doesNotThrow
v3 (resubmit after revert at 344378808778c61d5599f4e0ac783ef7e6f8ed05):
- update Clang tests
Differential Revision: https://reviews.llvm.org/D129860
I used a script to reuse existing check lines rather than creating new
ones. There are more opportunities to reduce the line count but the
"check generated functions" logic makes that somewhat tricky.
FWIW, we really should redo the update script with all these use cases
in mind...
Differential Revision: https://reviews.llvm.org/D128686
The callback is expected to create a branch to the ContinuationBB (sometimes called FiniBB in some lambdas) argument when finishing. This creates problems:
1. The InsertPoint used for CodeGenIP does not need to be the end of a block. If it is not, a naive callback will insert a branch instruction into the middle of the block.
2. The BasicBlock the CodeGenIP is pointing to may or may not have a terminator. There is an conflict where to branch to if the block already has a terminator.
3. Some API functions work only with block having a terminator. Some workarounds have been used to insert a temporary terminator that is removed again.
4. Some callbacks are sensitive to whether the BasicBlock has a terminator or not. This creates a callback ordering problem where different callback may have different behaviour depending on whether a previous callback created a terminator or not. The problem also exists for FinalizeCallbackTy where some callbacks do create branch to another "continue" block, but unlike BodyGenCallbackTy does not receive the target as argument. This is not addressed in this patch.
With this patch, the callback receives an CodeGenIP into a BasicBlock where to insert instructions. If it has to insert control flow, it can split the block at that position as needed but otherwise no separate ContinuationBB is needed. In particular, a callback can be empty without breaking the emitted IR. If the caller needs the control flow to branch to a specific target, it can insert the branch instruction itself and pass an InsertPoint before the terminator to the callback.
Certain frontends such as Clang may expect the current IRBuilder position to be at the end of a basic block. In this case its callbacks must split the block at CodeGenIP before setting the IRBuilder position such that the instructions after CodeGenIP are moved to another basic block and before returning create a new branch instruction to the split block.
Some utility functions such as `splitBB` are supporting correct splitting of BasicBlocks, independent of whether they have a terminator or not, returning/setting the InsertPoint of an IRBuilder to the end of split predecessor block, and optionally omitting creating a branch to the split successor block to be added later.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D118409
This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.
The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.
Differential Revision: https://reviews.llvm.org/D123115
Add applyStaticChunkedWorkshareLoop method implementing static schedule when chunk-size is specified. Unlike a static schedule without chunk-size (where chunk-size is chosen by the runtime such that each thread receives one chunk), we need two nested loops: one for looping over the iterations of a chunk, and a second for looping over all chunks assigned to the threads.
This patch includes the following related changes:
* Adapt applyWorkshareLoop to triage between the schedule types, now possible since all schedules have been implemented. The default schedule is assumed to be non-chunked static, as without OpenMPIRBuilder.
* Remove the chunk parameter from applyStaticWorkshareLoop, it is ignored by the runtime. Change the value for the value passed to the init function to 0, as without OpenMPIRBuilder.
* Refactor CanonicalLoopInfo::setTripCount and CanonicalLoopInfo::mapIndVar as used by both, applyStaticWorkshareLoop and applyStaticChunkedWorkshareLoop.
* Enable Clang to use the OpenMPIRBuilder in the presence of the schedule clause.
Differential Revision: https://reviews.llvm.org/D114413
Summary:
This patch modifies code generation in OpenMPIRBuilder to pass arguments
to the parallel region outlined function in an aggregate (struct),
besides the global_tid and bound_tid arguments. It depends on the
updated CodeExtractor (see D96854) for support. It mirrors functionality
of Clang codegen (see D102107).
Differential Revision: https://reviews.llvm.org/D110114
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
Fix for the case when there are no instructions in the entry basic block before the call
to `createSections`
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D114143
This reverts commit aacfbb953eb705af2ecfeb95a6262818fa85dd92.
Revert "Fix lit test failures in CodeGenCoroutines"
This reverts commit 63fff0f5bffe20fa2c84a45a41161afa0043cb34.
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
Resolve lit failures in clang after 8ca4b3e's land
Fix lit test failures in clang-ppc* and clang-x64-windows-msvc
Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc
Fix internal_clone(aarch64) inline assembly
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
This reverts the following commits:
37ca7a795b277c20c02a218bf44052278c03344b
9aa6c72b92b6c89cc6d23b693257df9af7de2d15
705387c5074bcca36d626882462ebbc2bcc3bed4
8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515
80dba72a669b5416e97a42fd2c2a7bc5a6d3f44a
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6
Differential Revision: https://reviews.llvm.org/D102107
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102107
This patch adds a module level metadata flag indicating that the module
was compiled with the `-fopenmp` flag. This will make it easier for
passes like OpenMPOpt to determine if it should be run.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102361
Need to emit a call for __kmpc_cancel_barrier in the exit block for
__kmpc_cancel function call if cancellation of the parallel block is
requested.
Differential Revision: https://reviews.llvm.org/D103646
If a test does not contain an " simd" but -fopenmp-simd RUN lines we can
just check that we do not create __kmpc|__tgt calls.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D101973
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101849
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101849
This patch is child of D89671, contains the clang
implementation to use the OpenMP IRBuilder's section
construct.
Co-author: @anchu-rajendran
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91054
Many OpenMP Clang tests do not RUN for version 4.5 and the default
version. This first patch in the series only handles test cases
which do not require any modifications in the CHECK lines after
adding RUN lines for default version.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84844
When we use the OpenMPIRBuilder for the parallel region we need to also
use it to get the thread ID (among other things) in the body. This is
because CGOpenMPRuntime::getThreadID() and
CGOpenMPRuntime::emitUpdateLocation implicitly assumes that if they are
called from within a parallel region there is a certain structure to the
code and certain members of the OMPRegionInfo are initialized. It might
make sense to initialize them even if we use the OpenMPIRBuilder but we
would preferably get rid of such state instead.
Bug reported by Anchu Rajendran Sudhakumari.
Depends on D82470.
Reviewed By: anchu-rajendran
Differential Revision: https://reviews.llvm.org/D82822
During code generation we might change/add basic blocks so keeping a
list of them is fairly easy to break. Nested parallel regions were
enough. The new scheme does recompute the list of blocks to be outlined
once it is needed.
Reviewed By: anchu-rajendran
Differential Revision: https://reviews.llvm.org/D82722
This patch introduces a new helper class `OMPBuilderCBHelpers`,
which will contain all reusable C/C++ language specific function-
alities required by the `OMPIRBuilder`.
Initially, this helper class contains the body and finalization
codegen functionalities implemented using callbacks which were
moved here for reusability among the different directives
implemented in the `OMPIRBuilder`, along with RAIIs for preserving
state prior to emitting outlined and/or inlined OpenMP regions.
In the future this helper class will also contain all the different
call backs required by OpenMP clauses/variable privatization.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D74562
Finalization can introduce new blocks we need to outline as well so it
makes sense to identify the blocks that need to be outlined after
finalization happened. There was also a minor unit test adjustment to
account for the fact that we have a single outlined exit block now.
This allows to use the OpenMPIRBuilder for parallel regions. Code was
extracted from D61953 and adapted to work with the new version (D70109).
All but one feature should be supported. An update of this patch will
provide test coverage and privatization other than shared.
Reviewed By: fghanim
Differential Revision: https://reviews.llvm.org/D70290