This is a follow-up fix for commit 0f5e9bee.
Only write effects to thread-local memory should be considered safe to
parallelize in workshare lowering, not reads. When both reads and writes
were safe, the cascading effect in moveToSingle could cause entire
SingleRegions to become fully parallelized, eliminating the omp.single
and its implicit barrier. This removed synchronization points needed to
keep threads coordinated inside sequential loops containing workshared
operations, causing race conditions in forall-workshare patterns.
This was exposed by the Fujitsu Test Suite and made the following tests
regress:
FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0031.test
FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0013.test
FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0030.test
FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0014.test
Updates #143330
This is a fix for two problems that caused a crash:
1. Thread-local variables sometimes are required to be parallelized.
Added a special case to handle this in
`LowerWorkshare.cpp:isSafeToParallelize`.
2. Race condition caused by a `nowait` added to the `omp.workshare` if
it is the last operation in a block. This allowed multiple threads to
execute the `omp.workshare` region concurrently. Since
_FortranAPushValue modifies a shared stack, this concurrent access
causes a crash. Disable the addition of `nowait` and rely on the
implicit barrier at the the of the `omp.workshare` region.
Fixes#143330
When compiling WORKSHARE construct in different compilation units, a
linker error happened, when two equal WORKSHARE constructs with a copy
operation have been compiled:
```
/usr/bin/ld: module2.o: in function `_workshare_copy_f64':
FIRModule:(.text+0x0): multiple definition of `_workshare_copy_f64'; module1.o:FIRModule:(.text+0x0): first defined here
```
Reason is that the generate copy function has the wrong linkage:
```
0000000000000000 T _workshare_copy_f64
```
while it should be
```
0000000000000000 t _workshare_copy_f64
```
Add a new pass that lowers an `omp.workshare` with its binding `omp.workshare.loop_wrapper` loop nests into other OpenMP constructs that can be lowered to LLVM.
More specifically, in order to preserve the sequential execution semantics of the code contained, it wraps portions that needs to be executed on a single thread in `omp.single` blocks, converts code that must be parallelized into `omp.wsloop` nests and inserts the appropriate synchronization.