41 Commits

Author SHA1 Message Date
Nikita Popov
aa8e9fac2a [OpenMP] Convert some tests to opaque pointers (NFC) 2023-01-03 15:03:14 +01:00
Johannes Doerfert
90609fb68f [OpenMP][NFCI] Remove effectively dead code in clang and the runtime
Differential Revision: https://reviews.llvm.org/D136903
2022-12-13 18:44:19 -08:00
Johannes Doerfert
f9c29878b0 Revert "[OpenMP][NFCI] Remove effectively dead code in clang and the runtime"
This reverts commit c1c8cbbf5f29257d084a23a2f6c4236c40b7afb9. One of the
tests seems to be flaky/non-deterministic.
2022-12-12 22:08:28 -08:00
Johannes Doerfert
c1c8cbbf5f [OpenMP][NFCI] Remove effectively dead code in clang and the runtime 2022-12-12 20:55:36 -08:00
Nikita Popov
304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Johannes Rudolf Doerfert
41a278f56a [OpenMP][FIX] Do not add custom state machine eagerly in LTO runs
If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.

Differential Revision: https://reviews.llvm.org/D136740
2022-10-26 10:40:11 -07:00
Doru Bercea
c9adeca501 Move allocas converted from __kmpc_alloc_shared to entry block. 2022-09-27 17:16:58 +00:00
Johannes Doerfert
c922cac868 Revert "[Attributor] AAPointerInfo should allow "harmless" uses"
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"

This reverts commit 844f6c5d03d58e7ac0c6b838e4a7834ac575ab9b and
4ed0a88cd8a77370073feb270d77a9e8b27bd68c as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
2022-09-11 21:37:54 -07:00
Johannes Doerfert
4ed0a88cd8 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-09-11 20:16:11 -07:00
Johannes Doerfert
21711039e3 [OpenMP] Allow the Attributor to look at functions we also internalized
This is important as we have accesses to globals in those which we need to
categorize.
2022-09-11 20:16:11 -07:00
Augie Fackler
4fea8ee540 OpenMP: mark allocptr attribute on __kmpc_free_shared
Differential Revision: https://reviews.llvm.org/D124491
2022-09-09 14:09:18 -04:00
Johannes Doerfert
bf789b1957 [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.

Fixes: https://github.com/llvm/llvm-project/issues/54981

Note: A previous version was flawed and consequently reverted in
      6555558a80589d1c5a1154b92cc3af9495f8f86c.
2022-07-19 16:24:42 -05:00
Johannes Doerfert
f6e0c05e3d Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues"
This reverts commit f17639ea0cd30f52ac853ba2eb25518426cc3bb8 as three
AMDGPU tests haven't been updated. Will need to verify the changes are
not regressions we should avoid.
2022-07-08 00:53:38 -05:00
Johannes Doerfert
f17639ea0c [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.

Fixes: https://github.com/llvm/llvm-project/issues/54981

Note: A previous version was flawed and consequently reverted in
      6555558a80589d1c5a1154b92cc3af9495f8f86c.
2022-07-08 00:38:27 -05:00
Johannes Doerfert
6555558a80 Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues"
This reverts commit da50dab1ae111e9e6cb0248a47a038b17f798705.

Patch broke AMD GPU OpenMP offload buildbots.
https://lab.llvm.org/buildbot/#/builders/193/builds/13246
2022-06-09 17:04:01 +02:00
Johannes Doerfert
da50dab1ae [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good.

Fixes: https://github.com/llvm/llvm-project/issues/54981
2022-06-09 16:48:53 +02:00
Augie Fackler
e90bce8f91 CallBase: fix getFnAttr so it also checks the function
Prior to this change, CallBase::hasFnAttr checked the called function to
see if it had an attribute if it wasn't set on the CallBase, but
getFnAttr didn't do the same delegation, which led to very confusing
behavior. This patch fixes the issue by making CallBase::getFnAttr also
check the function under the same circumstances.

Test changes look (to me) like they're cleaning up redundant attributes
which no longer get specified both on the callee and call. We also clean
up the one ad-hoc implementation of this getter over in InlineCost.cpp.

Differential Revision: https://reviews.llvm.org/D122821
2022-04-03 23:19:23 -04:00
Johannes Doerfert
e92891f864 [Attributor] Allow not to default initialize AAs for live internal functions
Outside users of the Attributor, e.g., OpenMP-opt, want to seed AAs
themselves. We should not seed all default AAs one an internal function
becomes live. That said, there should be a callback such that they can
do lazy seeding as well.

Differential Revision: https://reviews.llvm.org/D121489
2022-03-11 16:46:03 -06:00
Johannes Doerfert
d1387a26a5 [Attributor][FIX] Reachability needs to account for readonly callees
The oversight caused us to ignore call sites that are effectively dead
when we computed reachability (or more precise the call edges of a
function). The problem is that loads in the readonly callee might depend
on stores prior to the callee. If we do not track the call edge we
mistakenly assumed the store before the call cannot reach the load.
The problem is nicely visible in:
  `llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll`

Caused by D118673.

Fixes https://github.com/llvm/llvm-project/issues/53726
2022-02-10 13:52:24 -06:00
Joseph Huber
6b78526b1b [OpenMP] Emit remark on the captured call instead of the variable
Changes the remark to emit on the function call that captures the globalized
variable instead of the globalized variable itself. The user should be able to
see which variable it was in the argument list of the function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106980
2022-02-04 17:50:53 -05:00
Johannes Doerfert
ac3ec22df9 [Attributor] Use AAFunctionReachability to determine AANoRecurse
We missed out on AANoRecurse in the module pass because we had no call
graph. With AAFunctionReachability we can simply ask if the function may
reach itself.

Differential Revision: https://reviews.llvm.org/D110099
2022-02-01 01:40:44 -06:00
Johannes Doerfert
944aa0421c Reapply "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 73ece231ee0cf048d56841f47915beb1db6afc26 and
reapplies 7bfcdbcbf368cea14a5236080af975d5878a46eb with mlir changes.
Also reverts commit 423ba12971bac8397c87fcf975ba6a4b7530ed28 and
includes the unit test changes of
16da2140045808b2aea1d28366ca7d326eb3c809.
2021-12-29 01:10:38 -06:00
Mehdi Amini
73ece231ee Revert "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 7bfcdbcbf368cea14a5236080af975d5878a46eb.
Broke MLIR build
2021-12-29 06:57:36 +00:00
Johannes Doerfert
7bfcdbcbf3 [OpenMP][NFCI] Embed the source location string size in the ident_t
One of the unused ident_t fields now holds the size of the string
(=const char *) field so we have an easier time dealing with those
in the future.

Differential Revision: https://reviews.llvm.org/D113126
2021-12-28 23:53:29 -06:00
Joseph Huber
f074a6a041 [OpenMP] Add options to change Attributor max iterations in OpenMPOpt
This patch adds a new command line option `openmp-opt-max-iterations`
that controls the maximum number of iterations the attributor will run
for when compiling OpenMP target device code. This patch also adds a
remark to indicate when the attributor failed because it did not run
for enough iterations.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110749
2021-10-04 09:39:04 -04:00
Shilei Tian
423d34f74a [OpenMP][Offloading] Change bool IsSPMD to int8_t Mode in __kmpc_target_init and __kmpc_target_deinit
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Joseph Huber
fec2927e07 [OpenMP] Add NoSync attributes to alloc / free shared RTL calls
This patch adds the `nosync` attribute to the `__kmpc_alloc_shared` and
`__kmpc_free_shared` runtime library calls. This allows code analysis to
know that these functins dont contain any barriers. This will help
optimizations reason about the CFG of blocks containing these calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109995
2021-09-17 19:50:13 -04:00
Giorgis Georgakoudis
29a3e3dd7b [OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106892
2021-08-04 11:49:24 -07:00
Joseph Huber
cd0dd8ece8 [OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding
This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following:
 - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared.
 - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode.
 - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall.
 - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106802
2021-07-29 19:28:31 -04:00
Joseph Huber
754eb1c210 [OpenMP] Change __kmpc_free_shared to include the paired allocation size
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106496
2021-07-21 20:56:21 -04:00
Joseph Huber
eef6601b0f [OpenMP] Rework OpenMP remarks
This patch rewrites and reworks a few of the existing remarks to make the mmore
concise and consistent prior to writing the documentation for them.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105898
2021-07-16 14:07:00 -04:00
Johannes Doerfert
d9659bf6a0 [OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state
machine for a generic target region based on the potentially called
parallel regions.

The code analysis is done interprocedurally via an abstract attribute
(AAKernelInfo). All outermost parallel regions are collected and we
check if there might be unknown outermost parallel regions for which
we need an indirect call. Other AAKernelInfo extensions are expected.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D101977
2021-07-10 17:57:08 -05:00
Johannes Doerfert
c1c1fe9385 [Attributor] Reorganize AAHeapToStack
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.

This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.

Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.

Differential Revision: https://reviews.llvm.org/D104993
2021-07-10 16:32:24 -05:00
Nico Weber
d3e7491333 Revert Attributor patch series
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
2021-07-10 16:15:55 -04:00
Johannes Doerfert
f0628c6ff7 [OpenMP] Create custom state machines for generic target regions
In the spirit of TRegions [0], this patch creates a custom state
machine for a generic target region based on the potentially called
parallel regions.

The code analysis is done interprocedurally via an abstract attribute
(AAKernelInfo). All outermost parallel regions are collected and we
check if there might be unknown outermost parallel regions for which
we need an indirect call. Other AAKernelInfo extensions are expected.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

Differential Revision: https://reviews.llvm.org/D101977
2021-07-10 12:32:50 -05:00
Johannes Doerfert
1eb31d6de3 [Attributor] Reorganize AAHeapToStack
In order to simplify future extensions, e.g., the merge of
AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the
state we keep for each malloc-like call. The result is also less
confusing as we only track malloc-like calls, not all calls. Further, we
only perform the updates necessary for a malloc-like to argue it can go
to the stack, e.g., we won't check all uses if we moved on to the
"must-be-freed" argument.

This patch also uses Attributor helps to simplify the allocated size,
alignment, and the potentially freed objects.

Overall, this is mostly a reorganization and only the use of the
optimistic helpers should change (=improve) the capabilities a bit.

Differential Revision: https://reviews.llvm.org/D104993
2021-07-10 12:32:50 -05:00
Joseph Huber
0edb87773b [OpenMP] Add additional remarks for OpenMPOpt
This patch adds additional remarks, suggesting the use of `noescape` for failed
globalization and indicating when internalization failed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105150
2021-06-30 09:49:25 -04:00
Joseph Huber
57ad2e1067 [OpenMP] Prevent OpenMPOpt from internalizing uncalled functions
Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary.

Depends on D102423

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104890
2021-06-28 16:47:53 -04:00
Joseph Huber
5ccb7424fa [OpenMP] Change OpenMPOpt to check openmp metadata
The metadata added in D102361 introduces a module flag that we can check
to determine if the module was compiled with `-fopenmp` enables. We can
now check for the precense of this instead of scanning the call graph
for OpenMP runtime functions.

Depends on D102361

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102423
2021-06-25 16:34:22 -04:00
Joseph Huber
30e36c9b3c [Attributor] Add interface to emit remarks in Attributor
Summary:
This patch adds support for the Attributor to emit remarks on behalf of some
other pass. The attributor can now optionally take a callback function that
returns an OptimizationRemarkEmitter object when given a Function pointer. If
this is availible then a remark will be emitted for the corresponding pass
name.

Depends on D102197

Reviewed By: sstefan1 thegameg

Differential Revision: https://reviews.llvm.org/D102444
2021-06-22 14:12:46 -04:00
Joseph Huber
7d69da71dd [OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls
Summary:
The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087.

Depends on D97818

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102197
2021-06-22 13:23:05 -04:00