55 Commits

Author SHA1 Message Date
Sirraide
c44fa3e8a9
[Clang] Refactor __attribute__((assume)) (#84934)
This is a followup to #81014 and #84582: Before this patch, Clang 
would accept `__attribute__((assume))` and `[[clang::assume]]` as 
nonstandard spellings for the `[[omp::assume]]` attribute; this 
resulted in a potentially very confusing name clash with C++23’s 
`[[assume]]` attribute (and GCC’s `assume` attribute with the same
semantics).

This pr replaces every usage of `__attribute__((assume))`  with 
`[[omp::assume]]` and makes `__attribute__((assume))` and 
`[[clang::assume]]` alternative spellings for C++23’s `[[assume]]`; 
this shouldn’t cause any problems due to differences in appertainment
and because almost no-one was using this variant spelling to begin
with (a use in libclc has already been changed to use a different
attribute).
2024-05-22 17:58:48 +02:00
Johannes Doerfert
cd3a4c31bc
[Attributor][NFC] update tests (#91011) 2024-05-03 16:38:55 -07:00
Youngsuk Kim
c57ef2c698
[llvm][OpenMPOpt] Remove no-op ptr-to-ptr bitcast (NFC) (#73869)
* Remove a call to CreatePointerBitCastOrAddrSpaceCast which merely adds
a no-op ptr-to-ptr bitcast.

* Most of the diff is from removing checks for no-op ptr-to-ptr bitcasts
in relevant LIT tests
2023-11-29 20:47:37 -05:00
Johannes Doerfert
3de645efe3 [OpenMP][NFC] Split the reduction buffer size into two components
Before we tracked the size of the teams reduction buffer in order to
allocate it at runtime per kernel launch. This patch splits the number
into two parts, the size of the reduction data (=all reduction
variables) and the (maximal) length of the buffer. This will allow us to
allocate less if we need less, e.g., if we have less teams than the
maximal length. It also allows us to move code from clangs codegen into
the runtime as we now know how large the reduction data is.
2023-11-06 11:50:41 -08:00
Johannes Doerfert
b8cbc5c02c
[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401)
The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the runtime. The
KernelLaunchEnvironment is for dynamic information *per* kernel launch.
It allows the rutime to feed information to the kernel that is not
shared with other invocations of the kernel. The first use case is to
replace the globals that synchronize teams reductions with per-launch
versions. This allows concurrent teams reductions. More uses cases will
follow, e.g., per launch memory pools.

Fixes: https://github.com/llvm/llvm-project/issues/70249
2023-10-31 19:38:43 -07:00
Mehdi Amini
f390a76b7e Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)""
This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab.

Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.
2023-10-26 17:30:01 -07:00
Mehdi Amini
ddbaa11e9f Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)"
This reverts commit c2a1249a8257ed033a98e32e425539c6da6700ec.

The MLIR bots are broken with an omp test failure.
2023-10-26 17:25:20 -07:00
Johannes Doerfert
c2a1249a82
[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)
The runtime needs to know about the acceptable launch bounds, especially
if the compiler (middle- or backend) assumed those bounds. While this
patch does not yet inform the runtime, it stores the bounds in a place
that can/will be accessed and is associated with the kernel.
2023-10-26 14:46:55 -07:00
Johannes Doerfert
4fcd5f93d6 [OpenMPOpt] Mark more runtime functions as SPMD compatible
Fixes: https://github.com/llvm/llvm-project/issues/64421
2023-08-17 18:33:24 -07:00
Shilei Tian
10068cd654 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-26 13:35:14 -04:00
Shilei Tian
6bd74fd65f Revert commits for kernel environment
This reverts commits for kernel environments as they causes issues in AMD BB.
2023-07-23 23:32:31 -04:00
Shilei Tian
c5c8040390 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-23 18:36:01 -04:00
Johannes Doerfert
d33bca840a [Attributor] Introduce helpers to judge AAs prior to creation
This is a partial cleanup to centralize the initialization and update
decisions for AAs. Lifting the burdon and boilerplate on users and
making it harder to accidentally perform unsound deductions.

The two static helpers show how we can lift the decisions to generate an
AA into the Attributor, avoiding trivial AAs that just cost us compile
time and maintenance code (to check for pre-conditions).
2023-06-29 12:32:45 -07:00
Johannes Doerfert
732bdb6073 [Attributor] Avoid the type check in getCalledFunction
We now consistently use `CallBase::getCalledOperand` rather than
`getCalledFunction`, as we do not want the type checked performed by the
latter. This exposed various missing checks to handle mismatches
properly, but it is good to have them explicit now.

In a follow up we might want to flag certain calls as UB, but for now,
we allow everything to cut down on unexpected differences.
2023-06-23 20:10:12 -07:00
Johannes Doerfert
8f4fadd1b4 [OpenMP] Use "kernel" attribute consistently 2023-06-05 16:33:53 -07:00
Shilei Tian
d4ecd1241c Revert "[OpenMP] Introduce kernel environment"
This reverts commit 35cfadfbe2decd9633560b3046fa6c17523b2fa9.

It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
2023-04-22 20:56:35 -04:00
Shilei Tian
35cfadfbe2 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-04-22 20:46:38 -04:00
Ishaan Gandhi
aead502b11 [Attributor] Add convergent abstract attribute
This patch adds the AANonConvergent abstract attribute. It removes the
convergent attribute from functions that only call non-convergent
functions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D143228
2023-03-20 22:33:50 -07:00
Nikita Popov
aa8e9fac2a [OpenMP] Convert some tests to opaque pointers (NFC) 2023-01-03 15:03:14 +01:00
Johannes Doerfert
90609fb68f [OpenMP][NFCI] Remove effectively dead code in clang and the runtime
Differential Revision: https://reviews.llvm.org/D136903
2022-12-13 18:44:19 -08:00
Johannes Doerfert
f9c29878b0 Revert "[OpenMP][NFCI] Remove effectively dead code in clang and the runtime"
This reverts commit c1c8cbbf5f29257d084a23a2f6c4236c40b7afb9. One of the
tests seems to be flaky/non-deterministic.
2022-12-12 22:08:28 -08:00
Johannes Doerfert
c1c8cbbf5f [OpenMP][NFCI] Remove effectively dead code in clang and the runtime 2022-12-12 20:55:36 -08:00
Johannes Rudolf Doerfert
41a278f56a [OpenMP][FIX] Do not add custom state machine eagerly in LTO runs
If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.

Differential Revision: https://reviews.llvm.org/D136740
2022-10-26 10:40:11 -07:00
Johannes Doerfert
857bf306d7 [Attributor] Remove broken and duplicated load simplification
We look through loads in the "generic value traversal" and we
consequently don't need to look through them again in AAValueSimplify*.
The test changes stem from the fact that we allowed any simplified
value, incl. non-dynamically unique ones, as long as the underlying
memory was an alloca. This doesn't seem to make sense as allocas do not
protect against dynamically non-unique values. We need to make the
unique check better rather than excluding allocas. That in mind, we can
remove a lot of code by simply relying on the generic value traversal
load look through.

To soften the blow some minor adjustments have been made that allow more
simplification through the now used scheme and some tests have been
given a `norecurse` for now.
2022-04-05 20:49:03 -05:00
Johannes Doerfert
e92891f864 [Attributor] Allow not to default initialize AAs for live internal functions
Outside users of the Attributor, e.g., OpenMP-opt, want to seed AAs
themselves. We should not seed all default AAs one an internal function
becomes live. That said, there should be a callback such that they can
do lazy seeding as well.

Differential Revision: https://reviews.llvm.org/D121489
2022-03-11 16:46:03 -06:00
Johannes Doerfert
5af11ec34b [Attributor] Determine potentially loaded values through memory
We already look through memory to determine where a value that is stored
might pop up again (potential copies). This patch introduces the other
direction with similar logic. If a value is loaded, we can follow all
the accesses to the pointer (or better object) and try to determine what
value might have been stored.
2022-03-06 23:26:37 -06:00
Johannes Doerfert
f9c2d6005e [OpenMP][FIX] Ensure custom state machine works
The custom state machine had a check for surplus threads that filtered
the main thread if the kernel was executed by a single warp only. We
now first check for the main thread, then for surplus threads, avoiding
to filter the former out.

Fixes #54214.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D121011
2022-03-04 13:51:19 -05:00
Johannes Doerfert
6ed1ef0643 [Attributor][FIX] Pipe UsedAssumedInformation through more interfaces
`UsedAssumedInformation` is a return argument utilized to determine what
information is known. Most APIs used it already but
`genericValueTraversal` did not. This adds it to `genericValueTraversal`
and replaces `AllCallSitesKnown` of `checkForAllCallSites` with the
commonly used `UsedAssumedInformation`.

This was supposed to be a NFC commit, then the test change appeared.
Turns out, we had one user of `AllCallSitesKnown` (AANoReturn) and the
way we set `AllCallSitesKnown` was wrong as we ignored the fact some
call sites were optimistically assumed dead. Included a dedicated test
for this as well now.

Fixes https://github.com/llvm/llvm-project/issues/53884
2022-02-16 14:44:20 -06:00
Johannes Doerfert
ac3ec22df9 [Attributor] Use AAFunctionReachability to determine AANoRecurse
We missed out on AANoRecurse in the module pass because we had no call
graph. With AAFunctionReachability we can simply ask if the function may
reach itself.

Differential Revision: https://reviews.llvm.org/D110099
2022-02-01 01:40:44 -06:00
Johannes Doerfert
a1db0e523d [Attributor][FIX] Liveness handling in the isAssumedDead helpers
This fixes a conceptual problem with our AAIsDead usage which conflated
call site liveness with call site return value liveness. Without the
fix tests would obviously miscompile as we make genericValueTraversal
more powerful (in a follow up). The effects on the tests are mixed but
mostly marginal. The most prominent one is the lack of `noreturn` for
functions. The reason is that we make entire blocks live at the same
time (for time reasons). Now that we actually look at the block
liveness, which we need to do, the return instructions are live and
will survive. As an example,  `noreturn_async.ll` has been modified
to retain the `noreturn` even with block granularity. We could address
this easily but there is little need in practice.
2022-02-01 01:18:52 -06:00
Johannes Doerfert
3e0c512ce6 [OpenMP] Simplify all stores in the device code
Similar to loads, we want to be aggressive when it comes to store
simplification. Not everything in LLVM handles dead stores well when
address space casts are involved, we can simply ask the Attributor to do
it for us though.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D109998
2021-12-29 00:19:38 -06:00
Joel E. Denny
c9dfe322ee [OpenMP] Fix main thread barrier for Pascal and amdgpu
Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113602
2021-11-12 11:18:45 -05:00
Johannes Doerfert
73720c8059 [OpenMP][FIX] Introduce and use a simple generic-mode barrier
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers are
assumed to be aligned we need to use a "generic" barrier in places
that are not aligned.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112893
2021-11-02 23:22:01 -05:00
Shilei Tian
423d34f74a [OpenMP][Offloading] Change bool IsSPMD to int8_t Mode in __kmpc_target_init and __kmpc_target_deinit
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110279
2021-09-22 17:16:41 -04:00
Jon Chesterfield
6775ad2025 [openmp] Apply test change from D109500 2021-09-13 18:36:29 +01:00
Jon Chesterfield
bfcf979978 Revert "[openmp] Fix 51647, corrupt bitcode on amdgpu"
This reverts commit d5c049a3f68791490e8c87d9974e9831ede0ada0.
Going to re-commit it in pieces for easier application to 13
2021-09-13 18:25:07 +01:00
dpalermo
d5c049a3f6 [openmp] Fix 51647, corrupt bitcode on amdgpu
Patch by @dpalermo

The corrupt bitcode reported in https://bugs.llvm.org/show_bug.cgi?id=51647 seems to be a result of a later pass changing the workfn variable to addrspace(5) (thread private, on the stack). That seems reasonable for an alloca without an address space so it's an open question why that can crash the bitcode reader.

This change puts it in the thread private address space to begin with which means whatever misfired further down the pipeline does not break it. That matches the codegen from clang where stack variables are always annotated (5) and then addrspace cast prior to following use.

This therefore patches around whatever unsuccessfully moved the alloca variable to addrspace(5). That solves the problem of openmp opt producing code that crashes the bitcode reader. It should be possible to create a minimal repro for the underlying bug based on some handwritten IR that uses an alloca in a generic address space.

Reviewed By: ronlieb, jdoerfert, dpalermo-phab

Differential Revision: https://reviews.llvm.org/D109500
2021-09-13 15:24:48 +01:00
Giorgis Georgakoudis
29a3e3dd7b [OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106892
2021-08-04 11:49:24 -07:00
Joseph Huber
cd0dd8ece8 [OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding
This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following:
 - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared.
 - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode.
 - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall.
 - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106802
2021-07-29 19:28:31 -04:00
Joseph Huber
adbaa39dfc [Attributor] Change function internalization to not replace uses in internalized callers
The current implementation of function internalization creats a copy of each
function and replaces every use. This has the downside that the external
versions of the functions will call into the internalized versions of the
functions. This prevents them from being fully independent of eachother. This
patch replaces the current internalization scheme with a method that creates
all the copies of the functions intended to be internalized first and then
replaces the uses as long as their caller is not already internalized.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106931
2021-07-28 18:57:28 -04:00
Johannes Doerfert
41bd26dff9 [Attributor] Delete dead stores
D106185 allows us to determine if a store is needed easily. Using that
knowledge we can start to delete dead stores.

In AAIsDead we now track more state as an instruction can be dead (= the
old optimisitc state) or just "removable". A store instruction can be
removable while being very much alive, e.g., if it stores a constant
into an alloca or internal global. If we would pretend it was dead
instead of only removablewe we would ignore it when we determine what
values a load can see, so that is not what we want.

Differential Revision: https://reviews.llvm.org/D106188
2021-07-26 23:33:36 -05:00
Johannes Doerfert
adddd3dbda [Attributor] Introduce getPotentialCopiesOfStoredValue and use it
This patch introduces `getPotentialCopiesOfStoredValue` which uses
AAPointerInfo to determine all "aliases" or "potential copies" of a
value that is stored into memory. This operation can fail but if it
succeeds it means we can visit all "uses" of a value even if it is
temporarily stored in memory.

There are two users for the function:
  1) `Attributor::checkForAllUses` which will now ignore the value use
     in a store if all "potential copies" can be identified and instead
     be visited. This allows various AAs, including AAPointerInfo
     itself, to look through memory.
  2) `AANoCapture` which uses a custom use tracking through the
     CaptureTracker interface and therefore needs to be thought
     explicitly.

Differential Revision: https://reviews.llvm.org/D106185
2021-07-26 23:33:36 -05:00
Johannes Doerfert
be2b569646 [OpenMP] Run rewriteDeviceCodeStateMachine in the Module not CGSCC pass
While rewriteDeviceCodeStateMachine should probably be folded into
buildCustomStateMachine, we at least need the optimization to happen.
This was not reliably the case in the CGSCC pass but in the Module pass
it seems to work reliably.

This also ports a test to the new kernel encoding (target_init/deinit),
and makes sure we cannot run the kernel in SPMD mode.

Differential Revision: https://reviews.llvm.org/D106345
2021-07-26 21:26:07 -05:00
Giorgis Georgakoudis
e8439ec893 [OpenMP] Set RequiresFullRuntime false in SPMDization
SPMDization in D102307 does not change the RequiresFullRuntime argument of kmpc_target_init/deinit calls. However, the constraints of SPMDization detection for converting a target region to SPMD mode should guarantee that the region does not require full runtime support. Hence, this patch sets RequiresFullRuntime to false for improved execution performance.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105556
2021-07-20 09:54:51 -07:00
Johannes Doerfert
205c520387 [OpenMP] Remove XFAIL and update check lines properly
Undo 15c5701c8324d2dea519fa379c04d5c619a570ab and update check lines.
2021-07-20 00:35:13 -05:00
Johannes Doerfert
15c5701c83 [OpenMP][FIX] Temporarily XFAIL tests waiting for new check lines
The test is not wrong nor is the current main broken, it just an
interplay issue. Check lines will be updated in shortly.
2021-07-19 23:14:35 -05:00
Johannes Doerfert
c2281f1565 [Attributor] Introduce AAPointerInfo
This patch introduces AAPointerInfo which tracks the uses of a pointer
and places them in "bins" based on their offset from the base and access
size.

As with other AAs, any pointer can be tracked but it is up to the user
to make sense of the results. The user in this patch is AAValueSimplify
and AAPotentialValues which both utilize AAPointerInfo to determine the
value of a load. For now, this is restricted to loads of allocas and
internal globals. Through the use of AAPointerInfo and the "bins" we can
track struct members separately. The users also know that storing only
zeros (at unknown indices) will result in loading only 0 (from unknown
indices). Other than that, the users are flow and context insensitive
(for now).

To deal with the "bins" more easily, AAPointerInfo provides a
forallInterfearingAccesses that applies a callback on all accesses
that might interfere with a given load or store.

Differential Revision: https://reviews.llvm.org/D104432
2021-07-19 22:48:35 -05:00
Shilei Tian
ca662297d5 [AbstractAttributor] Fold function calls to __kmpc_is_spmd_exec_mode if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.

In the future we will support more function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105787
2021-07-15 18:23:23 -04:00
Shilei Tian
a70ef3f568 Revert "[AbstractAttributor] Fold function calls to __kmpc_is_spmd_exec_mode if possible"
This reverts commit 1100e4aafea233bc8bbc307c5758a7d287ad3bae.
2021-07-15 11:19:28 -04:00
Shilei Tian
1100e4aafe [AbstractAttributor] Fold function calls to __kmpc_is_spmd_exec_mode if possible
In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode`
to query the execution mode of current kernels. In many cases, user programs
only contain target region executing in one mode. As a consequence, those runtime
function calls will only return one value. If we can get rid of these function
calls during compliation, it can potentially improve performance.

In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for
each kernel (device) function `F`, we collect all kernel entries `K` that can
reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each
iteration, it will check all reaching kernel entries, and update the folded value
accordingly.

In the future we will support more function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105787
2021-07-13 22:28:35 -04:00