294 Commits

Author SHA1 Message Date
Johannes Doerfert
cb17c48fdd [Attributor] Identify and remove no-op fences
The logic and implementation follows the removal of no-op barriers. If
the fence is not making updates visible, either to the world or the
current thread, it is not needed. Said differently, the fences we remove
do not establish synchronization (happens-before) edges.
This allows us to eliminate some of the regression caused by:
  https://reviews.llvm.org/D145290
2023-06-05 17:14:00 -07:00
Johannes Doerfert
8f4fadd1b4 [OpenMP] Use "kernel" attribute consistently 2023-06-05 16:33:53 -07:00
Johannes Doerfert
dbbe9b3776 [Attributor] Create AAMustProgress for the mustprogress attribute
Derive the mustprogress attribute based on the willreturn attribute
or the fact that all callers are mustprogress.

Differential Revision: https://reviews.llvm.org/D94740
2023-06-05 16:33:52 -07:00
Johannes Doerfert
787d6bb59f [Attributor][OpenMP-Opt][NFC] Run the update test checks script 2023-05-18 13:27:44 -07:00
Joseph Huber
e494ebf9d0 [OpenMP] Fix incorrect interop type for number of dependencies
The interop types use the number of dependencies in the function
interface. Every other function uses an `i32` to count the number of
dependencies except for the initialization function. This leads to
codegen issues when the rest of the compiler passes in an `i32` that
then creates an invalid call. Fix this to be consistent with the other
uses.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D150156
2023-05-08 21:02:43 -05:00
Krzysztof Drewniak
f0415f2a45 Re-land "[AMDGPU] Define data layout entries for buffers""
Re-land D145441 with data layout upgrade code fixed to not break OpenMP.

This reverts commit 3f2fbe92d0f40bcb46db7636db9ec3f7e7899b27.

Differential Revision: https://reviews.llvm.org/D149776
2023-05-03 19:43:56 +00:00
Krzysztof Drewniak
3f2fbe92d0 Revert "[AMDGPU] Define data layout entries for buffers"
This reverts commit f9c1ede2543b37fabe9f2d8f8fed5073c475d850.

Differential Revision: https://reviews.llvm.org/D149758
2023-05-03 16:11:00 +00:00
Krzysztof Drewniak
f9c1ede254 [AMDGPU] Define data layout entries for buffers
Per discussion at
https://discourse.llvm.org/t/representing-buffer-descriptors-in-the-amdgpu-target-call-for-suggestions/68798,
we define two new address spaces for AMDGCN targets.

The first is address space 7, a non-integral address space (which was
already in the data layout) that has 160-bit pointers (which are
256-bit aligned) and uses a 32-bit offset. These pointers combine a
128-bit buffer descriptor and a 32-bit offset, and will be usable with
normal LLVM operations (load, store, GEP). However, they will be
rewritten out of existence before code generation.

The second of these is address space 8, the address space for "buffer
resources". These will be used to represent the resource arguments to
buffer instructions, and new buffer intrinsics will be defined that
take them instead of <4 x i32> as resource arguments. ptr
addrspace(8). These pointers are 128-bits long (with the same
alignment). They must not be used as the arguments to getelementptr or
otherwise used in address computations, since they can have
arbitrarily complex inherent addressing semantics that can't be
represented in LLVM. Even though, like their address space 7 cousins,
these pointers have deterministic ptrtoint/inttoptr semantics, they
are defined to be non-integral in order to prevent optimizations that
rely on pointers being a [0, [addr_max]] value from applying to them.

Future work includes:
- Defining new buffer intrinsics that take ptr addrspace(8) resources.
- A late rewrite to turn address space 7 operations into buffer
intrinsics and offset computations.

This commit also updates the "fallback address space" for buffer
intrinsics to the buffer resource, and updates the alias analysis
table.

Depends on D143437

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D145441
2023-05-03 15:25:58 +00:00
Shilei Tian
d4ecd1241c Revert "[OpenMP] Introduce kernel environment"
This reverts commit 35cfadfbe2decd9633560b3046fa6c17523b2fa9.

It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
2023-04-22 20:56:35 -04:00
Shilei Tian
35cfadfbe2 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-04-22 20:46:38 -04:00
Joseph Huber
46ee1021d9 [OpenMP] Replace HeapToShared's initial value with poison
There's a desire to move away from `undef` in LLVM. Currently we want to
have the `addressspace(3)` variables use `poison` instead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D147719
2023-04-14 09:39:32 -05:00
Johannes Doerfert
94d14536a9 [OpenMP][FIX] More AAExecutionDomain fixes
We missed certain updates, mostly to call site information, and
dependent AAs did not get recomputed. We also did not properly
distinguish and propagate incoming and outgoing information of call
sites.

The runtime tests passes now, I'll add a proper test for
AAExecutionDomain soon that covers all the cases and ensures we haven't
forgotten more updates. To help unblock some apps, I'll put the fix
first.
2023-03-27 21:36:21 -07:00
Johannes Doerfert
7f7e1749c5 [OpenMP] Be smarter about the insertion point for deduplication
We can use dominance and avoid the special handling of kernels and
prevent inserting code before allocas accidentally (as happend in the
runtime test).
2023-03-27 21:30:23 -07:00
Ishaan Gandhi
aead502b11 [Attributor] Add convergent abstract attribute
This patch adds the AANonConvergent abstract attribute. It removes the
convergent attribute from functions that only call non-convergent
functions.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D143228
2023-03-20 22:33:50 -07:00
Johannes Doerfert
8f47fd05d5 [OpenMPOpt][FIX] Avoid removing barriers in callees
We could be smarter about this, e.g., if the callee has a single call
site, but for now we first avoid the miscompile.
2023-03-20 17:44:24 -07:00
Johannes Doerfert
b89558a2ae [OpenMP][FIX] Properly track and lookup Execution Domains
This is a two part fix. First, we need two Execution Domains (ED) to
track the values of a function. One for incoming values and one for
outgoing values. This was conflated before. Second, at the function
entry we need to look at the incoming information from call sites not
iterate over non-existing predecessors.
2023-03-20 17:44:24 -07:00
Johannes Doerfert
0fc63d4e64 [Attributor][FIX] Ensure loop PHI replacements are dynamically unique
Similar to loads, PHIs can be used to introduce non-dynamically unique
values into the simplification "algorithm". We need to check that PHIs
do not carry such a value from one iteration into the next as can cause
downstream reasoning to fail, e.g., downstream could think a comparison
is equal because the simplified values are equal while they are defined
in different loop iterations. Similarly, instructions in cycles are now
conservatively treated as non-dynamically unique. We could do better but
I'll leave that for the future.

The change in AAUnderlyingObjects allows us to ignore dynamically unique
when we simply look for underlying objects. The user of that AA should
be aware that the result might not be a dynamically unique value.
2023-03-20 17:44:24 -07:00
Matt Arsenault
25a461046e OpenMP: Regenerate test checks 2023-02-16 22:40:15 -04:00
Johannes Doerfert
578d507359 [OpenMP][FIX] Ensure to determine aligned regions properly
There were missing checks in the aligned region code, copy-paste errors
(= usage of the IsReachedFromAlignedBarrierOnly value instead of
IsReachingAlignedBarrierOnly value on the forward pass), and a missing
update of the call state for sync declarations and definitions.

Partially fixes https://github.com/llvm/llvm-project/issues/60425
2023-02-02 02:28:10 -08:00
Joseph Huber
0bdde9dfb9 [OpenMP] Make OpenMPOpt aware of the OpenMP runtime's status
The `OpenMPOpt` pass contains optimizations that generate new calls into
the OpenMP runtime. This causes problems if we are in a state where the
runtime has already been linked statically. Generating these new calls
will result in them never being resolved. We should indicate if we are
in a "post-link" LTO phase and prevent OpenMPOpt from generating new
runtime calls.

Generally, it's not desireable for passes to maintain state about the
context in which they're called. But this is the only reasonable
solution to static linking when we have a pass that generates new
runtime calls.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142646
2023-01-26 13:23:44 -06:00
Johannes Doerfert
5238df7ed5 [Attributor] Allow (inter-procedural) "CFG" reasoning for aligned regions
If an instruction is executed in an aligned region we can ignore
threading effects and use CFG reasoning (dominance and reachability).
This is true because all threads are together in an aligned region and
there cannot be one waiting for a signal at a place not connected via
the control flow.

More dedicated tests will follow.

More details can be found here:
"Co-Designing an OpenMP GPU Runtime and Optimizations for Near-Zero
Overhead Execution", IPDPS 2022,
https://www.osti.gov/servlets/purl/1890094
2023-01-23 22:45:48 -08:00
Johannes Doerfert
fedbc689e1 [Attributor] Check assumptions to improve isAlignedBarrier queries 2023-01-23 20:34:26 -08:00
Johannes Doerfert
129faec711 [OpenMP] Identify non-aligned barriers executed in an aligned context
Even if a barrier does not enforce aligned execution, it will
effectively be like an aligned barrier if it is executed by all threads
in an aligned way. We lack control flow divergence analysis here so we
can only do (basic block) local reasoning for now.
2023-01-22 21:42:07 -08:00
Johannes Doerfert
43c1c59f73 [OpenMP] Merge barrier elimination into AAExecutionDomain
With this patch we track aligned barriers in AAExecutionDomain and also
delete unnecessary barriers there. This allows us to eliminate barriers
across blocks, across functions, and in the presence of complex accesses
that do not force a barrier. Further, we can use the collected
information to enable store-load forwarding in a threaded environment
(follow up patch).

Differential Revision: https://reviews.llvm.org/D140463
2023-01-22 16:34:59 -08:00
Johannes Doerfert
2275e325e4 [OpenMP] Guarding restrictions are required only for guarding
If we do not guard code during SPMDzation, we do not need to check
conditions for successfull guarding. That is, even if some code is
executed in different modes, it does not prevent SPMDzation if there is
no guarded code in there.
2023-01-22 15:53:42 -08:00
Johannes Doerfert
ea3c24932a [OpenMP][FIX] Properly update ParallelLevels tracker 2023-01-22 15:52:45 -08:00
Johannes Doerfert
7bc88cbe5c [OpenMP] Simplify llvm.assume operands in device code 2023-01-22 01:27:41 -08:00
Shilei Tian
bdf30603f2 [LLVM][OpenMP] Correct the function signature of __kmpc_parallel_level
`__kmpc_parallel_level` used to be a function w/o any argument, but in the new
device runtime, it accepts two. This patch simply corrects it in `OMPKinds.def`.
```
uint16_t __kmpc_parallel_level(IdentTy *Loc, uint32_t);
```

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D141655
2023-01-20 09:46:45 -05:00
Jonas Paulsson
dc3875e468 Add parameter extension attributes in various instrumentation passes.
For the targets that have in their ABI the requirement that arguments and
return values are extended to the full register bitwidth, it is important
that calls when built also take care of this detail.

The OMPIRBuilder, AddressSanitizer, GCOVProfiling, MemorySanitizer and
ThreadSanitizer passes are with this patch hopefully now doing this properly.

Reviewed By: Eli Friedman, Ulrich Weigand, Johannes Doerfert

Differential Revision: https://reviews.llvm.org/D133949
2023-01-18 18:29:12 -06:00
Johannes Doerfert
27944bbbe7 [Attributor][FIX] Avoid deleting (internal) library functions
In CGSCC mode we cannot delete internal library functions, esp.
__kmpc_alloc_shared, or we trigger an assertion. While the assertion is
probably too narrow, we avoid deleting those unused functions for now to
unblock the AMDGPU buildbot.
2023-01-12 01:17:23 -08:00
Johannes Doerfert
cefa5cefdc [OpenMP] Replace ExternalizationRAII with virtual uses
The externalization was always a stopgap solution. One of the drawbacks
is that it is very conservative no matter if we actually require the
functions at the end of the pass. The new concept is more generic and
properly integrates into the dependence graph. Whenever we might need a
function, it has a "virtual use" that cannot be analyzed. If we do not
because of some AA state, there will be a dependence to ensure state
changes trigger revisits of uses, including a potentially new virtual
use.
2023-01-12 00:14:06 -08:00
Johannes Doerfert
96c335e2cc [Attributor] Always ensure the correct AAIsDead object is used
Since the Attributor::isAssumedDead lookups can jump between functions
we need to potentially replace a given FnLivenessAA for it to be useful.
2023-01-11 23:49:09 -08:00
Johannes Doerfert
91f06dd732 [OpenMP][NFC] Include global alias test 2023-01-11 22:24:22 -08:00
Johannes Doerfert
cddcbfae14 [OpenMP][FIX] Avoid performance regression accidentally introduced 2023-01-11 00:58:34 -08:00
Johannes Doerfert
b2a8d2c69b [OpenMP] Avoid running openmp-opt on dead functions
The Attributor has logic to run only on assumed live functions and this
is exposed to users now. OpenMP-opt will (mostly) ignore dead internal
functions now but run the same deduction as before if an internal
function is marked live.

This should lower compile time as we run on less code and delete more
code early on. For the full OpenMC module compiled with noinline and
JITed at runtime, we save ~25%, or ~10s on my machine during JITing.
2023-01-10 15:03:51 -08:00
Johannes Doerfert
d1033e3cad [OpenMP] Disable ICV deduction by default.
This is not tested well and needs to be revisited in the future.
2023-01-10 15:03:51 -08:00
Johannes Doerfert
22c898dbfd [OpenMP] Use Attributor to find underlying objects of stores
When we see a store in generic mode we need to decide if we should guard
it for SPMDzation. This patch changes the getUnderlyingObjects call to
the more optimistic getAssumedUnderlyingObjects call to identify more
thread local pointers.
2023-01-09 23:34:52 -08:00
Johannes Doerfert
56be9123ca [Attributor][OpenMP][NFC] Cleanup tests via update script 2023-01-09 16:40:20 -08:00
Rafael A Herrera Guaitero
13b909ef27 OpenMPOpt: Check nested parallelism in target region
Analysis that determines if a parallel region can reach another parallel region in any target region of the TU.
A new global var is emitted with the name of the kernel + "_nested_parallelism", which is either 0 or 1 depending on the result.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D141010
2023-01-09 15:55:30 -06:00
Nikita Popov
ae1cf4577c [OpenMP] Convert some tests to opaque pointers (NFC) 2023-01-04 17:03:10 +01:00
Matt Arsenault
2e7640e6dc OpenMPOpt: Fix null dereference on missing declaration cache
Found by llvm-reduce fuzzing.
2023-01-03 16:26:37 -05:00
Matt Arsenault
c3054aeb5a OpenMPOpt: Fix using wrong address space for alloca
Using the function's address space makes no sense. Copied from the
existing test, with more addrspace variation. Could just replace the
existing one with this version if it's redundant.
2023-01-03 16:26:37 -05:00
Matt Arsenault
a87de3a6dc OpenMPOpt: Fix introducing empty nvvm.annotations into module 2023-01-03 10:32:10 -05:00
Nikita Popov
aa8e9fac2a [OpenMP] Convert some tests to opaque pointers (NFC) 2023-01-03 15:03:14 +01:00
Joseph Huber
bb4c6e7a06 [OpenMP] Remove folding logic for removed runtime function
This function was removed from the device runtime at some point but we
still have specialized code for it and an entry in the runtime kinds.
Remove it as it is no longer necessary.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D140402
2022-12-20 13:37:38 -06:00
Johannes Doerfert
4e0f464ce2 Reapply "[OpenMP][FIX] Restrict more unsound assmptions about threading"
This reverts commit 3b052558125cbedf18c2ddb65780b50d6f437d54.

This patch got reverted due to an unrelated memory leak that has been
fixed.
2022-12-19 18:27:52 -08:00
Johannes Doerfert
d4f3d8212a [OpenMP][FIX] Ensure to inline ompx:: functions after the rename in D140334 2022-12-19 16:41:49 -08:00
Mitch Phillips
3b05255812 Revert "[OpenMP][FIX] Restrict more unsound assmptions about threading"
This reverts commit 07c375348083170e39c9498a42a9679c7e08f07f.

Reason: This change is dependent on a commit that needs to be rolled
back because it broke the ASan buildbot. See
https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.
2022-12-16 17:56:38 -08:00
Johannes Doerfert
07c3753480 [OpenMP][FIX] Restrict more unsound assmptions about threading
Even if all loads and stores are in `nosync` functions we cannot
guarantee there is no synchronization going on between them. As such, we
cannot use CFG reasoning. We could check the entire module, or, what
happens now to minimize test churn, is to check if all accesses are in
the same function that is `nosync`. A follow up will undo some of the
regressions where possible.

Similarly, reachability cannot be used to exclude an access if the
access is not known to be executed by the same thread as the given
instruction.

The OpenMP-opt test was added for the latter problem.
2022-12-13 22:58:33 -08:00
Johannes Doerfert
23333bb6b7 [NFC] Rerun update test checks on Attributor and OpenMP-Opt tests 2022-12-13 18:44:19 -08:00