This reverts commit 35cfadfbe2decd9633560b3046fa6c17523b2fa9.
It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
There's a desire to move away from `undef` in LLVM. Currently we want to
have the `addressspace(3)` variables use `poison` instead.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D147719
We missed certain updates, mostly to call site information, and
dependent AAs did not get recomputed. We also did not properly
distinguish and propagate incoming and outgoing information of call
sites.
The runtime tests passes now, I'll add a proper test for
AAExecutionDomain soon that covers all the cases and ensures we haven't
forgotten more updates. To help unblock some apps, I'll put the fix
first.
This callback caused us to potentially miss out on call edges if we were
expecting a custom state machine since the custom state machine was not
created but the workers also did not enter the generic one. I have not
observed an issue and don't know how to create a test for sure, but it
is saver to err on the conservative side for now.
We can use dominance and avoid the special handling of kernels and
prevent inserting code before allocas accidentally (as happend in the
runtime test).
This patch adds the AANonConvergent abstract attribute. It removes the
convergent attribute from functions that only call non-convergent
functions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D143228
This is a two part fix. First, we need two Execution Domains (ED) to
track the values of a function. One for incoming values and one for
outgoing values. This was conflated before. Second, at the function
entry we need to look at the incoming information from call sites not
iterate over non-existing predecessors.
There were missing checks in the aligned region code, copy-paste errors
(= usage of the IsReachedFromAlignedBarrierOnly value instead of
IsReachingAlignedBarrierOnly value on the forward pass), and a missing
update of the call state for sync declarations and definitions.
Partially fixes https://github.com/llvm/llvm-project/issues/60425
Before we might have ended up queriying the AAExecutionDomain of a
different function, which resulted in wrong optimistic results.
Partially fixes https://github.com/llvm/llvm-project/issues/60425
The `OpenMPOpt` pass contains optimizations that generate new calls into
the OpenMP runtime. This causes problems if we are in a state where the
runtime has already been linked statically. Generating these new calls
will result in them never being resolved. We should indicate if we are
in a "post-link" LTO phase and prevent OpenMPOpt from generating new
runtime calls.
Generally, it's not desireable for passes to maintain state about the
context in which they're called. But this is the only reasonable
solution to static linking when we have a pass that generates new
runtime calls.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142646
If an instruction is executed in an aligned region we can ignore
threading effects and use CFG reasoning (dominance and reachability).
This is true because all threads are together in an aligned region and
there cannot be one waiting for a signal at a place not connected via
the control flow.
More dedicated tests will follow.
More details can be found here:
"Co-Designing an OpenMP GPU Runtime and Optimizations for Near-Zero
Overhead Execution", IPDPS 2022,
https://www.osti.gov/servlets/purl/1890094
Even if a barrier does not enforce aligned execution, it will
effectively be like an aligned barrier if it is executed by all threads
in an aligned way. We lack control flow divergence analysis here so we
can only do (basic block) local reasoning for now.
With this patch we track aligned barriers in AAExecutionDomain and also
delete unnecessary barriers there. This allows us to eliminate barriers
across blocks, across functions, and in the presence of complex accesses
that do not force a barrier. Further, we can use the collected
information to enable store-load forwarding in a threaded environment
(follow up patch).
Differential Revision: https://reviews.llvm.org/D140463
If we do not guard code during SPMDzation, we do not need to check
conditions for successfull guarding. That is, even if some code is
executed in different modes, it does not prevent SPMDzation if there is
no guarded code in there.
The externalization was always a stopgap solution. One of the drawbacks
is that it is very conservative no matter if we actually require the
functions at the end of the pass. The new concept is more generic and
properly integrates into the dependence graph. Whenever we might need a
function, it has a "virtual use" that cannot be analyzed. If we do not
because of some AA state, there will be a dependence to ensure state
changes trigger revisits of uses, including a potentially new virtual
use.
The Attributor has logic to run only on assumed live functions and this
is exposed to users now. OpenMP-opt will (mostly) ignore dead internal
functions now but run the same deduction as before if an internal
function is marked live.
This should lower compile time as we run on less code and delete more
code early on. For the full OpenMC module compiled with noinline and
JITed at runtime, we save ~25%, or ~10s on my machine during JITing.
When we collect and process allocations we did not verify the call
against the anchor scope / associated function. This should be done to
avoid processing calls multiple times and generally looking at calls not
in the AAs scope.
When we see a store in generic mode we need to decide if we should guard
it for SPMDzation. This patch changes the getUnderlyingObjects call to
the more optimistic getAssumedUnderlyingObjects call to identify more
thread local pointers.
Analysis that determines if a parallel region can reach another parallel region in any target region of the TU.
A new global var is emitted with the name of the kernel + "_nested_parallelism", which is either 0 or 1 depending on the result.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D141010
Using the function's address space makes no sense. Copied from the
existing test, with more addrspace variation. Could just replace the
existing one with this version if it's redundant.
This function was removed from the device runtime at some point but we
still have specialized code for it and an entry in the runtime kinds.
Remove it as it is no longer necessary.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D140402
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes clang.