347 Commits

Author SHA1 Message Date
Matt
88e31f64a0
[OpenMP][FIX] Remove unsound omp_get_thread_limit deduplication (#79524)
The deduplication of the calls to `omp_get_thread_limit` used to be
legal when originally added in
<e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)>,
as the result (thread_limit) was immutable.

However, now that we have `thread_limit` clause, we no longer have
immutability; therefore `omp_get_thread_limit()` is not a deduplicable
runtime call.

Thus, removing `omp_get_thread_limit` from the
`DeduplicableRuntimeCallIDs` array.

Here's a simple example:
```
#include <omp.h>
#include <stdio.h>

int main()
{
#pragma omp target thread_limit(4)
{
printf("\n1:target thread_limit: %d\n", omp_get_thread_limit());
}

#pragma omp target thread_limit(3)
{
printf("\n2:target thread_limit: %d\n", omp_get_thread_limit());
}
return 0;
}
```

GCC-compiled binary execution: https://gcc.godbolt.org/z/Pjv3TWoTq
```
1:target thread_limit: 4
2:target thread_limit: 3
```

Clang/LLVM-compiled binary execution:
https://clang.godbolt.org/z/zdPbrdMPn
```
1:target thread_limit: 4
2:target thread_limit: 4
```

By my reading of the OpenMP spec GCC does the right thing here; cf.
<https://www.openmp.org/spec-html/5.2/openmpse12.html#x34-330002.4>:
> If a target construct with a thread_limit clause is encountered, the
thread-limit-var ICV from the data environment of the generated initial
task is instead set to an implementation defined value between one and
the value specified in the clause.

The common subexpression elimination (CSE) of the second call to
`omp_get_thread_limit` by LLVM does not seem to be correct, as it's not
an available expression at any program point(s) (in the scope of the
clause in question) after the second target construct with a
`thread_limit` clause is encountered.

Compiling with `-Rpass=openmp-opt -Rpass-analysis=openmp-opt
-Rpass-missed=openmp-opt` we have:
https://clang.godbolt.org/z/G7dfhP7jh
```
<source>:8:42: remark: OpenMP runtime call omp_get_thread_limit deduplicated. [OMP170] [-Rpass=openmp-opt]
8 | printf("\n1:target thread_limit: %d\n",omp_get_thread_limit());
| ^
```

OMP170 has the following explanation:
https://openmp.llvm.org/remarks/OMP170.html

> This optimization remark indicates that a call to an OpenMP runtime
call was replaced with the result of an existing one. This occurs when
the compiler knows that the result of a runtime call is immutable.
Removing duplicate calls is done by replacing all calls to that function
with the result of the first call. This cannot be done automatically by
the compiler because the implementations of the OpenMP runtime calls
live in a separate library the compiler cannot see.
This optimization will trigger for known OpenMP runtime calls whose
return value will not change.

At the same time I do not believe we have an analysis checking whether
this precondition holds here: "This occurs when the compiler knows that
the result of a runtime call is immutable."

AFAICT, such analysis doesn't appear to exist in the original patch
introducing deduplication, either:

-
9548b74a83
- https://reviews.llvm.org/D69930

The fix is to remove it from `DeduplicableRuntimeCallIDs`, effectively
reverting the addition in this commit (noting that `omp_get_max_threads`
is not present in `DeduplicableRuntimeCallIDs`, so it's possible this
addition was incorrect in the first place):

- [OpenMP][Opt] Annotate known runtime functions and deduplicate more,
-
e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)

As a result, we're no longer unsoundly deduplicating the OpenMP runtime
call `omp_get_thread_limit` as illustrated by the test case: Note the
(correctly) repeated `call i32 @omp_get_thread_limit()`.

---------

Co-authored-by: Joseph Huber <huberjn@outlook.com>
2024-02-22 08:13:41 -06:00
Jessica Del
32f9983c06
[AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
2023-12-15 15:49:25 +01:00
Youngsuk Kim
c57ef2c698
[llvm][OpenMPOpt] Remove no-op ptr-to-ptr bitcast (NFC) (#73869)
* Remove a call to CreatePointerBitCastOrAddrSpaceCast which merely adds
a no-op ptr-to-ptr bitcast.

* Most of the diff is from removing checks for no-op ptr-to-ptr bitcasts
in relevant LIT tests
2023-11-29 20:47:37 -05:00
Johannes Doerfert
3de645efe3 [OpenMP][NFC] Split the reduction buffer size into two components
Before we tracked the size of the teams reduction buffer in order to
allocate it at runtime per kernel launch. This patch splits the number
into two parts, the size of the reduction data (=all reduction
variables) and the (maximal) length of the buffer. This will allow us to
allocate less if we need less, e.g., if we have less teams than the
maximal length. It also allows us to move code from clangs codegen into
the runtime as we now know how large the reduction data is.
2023-11-06 11:50:41 -08:00
Johannes Doerfert
d3e7a48cbd [OpenMP][NFC] Remove a no-op function 2023-11-03 10:28:36 -07:00
Johannes Doerfert
b8cbc5c02c
[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401)
The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the runtime. The
KernelLaunchEnvironment is for dynamic information *per* kernel launch.
It allows the rutime to feed information to the kernel that is not
shared with other invocations of the kernel. The first use case is to
replace the globals that synchronize teams reductions with per-launch
versions. This allows concurrent teams reductions. More uses cases will
follow, e.g., per launch memory pools.

Fixes: https://github.com/llvm/llvm-project/issues/70249
2023-10-31 19:38:43 -07:00
Joseph Huber
e8c0ae60d7
[OpenMP] Add optimization to remove the RPC client (#70683)
Summary:
Part of the work done in the `libc` project is to provide host services
for things like `printf` or `malloc`, or generally any syscall-like
behaviour. This scheme works by emitting an externally visible global
called `__llvm_libc_rpc_client` that the host runtime can pick up to get
a handle to the global memory associated with the client. We use the
presence of this symbol to indicate whether or not we need to run an RPC
server. Normally, this symbol is only present if something requiring an
RPC server was linked in, such as `printf`. However, if this call to
`printf` was subsequently optimizated out, the symbol would remain and
cannot be removed (rightfully so) because of its linkage. This patch
adds a special-case optimization to remove this symbol so we can
indicate that an RPC server is no longer needed.

This patch puts this logic in `OpenMPOpt` as the most readily available
place for it. In the future, we should think how to move this somewhere
more generic. Furthermore, we use a hard-coded runtime name (which isn't
uncommon given all the other magic symbol names). But it might be nice
to abstract that part away.
2023-10-31 17:23:24 -05:00
Mehdi Amini
f390a76b7e Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)""
This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab.

Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.
2023-10-26 17:30:01 -07:00
Mehdi Amini
ddbaa11e9f Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)"
This reverts commit c2a1249a8257ed033a98e32e425539c6da6700ec.

The MLIR bots are broken with an omp test failure.
2023-10-26 17:25:20 -07:00
Johannes Doerfert
c2a1249a82
[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257)
The runtime needs to know about the acceptable launch bounds, especially
if the compiler (middle- or backend) assumed those bounds. While this
patch does not yet inform the runtime, it stores the bounds in a place
that can/will be accessed and is associated with the kernel.
2023-10-26 14:46:55 -07:00
Alex Richardson
1e029cf53b Add missing REQUIRES lines to unbreak buildbots
Since e39f6c1844fab59c638d8059a6cf139adb42279a these tests require
a valid target in order to compute the data layout.
2023-10-26 14:28:40 -07:00
Alex Richardson
e39f6c1844 [opt] Infer DataLayout from triple if not specified
There are many tests that specify a target triple/CPU flags but no
DataLayout which can lead to IR being generated that has unusual
behaviour. This commit attempts to use the default DataLayout based
on the relevant flags if there is no explicit override on the command
line or in the IR file.

One thing that is not currently possible to differentiate from a missing
datalayout `target datalayout = ""` in the IR file since the current
APIs don't allow detecting this case. If it is considered useful to
support this case (instead of passing "-data-layout=" on the command
line), I can change IR parsers to track whether they have seen such a
directive and change the callback type.

Differential Revision: https://reviews.llvm.org/D141060
2023-10-26 12:07:37 -07:00
Johannes Doerfert
0a0c23b9ce [OpenMPOpt][FIX] Properly track changes to NestedParallelism
If we update the state, or indicate a pessimistic fixpoint, we need to
consider NestedParallelism too.

Fixes part of https://github.com/llvm/llvm-project/issues/66708

That said, the reproducer still needs malloc which we don't support on
AMD GPU. Will be added later.
2023-10-20 19:28:09 -07:00
Daniel Woodworth
ac29405b93
[OpenMPOpt] Fix incorrect end-of-kernel barrier removal (#65670)
Barrier removal in OpenMPOpt normally removes barriers by proving that
they are redundant with barriers preceding them. However, it can't do
this with the "pseudo-barrier" at the end of kernels because that can't
be removed. Instead, it removes the barriers preceding the end of the
kernel which that end-of-kernel barrier is redundant with. However,
these barriers aren't always redundant with the end-of-kernel barrier
when loops are involved, and removing them can lead to incorrect results
in compiled code.

This change fixes this by requiring that these pre-end-of-kernel
barriers also have the kernel end as a unique successor before removing
them. It also changes the initialization of `ExitED` for kernels since
the kernel end is not an aligned barrier.
2023-09-27 09:35:42 -07:00
Shilei Tian
186a4b3b65
[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions (#67075)
Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with
"kernel"
attribute). This doesn't hold if we mix OpenMP code and CUDA code by
lingking
them together because CUDA kernels are not annotated with the attribute.
This
patch removes the assumption and added a new counter for those
non-OpenMP
kernels.

Fix #66687.
2023-09-23 22:34:07 -04:00
Shilei Tian
22e1df7f5b
[LLVM][OpenMPOpt] Fix a crash when associated function is nullptr (#66274)
The associated function can be a nullptr if it is an indirect call.
This causes a crash in `CheckCallee` which always assumes the callee
is a valid pointer.

Fix #66904.
2023-09-13 20:22:59 -04:00
Johannes Doerfert
d47cf2bff3
[OpenMPOpt] Allow indirect calls in AAKernelInfoCallSite (#65836)
The Attributor has gained support for indirect calls but it is opt-in.
This patch makes AAKernelInfoCallSite able to handle multiple potential
callees.
2023-09-10 19:02:09 -07:00
Johannes Doerfert
67635b6e23 [OpenMPOpt][NFC] Precommit test 2023-09-08 22:40:33 -07:00
Shilei Tian
499f691be1 Revert "Reapply "[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544)"""
This reverts commit c5525a6e8fb7f7c2ce7126ac5b17aaff01ac407f.
AMD BB is not happy again.
2023-09-08 15:46:23 -04:00
Shilei Tian
c5525a6e8f Reapply "[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544)""
This reverts commit e592c2dcf5b7d2da6c2564f5d9990aa34079bad4 that
reverts e91e3cf.
2023-09-08 15:39:16 -04:00
Shilei Tian
e592c2dcf5 Revert "[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544)"
This reverts commit e91e3cf0748a80e1d7219c13fa6a7622321f4936 because
AMD BB is not happy with it.
2023-09-07 12:31:11 -04:00
Shilei Tian
e91e3cf074
[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544) 2023-09-07 12:23:52 -04:00
Johannes Doerfert
8b08287cb3 [OpenMPOpt] Eliminate assumptions only "late"
When we remove barriers, we might need to remove llvm.assume
assumptions as well. However, doing this early, thus in the module pass,
will cause us to miss out on information we might need. There are few
situations we can eliminate barriers across functions, for now we simply
disable elimination of barriers that require assumptions to be removed
during the early module pass.
2023-08-23 16:11:43 -07:00
Johannes Doerfert
78b8f1f78f [Attributor][FIX] Remove the visited set from AAInterFnReachability
The visited set was used to not visit the same function twice, however,
the (new) algorithm requires we do since we start the queries at
different call sites.
2023-08-23 11:48:18 -07:00
Johannes Doerfert
fb0e49f230 [OpenMP] Add noalias to runtime allocator functions 2023-08-17 19:25:32 -07:00
Johannes Doerfert
bfa1afb81c [OpenMPOpt] Improve __kmpc_alloc_shared handling
We know that __kmpc_alloc_shared is by construction matched with a
unique __kmpc_free_shared. Making the compiler aware of these facts
helps to avoid mallocs/allocas.

Fixes: https://github.com/llvm/llvm-project/issues/64551
2023-08-17 19:25:32 -07:00
Johannes Doerfert
4fcd5f93d6 [OpenMPOpt] Mark more runtime functions as SPMD compatible
Fixes: https://github.com/llvm/llvm-project/issues/64421
2023-08-17 18:33:24 -07:00
Johannes Doerfert
2ece6d939b [OpenMPOpt] SPMD-amenable implies no unknown parallel regions 2023-08-17 18:33:23 -07:00
Johannes Doerfert
dfc821ae89 [OpenMPOpt][FIX] Ensure a dependence for KernelEnvC queries
When other AAs query the current value of KernelEnvC via the callback
KernelConfigurationSimplifyCB we need to ensure they are now dependent
on the AAKernelInfo that is in charge of the KernelEnvC.
2023-08-10 23:16:25 -07:00
Johannes Doerfert
27f9a26668 [OpenMP][NFC] Precommit reduced test 2023-08-10 23:16:24 -07:00
Johannes Doerfert
fa367d159a [IR] Mark llvm.assume as memory(inaccessiblemem: write)
It was `inaccessiblemem: readwrite` before, no need for the read.
No real benefit is expected but it can help debugging and other efforts.

Differential Revision: https://reviews.llvm.org/D156478
2023-07-31 13:44:52 -07:00
Shilei Tian
10068cd654 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-26 13:35:14 -04:00
Johannes Doerfert
88b5d23021 [Attributor] Allow multiple LHS/RHS values when simplifying comparisons
We use to deal with multiple values but not in the handleCmp function.
Now we also allow multiple simplified operands there.
2023-07-25 20:31:21 -07:00
Johannes Doerfert
0cd8a28941 [Attributor][FIX] No IntraFnReachability does not mean unreachable
Also, first check inter fn reachability as it seems to be cheaper in
practise.
2023-07-25 17:47:33 -07:00
Johannes Doerfert
4223c9b354 [Attributor] Always deduce nosync from readonly + non-convergent
This adds the deduction also if the function is not IPO amendable.
2023-07-25 17:47:33 -07:00
Joseph Huber
05b181d851 [OpenMP] Make the nested parallelism global hidden
Summary:
These will probably be removed with the kernel environment, but they
should have hidden visibliity so they can be optimized out.
2023-07-24 08:28:54 -05:00
Shilei Tian
6bd74fd65f Revert commits for kernel environment
This reverts commits for kernel environments as they causes issues in AMD BB.
2023-07-23 23:32:31 -04:00
Shilei Tian
c5c8040390 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Depend on D155886.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-07-23 18:36:01 -04:00
Johannes Doerfert
232ce90541 [OpenMP][FIX] Adjust "known" attributes for runtime functions
This showed up when we started to deduce readnone for the argument of
__kmpc_global_thread_num. The known attributes for "getters" did not
allow to read arguments, but that is sometimes the case.
2023-07-14 17:01:48 -07:00
Johannes Doerfert
4dc5662c27 [Attributor][NFC] Update all tests with the script
Three tests needed manual adjustment after
https://reviews.llvm.org/D148216 got reverted. See
https://github.com/llvm/llvm-project/issues/63746.
2023-07-14 13:53:38 -07:00
Matt Arsenault
357d19a8fd OpenMP: Convert some tests to opaque pointers 2023-07-11 18:03:20 -04:00
Johannes Doerfert
02a4fcec6b [Attributor] Port AANonNull to the isImpliedByIR interface
AANonNull is now the first AA that is always queried via the new APIs
and not created manually. Others will follow shortly to avoid trivial
AAs whenever possible.

This commit introduced some helper logic that will make it simpler to
port the next one. It also untangles AADereferenceable and AANonNull
such that the former does not keep a handle on the latter. Finally,
we stop deducing `nonnull` for `undef`, which was incorrect.
2023-07-09 16:04:19 -07:00
Johannes Doerfert
fe12d313ba [OpenMPOpt][FIX] Propagate IsReachingAlignedBarrier flag through calls 2023-07-07 16:38:34 -07:00
Johannes Doerfert
7e77e812ab [Attributor][FIX] Require the store to be aligned for value propagation 2023-07-07 16:38:34 -07:00
Johannes Doerfert
24656e995a [OpenMPOpt] The kernel end is not necessarily an aligned barrier
A kernel can be exited in a non-aligned fashion, so we cannot pretend it
always ends in an aligned barrier. Instead, we require an explicit
aligned barrier as we lack a divergence analysis at this point.
2023-07-07 16:38:34 -07:00
Johannes Doerfert
4009f84d2d [OpenMPOpt] Check for execution with an aligned barrier
If the next or last synchronizing instruction was an aligned barrier,
the instruction is executed in an aligned region.
2023-07-07 16:38:33 -07:00
Johannes Doerfert
3a3ea43078 [OpenMPOpt][NFC] Precommit test for AAExecutionDomain bug 2023-07-07 16:38:33 -07:00
Johannes Doerfert
77dbd1d712 [Attributor][NFCI] Manifest assumption attributes explicitly
We had some custom manifest for assumption attributes but we use the
generic manifest logic. If we later decide to curb duplication (of
attributes on the call site and callee), we can do that at a single
location and for all attributes.

The test changes basically add known `llvm.assume` callee information to
the call sites.
2023-07-03 11:57:29 -07:00
Johannes Doerfert
b672c602c7 [Attributor][NFCI] Merge MemoryEffects explicitly
We had some custom handling for existing MemoryEffects but we now move
it to the place we check other existing attributes before we manifest
new ones. If we later decide to curb duplication (of attributes on the
call site and callee), we can do that at a single location and for all
attributes.

The test changes basically add known `memory` callee information to the
call sites.
2023-07-03 11:57:29 -07:00
Johannes Doerfert
d33bca840a [Attributor] Introduce helpers to judge AAs prior to creation
This is a partial cleanup to centralize the initialization and update
decisions for AAs. Lifting the burdon and boilerplate on users and
making it harder to accidentally perform unsound deductions.

The two static helpers show how we can lift the decisions to generate an
AA into the Attributor, avoiding trivial AAs that just cost us compile
time and maintenance code (to check for pre-conditions).
2023-06-29 12:32:45 -07:00