llvm-project

Author	SHA1	Message	Date
Matt	88e31f64a0	[OpenMP][FIX] Remove unsound omp_get_thread_limit deduplication (#79524 ) The deduplication of the calls to `omp_get_thread_limit` used to be legal when originally added in <`e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)`>, as the result (thread_limit) was immutable. However, now that we have `thread_limit` clause, we no longer have immutability; therefore `omp_get_thread_limit()` is not a deduplicable runtime call. Thus, removing `omp_get_thread_limit` from the `DeduplicableRuntimeCallIDs` array. Here's a simple example: ``` #include <omp.h> #include <stdio.h> int main() { #pragma omp target thread_limit(4) { printf("\n1:target thread_limit: %d\n", omp_get_thread_limit()); } #pragma omp target thread_limit(3) { printf("\n2:target thread_limit: %d\n", omp_get_thread_limit()); } return 0; } ``` GCC-compiled binary execution: https://gcc.godbolt.org/z/Pjv3TWoTq ``` 1:target thread_limit: 4 2:target thread_limit: 3 ``` Clang/LLVM-compiled binary execution: https://clang.godbolt.org/z/zdPbrdMPn ``` 1:target thread_limit: 4 2:target thread_limit: 4 ``` By my reading of the OpenMP spec GCC does the right thing here; cf. <https://www.openmp.org/spec-html/5.2/openmpse12.html#x34-330002.4>: > If a target construct with a thread_limit clause is encountered, the thread-limit-var ICV from the data environment of the generated initial task is instead set to an implementation deﬁned value between one and the value speciﬁed in the clause. The common subexpression elimination (CSE) of the second call to `omp_get_thread_limit` by LLVM does not seem to be correct, as it's not an available expression at any program point(s) (in the scope of the clause in question) after the second target construct with a `thread_limit` clause is encountered. Compiling with `-Rpass=openmp-opt -Rpass-analysis=openmp-opt -Rpass-missed=openmp-opt` we have: https://clang.godbolt.org/z/G7dfhP7jh ``` <source>:8:42: remark: OpenMP runtime call omp_get_thread_limit deduplicated. [OMP170] [-Rpass=openmp-opt] 8 \| printf("\n1:target thread_limit: %d\n",omp_get_thread_limit()); \| ^ ``` OMP170 has the following explanation: https://openmp.llvm.org/remarks/OMP170.html > This optimization remark indicates that a call to an OpenMP runtime call was replaced with the result of an existing one. This occurs when the compiler knows that the result of a runtime call is immutable. Removing duplicate calls is done by replacing all calls to that function with the result of the first call. This cannot be done automatically by the compiler because the implementations of the OpenMP runtime calls live in a separate library the compiler cannot see. This optimization will trigger for known OpenMP runtime calls whose return value will not change. At the same time I do not believe we have an analysis checking whether this precondition holds here: "This occurs when the compiler knows that the result of a runtime call is immutable." AFAICT, such analysis doesn't appear to exist in the original patch introducing deduplication, either: - `9548b74a83` - https://reviews.llvm.org/D69930 The fix is to remove it from `DeduplicableRuntimeCallIDs`, effectively reverting the addition in this commit (noting that `omp_get_max_threads` is not present in `DeduplicableRuntimeCallIDs`, so it's possible this addition was incorrect in the first place): - [OpenMP][Opt] Annotate known runtime functions and deduplicate more, - `e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)` As a result, we're no longer unsoundly deduplicating the OpenMP runtime call `omp_get_thread_limit` as illustrated by the test case: Note the (correctly) repeated `call i32 @omp_get_thread_limit()`. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-02-22 08:13:41 -06:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Youngsuk Kim	c57ef2c698	[llvm][OpenMPOpt] Remove no-op ptr-to-ptr bitcast (NFC) (#73869 ) * Remove a call to CreatePointerBitCastOrAddrSpaceCast which merely adds a no-op ptr-to-ptr bitcast. * Most of the diff is from removing checks for no-op ptr-to-ptr bitcasts in relevant LIT tests	2023-11-29 20:47:37 -05:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Johannes Doerfert	d3e7a48cbd	[OpenMP][NFC] Remove a no-op function	2023-11-03 10:28:36 -07:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Joseph Huber	e8c0ae60d7	[OpenMP] Add optimization to remove the RPC client (#70683 ) Summary: Part of the work done in the `libc` project is to provide host services for things like `printf` or `malloc`, or generally any syscall-like behaviour. This scheme works by emitting an externally visible global called `__llvm_libc_rpc_client` that the host runtime can pick up to get a handle to the global memory associated with the client. We use the presence of this symbol to indicate whether or not we need to run an RPC server. Normally, this symbol is only present if something requiring an RPC server was linked in, such as `printf`. However, if this call to `printf` was subsequently optimizated out, the symbol would remain and cannot be removed (rightfully so) because of its linkage. This patch adds a special-case optimization to remove this symbol so we can indicate that an RPC server is no longer needed. This patch puts this logic in `OpenMPOpt` as the most readily available place for it. In the future, we should think how to move this somewhere more generic. Furthermore, we use a hard-coded runtime name (which isn't uncommon given all the other magic symbol names). But it might be nice to abstract that part away.	2023-10-31 17:23:24 -05:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab. Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit c2a1249a8257ed033a98e32e425539c6da6700ec. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00
Johannes Doerfert	c2a1249a82	[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 ) The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.	2023-10-26 14:46:55 -07:00
Alex Richardson	1e029cf53b	Add missing REQUIRES lines to unbreak buildbots Since e39f6c1844fab59c638d8059a6cf139adb42279a these tests require a valid target in order to compute the data layout.	2023-10-26 14:28:40 -07:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Johannes Doerfert	0a0c23b9ce	[OpenMPOpt][FIX] Properly track changes to NestedParallelism If we update the state, or indicate a pessimistic fixpoint, we need to consider NestedParallelism too. Fixes part of https://github.com/llvm/llvm-project/issues/66708 That said, the reproducer still needs malloc which we don't support on AMD GPU. Will be added later.	2023-10-20 19:28:09 -07:00
Daniel Woodworth	ac29405b93	[OpenMPOpt] Fix incorrect end-of-kernel barrier removal (#65670 ) Barrier removal in OpenMPOpt normally removes barriers by proving that they are redundant with barriers preceding them. However, it can't do this with the "pseudo-barrier" at the end of kernels because that can't be removed. Instead, it removes the barriers preceding the end of the kernel which that end-of-kernel barrier is redundant with. However, these barriers aren't always redundant with the end-of-kernel barrier when loops are involved, and removing them can lead to incorrect results in compiled code. This change fixes this by requiring that these pre-end-of-kernel barriers also have the kernel end as a unique successor before removing them. It also changes the initialization of `ExitED` for kernels since the kernel end is not an aligned barrier.	2023-09-27 09:35:42 -07:00
Shilei Tian	186a4b3b65	[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions (#67075 ) Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel" attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking them together because CUDA kernels are not annotated with the attribute. This patch removes the assumption and added a new counter for those non-OpenMP kernels. Fix #66687.	2023-09-23 22:34:07 -04:00
Shilei Tian	22e1df7f5b	[LLVM][OpenMPOpt] Fix a crash when associated function is nullptr (#66274 ) The associated function can be a nullptr if it is an indirect call. This causes a crash in `CheckCallee` which always assumes the callee is a valid pointer. Fix #66904.	2023-09-13 20:22:59 -04:00
Johannes Doerfert	d47cf2bff3	[OpenMPOpt] Allow indirect calls in AAKernelInfoCallSite (#65836 ) The Attributor has gained support for indirect calls but it is opt-in. This patch makes AAKernelInfoCallSite able to handle multiple potential callees.	2023-09-10 19:02:09 -07:00
Johannes Doerfert	67635b6e23	[OpenMPOpt][NFC] Precommit test	2023-09-08 22:40:33 -07:00
Shilei Tian	499f691be1	Revert "Reapply "[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544 )""" This reverts commit c5525a6e8fb7f7c2ce7126ac5b17aaff01ac407f. AMD BB is not happy again.	2023-09-08 15:46:23 -04:00
Shilei Tian	c5525a6e8f	Reapply "[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544 )"" This reverts commit e592c2dcf5b7d2da6c2564f5d9990aa34079bad4 that reverts e91e3cf.	2023-09-08 15:39:16 -04:00
Shilei Tian	e592c2dcf5	Revert "[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544 )" This reverts commit e91e3cf0748a80e1d7219c13fa6a7622321f4936 because AMD BB is not happy with it.	2023-09-07 12:31:11 -04:00
Shilei Tian	e91e3cf074	[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544 )	2023-09-07 12:23:52 -04:00
Johannes Doerfert	8b08287cb3	[OpenMPOpt] Eliminate assumptions only "late" When we remove barriers, we might need to remove llvm.assume assumptions as well. However, doing this early, thus in the module pass, will cause us to miss out on information we might need. There are few situations we can eliminate barriers across functions, for now we simply disable elimination of barriers that require assumptions to be removed during the early module pass.	2023-08-23 16:11:43 -07:00
Johannes Doerfert	78b8f1f78f	[Attributor][FIX] Remove the visited set from AAInterFnReachability The visited set was used to not visit the same function twice, however, the (new) algorithm requires we do since we start the queries at different call sites.	2023-08-23 11:48:18 -07:00
Johannes Doerfert	fb0e49f230	[OpenMP] Add `noalias` to runtime allocator functions	2023-08-17 19:25:32 -07:00
Johannes Doerfert	bfa1afb81c	[OpenMPOpt] Improve __kmpc_alloc_shared handling We know that __kmpc_alloc_shared is by construction matched with a unique __kmpc_free_shared. Making the compiler aware of these facts helps to avoid mallocs/allocas. Fixes: https://github.com/llvm/llvm-project/issues/64551	2023-08-17 19:25:32 -07:00
Johannes Doerfert	4fcd5f93d6	[OpenMPOpt] Mark more runtime functions as SPMD compatible Fixes: https://github.com/llvm/llvm-project/issues/64421	2023-08-17 18:33:24 -07:00
Johannes Doerfert	2ece6d939b	[OpenMPOpt] SPMD-amenable implies no unknown parallel regions	2023-08-17 18:33:23 -07:00
Johannes Doerfert	dfc821ae89	[OpenMPOpt][FIX] Ensure a dependence for KernelEnvC queries When other AAs query the current value of KernelEnvC via the callback KernelConfigurationSimplifyCB we need to ensure they are now dependent on the AAKernelInfo that is in charge of the KernelEnvC.	2023-08-10 23:16:25 -07:00
Johannes Doerfert	27f9a26668	[OpenMP][NFC] Precommit reduced test	2023-08-10 23:16:24 -07:00
Johannes Doerfert	fa367d159a	[IR] Mark `llvm.assume` as `memory(inaccessiblemem: write)` It was `inaccessiblemem: readwrite` before, no need for the read. No real benefit is expected but it can help debugging and other efforts. Differential Revision: https://reviews.llvm.org/D156478	2023-07-31 13:44:52 -07:00
Shilei Tian	10068cd654	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-26 13:35:14 -04:00
Johannes Doerfert	88b5d23021	[Attributor] Allow multiple LHS/RHS values when simplifying comparisons We use to deal with multiple values but not in the handleCmp function. Now we also allow multiple simplified operands there.	2023-07-25 20:31:21 -07:00
Johannes Doerfert	0cd8a28941	[Attributor][FIX] No IntraFnReachability does not mean unreachable Also, first check inter fn reachability as it seems to be cheaper in practise.	2023-07-25 17:47:33 -07:00
Johannes Doerfert	4223c9b354	[Attributor] Always deduce nosync from readonly + non-convergent This adds the deduction also if the function is not IPO amendable.	2023-07-25 17:47:33 -07:00
Joseph Huber	05b181d851	[OpenMP] Make the nested parallelism global hidden Summary: These will probably be removed with the kernel environment, but they should have hidden visibliity so they can be optimized out.	2023-07-24 08:28:54 -05:00
Shilei Tian	6bd74fd65f	Revert commits for kernel environment This reverts commits for kernel environments as they causes issues in AMD BB.	2023-07-23 23:32:31 -04:00
Shilei Tian	c5c8040390	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-23 18:36:01 -04:00
Johannes Doerfert	232ce90541	[OpenMP][FIX] Adjust "known" attributes for runtime functions This showed up when we started to deduce readnone for the argument of __kmpc_global_thread_num. The known attributes for "getters" did not allow to read arguments, but that is sometimes the case.	2023-07-14 17:01:48 -07:00
Johannes Doerfert	4dc5662c27	[Attributor][NFC] Update all tests with the script Three tests needed manual adjustment after https://reviews.llvm.org/D148216 got reverted. See https://github.com/llvm/llvm-project/issues/63746.	2023-07-14 13:53:38 -07:00
Matt Arsenault	357d19a8fd	OpenMP: Convert some tests to opaque pointers	2023-07-11 18:03:20 -04:00
Johannes Doerfert	02a4fcec6b	[Attributor] Port AANonNull to the isImpliedByIR interface AANonNull is now the first AA that is always queried via the new APIs and not created manually. Others will follow shortly to avoid trivial AAs whenever possible. This commit introduced some helper logic that will make it simpler to port the next one. It also untangles AADereferenceable and AANonNull such that the former does not keep a handle on the latter. Finally, we stop deducing `nonnull` for `undef`, which was incorrect.	2023-07-09 16:04:19 -07:00
Johannes Doerfert	fe12d313ba	[OpenMPOpt][FIX] Propagate IsReachingAlignedBarrier flag through calls	2023-07-07 16:38:34 -07:00
Johannes Doerfert	7e77e812ab	[Attributor][FIX] Require the store to be aligned for value propagation	2023-07-07 16:38:34 -07:00
Johannes Doerfert	24656e995a	[OpenMPOpt] The kernel end is not necessarily an aligned barrier A kernel can be exited in a non-aligned fashion, so we cannot pretend it always ends in an aligned barrier. Instead, we require an explicit aligned barrier as we lack a divergence analysis at this point.	2023-07-07 16:38:34 -07:00
Johannes Doerfert	4009f84d2d	[OpenMPOpt] Check for execution with an aligned barrier If the next or last synchronizing instruction was an aligned barrier, the instruction is executed in an aligned region.	2023-07-07 16:38:33 -07:00
Johannes Doerfert	3a3ea43078	[OpenMPOpt][NFC] Precommit test for AAExecutionDomain bug	2023-07-07 16:38:33 -07:00
Johannes Doerfert	77dbd1d712	[Attributor][NFCI] Manifest assumption attributes explicitly We had some custom manifest for assumption attributes but we use the generic manifest logic. If we later decide to curb duplication (of attributes on the call site and callee), we can do that at a single location and for all attributes. The test changes basically add known `llvm.assume` callee information to the call sites.	2023-07-03 11:57:29 -07:00
Johannes Doerfert	b672c602c7	[Attributor][NFCI] Merge MemoryEffects explicitly We had some custom handling for existing MemoryEffects but we now move it to the place we check other existing attributes before we manifest new ones. If we later decide to curb duplication (of attributes on the call site and callee), we can do that at a single location and for all attributes. The test changes basically add known `memory` callee information to the call sites.	2023-07-03 11:57:29 -07:00
Johannes Doerfert	d33bca840a	[Attributor] Introduce helpers to judge AAs prior to creation This is a partial cleanup to centralize the initialization and update decisions for AAs. Lifting the burdon and boilerplate on users and making it harder to accidentally perform unsound deductions. The two static helpers show how we can lift the decisions to generate an AA into the Attributor, avoiding trivial AAs that just cost us compile time and maintenance code (to check for pre-conditions).	2023-06-29 12:32:45 -07:00

1 2 3 4 5 ...

347 Commits