llvm-project

Author	SHA1	Message	Date
Kazu Hirata	f4a3309c9a	[IPO] Avoid repeated hash lookups (NFC) (#108796 )	2024-09-16 06:44:34 -07:00
Sergio Afonso	07bef02831	[OpenMPOpt] Initialize OpenMPIRBuilderConfig::IsGPU flag (#104456 ) This patch ensures the `IsGPU` flag is set by the OpenMPOpt pass, so that it can be relied upon by `OpenMPIRBuilder` methods when called by that pass as well. Since currently there are very limited callers for the `OpenMPIRBuilder::isGPU()` method, no assertions are being triggered by the lack of initialization of this flag. However, when more offloading-related features are implemented, it will eventually start happening.	2024-09-05 12:30:20 +01:00
Johannes Doerfert	2641ed7d26	[OpenMP][FIX] Check for requirements early (#104836 ) If we can't transform the region to SPMD, we should not wait till the end to decide that. Other AAs might assume SPMD, and we did set the constant initializer to indicate SPMD, but we did not change the code properly.	2024-08-20 09:05:23 -07:00
Shilei Tian	907c7eb311	[Attributor] Enable `AAAddressSpace` in `OpenMPOpt` (#104363 ) This reverts commit e592c2dcf5b7d2da6c2564f5d9990aa34079bad4. We can finally reland the PR since the issue that caused the PR to be reverted has been resolved in https://github.com/llvm/llvm-project/pull/104051.	2024-08-16 13:33:48 -04:00
Matt Arsenault	23209d1c1a	OpenMPOpt: Remove dead include	2024-08-09 20:52:27 +04:00
Arthur Eubanks	58bc98cd3a	[CallGraphUpdater] Remove some legacy pass manager support (#98362 ) We don't have any legacy pass manager CGSCC passes that modify the call graph (we only use it in the codegen pipeline to run function passes in call graph order). This is the beginning of removing CallGraphUpdater and making all the relevant CGSCC passes directly use the new pass manager APIs.	2024-07-12 10:02:50 -07:00
Youngsuk Kim	2051736f7b	[llvm][Transforms] Avoid 'raw_string_ostream::str' (NFC) Since `raw_string_ostream` doesn't own the string buffer, it is desirable (in terms of memory safety) for users to directly reference the string buffer rather than use `raw_string_ostream::str()`. Work towards TODO comment to remove `raw_string_ostream::str()`.	2024-06-30 09:03:29 -05:00
Nikita Popov	2d209d964a	[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902 ) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.	2024-06-27 16:38:15 +02:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Sirraide	c44fa3e8a9	[Clang] Refactor `__attribute__((assume))` (#84934 ) This is a followup to #81014 and #84582: Before this patch, Clang would accept `__attribute__((assume))` and `[[clang::assume]]` as nonstandard spellings for the `[[omp::assume]]` attribute; this resulted in a potentially very confusing name clash with C++23’s `[[assume]]` attribute (and GCC’s `assume` attribute with the same semantics). This pr replaces every usage of `__attribute__((assume))` with `[[omp::assume]]` and makes `__attribute__((assume))` and `[[clang::assume]]` alternative spellings for C++23’s `[[assume]]`; this shouldn’t cause any problems due to differences in appertainment and because almost no-one was using this variant spelling to begin with (a use in libclc has already been changed to use a different attribute).	2024-05-22 17:58:48 +02:00
Jeremy Morse	2fe81edef6	[NFC][RemoveDIs] Insert instruction using iterators in Transforms/ As part of the RemoveDIs project we need LLVM to insert instructions using iterators wherever possible, so that the iterators can carry a bit of debug-info. This commit implements some of that by updating the contents of llvm/lib/Transforms/Utils to always use iterator-versions of instruction constructors. There are two general flavours of update: * Almost all call-sites just call getIterator on an instruction * Several make use of an existing iterator (scenarios where the code is actually significant for debug-info) The underlying logic is that any call to getFirstInsertionPt or similar APIs that identify the start of a block need to have that iterator passed directly to the insertion function, without being converted to a bare Instruction pointer along the way. Noteworthy changes: * FindInsertedValue now takes an optional iterator rather than an instruction pointer, as we need to always insert with iterators, * I've added a few iterator-taking versions of some value-tracking and DomTree methods -- they just unwrap the iterator. These are purely convenience methods to avoid extra syntax in some passes. * A few calls to getNextNode become std::next instead (to keep in the theme of using iterators for positions), * SeparateConstOffsetFromGEP has it's insertion-position field changed. Noteworthy because it's not a purely localised spelling change. All this should be NFC.	2024-03-05 15:12:22 +00:00
Matt	88e31f64a0	[OpenMP][FIX] Remove unsound omp_get_thread_limit deduplication (#79524 ) The deduplication of the calls to `omp_get_thread_limit` used to be legal when originally added in <`e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)`>, as the result (thread_limit) was immutable. However, now that we have `thread_limit` clause, we no longer have immutability; therefore `omp_get_thread_limit()` is not a deduplicable runtime call. Thus, removing `omp_get_thread_limit` from the `DeduplicableRuntimeCallIDs` array. Here's a simple example: ``` #include <omp.h> #include <stdio.h> int main() { #pragma omp target thread_limit(4) { printf("\n1:target thread_limit: %d\n", omp_get_thread_limit()); } #pragma omp target thread_limit(3) { printf("\n2:target thread_limit: %d\n", omp_get_thread_limit()); } return 0; } ``` GCC-compiled binary execution: https://gcc.godbolt.org/z/Pjv3TWoTq ``` 1:target thread_limit: 4 2:target thread_limit: 3 ``` Clang/LLVM-compiled binary execution: https://clang.godbolt.org/z/zdPbrdMPn ``` 1:target thread_limit: 4 2:target thread_limit: 4 ``` By my reading of the OpenMP spec GCC does the right thing here; cf. <https://www.openmp.org/spec-html/5.2/openmpse12.html#x34-330002.4>: > If a target construct with a thread_limit clause is encountered, the thread-limit-var ICV from the data environment of the generated initial task is instead set to an implementation deﬁned value between one and the value speciﬁed in the clause. The common subexpression elimination (CSE) of the second call to `omp_get_thread_limit` by LLVM does not seem to be correct, as it's not an available expression at any program point(s) (in the scope of the clause in question) after the second target construct with a `thread_limit` clause is encountered. Compiling with `-Rpass=openmp-opt -Rpass-analysis=openmp-opt -Rpass-missed=openmp-opt` we have: https://clang.godbolt.org/z/G7dfhP7jh ``` <source>:8:42: remark: OpenMP runtime call omp_get_thread_limit deduplicated. [OMP170] [-Rpass=openmp-opt] 8 \| printf("\n1:target thread_limit: %d\n",omp_get_thread_limit()); \| ^ ``` OMP170 has the following explanation: https://openmp.llvm.org/remarks/OMP170.html > This optimization remark indicates that a call to an OpenMP runtime call was replaced with the result of an existing one. This occurs when the compiler knows that the result of a runtime call is immutable. Removing duplicate calls is done by replacing all calls to that function with the result of the first call. This cannot be done automatically by the compiler because the implementations of the OpenMP runtime calls live in a separate library the compiler cannot see. This optimization will trigger for known OpenMP runtime calls whose return value will not change. At the same time I do not believe we have an analysis checking whether this precondition holds here: "This occurs when the compiler knows that the result of a runtime call is immutable." AFAICT, such analysis doesn't appear to exist in the original patch introducing deduplication, either: - `9548b74a83` - https://reviews.llvm.org/D69930 The fix is to remove it from `DeduplicableRuntimeCallIDs`, effectively reverting the addition in this commit (noting that `omp_get_max_threads` is not present in `DeduplicableRuntimeCallIDs`, so it's possible this addition was incorrect in the first place): - [OpenMP][Opt] Annotate known runtime functions and deduplicate more, - `e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)` As a result, we're no longer unsoundly deduplicating the OpenMP runtime call `omp_get_thread_limit` as illustrated by the test case: Note the (correctly) repeated `call i32 @omp_get_thread_limit()`. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-02-22 08:13:41 -06:00
Ivan R. Ivanov	39f09ec245	Invalidate analyses after running Attributor in OpenMPOpt (#74908 ) Using the LoopInfo from OMPInfoCache after the Attributor ran resulted in a crash due to it being in an invalid state. --------- Co-authored-by: Ivan Radanov Ivanov <ivanov2@llnl.gov>	2023-12-20 15:01:21 -08:00
Paul Walker	dea16ebd26	[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217 ) The specialisation will not be valid when ConstantInt gains native support for vector types. This is largely a mechanical change but with extra attention paid to constant folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to remove the need to call `getIntegerType()`. Co-authored-by: Nikita Popov <github@npopov.com>	2023-12-18 11:58:42 +00:00
Youngsuk Kim	c57ef2c698	[llvm][OpenMPOpt] Remove no-op ptr-to-ptr bitcast (NFC) (#73869 ) * Remove a call to CreatePointerBitCastOrAddrSpaceCast which merely adds a no-op ptr-to-ptr bitcast. * Most of the diff is from removing checks for no-op ptr-to-ptr bitcasts in relevant LIT tests	2023-11-29 20:47:37 -05:00
Simon Pilgrim	3ca4fe80d4	[Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC. startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)	2023-11-06 16:50:18 +00:00
Dominik Adamski	2cce0f6c57	[OpenMP][OMPIRBuilder] Add support to omp target parallel (#67000 ) Added support for LLVM IR code generation which is used for handling omp target parallel code. The call for __kmpc_parallel_51 is generated and the parallel region is outlined to separate function. The proper setup of kmpc_target_init mode is not included in the commit. It is assumed that the SPMD mode for target initialization is properly set by other codegen functions.	2023-11-06 11:44:00 +01:00
Johannes Doerfert	d3e7a48cbd	[OpenMP][NFC] Remove a no-op function	2023-11-03 10:28:36 -07:00
Johannes Doerfert	a8152086ff	[Attributor][FIX] Ensure new BBs are registered	2023-11-01 12:12:14 -07:00
Joseph Huber	e8c0ae60d7	[OpenMP] Add optimization to remove the RPC client (#70683 ) Summary: Part of the work done in the `libc` project is to provide host services for things like `printf` or `malloc`, or generally any syscall-like behaviour. This scheme works by emitting an externally visible global called `__llvm_libc_rpc_client` that the host runtime can pick up to get a handle to the global memory associated with the client. We use the presence of this symbol to indicate whether or not we need to run an RPC server. Normally, this symbol is only present if something requiring an RPC server was linked in, such as `printf`. However, if this call to `printf` was subsequently optimizated out, the symbol would remain and cannot be removed (rightfully so) because of its linkage. This patch adds a special-case optimization to remove this symbol so we can indicate that an RPC server is no longer needed. This patch puts this logic in `OpenMPOpt` as the most readily available place for it. In the future, we should think how to move this somewhere more generic. Furthermore, we use a hard-coded runtime name (which isn't uncommon given all the other magic symbol names). But it might be nice to abstract that part away.	2023-10-31 17:23:24 -05:00
Johannes Doerfert	31b91213bd	[OpenMP] Unify the min/max thread/teams pathways We used to pass the min/max threads/teams values through different paths from the frontend to the middle end. This simplifies the situation by passing the values once, only when we will create the KernelEnvironment, which contains the values. At that point we also manifest the metadata, as appropriate. Some footguns have also been removed, e.g., our target check is now triple-based, not calling convention-based, as the latter is dependent on the ordering of operations. The types of the values have been unified to int32_t.	2023-10-29 10:53:20 -07:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab. Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit c2a1249a8257ed033a98e32e425539c6da6700ec. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00
Johannes Doerfert	c2a1249a82	[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 ) The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.	2023-10-26 14:46:55 -07:00
Johannes Doerfert	0a0c23b9ce	[OpenMPOpt][FIX] Properly track changes to NestedParallelism If we update the state, or indicate a pessimistic fixpoint, we need to consider NestedParallelism too. Fixes part of https://github.com/llvm/llvm-project/issues/66708 That said, the reproducer still needs malloc which we don't support on AMD GPU. Will be added later.	2023-10-20 19:28:09 -07:00
Daniel Woodworth	ac29405b93	[OpenMPOpt] Fix incorrect end-of-kernel barrier removal (#65670 ) Barrier removal in OpenMPOpt normally removes barriers by proving that they are redundant with barriers preceding them. However, it can't do this with the "pseudo-barrier" at the end of kernels because that can't be removed. Instead, it removes the barriers preceding the end of the kernel which that end-of-kernel barrier is redundant with. However, these barriers aren't always redundant with the end-of-kernel barrier when loops are involved, and removing them can lead to incorrect results in compiled code. This change fixes this by requiring that these pre-end-of-kernel barriers also have the kernel end as a unique successor before removing them. It also changes the initialization of `ExitED` for kernels since the kernel end is not an aligned barrier.	2023-09-27 09:35:42 -07:00
Shilei Tian	186a4b3b65	[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions (#67075 ) Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel" attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking them together because CUDA kernels are not annotated with the attribute. This patch removes the assumption and added a new counter for those non-OpenMP kernels. Fix #66687.	2023-09-23 22:34:07 -04:00
Shilei Tian	22e1df7f5b	[LLVM][OpenMPOpt] Fix a crash when associated function is nullptr (#66274 ) The associated function can be a nullptr if it is an indirect call. This causes a crash in `CheckCallee` which always assumes the callee is a valid pointer. Fix #66904.	2023-09-13 20:22:59 -04:00
Johannes Doerfert	d47cf2bff3	[OpenMPOpt] Allow indirect calls in AAKernelInfoCallSite (#65836 ) The Attributor has gained support for indirect calls but it is opt-in. This patch makes AAKernelInfoCallSite able to handle multiple potential callees.	2023-09-10 19:02:09 -07:00
Shilei Tian	499f691be1	Revert "Reapply "[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544 )""" This reverts commit c5525a6e8fb7f7c2ce7126ac5b17aaff01ac407f. AMD BB is not happy again.	2023-09-08 15:46:23 -04:00
Shilei Tian	c5525a6e8f	Reapply "[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544 )"" This reverts commit e592c2dcf5b7d2da6c2564f5d9990aa34079bad4 that reverts e91e3cf.	2023-09-08 15:39:16 -04:00
Shilei Tian	e592c2dcf5	Revert "[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544 )" This reverts commit e91e3cf0748a80e1d7219c13fa6a7622321f4936 because AMD BB is not happy with it.	2023-09-07 12:31:11 -04:00
Shilei Tian	e91e3cf074	[Attributor] Enable AAAddressSpace for OpenMPOpt (#65544 )	2023-09-07 12:23:52 -04:00
Johannes Doerfert	a01398156a	[OpenMPOpt][FIX] Ensure to propagate information about parallel regions Before, we checked the parallel region only once, and ignored updates in the KernelInfo for the parallel region that happened later. This caused us to think nested parallel sections are not present even if they are, among other things.	2023-08-25 10:46:56 -07:00
Johannes Doerfert	8b08287cb3	[OpenMPOpt] Eliminate assumptions only "late" When we remove barriers, we might need to remove llvm.assume assumptions as well. However, doing this early, thus in the module pass, will cause us to miss out on information we might need. There are few situations we can eliminate barriers across functions, for now we simply disable elimination of barriers that require assumptions to be removed during the early module pass.	2023-08-23 16:11:43 -07:00
Johannes Doerfert	9c08e76f3e	[Attributor] Introduce AAIndirectCallInfo AAIndirectCallInfo will collect information and specialize indirect call sites. It is similar to our IndirectCallPromotion but runs as part of the Attributor (so with assumed callee information). It also expands more calls and let's the rest of the pipeline figure out what is UB, for now. We use existing call promotion logic to improve the result, otherwise we rely on the (implicit) function pointer cast. This effectively "fixes" #60327 as it will undo the type punning early enough for the inliner to work with the (now specialized, thus direct) call. Fixes: https://github.com/llvm/llvm-project/issues/60327	2023-08-18 16:44:05 -07:00
Johannes Doerfert	97c24a16fd	[OpenMPOpt][NFC] Allow missing wrapper functions for parallel_51 Clang does not create a wrapper function for SPMD kernels. If it does not, we still want to collect the parallel region, even if we have no use for it right now.	2023-08-17 18:33:24 -07:00
Johannes Doerfert	4fcd5f93d6	[OpenMPOpt] Mark more runtime functions as SPMD compatible Fixes: https://github.com/llvm/llvm-project/issues/64421	2023-08-17 18:33:24 -07:00
Johannes Doerfert	2ece6d939b	[OpenMPOpt] SPMD-amenable implies no unknown parallel regions	2023-08-17 18:33:23 -07:00
Johannes Doerfert	dfc821ae89	[OpenMPOpt][FIX] Ensure a dependence for KernelEnvC queries When other AAs query the current value of KernelEnvC via the callback KernelConfigurationSimplifyCB we need to ensure they are now dependent on the AAKernelInfo that is in charge of the KernelEnvC.	2023-08-10 23:16:25 -07:00
Bjorn Pettersson	fd05c34b18	Stop using legacy helpers indicating typed pointer types. NFC Since we no longer support typed LLVM IR pointer types, the code can be simplified into for example using PointerType::get directly instead of using Type::getInt8PtrTy and Type::getInt32PtrTy etc. Differential Revision: https://reviews.llvm.org/D156733	2023-08-02 12:08:37 +02:00
Shilei Tian	10068cd654	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-26 13:35:14 -04:00
Joseph Huber	05b181d851	[OpenMP] Make the nested parallelism global hidden Summary: These will probably be removed with the kernel environment, but they should have hidden visibliity so they can be optimized out.	2023-07-24 08:28:54 -05:00
Shilei Tian	6bd74fd65f	Revert commits for kernel environment This reverts commits for kernel environments as they causes issues in AMD BB.	2023-07-23 23:32:31 -04:00
Shilei Tian	c5c8040390	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-23 18:36:01 -04:00
Nikita Popov	7be7f23269	[llvm] Remove uses of getWithSamePointeeType() (NFC)	2023-07-18 12:07:09 +02:00
Johannes Doerfert	02a4fcec6b	[Attributor] Port AANonNull to the isImpliedByIR interface AANonNull is now the first AA that is always queried via the new APIs and not created manually. Others will follow shortly to avoid trivial AAs whenever possible. This commit introduced some helper logic that will make it simpler to port the next one. It also untangles AADereferenceable and AANonNull such that the former does not keep a handle on the latter. Finally, we stop deducing `nonnull` for `undef`, which was incorrect.	2023-07-09 16:04:19 -07:00
Johannes Doerfert	fe12d313ba	[OpenMPOpt][FIX] Propagate IsReachingAlignedBarrier flag through calls	2023-07-07 16:38:34 -07:00
Johannes Doerfert	24656e995a	[OpenMPOpt] The kernel end is not necessarily an aligned barrier A kernel can be exited in a non-aligned fashion, so we cannot pretend it always ends in an aligned barrier. Instead, we require an explicit aligned barrier as we lack a divergence analysis at this point.	2023-07-07 16:38:34 -07:00

1 2 3 4 5 ...

350 Commits