llvm-project

Author	SHA1	Message	Date
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Shilei Tian	bc5d8276da	[FIX] Update check lines of `llvm/test/Transforms/OpenMP/remove_globalization.ll`	2025-06-09 10:59:53 -04:00
Shilei Tian	64bd4d91ef	[FIX] Add nvptx as required target for some OpenMP tests Those tests set nvptx64 in IR but doesn't require the target. The optimization now needs TTI such that if nvptx is not registered, it just uses whatever default target is, which will cause the check lines mismatch.	2025-06-09 10:27:46 -04:00
Shilei Tian	f32b75658f	[Attributor] Use known non-flat AS before `getAssumedAddrSpace` (#143221 ) If the underlying object already has a non-flat address space, we simply use that before calling `getAssumedAddrSpace`. Partially fixes SWDEV-536263.	2025-06-09 10:11:34 -04:00
Shilei Tian	4d48673562	Reapply "Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )"" This reverts commit 37ea3b32cdcb6c0dcecbcc4bf844f5190c7378dd.	2025-05-30 22:11:22 -04:00
Shilei Tian	37ea3b32cd	Revert "Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )"" This reverts commit 4efc13f8ff1eaf4f9fb1fcea8d4552b3eca052ca.	2025-05-30 22:06:16 -04:00
Shilei Tian	4efc13f8ff	Reapply "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )" This reverts commit 3c6211c183885afb5d89259a53c4f4f46a6bf399.	2025-05-30 21:56:24 -04:00
Shilei Tian	3c6211c183	Revert "[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )" This reverts commit 9bf6b2a8cb0467b62173659306e43a0346f063a2.	2025-05-30 21:15:25 -04:00
Shilei Tian	2446381838	Revert "[FIX] Fix a test update by mistake" This reverts commit 0a75d8e4330b4fba670c48c942adcc5a5891eba3.	2025-05-30 21:15:25 -04:00
Shilei Tian	0a75d8e433	[FIX] Fix a test update by mistake	2025-05-30 18:09:48 -04:00
Shilei Tian	9bf6b2a8cb	[AMDGPU] Make `getAssumedAddrSpace` return AS1 for pointer kernel arguments (#137488 )	2025-05-30 17:30:42 -04:00
Alex MacLean	3a84a4e55d	Reland "[NVPTX] Unify and extend barrier{.cta} intrinsic support" (#141143 ) Note: This relands #140615 adding a ".count" suffix to the non-".all" variants. Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned.count(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync.count(i32, i32) - llvm.nvvm.barrier.cta.arrive.count(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned.count(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync.count(x, y)	2025-05-22 19:38:10 -07:00
Alex Maclean	e72d8b2553	Revert "[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615 )" This reverts commit 735209c0688b10a66c24750422b35d8c2ad01bb5.	2025-05-22 17:28:43 +00:00
Alex MacLean	735209c068	[NVPTX] Unify and extend barrier{.cta} intrinsic support (#140615 ) Our current intrinsic support for barrier intrinsics is confusing and incomplete, with multiple intrinsics mapping to the same instruction and intrinsic names not clearly conveying intrinsic semantics. Further, we lack support for some variants. This change unifies the IR representation to a single consistently named set of intrinsics. - llvm.nvvm.barrier.cta.sync.aligned.all(i32) - llvm.nvvm.barrier.cta.sync.aligned(i32, i32) - llvm.nvvm.barrier.cta.arrive.aligned(i32, i32) - llvm.nvvm.barrier.cta.sync.all(i32) - llvm.nvvm.barrier.cta.sync(i32, i32) - llvm.nvvm.barrier.cta.arrive(i32, i32) The following Auto-Upgrade rules are used to maintain compatibility with IR using the legacy intrinsics: * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y) * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y)	2025-05-21 08:14:15 -07:00
Alexander Richardson	07e2ba445d	[AMDGPU] Set AS8 address width to 48 bits Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419	2025-05-19 17:26:05 -07:00
Shilei Tian	3bc125490a	[AMDGPU][Verifier] Check address space of `alloca` instruction (#135820 ) This PR updates the `Verifier` to enforce that `alloca` instructions on AMDGPU must be in AS5. This prevents hitting a misleading backend error like "unable to select FrameIndex," which makes it look like a backend bug when it's actually an IR-level issue.	2025-04-26 00:54:00 -04:00
Tom Eccles	f7daa9d302	[mlir][OpenMP] fix crash outlining infinite loop (#129872 ) Previously an extra block was created by splitting the previous exit block. This produced incorrect results when the outlined region statically never terminated because then there wouldn't be a valid exit block for the outlined region, this caused this newly added block to have an incoming edge from outside of the outlining region, which caused outlining to fail. So far as I can tell this extra block no longer serves any purpose. The comment says it is supposed to collate multiple control flow edges into one place, but the code as it is now does not achieve this. In fact, as can be seen from the changes to lit tests, this block was not actually outlined in the end. This is because there are actually two code extractors: one in the callback for creating a parallel op which is used to find what the input/output variables are (which does have this block added to it), and another one which actually does the outlining (which this block was not added to). Tested with the gfortran and fujitsu test suites. Fixes #112884	2025-03-07 11:02:52 +00:00
Joseph Huber	d9500f5032	[OpenMP] Fix the OpenMPOpt pass incorrectly optimizing if definition was missing Summary: This code is intended to block transformations if the call isn't present, however the way it's coded it silently lets it pass if the definition doesn't exist at all. This previously was always valid since we included the runtime as one giant blob so everything was always there, but now that we want to move towards separate ones, it's not quite correct.	2025-02-06 21:38:36 -06:00
David Pagan	a5fc7c3ac1	[clang][OpenMP] New OpenMP 6.0 assumption clause, 'no_openmp_constructs' (#125933 ) Add initial parsing/sema support for new assumption clause so clause can be specified. For now, it's ignored, just like the others. Added support for 'no_openmp_construct' to release notes. Testing - Updated appropriate LIT tests. - Testing: check-all	2025-02-06 12:41:10 -08:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Alex MacLean	07ed8187ac	[OpenMP] Replace nvvm.annotation usage with kernel calling conventions (#122320 ) Specifying a kernel with the `ptx_kernel` or `amdgpu_kernel` calling convention is a more idiomatic and compile-time performant than using the `nvvm.annoation !"kernel"` metadata. Transition OMPIRBuilder to use calling conventions for PTX kernels and no longer emit `nvvm.annoation`. Update OpenMPOpt to work with kernels specified via calling convention as well as metadata. Update OpenMP tests to use the calling conventions.	2025-01-24 16:56:10 -08:00
Joseph Huber	4cb4516ae9	[OpenMP] Fix RPC client not being optimized out after changes Summary: I forgot that this check deliberately looked through the indirection I removed. Fix it to just check if the symbol has no users.	2024-11-27 15:56:23 -06:00
Joseph Huber	89d8e70031	[libc] Export a pointer to the RPC client directly (#117913 ) Summary: We currently have an unnecessary level of indirection when initializing the RPC client. This is a holdover from when the RPC client was not trivially copyable and simply makes it more complicated. Here we use the `asm` syntax to give the C++ variable a valid name so that we can just copy to it directly. Another advantage to this, is that if users want to piggy-back on the same RPC interface they need only declare theirs as extern with the same symbol name, or make it weak to optionally use it if LIBC isn't avaialb.e	2024-11-27 14:57:38 -06:00
Jonas Paulsson	76a52db1ed	[OpenMP] Add missing SExt attributes on i32 args. (#115242 ) __kmpc_omp_taskwait_deps_51 arguments fixed.	2024-11-07 16:46:56 +01:00
Johannes Doerfert	84bf0da34d	[Attributor][FIX] Ensure to always translate call site arguments (#107323 ) When we propagate call site arguments we always need to translate them, this is important as we ended up picking the function argument for a recurisve call not the call site argument. `@recBad` and `@recGood` in `returned.ll` show the problem as they used to transform them the same way. The restructuring cleans the code up and helps derive more "returned" arguments and better information in the presence of recursive calls. The "dropped" attributes are simply dropped because we do not query them anymore, not because we cannot derive them.	2024-09-05 13:37:21 -07:00
Johannes Doerfert	2641ed7d26	[OpenMP][FIX] Check for requirements early (#104836 ) If we can't transform the region to SPMD, we should not wait till the end to decide that. Other AAs might assume SPMD, and we did set the constant initializer to indicate SPMD, but we did not change the code properly.	2024-08-20 09:05:23 -07:00
Shilei Tian	907c7eb311	[Attributor] Enable `AAAddressSpace` in `OpenMPOpt` (#104363 ) This reverts commit e592c2dcf5b7d2da6c2564f5d9990aa34079bad4. We can finally reland the PR since the issue that caused the PR to be reverted has been resolved in https://github.com/llvm/llvm-project/pull/104051.	2024-08-16 13:33:48 -04:00
Matt Arsenault	f9060f1b7e	AMDGPU: Fix using wrong alloca address space in test (#102108 )	2024-08-07 00:19:22 +04:00
Sushant Gokhale	c7ee20433c	[OpenMP] Fix stack corruption due to argument mismatch (#96386 ) While lowering (#pragma omp target update from), clang's generated .omp_task_entry. is setting up 9 arguments while calling __tgt_target_data_update_nowait_mapper. At the same time, in __tgt_target_data_update_nowait_mapper, call to targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming it has the argument count listed in the signature. AARCH64 asm sequence for this is as follows (removed unrelated insns): ` .omp_task_entry..108: sub sp, sp, #32 stp x29, x30, sp, #16 // 16-byte Folded Spill add x29, sp, #16 str x8, sp, #8. // stack canary str xzr, [sp] bl __tgt_target_data_update_nowait_mapper __tgt_target_data_update_nowait_mapper: sub sp, sp, #32 stp x29, x30, sp, #16 // 16-byte Folded Spill add x29, sp, #16 str x8, sp, #8 // stack canary // Sibcall argument setup adrp x8, :got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb ldr x8, [x8, :got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb] stp x9, x8, x29, #16 adrp x8, .L.str.8 add x8, x8, :lo12:.L.str.8 str x8, x29, #32. <==. This is the insn that erases $fp ldp x29, x30, sp, #16 // 16-byte Folded Reload add sp, sp, #32 // Sibcall b ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD ` On AArch64, call to __tgt_target_data_update_nowait_mapper in .omp_task_entry. sets up only single space on stack and this results in ovewriting $fp and subsequent stack corruption. This issue can be credited to discrepancy of __tgt_target_data_update_nowait_mapper signature in openmp/libomptarget/include/omptarget.h taking 13 arguments while clang/lib/CodeGen/CGOpenMPRuntime.cpp and llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments. This patch modifies __tgt_target_data_update_nowait_mapper signature to match .omp_task_entry usage(and other 2 files mentioned above). Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>	2024-07-05 10:39:15 +05:30
Fangrui Song	89e8e63f47	[Attributor] Stabilize llvm.assume output Don't rely on the iteration order of DenseSet<StringRef>, which is not guaranteed to be deterministic.	2024-06-19 15:36:46 -07:00
Nikita Popov	deab451e7a	[IR] Remove support for icmp and fcmp constant expressions (#93038 ) Remove support for the icmp and fcmp constant expressions. This is part of: https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179 As usual, many of the updated tests will no longer test what they were originally intended to -- this is hard to preserve when constant expressions get removed, and in many cases just impossible as the existence of a specific kind of constant expression was the cause of the issue in the first place.	2024-06-04 08:31:03 +02:00
Sirraide	c44fa3e8a9	[Clang] Refactor `__attribute__((assume))` (#84934 ) This is a followup to #81014 and #84582: Before this patch, Clang would accept `__attribute__((assume))` and `[[clang::assume]]` as nonstandard spellings for the `[[omp::assume]]` attribute; this resulted in a potentially very confusing name clash with C++23’s `[[assume]]` attribute (and GCC’s `assume` attribute with the same semantics). This pr replaces every usage of `__attribute__((assume))` with `[[omp::assume]]` and makes `__attribute__((assume))` and `[[clang::assume]]` alternative spellings for C++23’s `[[assume]]`; this shouldn’t cause any problems due to differences in appertainment and because almost no-one was using this variant spelling to begin with (a use in libclc has already been changed to use a different attribute).	2024-05-22 17:58:48 +02:00
Johannes Doerfert	cd3a4c31bc	[Attributor][NFC] update tests (#91011 )	2024-05-03 16:38:55 -07:00
Joseph Huber	904b1a8505	[Offload] Remove remaining `__tgt_register_requires` references (#90198 ) Summary: This call was removed a few months ago to allow the runtime to actually init / deinit in a correct order. However that patch forgot to remove a few leftover uses.	2024-04-26 08:28:10 -05:00
Matt	88e31f64a0	[OpenMP][FIX] Remove unsound omp_get_thread_limit deduplication (#79524 ) The deduplication of the calls to `omp_get_thread_limit` used to be legal when originally added in <`e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)`>, as the result (thread_limit) was immutable. However, now that we have `thread_limit` clause, we no longer have immutability; therefore `omp_get_thread_limit()` is not a deduplicable runtime call. Thus, removing `omp_get_thread_limit` from the `DeduplicableRuntimeCallIDs` array. Here's a simple example: ``` #include <omp.h> #include <stdio.h> int main() { #pragma omp target thread_limit(4) { printf("\n1:target thread_limit: %d\n", omp_get_thread_limit()); } #pragma omp target thread_limit(3) { printf("\n2:target thread_limit: %d\n", omp_get_thread_limit()); } return 0; } ``` GCC-compiled binary execution: https://gcc.godbolt.org/z/Pjv3TWoTq ``` 1:target thread_limit: 4 2:target thread_limit: 3 ``` Clang/LLVM-compiled binary execution: https://clang.godbolt.org/z/zdPbrdMPn ``` 1:target thread_limit: 4 2:target thread_limit: 4 ``` By my reading of the OpenMP spec GCC does the right thing here; cf. <https://www.openmp.org/spec-html/5.2/openmpse12.html#x34-330002.4>: > If a target construct with a thread_limit clause is encountered, the thread-limit-var ICV from the data environment of the generated initial task is instead set to an implementation deﬁned value between one and the value speciﬁed in the clause. The common subexpression elimination (CSE) of the second call to `omp_get_thread_limit` by LLVM does not seem to be correct, as it's not an available expression at any program point(s) (in the scope of the clause in question) after the second target construct with a `thread_limit` clause is encountered. Compiling with `-Rpass=openmp-opt -Rpass-analysis=openmp-opt -Rpass-missed=openmp-opt` we have: https://clang.godbolt.org/z/G7dfhP7jh ``` <source>:8:42: remark: OpenMP runtime call omp_get_thread_limit deduplicated. [OMP170] [-Rpass=openmp-opt] 8 \| printf("\n1:target thread_limit: %d\n",omp_get_thread_limit()); \| ^ ``` OMP170 has the following explanation: https://openmp.llvm.org/remarks/OMP170.html > This optimization remark indicates that a call to an OpenMP runtime call was replaced with the result of an existing one. This occurs when the compiler knows that the result of a runtime call is immutable. Removing duplicate calls is done by replacing all calls to that function with the result of the first call. This cannot be done automatically by the compiler because the implementations of the OpenMP runtime calls live in a separate library the compiler cannot see. This optimization will trigger for known OpenMP runtime calls whose return value will not change. At the same time I do not believe we have an analysis checking whether this precondition holds here: "This occurs when the compiler knows that the result of a runtime call is immutable." AFAICT, such analysis doesn't appear to exist in the original patch introducing deduplication, either: - `9548b74a83` - https://reviews.llvm.org/D69930 The fix is to remove it from `DeduplicableRuntimeCallIDs`, effectively reverting the addition in this commit (noting that `omp_get_max_threads` is not present in `DeduplicableRuntimeCallIDs`, so it's possible this addition was incorrect in the first place): - [OpenMP][Opt] Annotate known runtime functions and deduplicate more, - `e28936f613 (diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123)` As a result, we're no longer unsoundly deduplicating the OpenMP runtime call `omp_get_thread_limit` as illustrated by the test case: Note the (correctly) repeated `call i32 @omp_get_thread_limit()`. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-02-22 08:13:41 -06:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Youngsuk Kim	c57ef2c698	[llvm][OpenMPOpt] Remove no-op ptr-to-ptr bitcast (NFC) (#73869 ) * Remove a call to CreatePointerBitCastOrAddrSpaceCast which merely adds a no-op ptr-to-ptr bitcast. * Most of the diff is from removing checks for no-op ptr-to-ptr bitcasts in relevant LIT tests	2023-11-29 20:47:37 -05:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Johannes Doerfert	d3e7a48cbd	[OpenMP][NFC] Remove a no-op function	2023-11-03 10:28:36 -07:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Joseph Huber	e8c0ae60d7	[OpenMP] Add optimization to remove the RPC client (#70683 ) Summary: Part of the work done in the `libc` project is to provide host services for things like `printf` or `malloc`, or generally any syscall-like behaviour. This scheme works by emitting an externally visible global called `__llvm_libc_rpc_client` that the host runtime can pick up to get a handle to the global memory associated with the client. We use the presence of this symbol to indicate whether or not we need to run an RPC server. Normally, this symbol is only present if something requiring an RPC server was linked in, such as `printf`. However, if this call to `printf` was subsequently optimizated out, the symbol would remain and cannot be removed (rightfully so) because of its linkage. This patch adds a special-case optimization to remove this symbol so we can indicate that an RPC server is no longer needed. This patch puts this logic in `OpenMPOpt` as the most readily available place for it. In the future, we should think how to move this somewhere more generic. Furthermore, we use a hard-coded runtime name (which isn't uncommon given all the other magic symbol names). But it might be nice to abstract that part away.	2023-10-31 17:23:24 -05:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab. Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit c2a1249a8257ed033a98e32e425539c6da6700ec. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00
Johannes Doerfert	c2a1249a82	[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 ) The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.	2023-10-26 14:46:55 -07:00
Alex Richardson	1e029cf53b	Add missing REQUIRES lines to unbreak buildbots Since e39f6c1844fab59c638d8059a6cf139adb42279a these tests require a valid target in order to compute the data layout.	2023-10-26 14:28:40 -07:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Johannes Doerfert	0a0c23b9ce	[OpenMPOpt][FIX] Properly track changes to NestedParallelism If we update the state, or indicate a pessimistic fixpoint, we need to consider NestedParallelism too. Fixes part of https://github.com/llvm/llvm-project/issues/66708 That said, the reproducer still needs malloc which we don't support on AMD GPU. Will be added later.	2023-10-20 19:28:09 -07:00
Daniel Woodworth	ac29405b93	[OpenMPOpt] Fix incorrect end-of-kernel barrier removal (#65670 ) Barrier removal in OpenMPOpt normally removes barriers by proving that they are redundant with barriers preceding them. However, it can't do this with the "pseudo-barrier" at the end of kernels because that can't be removed. Instead, it removes the barriers preceding the end of the kernel which that end-of-kernel barrier is redundant with. However, these barriers aren't always redundant with the end-of-kernel barrier when loops are involved, and removing them can lead to incorrect results in compiled code. This change fixes this by requiring that these pre-end-of-kernel barriers also have the kernel end as a unique successor before removing them. It also changes the initialization of `ExitED` for kernels since the kernel end is not an aligned barrier.	2023-09-27 09:35:42 -07:00
Shilei Tian	186a4b3b65	[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions (#67075 ) Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel" attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking them together because CUDA kernels are not annotated with the attribute. This patch removes the assumption and added a new counter for those non-OpenMP kernels. Fix #66687.	2023-09-23 22:34:07 -04:00
Shilei Tian	22e1df7f5b	[LLVM][OpenMPOpt] Fix a crash when associated function is nullptr (#66274 ) The associated function can be a nullptr if it is an indirect call. This causes a crash in `CheckCallee` which always assumes the callee is a valid pointer. Fix #66904.	2023-09-13 20:22:59 -04:00

1 2 3 4 5 ...

381 Commits