llvm-project

Author	SHA1	Message	Date
Maksim Levental	eaa67a3cf0	[mlir][NFC] update `Conversion` create APIs (5/n) (#149887 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-22 10:40:45 -04:00
Kazu Hirata	fa9adbfda9	[mlir] Remove unused includes (NFC) (#147101 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-04 13:30:21 -07:00
lorenzo chelini	c5613dc863	[MLIR] Mark LLVM::FMAOp as legal (#144671 ) Mark LLVM::FMAOp as legal in configureGpuToNVVMConversionLegality, since we can handle intrinsic lowering in the NVPTX backend and emit fma.rn.f32.	2025-06-18 15:49:00 +02:00
William Moses	d9c65af626	[MLIR][GPUToNVVM] Support 32-bit isfinite (#131699 ) Co-authored-by: Ivan Radanov Ivanov <ivanov.i.aa@m.titech.ac.jp>	2025-03-18 02:11:38 +01:00
Valentin Clement (バレンタインクレメン)	b3e05d58b9	[mlir][nvvm] Add conversion for math.erfc (#129329 ) Add missing pattern to convert `math.erfc` operation to `__nv_erfcf` or `__nv_erfc` function call.	2025-02-28 14:51:56 -08:00
Matthias Springer	4defac91db	[mlir][GPUToNVVM] Add `benefit` to `populate` functions (#128484 ) Certain GPU->NVVM patterns compete with Arith->LLVM patterns. (The ones that lower to libdevice.) Add an optional `benefit` parameter to all `populate` functions so that users can give preference to GPU->NVVM patterns.	2025-02-24 17:27:55 +01:00
Oleksandr "Alex" Zinenko	79d8a34bc5	[mlir] add some FP classification ops and their lowering to libdevice (#127322 ) Introduce a subset of floating point classification ops to the Math dialect. These ops mirror functions provided by the C math library and, similarly to the existing `math.copysign`, belong to the math dialect. Add a lowering of those ops to Nvidia libdevice calls when possible as the first mechanism to exercise them.	2025-02-16 14:51:21 +01:00
Oleksandr "Alex" Zinenko	963ff1c305	[mlir] lower min/maxnum to libdevice calls (#127323 ) Introduce lowering from arith.minnum/maxxnum operations to the corresponding Nvidia libdevice calls. This requires to reorder pattern population methods so that the libdevice-targeting patterns are prioritized over default patterns targeting LLVM IR intrinsics from the Arith dialect. The tests are placed into a separate file as the existing gpu-to-nvvm.mlir files has a mode that forces Arith dialect operations to be preserved as is without using a separate FileCheck tag to differentiate. Co-authored-by: William Moses <gh@wsmoses.com>	2025-02-15 17:53:36 -05:00
Ivan Butygin	aecb764cc2	[mlir][gpu] GPUToROCDL/NVVM: use generic llvm conversion interface instead of hardcoded conversions. (#124439 ) Using `ConvertToLLVMPatternInterface` allows to unhardcode specific dialect conversions from passes and, more importantly, allows downstream projects to inject their ops/types translation here by registering corresponding interface. Add `allowed-dialects` option so user can control which dialects can be used to populate conversions.	2025-02-13 17:53:12 +03:00
Oleksandr "Alex" Zinenko	4df28af713	[mlir] Add lowering of absi and fpowi to libdevice (#123644 ) More concise version of #123422. --------- Co-authored-by: William S. Moses <gh@wsmoses.com>	2025-01-20 12:53:44 -06:00
Matthias Springer	599c739905	[mlir][GPU] Add NVVM-specific `cf.assert` lowering (#120431 ) This commit add an NVIDIA-specific lowering of `cf.assert` to to `__assertfail`. Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and `getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can be reused.	2025-01-06 12:00:11 +01:00
Jacques Pienaar	09dfc5713d	[mlir] Enable decoupling two kinds of greedy behavior. (#104649 ) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two.	2024-12-20 08:15:48 -08:00
Fabian Mora	7498eaa9ab	[mlir][LLVM] Add the `ConvertToLLVMAttrInterface` and `ConvertToLLVMOpInterface` interfaces (#99566 ) This patch adds the `ConvertToLLVMAttrInterface` and `ConvertToLLVMOpInterface` interfaces. It also modifies the `convert-to-llvm` pass to use these interfaces when available. The `ConvertToLLVMAttrInterface` interfaces allows attributes to configure conversion to LLVM, including the conversion target, LLVM type converter, and populating conversion patterns. See the `NVVMTargetAttr` implementation of this interface for an example of how this interface can be used to configure conversion to LLVM. The `ConvertToLLVMOpInterface` interface collects all convert to LLVM attributes stored in an operation. Finally, the `convert-to-llvm` pass was modified to use these interfaces when available. This allows applying `convert-to-llvm` to GPU modules and letting the `NVVMTargetAttr` decide which patterns to populate.	2024-11-24 10:09:43 -05:00
Durgadoss R	a8b5115441	[MLIR][NVGPU] Fix the cga_cluster.mlir test (#112191 ) This patch fixes the sm90 cluster test by: * Fixing a typo in LowerGpuOpsToNVVMOps where one of the ClusterDim Op conversion pattern should actually be for the ClusterDimBlocks Op. This addresses the compilation error for this test. * The grid-size should be (4,4,1) instead of (2,2,1). This passes the scf-if check against the threshold of 3 below and actually generates the required prints from the GPU. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-10-14 19:44:13 +05:30
Matthias Springer	206fad0e21	[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250 ) This commit marks the type converter in `populate...` functions as `const`. This is useful for debugging. Patterns already take a `const` type converter. However, some `populate...` functions do not only add new patterns, but also add additional type conversion rules. That makes it difficult to find the place where a type conversion was added in the code base. With this change, all `populate...` functions that only populate pattern now have a `const` type converter. Programmers can then conclude from the function signature that these functions do not register any new type conversion rules. Also some minor cleanups around the 1:N dialect conversion infrastructure, which did not always pass the type converter as a `const` object internally.	2024-10-05 21:32:40 +02:00
Daniel Hernandez-Juarez	1c47fa9b62	[mlir][AMDGPU] Add support for AMD f16 math library calls (#108809 ) In this PR we add support for AMD f16 math library calls (`__ocml_*_f16`) CC: @krzysz00 @manupak	2024-09-23 12:52:00 -05:00
Benjamin Kramer	ac11945386	[mlir][GPU] block_id has the grid size as its range	2024-09-17 18:38:04 +02:00
Krzysztof Drewniak	a953982cb7	[mlir][GPU] Plumb range information through the NVVM lowerings (#107659 ) Update the GPU to NVVM lowerings to correctly propagate range information on IDs and dimension queries, etiher from known_{block,grid}_size attributes or from `upperBound` annotations on the operations themselves.	2024-09-13 12:07:51 -05:00
Andrea Faulds	7aa22f013e	[mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851 ) This enables performing several reductions in parallel, each smaller than the size of the subgroup. One potential application is flash attention with subgroup-wide matrix multiplication and reduction combined in one kernel. The multiplication operation requires a 2D matrix to be distributed over the lanes of the subgroup, which then constrains the shape the following reduction can have if we want to keep data in registers.	2024-08-20 13:37:03 -04:00
Matthias Springer	7030280329	[mlir][GPU] Improve `gpu.module` op implementation (#102866 ) - Replace hand-written parser/printer with auto-generated assembly format. - Remove implicit `gpu.module_end` terminator and use the `NoTerminator` trait instead. (Same as `builtin.module`.) - Turn the region into a graph region. (Same as `builtin.module`.)	2024-08-13 09:37:36 +02:00
Victor Perez	d45de8003a	[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (#101664 ) Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conventions used for kernels/functions. - `workgroup` attributions encoded as additional `llvm.ptr<3>` arguments. - No attribute used to annotate kernels - `reqd_work_group_size` attribute using to encode `gpu.known_block_size`. - `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution sizes. This will be attached to the pointer argument workgroup attributions lower to. Note: A notable missing feature that will be addressed in a follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>	2024-08-09 16:09:11 +02:00
runseny	f6431f0c52	[MLIR][GPUToNVVM] support fastMath and other non-supported mathOp (#99890 ) Support fastMath and other non-supported mathOp which only require float operands and call libdevice function directly to nvvm. 1. lowering mathOp with fastMath attribute to correct libdevice intrinsic. 2. some mathOp in math dialect has been lowered to libdevice now, but it doesn't cover all mathOp. so this mr lowers all the remaining mathOp which only require float operands.	2024-07-25 13:54:58 +02:00
Krzysztof Drewniak	43fd4c49bd	[mlir][GPU] Improve handling of GPU bounds (#95166 ) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known_grid_size` become inherent attributes of GPU functions. This makes them less clunky to work with. As a consequence, the `gpu.func` lowering patterns now only look at the inherent attributes when setting target-specific attributes on the `llvm.func` that they lower to. 2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size` are made official dialect-level discardable attributes which can be placed on arbitrary functions. This allows for progressive lowerings (without this, a lowering for `gpu.thread_id` couldn't know about the bounds if it had already been moved from a `gpu.func` to an `llvm.func`) and allows for range information to be provided even when `gpu._{id,dim}` are being used outside of a `gpu.func` context. 3. All of these index operations have gained an optional `upper_bound` attribute, allowing for an alternate mode of operation where the bounds are specified locally and not inherited from the operation's context. These also allow handling of cases where the precise launch sizes aren't known, but can be bounded more precisely than the maximum of what any platform's API allows. (I'd like to thank @benvanik for pointing out that this could be useful.) When inferring bounds (either for range inference or for setting `range` during lowering) these sources of information are consulted in order of specificity (`upper_bound` > inherent attribute > discardable attribute, except that dimension sizes check for `known__bounds` to see if they can be constant-folded before checking their `upper_bound`). This patch also updates the documentation about the bounds and inference behavior to clarify what these attributes do when set and the consequences of setting them up incorrectly. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-06-17 23:47:38 -05:00
Pradeep Kumar	bd6568c98a	[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops (#95245 ) This commit adds support for `gpu.cluster_dim_blocks` and `gpu.cluster_block_id` Ops to represent number of blocks per cluster and block id inside a cluster respectively. Also, fixed the description of `gpu.cluster_dim` Op and updated the `cga_cluster.mlir` test file to use `gpu.cluster_dim_blocks` Co-authored-by: pradeepku <pradeepku@nvidia.com> Co-authored-by: Guray Ozen <guray.ozen@gmail.com>	2024-06-14 10:35:35 +05:30
Johannes Reifferscheid	a6a9215b93	Lower shuffle to single-result form if possible. (#84321 ) We currently always lower shuffle to the struct-returning variant. I saw some cases where this survived all the way through ptx, resulting in increased register usage. The easiest fix is to simply lower to the single-result version when the predicate is unused.	2024-03-21 10:33:49 +01:00
David Majnemer	0039a2ff4f	[mlir][gpu] Add support for lowering math.erf to __nv_erf (#79848 )	2024-01-29 19:35:23 +00:00
Guray Ozen	763109e346	[mlir][gpu] Use `known_block_size` to set `maxntid` for NVVM target (#77301 ) Setting thread block size with `maxntid` on the kernel has great performance benefits. In this way, downstream PTX compiler can do better register allocation. MLIR's `gpu.launch` and `gpu.launch_func` already has an attribute (`known_block_size`) that keeps the thread block size when it is known. This PR simply uses this attribute to set `maxntid`.	2024-01-08 14:49:19 +01:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Guray Ozen	641e05decc	[mlir][gpu] Support dynamic_shared_memory Op with vector dialect (#74475 ) `gpu.dynamic_shared_memory` currently does not get lowered when it is used with vector dialect. The reason is that vector-to-llvm conversion is not included in gpu-to-nvvm. This PR includes that and adds a test.	2023-12-06 10:41:57 +01:00
Jakub Kuderski	7eccd52842	Reland "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit dd09221a29506031415cad8a1308998358633d48 and relands https://github.com/llvm/llvm-project/pull/73423. * Updated `gpu.all_reduce` `min`/`max` in CUDA integration tests.	2023-11-27 11:38:18 -05:00
Jakub Kuderski	dd09221a29	Revert "[mlir][gpu] Align reduction operations with vector combining kinds (#73423 )" This reverts commit e0aac8c88d0d30e8da0f8a240ad1e6b4d88782e0. I'm seeing some nvidia integration test failures: https://lab.llvm.org/buildbot/#/builders/61/builds/52334.	2023-11-27 11:29:23 -05:00
Jakub Kuderski	e0aac8c88d	[mlir][gpu] Align reduction operations with vector combining kinds (#73423 ) The motivation for this change is explained in https://github.com/llvm/llvm-project/issues/72354. Before this change, we could not tell between signed/unsigned minimum/maximum and NaN treatment for floating point values. The mapping of old reduction operations to the new ones is as follows: * `min` --> `minsi` for ints, `minf` for floats * `max` --> `maxsi` for ints, `maxf` for floats New reduction kinds not represented in the old enum: `minui`, `maxui`, `minimumf`, `maximumf`. As a next step, I would like to have a common definition of combining kinds used by the `vector` and `gpu` dialects. Separately, the GPU to SPIR-V lowering does not yet properly handle zero and NaN values -- the behavior of floating point min/max group reductions is not specified by the SPIR-V spec, see https://github.com/llvm/llvm-project/issues/73459. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-11-27 11:19:20 -05:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Guray Ozen	ea84897ba3	[mlir][gpu] Introduce `gpu.dynamic_shared_memory` Op (#71546 ) While the `gpu.launch` Op allows setting the size via the `dynamic_shared_memory_size` argument, accessing the dynamic shared memory is very convoluted. This PR implements the proposed Op, `gpu.dynamic_shared_memory` that aims to simplify the utilization of dynamic shared memory. RFC: https://discourse.llvm.org/t/rfc-simplifying-dynamic-shared-memory-access-in-gpu/ Proposal from RFC This PR `gpu.dynamic.shared.memory` Op to use dynamic shared memory feature efficiently. It is is a powerful feature that enables the allocation of shared memory at runtime with the kernel launch on the host. Afterwards, the memory can be accessed directly from the device. I believe similar story exists for AMDGPU. Current way Using Dynamic Shared Memory with MLIR Let me illustrate the challenges of using dynamic shared memory in MLIR with an example below. The process involves several steps: - memref.global 0-sized array LLVM's NVPTX backend expects - dynamic_shared_memory_size Set the size of dynamic shared memory - memref.get_global Access the global symbol - reinterpret_cast and subview Many OPs for pointer arithmetic ``` // Step 1. Create 0-sized global symbol. Manually set the alignment memref.global "private" @dynamicShmem : memref<0xf16, 3> { alignment = 16 } func.func @main() { // Step 2. Allocate shared memory gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { // Step 3. Access the global object %shmem = memref.get_global @dynamicShmem : memref<0xf16, 3> // Step 4. A sequence of `memref.reinterpret_cast` and `memref.subview` operations. %4 = memref.reinterpret_cast %shmem to offset: [0], sizes: [14, 64, 128], strides: [8192,128,1] : memref<0xf16, 3> to memref<14x64x128xf16,3> %5 = memref.subview %4[7, 0, 0][7, 64, 128][1,1,1] : memref<14x64x128xf16,3> to memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> %6 = memref.subview %5[2, 0, 0][1, 64, 128][1,1,1] : memref<7x64x128xf16, strided<[8192, 128, 1], offset: 57344>, 3> to memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> %7 = memref.subview %6[0, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 73728>, 3> %8 = memref.subview %6[32, 0][64, 64][1,1] : memref<64x128xf16, strided<[128, 1], offset: 73728>, 3> to memref<64x64xf16, strided<[128, 1], offset: 77824>, 3> // Step.5 Use "test.use.shared.memory"(%7) : (memref<64x64xf16, strided<[128, 1], offset: 73728>, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, strided<[128, 1], offset: 77824>, 3>) -> (index) gpu.terminator } ``` Let’s write the program above with that: ``` func.func @main() { gpu.launch blocks(...) threads(...) dynamic_shared_memory_size %c10000 { %i = arith.constant 18 : index // Step 1: Obtain shared memory directly %shmem = gpu.dynamic_shared_memory : memref<?xi8, 3> %c147456 = arith.constant 147456 : index %c155648 = arith.constant 155648 : index %7 = memref.view %shmem[%c147456][] : memref<?xi8, 3> to memref<64x64xf16, 3> %8 = memref.view %shmem[%c155648][] : memref<?xi8, 3> to memref<64x64xf16, 3> // Step 2: Utilize the shared memory "test.use.shared.memory"(%7) : (memref<64x64xf16, 3>) -> (index) "test.use.shared.memory"(%8) : (memref<64x64xf16, 3>) -> (index) } } ``` This PR resolves #72513	2023-11-16 14:42:17 +01:00
Christian Ulmann	02307a1444	[MLIR][GPUToNVVM] Remove typed pointer support (#70861 ) This commit removes the support for lowering GPU to NVVM dialect with typed pointers. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-01 08:40:32 +01:00
Fabian Mora	119c489cc1	Reland [mlir][test][gpu] Migrate CUDA tests to the TargetAttr compilation workflow (llvm#65768) The revert happened due to a build bot failure that threw 'CUDA_ERROR_UNSUPPORTED_PTX_VERSION'. The failure's root cause was a pass using "+ptx76" for compilation and an old CUDA driver on the bot. This commit relands the patch with "+ptx60". Original Gh PR: #65768 Original commit message: Migrate tests referencing `gpu-to-cubin` to the new compilation workflow using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified to use the new compilation workflow to simplify the introduction of future tests. The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't allow for passing all options available in the `ConvertGpuOpsToNVVMOp` pass.	2023-09-09 12:45:21 +00:00
Fabian Mora	2c596ea951	Revert "[mlir][test][gpu] Migrate CUDA tests to the TargetAttr compilation workflow (#65768 ) (#65848 ) This reverts commit d21b67293be15f8a89378e4785d70cc037866406.	2023-09-09 07:14:19 -04:00
Fabian Mora	d21b67293b	[mlir][test][gpu] Migrate CUDA tests to the TargetAttr compilation workflow (#65768 ) Migrate tests referencing `gpu-to-cubin` to the new compilation workflow using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified to use the new compilation workflow to simplify the introduction of future tests. The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't allow for passing all options available in the `ConvertGpuOpsToNVVMOp` pass.	2023-09-09 07:03:38 -04:00
Adrian Kuegel	640b95da80	[mlir][GPU] Lower arith.remf to GPU intrinsic. Differential Revision: https://reviews.llvm.org/D159422	2023-09-04 13:38:45 +02:00
Nicolas Vasilache	888717e853	[mlir][transform] Enable gpu-to-nvvm via conversion patterns driven by TD This revision untangles a few more conversion pieces and allows rewriting the relatively intricate (and somewhat inconsistent) LowerGpuOpsToNVVMOpsPass in a declarative fashion that provides a much better understanding and control. Differential Revision: https://reviews.llvm.org/D157617	2023-08-10 15:30:48 +00:00
Uday Bondhugula	27ccf0f407	[MLIR] Provide bare pointer memref lowering option on gpu-to-nvvm pass Provide the bare pointer memref lowering option on gpu-to-nvvm pass. This is needed whenever we lower memrefs on the host function side and the kernel calls on the host-side (gpu-to-llvm) with the bare ptr convention. The GPU module side of the lowering should also "align" and use the bare pointer convention. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D152480	2023-06-18 22:53:50 +05:30
Markus Böck	0e5aeae6f5	[mlir][GPUToLLVM] Add support for emitting opaque pointers Part of https://discourse.llvm.org/t/rfc-switching-the-llvm-dialect-and-dialect-lowerings-to-opaque-pointers/68179 This patch adds the new pass option `use-opaque-pointers` to the GPU to LLVM lowerings (including ROCD and NVVM) and adapts the code to support using opaque pointers in addition to typed pointers. The required changes mostly boil down to avoiding `getElementType` and specifying base types in GEP and Alloca. In the future opaque pointers will be the only supported model, hence tests have been ported to using opaque pointers by default. Additional regression tests for typed-pointers have been added to avoid breaking existing clients. Note: This does not yet port the `GpuToVulkan` passes. Differential Revision: https://reviews.llvm.org/D144448	2023-02-21 20:46:33 +01:00
Krzysztof Drewniak	499abb243c	Add generic type attribute mapping infrastructure, use it in GpuToX Remapping memory spaces is a function often needed in type conversions, most often when going to LLVM or to/from SPIR-V (a future commit), and it is possible that such remappings may become more common in the future as dialects take advantage of the more generic memory space infrastructure. Currently, memory space remappings are handled by running a special-purpose conversion pass before the main conversion that changes the address space attributes. In this commit, this approach is replaced by adding a notion of type attribute conversions TypeConverter, which is then used to convert memory space attributes. Then, we use this infrastructure throughout the ToLLVM conversions. This has the advantage of loosing the requirements on the inputs to those passes from "all address spaces must be integers" to "all memory spaces must be convertible to integer spaces", a looser requirement that reduces the coupling between portions of MLIR. ON top of that, this change leads to the removal of most of the calls to getMemorySpaceAsInt(), bringing us closer to removing it. (A rework of the SPIR-V conversions to use this new system will be in a folowup commit.) As a note, one long-term motivation for this change is that I would eventually like to add an allocaMemorySpace key to MLIR data layouts and then call getMemRefAddressSpace(allocaMemorySpace) in the relevant ToLLVM in order to ensure all alloca()s, whether incoming or produces during the LLVM lowering, have the correct address space for a given target. I expect that the type attribute conversion system may be useful in other contexts. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D142159	2023-02-09 18:00:46 +00:00
Kazu Hirata	5c9013e266	Use std::optional instead of llvm::Optional (NFC)	2023-01-28 00:45:19 -08:00
Quentin Colombet	cb4ccd38fa	[mlir][Conversion] Rename the MemRefToLLVM pass Since the recent MemRef refactoring that centralizes the lowering of complex MemRef operations outside of the conversion framework, the MemRefToLLVM pass doesn't directly convert these complex operations. Instead, to fully convert the whole MemRef dialect space, MemRefToLLVM needs to run after `expand-strided-metadata`. Make this more obvious by changing the name of the pass and the option associated with it from `convert-memref-to-llvm` to `finalize-memref-to-llvm`. The word "finalize" conveys that this pass needs to run after something else and that something else is documented in its tablegen description. This is a follow-up patch related to the conversation at: https://discourse.llvm.org/t/psa-you-need-to-run-expand-strided-metadata-before-memref-to-llvm-now/66956/14 Differential Revision: https://reviews.llvm.org/D142463	2023-01-27 09:10:10 +00:00
Guray Ozen	a3388f3e2a	[mlir] Introduce a pattern to lower `gpu.subgroup_reduce` to `nvvm.redux_op` This revision introduces a pattern to lower `gpu.subgroup_reduce` op into to the `nvvm.redux_sync` op. The op must be run by the entire subgroup, otherwise it is undefined behaviour. It also adds a flag and populate function, because the op is not avaiable for every gpu (sm80+), so it can be used when it is desired. Depends on D142088 Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D142103	2023-01-20 13:56:23 +01:00
Christopher Bate	60dd937d56	[mlir][gpu] Fix build failure / silence windows build warnings Fixes Windows build failure (C4715) caused by 6ca1a09f03e8e940f306bea73efa935e4ee38173.	2023-01-13 13:40:57 -07:00
Christopher Bate	6ca1a09f03	[mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr) This is a purely mechanical change that introduces an enum attribute in the GPU dialect to represent the various memref memory spaces as opposed to the hard-coded integer attributes that are currently used. The following steps were taken to make the transition across the codebase: 1. Introduce a pass "gpu-lower-memory-space-attributes": The pass updates all memref types that have a memory space attribute that is a `gpu::AddressSpaceAttr`. These attributes are changed to `IntegerAttr`'s using a mapping that is given by the caller. This pass is based on the "map-memref-spirv-storage-class" pass and the common functions can probably be refactored into a set of utilities under the MemRef dialect. 2. Update the verifiers of GPU/NVGPU dialect operations. If a verifier currently checks the address space of an operand using e.g.`getWorkspaceAddressSpace`, then it can continue to do so. However, the checks are changed to only fail if the memory space is either missing or a wrong value of type `gpu::AddressSpaceAttr`. Otherwise, it just assumes the address space is correct because it was specifically lowered to something other than a `gpu::AddressSpaceAttr`. 3. Update existing gpu-to-llvm conversion infrastructure. In the existing gpu-to-X passes, we add a full conversion equivalent to `gpu-lower-memory-space-attributes` just before doing the conversion to the LLVMDialect. This is done because currently both the gpu-to-llvm passes (rocdl,nvvm) run gpu-to-gpu rewrites within the pass, which introduce `AddressSpaceAttr` memory space annotations. Therefore, I inserted the memory space conversion between the gpu-to-gpu rewrites and the LLVM conversion. For more context see the below discourse discussion: https://discourse.llvm.org/t/gpu-workgroup-shared-memory-address-space-is-hard-coded/ Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D140644	2023-01-13 11:00:10 -07:00
Goran Flegar	b7fe0d346b	Lower math.tan to __nv_tan[f] / __ocml_tan_f{32\|64} At present math.tan fails to lower for NVVM and ROCDL. Differential Revision: https://reviews.llvm.org/D141505	2023-01-12 15:26:11 +01:00
Johannes Reifferscheid	059cf735a9	Lower math.cbrt to NVVM/ROCDL. Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D141270	2023-01-09 13:17:35 +01:00

1 2 3 4

198 Commits