llvm-project

Author	SHA1	Message	Date
Ivan Butygin	d8350712b3	[mlir] GPUToROCDL: lower `gpu.subgroup_id` to the intrinsic where possible (#179422 ) Lower `gpu.subgroup_id` to `wave.id` intrinsic on gfx12+, lower to `linearized_thread_id / subgroup_size` on older.	2026-02-04 00:53:07 +03:00
Krzysztof Drewniak	df739ba008	[mlir][gpu] Add address space modifier to gpu.barrier (#177425 ) This is a takeover of PR ##110527 This commit adds an optional list of memory fences to gpu.barrier, allowing users to specify which memory scopes they wish to fence explicitly, while leaving the default semantics (which are equivalent to calling for a global and local fence by analogy to CUDA's __syncthreads) unchanged. The new expanded semantics are implemented for SPIR-V and for the AMDGPU backend. See also https://discourse.llvm.org/t/rfc-add-memory-scope-to-gpu-barrier/81021/2?u=fmarno, where the default behavior of a gpu.barrier was hashed out (though note that the examples based on VMCNT are outdated for AMDGPU in that memory fences can now be annotated with the correct set of address spaces). This commit also deprecates amdgpu.lds_barrier for usecases that don't involve targeting a gfx908. Assisted-by: Cursor/Claude code (tests and extending amdgpu.lds_barrier pattern while copying it over) --------- Co-authored-by: Finlay Marno <finlay.marno@codeplay.com> Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Alan Li <alan.li@me.com>	2026-01-26 12:08:47 -08:00
Ivan Butygin	8b907a3a20	[mlir] GPUToROCDL: repack usupported types when lowering `subgroup_broadcast` (#174206 ) Use the same repacking logic as for shuffle/swizzle.	2026-01-06 23:26:47 +03:00
Adam Paszke	9a93769853	[MLIR] Propagate known cluster sizes from gpu.launch to gpu.func (#174404 ) This lets us properly annotate ranges for gpu.cluster_block_id and gpu.cluster_dim_blocks. It also allows us to fill in the nvvm.cluster_dim attribute for use in the NVVM backend.	2026-01-06 03:49:02 -08:00
Ivan Butygin	f785ca0d72	[mlir][nvgpu] Move memref memspace attributes conversion to single place (#172156 ) Also, some fixes for AMDGPU part for better naming.	2025-12-14 12:44:47 +03:00
Ivan Butygin	c22d82a1d4	[mlir][amdgpu] Move GPU memory spaces conversion to single place (#171876 )	2025-12-11 21:39:57 +03:00
Keshav Vinayak Jha	fbbffc1169	[MLIR][ROCDL] Add math.clampf -> rocdl.fmed3 conversion (#163520 ) Added Pattern for lowering `Math::ClampFOp` to `ROCDL::FMED3`. Also added `chipet` option to `MathToRocdl` pass to check for arch support ISA instructions Solves [#15072](https://github.com/llvm/llvm-project/issues/157052) Reapplies https://github.com/llvm/llvm-project/pull/160100 Un-reverts the merged https://github.com/llvm/llvm-project/pull/163259, and fixes the error. --------- Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>	2025-10-17 08:14:58 -04:00
Fabian Mora	e34b71e351	Revert "[MLIR][ROCDL] Add math.clampf -> rocdl.fmed3 conversion" (#163447 ) Reverts llvm/llvm-project#163259. Reverting due to missing link libraries causing failures in shared build bots.	2025-10-14 16:33:09 -04:00
Keshav Vinayak Jha	1e6df640e2	[MLIR][ROCDL] Add math.clampf -> rocdl.fmed3 conversion (#163259 ) Added Pattern for lowering `Math::ClampFOp` to `ROCDL::FMED3`. Also added `chipset` option to `MathToRocdl` pass to check for arch support ISA instructions Solves [#15072](https://github.com/llvm/llvm-project/issues/157052) Reapplies https://github.com/llvm/llvm-project/pull/160100 --------- Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>	2025-10-14 15:21:26 -05:00
Mehdi Amini	beb6bab87e	[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in LowerGpuOpsToROCDLOps.cpp (NFC)	2025-09-16 08:00:52 -07:00
Pablo Antonio Martinez	dd04668138	[mlir][gpu] Refactor GpuOpsToROCDLOps pass interface (NFC) (#157402 ) This PR deletes the `createLowerGpuOpsToROCDLOpsPass` constructor from the .td file, making the `createConvertGpuOpsToROCDLOps` pass available to users. This has the following effects: 1. `createLowerGpuOpsToROCDLOpsPass` is not available anymore. Instead, `createConvertGpuOpsToROCDLOps` should be used. This makes the interface consistent with ConvertGpuOpsToNVVMOps. 2. To call `createConvertGpuOpsToROCDLOps`, the options must be passed via ConvertGpuOpsToROCDLOpsOptions. This has the side effect of making the `allowed-dialects` option available, which was not accessible via C++ before.	2025-09-10 09:04:34 +02:00
Jakub Kuderski	2b3d3fce73	[mlir][gpu] Revert gpu.subgroup_broadcast with any_lane (#157373 ) This partially reverts https://github.com/llvm/llvm-project/pull/152808. Post-commit comments revealed that the `any_lane` variant hasn't been fully agreed upon at the time of landing.	2025-09-08 00:43:57 +00:00
Ivan Butygin	4880940c84	[mlir][gpu] Add `subgroup_broadcast` op (#152808 ) `subgroup_broadcast` allow to broadcast the value from one lane to all lanes in subgroup. Supported modes: * `first_active_lane` - broadcast value from the first active lane in subgroup. * `specific_lane` - broadcast value from the specified lane, lane index must be within subgroup. * `any_lane` - if `src` value is uniform across all the subgroup lanes return it unchanged, otherwise result is poison. This variant essentially an uniformity hint for the compiler, conveying that specific value is uniform across all subgroup lanes. Dropping `any_lane` broadcast should not change the code semantics.	2025-08-30 09:25:49 +03:00
Tim Gymnich	003cbbd4ca	[mlir][amdgpu] Promote gpu.shuffle to amdgpu.permlane_swap (#154933 ) - promote `gpu.shuffle %src xor {16,32} 64` to `amdgpu.permlane_swap %src {16,32}`	2025-08-24 12:41:09 +02:00
Krzysztof Drewniak	bbe3d64b39	[mlir][ROCDL] Annotate lane ID functions with noundef, ranges (#151396 ) Now that we have general support for setting argument and result attributes on LLVM intrinsics, extend the definitions of mbcnt.lo and mbcnt.hi to carry such attributes. With that, update the construction of the mbcnt.lo/mbcnt.hi calls used to get the lane ID to be `noundef` (since the lane ID is always defined) and to be annotated with the correct ranges (so that generic LLVM passes can correctly optimized based on the fact that there are never more than 32/64 lanes). (Also, handle a pattern that wasn't using getLaneId() and get rid of a dead argument)	2025-08-13 17:44:03 -05:00
Maksim Levental	eaa67a3cf0	[mlir][NFC] update `Conversion` create APIs (5/n) (#149887 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-22 10:40:45 -04:00
Kazu Hirata	fa9adbfda9	[mlir] Remove unused includes (NFC) (#147101 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-04 13:30:21 -07:00
Alexander Richardson	07e2ba445d	[AMDGPU] Set AS8 address width to 48 bits Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419	2025-05-19 17:26:05 -07:00
Ivan Butygin	91f3cdbd4f	[mlir][gpu] Pattern to promote `gpu.shuffle` to specialized AMDGPU ops (#137109 ) Only swizzle promotion for now, may add DPP ops support later.	2025-05-13 13:26:46 +03:00
Krzysztof Drewniak	2880859604	[mlir][ROCDL] Remove unneeded bf16 expansion in LowerGPUToROCDL (#139603 ) The umbrella pass fol lowering GPU ops to ROCDL (aka lowering to LLVM + the AMDGPU-specific setup) would call the arith patterns that manually implemented extf and truncf on bfloat because the LLVM AMDGPU backend used to not suppport those operaitons. Since the backend does now support these operations and has for quite some time, remove these patterns from the default lowering flow.	2025-05-12 16:42:15 -05:00
Stanley Winata	1c8e5e223f	[mlir][gpu] Fix breaking constructor from GPUSubgroupSizeToROCDL (#137439 ) This PR addressed a bug from llvm/llvm-project#137360. which was using GPUSubgroupSizeToROCDL to patterns function that do not have a valid constructor for it. This is causing compilation error below: error: constructor inherited by 'GPUSubgroupSizeOpToROCDL' from base class 'ConvertOpToLLVMPattern<mlir::gpu::SubgroupSizeOp>' is implicitly deleted Signed-off-by: Stanley Winata <stanley.winata@amd.com>	2025-04-25 20:25:06 -07:00
Kazu Hirata	9799746fea	[mlir] Fix a warning This patch fixes: mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp:170:5: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]	2025-04-25 16:21:29 -07:00
Alan Li	6e47937eed	[MLIR][ROCDL] Lower `gpu.subgroup_size` to `wavefrontsize` (#137360 )	2025-04-25 19:21:15 -04:00
Gaurav Verma	60a1f5a8a0	[mlir] added gpu.shuffle mode UP support (#137300 ) Added support for `gpu.shuffle` mode `UP` Signed-off-by: xintin <gaurav.verma@amd.com>	2025-04-25 11:15:37 -07:00
Kazu Hirata	4c17a5c663	[mlir] Fix a warning This patch fixes: mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp:140:10: error: unused variable 'shflType' [-Werror,-Wunused-variable]	2025-04-18 09:59:19 -07:00
Ivan Butygin	46d1cb8335	[mlir] GPUToROCDL: Add support for non-i32/f32 shuffle types (#136320 ) Use recently added repacking utilities to support other datatypes. Also, tighten `gpu.shuffle` verification to reject scalable vectors	2025-04-18 19:53:24 +03:00
Ivan Butygin	d893d129e6	[mlir] GPUToROCDL: Fix crashes with unsupported shuffle datatypes (#135504 ) Calling `getIntOrFloatBitWidth` on non-int/float types (`gpu.shuffle` also accepts vectors) will crash.	2025-04-13 20:26:19 +02:00
Ivan Butygin	aecb764cc2	[mlir][gpu] GPUToROCDL/NVVM: use generic llvm conversion interface instead of hardcoded conversions. (#124439 ) Using `ConvertToLLVMPatternInterface` allows to unhardcode specific dialect conversions from passes and, more importantly, allows downstream projects to inject their ops/types translation here by registering corresponding interface. Add `allowed-dialects` option so user can control which dialects can be used to populate conversions.	2025-02-13 17:53:12 +03:00
Matthias Springer	599c739905	[mlir][GPU] Add NVVM-specific `cf.assert` lowering (#120431 ) This commit add an NVIDIA-specific lowering of `cf.assert` to to `__assertfail`. Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and `getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can be reused.	2025-01-06 12:00:11 +01:00
Ivan Butygin	0e23cb0cc5	[mlir][nfc] GpuToROCDL: Remove some dead code (#121403 )	2024-12-31 20:39:31 +03:00
Ivan Butygin	018b32ca1f	Revert "[mlir][nfc] GpuToROCDL: Remove some dead code" (#121402 ) Reverts llvm/llvm-project#121395	2024-12-31 18:55:00 +03:00
Ivan Butygin	0b08e095cc	[mlir][nfc] GpuToROCDL: Remove some dead code (#121395 )	2024-12-31 18:54:41 +03:00
Jacques Pienaar	09dfc5713d	[mlir] Enable decoupling two kinds of greedy behavior. (#104649 ) The greedy rewriter is used in many different flows and it has a lot of convenience (work list management, debugging actions, tracing, etc). But it combines two kinds of greedy behavior 1) how ops are matched, 2) folding wherever it can. These are independent forms of greedy and leads to inefficiency. E.g., cases where one need to create different phases in lowering and is required to applying patterns in specific order split across different passes. Using the driver one ends up needlessly retrying folding/having multiple rounds of folding attempts, where one final run would have sufficed. Of course folks can locally avoid this behavior by just building their own, but this is also a common requested feature that folks keep on working around locally in suboptimal ways. For downstream users, there should be no behavioral change. Updating from the deprecated should just be a find and replace (e.g., `find ./ -type f -exec sed -i 's\|applyPatternsAndFoldGreedily\|applyPatternsGreedily\|g' {} \;` variety) as the API arguments hasn't changed between the two.	2024-12-20 08:15:48 -08:00
Dragan Mladjenovic	596bfb804b	[MLIR][AMDGPU] Support gpu::ShuffleMode::DOWN lowering in ROCDL (#106237 )	2024-11-20 03:00:05 -06:00
Matthias Springer	206fad0e21	[mlir][NFC] Mark type converter in `populate...` functions as `const` (#111250 ) This commit marks the type converter in `populate...` functions as `const`. This is useful for debugging. Patterns already take a `const` type converter. However, some `populate...` functions do not only add new patterns, but also add additional type conversion rules. That makes it difficult to find the place where a type conversion was added in the code base. With this change, all `populate...` functions that only populate pattern now have a `const` type converter. Programmers can then conclude from the function signature that these functions do not register any new type conversion rules. Also some minor cleanups around the 1:N dialect conversion infrastructure, which did not always pass the type converter as a `const` object internally.	2024-10-05 21:32:40 +02:00
Daniel Hernandez-Juarez	1c47fa9b62	[mlir][AMDGPU] Add support for AMD f16 math library calls (#108809 ) In this PR we add support for AMD f16 math library calls (`__ocml_*_f16`) CC: @krzysz00 @manupak	2024-09-23 12:52:00 -05:00
Nirvedh Meshram	a16164d0c2	[MLIR][ROCDL] Add dynamically legal ops to LowerGpuOpsToROCDLOpsPass (#108302 ) Similar to https://github.com/llvm/llvm-project/pull/108266 After https://github.com/llvm/llvm-project/pull/102971 It is legal to generate `LLVM::ExpOp` and `LLVM::LogOp` if the type is is a float16 or float32	2024-09-12 11:20:27 -05:00
Nirvedh Meshram	c31d343857	Update legalizations for LowerGpuOpsToROCDLOps (#108266 ) LLVM::FAbsOp and LLVM::SqrtOp are legal after https://github.com/llvm/llvm-project/pull/102971	2024-09-11 15:02:38 -05:00
Matthias Springer	7030280329	[mlir][GPU] Improve `gpu.module` op implementation (#102866 ) - Replace hand-written parser/printer with auto-generated assembly format. - Remove implicit `gpu.module_end` terminator and use the `NoTerminator` trait instead. (Same as `builtin.module`.) - Turn the region into a graph region. (Same as `builtin.module`.)	2024-08-13 09:37:36 +02:00
Victor Perez	d45de8003a	[MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (#101664 ) Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conventions used for kernels/functions. - `workgroup` attributions encoded as additional `llvm.ptr<3>` arguments. - No attribute used to annotate kernels - `reqd_work_group_size` attribute using to encode `gpu.known_block_size`. - `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution sizes. This will be attached to the pointer argument workgroup attributions lower to. Note: A notable missing feature that will be addressed in a follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>	2024-08-09 16:09:11 +02:00
Jan Leyonberg	3fae5551de	[MLIR][ROCDL] Refactor conversion of math operations to ROCDL calls to a separate pass (#98653 ) This patch refactors the conversion of math operations to ROCDL library calls. This pass will also be used in flang to lower Fortran intrinsics/math functions for OpenMP target offloading codgen.	2024-07-17 09:33:04 -04:00
Krzysztof Drewniak	43fd4c49bd	[mlir][GPU] Improve handling of GPU bounds (#95166 ) This change reworks how range information for GPU dispatch IDs (block IDs, thread IDs, and so on) is handled. 1. `known_block_size` and `known_grid_size` become inherent attributes of GPU functions. This makes them less clunky to work with. As a consequence, the `gpu.func` lowering patterns now only look at the inherent attributes when setting target-specific attributes on the `llvm.func` that they lower to. 2. At the same time, `gpu.known_block_size` and `gpu.known_grid_size` are made official dialect-level discardable attributes which can be placed on arbitrary functions. This allows for progressive lowerings (without this, a lowering for `gpu.thread_id` couldn't know about the bounds if it had already been moved from a `gpu.func` to an `llvm.func`) and allows for range information to be provided even when `gpu._{id,dim}` are being used outside of a `gpu.func` context. 3. All of these index operations have gained an optional `upper_bound` attribute, allowing for an alternate mode of operation where the bounds are specified locally and not inherited from the operation's context. These also allow handling of cases where the precise launch sizes aren't known, but can be bounded more precisely than the maximum of what any platform's API allows. (I'd like to thank @benvanik for pointing out that this could be useful.) When inferring bounds (either for range inference or for setting `range` during lowering) these sources of information are consulted in order of specificity (`upper_bound` > inherent attribute > discardable attribute, except that dimension sizes check for `known__bounds` to see if they can be constant-folded before checking their `upper_bound`). This patch also updates the documentation about the bounds and inference behavior to clarify what these attributes do when set and the consequences of setting them up incorrectly. --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2024-06-17 23:47:38 -05:00
stefankoncarevic	94be801879	[mlir][ROCDL] Update the LLVM data layout for ROCDL lowering. (#92127 ) This change updates the dataLayout string to ensure alignment with the latest LLVM TargetMachine configuration. The aim is to maintain consistency and prevent potential compilation issues related to memory address space handling.	2024-05-28 10:17:02 -05:00
Krzysztof Drewniak	4cba5957e6	[mlir][ROCDL] Set the LLVM data layout when lowering to ROCDL LLVM (#74501 ) In order to ensure operations lower correctly (especially memref.addrspacecast, which relies on the data layout benig set correctly then dealing with dynamic memrefs) and to prevent compilation issues later down the line, set the `llvm.data_layout` attribute on GPU modules when lowering their contents to a ROCDL / AMDGPU target. If there's a good way to test the embedded string to prevent it from going out of sync with the LLVM TargetMachine, I'd appreciate hearing about it. (Or, alternatively, if there's a place I could farctor the string out to).	2024-02-27 09:59:50 -06:00
Mehdi Amini	45c226d452	[MLIR] Add ODS support for generating helpers for dialect (discardable) attributes (#77024 ) This is a new ODS feature that allows dialects to define a list of key/value pair representing an attribute type and a name. This will generate helper classes on the dialect to be able to manage discardable attributes on operations in a type safe way. For example the `test` dialect can define: ``` let discardableAttrs = (ins "mlir::IntegerAttr":$discardable_attr_key, ); ``` And the following will be generated in the TestDialect class: ``` /// Helper to manage the discardable attribute `discardable_attr_key`. class DiscardableAttrKeyAttrHelper { ::mlir::StringAttr name; public: static constexpr ::llvm::StringLiteral getNameStr() { return "test.discardable_attr_key"; } constexpr ::mlir::StringAttr getName() { return name; } DiscardableAttrKeyAttrHelper(::mlir::MLIRContext ctx) : name(::mlir::StringAttr::get(ctx, getNameStr())) {} mlir::IntegerAttr getAttr(::mlir::Operation op) { return op->getAttrOfType<mlir::IntegerAttr>(name); } void setAttr(::mlir::Operation op, mlir::IntegerAttr val) { op->setAttr(name, val); } bool isAttrPresent(::mlir::Operation op) { return op->hasAttrOfType<mlir::IntegerAttr>(name); } void removeAttr(::mlir::Operation *op) { assert(op->hasAttrOfType<mlir::IntegerAttr>(name)); op->removeAttr(name); } }; DiscardableAttrKeyAttrHelper getDiscardableAttrKeyAttrHelper() { return discardableAttrKeyAttrName; } ``` User code having an instance of the TestDialect can then manipulate this attribute on operation using: ``` auto helper = testDialect.getDiscardableAttrKeyAttrHelper(); helper.setAttr(op, value); helper.isAttrPresent(op); ... ```	2024-02-19 23:30:03 -08:00
Hugo Trachino	65066c0277	[mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has no effect (NFC) (#80129 ) This aims to clean-up confusing uses of builder.createOrFold<ConstantOp> since folding of constants fails.	2024-01-31 23:40:37 -08:00
Guray Ozen	391a7577e7	[mlir][gpu] Add lowering dynamic_shared_memory op for rocdl (#74473 ) This PR adds lowering of `gpu.dynamic_shared_memory` to rocdl target.	2023-12-05 19:56:43 +01:00
Christian Ulmann	4279a642fb	[MLIR][GPUToROCDL] Remove typed pointer support (#70908 ) This commit removes the support for lowering GPU to ROCDL dialect with typed pointers. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-01 10:13:06 +01:00
Adrian Kuegel	baf2d13519	[mlir][GPUToROCDL] Lower arith.remf to GPU intrinsic. Differential Revision: https://reviews.llvm.org/D159423	2023-09-04 14:05:04 +02:00
Stanley Winata	1896096002	[mlir][ROCM] Add Wave/Warp shuffle lowering and op for ROCM. Reduction is heavily used for many DL workload especially with softmax/Attention layers. Wave/Warp shuffle and reduction is known to be a speedy/efficient way to do these reductions. In this patch we introduce AMD shuffle intrinsic Ops to ROCDL, along with it's corresponding lowering from gpu.shuffle. This should speed up a lot of DL workloads on ROCM backend. Currently, we have support for xor and idx, which are the more common ones. In the future, we plan on adding support for Down and Up, as well as using the ds_swizzle to further enhance it's performance when width and offsets are constant. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D158684	2023-08-24 17:35:34 -07:00

1 2 3

137 Commits