llvm-project

Author	SHA1	Message	Date
Jakub Kuderski	59e44799bd	[mlir] Fix new clang-tidy warning llvm-type-switch-case-types. NFC. (#178487 ) Pre-commiting this before landing the new check in https://github.com/llvm/llvm-project/pull/177892	2026-01-28 19:13:47 +00:00
Krzysztof Drewniak	3446ff1e67	[mlir] Update all-reduce (& vector tests) to use workgroup barriers (#178285 ) This commit updates the lowering of all-reduce operations to annotate the generated barriers with `memfence [#gpu.address_space<workgroup>]` so that these barriers do not force unrelated global memory operations to complete. It similarly sets up the warp synchronization function in the vectory distribuhte tests, since they also only read/write shared memory. In additon, this commit adds convenience builders for gpu.barrier, which will allow it to either fence on a given address space or on the address space of a provided memref.	2026-01-27 13:09:18 -08:00
Krzysztof Drewniak	df739ba008	[mlir][gpu] Add address space modifier to gpu.barrier (#177425 ) This is a takeover of PR ##110527 This commit adds an optional list of memory fences to gpu.barrier, allowing users to specify which memory scopes they wish to fence explicitly, while leaving the default semantics (which are equivalent to calling for a global and local fence by analogy to CUDA's __syncthreads) unchanged. The new expanded semantics are implemented for SPIR-V and for the AMDGPU backend. See also https://discourse.llvm.org/t/rfc-add-memory-scope-to-gpu-barrier/81021/2?u=fmarno, where the default behavior of a gpu.barrier was hashed out (though note that the examples based on VMCNT are outdated for AMDGPU in that memory fences can now be annotated with the correct set of address spaces). This commit also deprecates amdgpu.lds_barrier for usecases that don't involve targeting a gfx908. Assisted-by: Cursor/Claude code (tests and extending amdgpu.lds_barrier pattern while copying it over) --------- Co-authored-by: Finlay Marno <finlay.marno@codeplay.com> Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Alan Li <alan.li@me.com>	2026-01-26 12:08:47 -08:00
foxtran	61c162169c	[MLIR] Fix GCC's `-Wreturn-type` warnings (#177654 ) This patch fixes `-Wreturn-type` warnings which happens if MLIR is built with GCC compiler (11.5 is used for detecting) Founded errors ``` build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp: In function ‘MlirGreedyRewriteStrictness mlirGreedyRewriteDriverConfigGetStrictness(MlirGreedyRewriteDriverConfig)’: build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp:399:1: warning: control reaches end of non-void function [-Wreturn-type] 399 \| } \| ^ build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp: In function ‘MlirGreedySimplifyRegionLevel mlirGreedyRewriteDriverConfigGetRegionSimplificationLevel(MlirGreedyRewriteDriverConfig)’: build/llvm-llvmorg-21.1.8/mlir/lib/CAPI/Transforms/Rewrite.cpp:414:1: warning: control reaches end of non-void function [-Wreturn-type] 414 \| } \| ^ build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp: In member function ‘mlir::Speculation::Speculatability mlir::gpu::SubgroupBroadcastOp::getSpeculatability()’: build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp:2522:1: warning: control reaches end of non-void function [-Wreturn-type] 2522 \| } \| ^ build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp: In member function ‘llvm::LogicalResult mlir::gpu::SubgroupBroadcastOp::verify()’: build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp:2537:1: warning: control reaches end of non-void function [-Wreturn-type] 2537 \| } \| ^ build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractToNeonPatterns.cpp: In member function ‘mlir::Value {anonymous}::VectorContractRewriter::createMMLA(mlir::PatternRewriter&, mlir::Location, mlir::Value, mlir::Value, mlir::Value)’: build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/ArmNeon/Transforms/LowerContractToNeonPatterns.cpp:153:3: warning: control reaches end of non-void function [-Wreturn-type] 153 \| } \| ^ build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp: In function ‘std::pair<long int, long int> mlir::linalg::getFmrFromWinogradConv2DFmr(mlir::linalg::WinogradConv2DFmr)’: build/llvm-llvmorg-21.1.8/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp:3776:1: warning: control reaches end of non-void function [-Wreturn-type] 3776 \| } \| ^ build/llvm-llvmorg-21.1.8/mlir/test/lib/Dialect/Test/TestOpDefs.cpp: In function ‘llvm::StringLiteral getVisibilityString(mlir::SymbolTable::Visibility)’: build/llvm-llvmorg-21.1.8/mlir/test/lib/Dialect/Test/TestOpDefs.cpp:37:1: warning: control reaches end of non-void function [-Wreturn-type] 37 \| } \| ^ ```	2026-01-25 16:29:37 +01:00
Akimasa Watanuki	ce2a5919cd	[mlir][gpu] Enforce async keyword when parsing gpu.launch with results (#176570 ) The `gpu.launch` parser attempts to add a null `asyncTokenType` to the results list if a result is requested but the `async` keyword is missing, leading to an assertion failure. Explicitly verify that `asyncTokenType` is valid when `parser.getNumResults() > 0`. Emit a diagnostic error if the `async` keyword is missing instead of crashing. Add a regression test to `mlir/test/Dialect/GPU/invalid.mlir`. Fix: https://github.com/llvm/llvm-project/issues/176530	2026-01-23 06:58:04 +09:00
Matthias Springer	cc98eb0380	[mlir] Fix build after #175815 (#176332 ) Fix this build error, which is reported by some compilers after #175815: ``` error: operands to ?: have different types ‘mlir::Operation::result_range {aka mlir::ResultRange}’ and ‘mlir::ValueRange’ return successor.isParent() ? getOperation()->getResults() : ValueRange(); ```	2026-01-16 11:02:12 +01:00
Matthias Springer	f76433761a	[mlir][Interfaces] Split successor inputs from region successor (#175815 ) This commit simplifies the design of the `RegionBranchOpInterface`. The property of being a successor input is now independent of the region branch point. There is a new API for querying successor inputs: `RegionBranchOpInterface::getSuccessorInputs(RegionSuccessor)`. Note that this function does not take a `RegionBranchPoint` as parameter. The `RegionSuccessor` API is now also simpler: it no longer stores successor inputs. A region successor is simply `Region *`, wrapped around a convenience API. Note: This commit is mostly mechanical. Analyses / transformations that build on top of the `RegionBranchOpInterface` (e.g., `visitNonControlFlowArguments` API) can likely be simplified in follow-up commits. Note for LLVM integration: Split `RegionBranchOpInterface::getSuccessorRegion` implementations into two functions: `getSuccessorRegion` and `getSuccessorInputs. (There are many examples in this commit.) RFC: https://discourse.llvm.org/t/rfc-simplify-regionbranchopinterface-separate-successor-inputs-from-region-successor/89420/7	2026-01-16 10:16:53 +01:00
Matthias Springer	5f3b40ec7a	[mlir][Interfaces][NFC] Simplify and align `RegionSuccessor` design / API (#174945 ) Simplify the design of `RegionSuccessor`. There is no need to store the `Operation ` pointer when branching out of the region branch op (to the parent). There is no API to even access the `Operation ` pointer. Add a new helper function `RegionSuccessor::parent` to construct a region successor that points to the parent. This aligns the `RegionSuccessor` design and API with `RegionBranchPoint`: * Both classes now have a `parent()` helper function. `ClassName::parent()` can be used in documentation to precisely describe the source/target of a region branch. * Both classes now use `nullptr` internally to represent "parent". This API change also protects against incorrect API usage: users can no longer pass an incorrect parent op. If a region successor is not a region of the region branch op, it must branch out of region branch op itself ("parent"). However, the previous API allowed passing other operations. There was one such API violation in a [test case](https://github.com/llvm/llvm-project/pull/174945/files#diff-d5717e4a8d7344b2ff77762b8fa480bcfec0eeee97a86195c787d791a6217e13L71). Also clean up the documentation to use the correct terminology (such as "successor operands", "successor inputs") consistently. Note: This PR effectively rolls back some changes from #161575. That PR introduced `llvm::PointerUnion<Region , Operation > successor{nullptr};`. It is unclear from the commit message why that change was made. Note for LLVM integration: You may have to slightly modify `getSuccessorRegion` implementations: Replace `RegionSuccessor(getOperation(), getOperation()->getResults())` with `RegionSuccessor::parent(getResults())`.	2026-01-14 10:57:22 +01:00
Nick Kreeger	e289b2e765	[mlir][Utils] Add VerificationUtils (NFC) (#174336 ) Introduces `VerificationUtils` to consolidate common operation verification patterns in MLIR. This initial implementation provides `verifyDynamicDimensionCount()` to reduce code duplication across dialect verifiers. This is an NFC (No Functional Change) refactoring that improves code maintainability by extracting reusable verification logic into a shared utility.	2026-01-12 16:52:44 +00:00
Ivan Butygin	ac62f12192	[mlir][amdgpu] Remove redundant barriers (#175436 )	2026-01-12 14:47:58 +03:00
Adam Paszke	9a93769853	[MLIR] Propagate known cluster sizes from gpu.launch to gpu.func (#174404 ) This lets us properly annotate ranges for gpu.cluster_block_id and gpu.cluster_dim_blocks. It also allows us to fill in the nvvm.cluster_dim attribute for use in the NVVM backend.	2026-01-06 03:49:02 -08:00
Ivan Butygin	62709a1111	[mlir][gpu] Fold `subgroup_broadcast(subgroup_broadcast(%val))` chains (#174159 )	2026-01-02 01:44:47 +03:00
Giacomo Castiglioni	d3edc94d11	[MLIR][GPU] subgroup_mma fp64 extension - take 2 (#169061 ) This PR re-lands #165873. This PR extends the gpu.subgroup_mma_* ops to support fp64 type. The extension requires special handling during the lowering to nvvm due to the return type for load ops for fragment a and b (they return a scalar instead of a struct). The original PR did not guard the new test based on the required architecture (sm80) which lead to a failure on the cuda runners with T4 GPUs.	2025-12-01 07:39:59 -05:00
Fabian Mora	8c3f59f1b2	Revert "[MLIR][GPU] subgroup_mma fp64 extension" (#169049 ) Reverts llvm/llvm-project#165873 The revert is triggered by a failing integration test on a couple of buildbots.	2025-11-21 10:02:59 -05:00
Giacomo Castiglioni	49995b2af0	[MLIR][GPU] subgroup_mma fp64 extension (#165873 ) This PR extends the `gpu.subgroup_mma_*` ops to support fp64 type. The extension requires special handling during the lowering to `nvvm` due to the return type for load ops for fragment a and b (they return a scalar instead of a struct).	2025-11-21 09:07:43 -05:00
Mehdi Amini	41f65666f6	[MLIR] Revamp RegionBranchOpInterface (#165429 ) This is still somehow a WIP, we have some issues with this interface that are not trivial to solve. This patch tries to make the concepts of RegionBranchPoint and RegionSuccessor more robust and aligned with their definition: - A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`) op or a `RegionBranchTerminatorOpInterface` operation in a nested region. - A `RegionSuccessor` is either one of the nested region or the parent `RegionBranchOpInterface` Some new methods with reasonnable default implementation are added to help resolving the flow of values across the RegionBranchOpInterface. It is still not trivial in the current state to walk the def-use chain backward with this interface. For example when you have the 3rd block argument in the entry block of a for-loop, finding the matching operands requires to know about the hidden loop iterator block argument and where the iterargs start. The API is designed around forward-tracking of the chain unfortunately. Try to reland #161575 ; I suspect a buildbot incremental build issue.	2025-10-28 09:53:56 -07:00
Mehdi Amini	e3c547179f	Revert " [MLIR] Revamp RegionBranchOpInterface " (#165356 ) Reverts llvm/llvm-project#161575 Broke Windows on ARM buildbot build, needs investigations.	2025-10-28 01:06:14 -07:00
Mehdi Amini	ab1fd21b54	[MLIR] Revamp RegionBranchOpInterface (#161575 ) This is still somehow a WIP, we have some issues with this interface that are not trivial to solve. This patch tries to make the concepts of RegionBranchPoint and RegionSuccessor more robust and aligned with their definition: - A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`) op or a `RegionBranchTerminatorOpInterface` operation in a nested region. - A `RegionSuccessor` is either one of the nested region or the parent `RegionBranchOpInterface` Some new methods with reasonnable default implementation are added to help resolving the flow of values across the RegionBranchOpInterface. It is still not trivial in the current state to walk the def-use chain backward with this interface. For example when you have the 3rd block argument in the entry block of a for-loop, finding the matching operands requires to know about the hidden loop iterator block argument and where the iterargs start. The API is designed around forward-tracking of the chain unfortunately.	2025-10-28 07:47:26 +00:00
Jakub Kuderski	0820266651	[mlir] Use llvm accumulate wrappers. NFCI. (#162957 ) Use wrappers around `std::accumulate` to make the code more concise and less bug-prone: https://github.com/llvm/llvm-project/pull/162129. With `std::accumulate`, it's the initial value that determines the accumulator type. `llvm::sum_of` and `llvm::product_of` pick the right accumulator type based on the range element type. Found some funny bugs like a local accumulate helper that calculated a sum with initial value of 1 -- we didn't hit the bug because the code was actually dead...	2025-10-11 11:33:18 -04:00
Jakub Kuderski	8bab6c4e8c	[mlir] Simplify unreachable type switch cases. NFC. (#162032 ) Use `DefaultUnreachable` from https://github.com/llvm/llvm-project/pull/161970.	2025-10-06 09:23:25 -04:00
Mehdi Amini	ecea2b542b	[MLIR] Fix gpu.launch attribution argument printing (#161408 ) This was broken and never tested. Not only this could crash for stack-use-after-scope, but it also would have printed something like: ``` value <block argument> of type 'memref<7x8xf64, #gpu.address_space<workgroup>>' at index: 12 ``` insted of the SSA value. It turns out the gpu.func already have a very similar helper that we can reuse here. Fixes #161394	2025-09-30 19:49:22 +02:00
Jakub Kuderski	2b3d3fce73	[mlir][gpu] Revert gpu.subgroup_broadcast with any_lane (#157373 ) This partially reverts https://github.com/llvm/llvm-project/pull/152808. Post-commit comments revealed that the `any_lane` variant hasn't been fully agreed upon at the time of landing.	2025-09-08 00:43:57 +00:00
Ivan Butygin	4880940c84	[mlir][gpu] Add `subgroup_broadcast` op (#152808 ) `subgroup_broadcast` allow to broadcast the value from one lane to all lanes in subgroup. Supported modes: * `first_active_lane` - broadcast value from the first active lane in subgroup. * `specific_lane` - broadcast value from the specified lane, lane index must be within subgroup. * `any_lane` - if `src` value is uniform across all the subgroup lanes return it unchanged, otherwise result is poison. This variant essentially an uniformity hint for the compiler, conveying that specific value is uniform across all subgroup lanes. Dropping `any_lane` broadcast should not change the code semantics.	2025-08-30 09:25:49 +03:00
Adam Siemieniuk	533ddcd989	[mlir][gpu] Warp execute terminator getter (#154729 ) Adds a utility getter to `warp_execute_on_lane_0` which simplifies access to the op's terminator. Uses are refactored to utilize the new terminator getter.	2025-08-22 18:24:23 +02:00
Longsheng Mou	7d886fab74	[mlir][gpu] Update attribute definitions in `gpu::LaunchOp` (#152106 ) `gpu::LaunchOp` is updated the following way: - Change the attribute type of kernel function and module from `SymbolRefAttr` to `FlatSymbolRefAttr` to avoid nested symbol references. - Rename variables from camel case (kernelFunc, kernelModule) to lower case (function, module) and update the syntax. - `LaunchOp::build` support passing `module` and `function` attributes.	2025-08-08 11:43:21 +08:00
Hsiangkai Wang	0d21522c00	[mlir][gpu] Make offset and width in gpu.rotate as attributes (#150901 ) `offset` and `width` must be constants and there are constraints on their values. Update the operation definition to use attributes instead of operands.	2025-07-29 09:02:42 +01:00
Maksim Levental	dce6679cf5	[mlir][NFC] update `mlir/Dialect` create APIs (16/n) (#149922 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-21 19:57:30 -04:00
Nicolas Vasilache	2b28d10022	[mlir][SCF][GPU] Add DeviceMaskingAttrInterface (#146943 ) This revision adds DeviceMaskingAttrInterface and extends DeviceMappingArrayAttr to accept a union of DeviceMappingAttrInterface and DeviceMaskingAttrInterface. Support is added to GPUTransformOps to take advantage of this information and lower to block/warpgroup/warp/thread specialization when mapped to linear ids. The revision also connects to scf::ForallOp and uses the new attribute to implement warp specialization. The implementation is in the form of a GPUMappingMaskAttr, which can be additionally passed to the scf.forall.mapping attribute to specify a mask on compute resources that should be active. In the first implementation the masking is a bitfield that specifies for each processing unit whether it is active or not. In the future, we may want to implement this as a symbol to refer to dynamically defined values. Extending op semantics with an operand is deemed too intrusive at this time. --------- Co-authored-by: Oleksandr "Alex" Zinenko <git@ozinenko.com>	2025-07-07 18:06:41 +02:00
Nicolas Vasilache	0a62836969	[mlir][gpu][transforms] Add support for mapping to lanes (#146912 ) This revision adds a new attribute for mapping `scf.forall` to linear lane ids. Example: ``` // %arg2 and %arg3 map to lanes [0, 6) and are turned into epxressions // involving threadIdx.x/y by the map_nested_forall_to_threads // transformation. This results in a if (linear_thread_id < 6) conditional. scf.forall (%arg2, %arg3) in (2, 3) { ... } {mapping = [#gpu.lane<linear_dim_0>, #gpu.lane<linear_dim_1>]} ``` --------- Co-authored-by: Oleksandr "Alex" Zinenko <git@ozinenko.com>	2025-07-07 15:14:52 +02:00
Kazu Hirata	ed0ee3a419	[mlir] Use llvm::fill (NFC) (#147100 ) We can pass a range to llvm::fill.	2025-07-04 13:30:14 -07:00
Hsiangkai Wang	f581ef5b66	[mlir][gpu] Add gpu.rotate operation (#142796 ) Add gpu.rotate operation and a pattern to convert gpu.rotate to SPIR-V OpGroupNonUniformRotateKHR.	2025-07-01 11:32:25 +01:00
Kazu Hirata	63f30d7d82	[mlir] Migrate away from {TypeRange,ValueRange}(std::nullopt) (NFC) (#145445 ) ArrayRef has a constructor that accepts std::nullopt. This constructor dates back to the days when we still had llvm::Optional. Since the use of std::nullopt outside the context of std::optional is kind of abuse and not intuitive to new comers, I would like to move away from the constructor and eventually remove it. This patch migrates away from TypeRagne(std::nullopt) and ValueRange(std::nullopt).	2025-06-24 07:03:59 -07:00
Srinivasa Ravi	9a553d3766	[MLIR][NVVM] Add NVVMRequiresSM op traits (#126886 ) Motivation: Currently, the NVVMOps are not verified against the supported SM architectures. This can manifest as an ISel failure in the NVPTX LLVM backend during CodeGen to PTX ISA. This PR addresses this issue by adding verifier checks for Target-SM architectures in the NVVM Dialect itself, thereby catching the errors early on. Summary: * Parametric traits named `NVVMRequiresSM` and `NVVMRequiresSMa` are added to facilitate the version checks for typical and arch-accelerated versions respectively. * These traits can be attached to any NVVM Op to enable the checks for the particular Op. (example shown below) * An attribute interface called named `TargetAttrVerifyInterface` is added to the GPU dialect which any target attribute seeking to perform target-verification on the module can implement. * The checks are performed by the `NVVMTargetAttr` (implementing the `TargetAttrVerifyInterface` interface) when called from the GPU module verifier where it walks through the module and performs the checks for Ops with the `NVVMRequiresSM` traits. * A few Ops in `NVVMOps.td` have been updated to serve as examples. Example Usage: ``` def NVVM_ReduxOp : NVVM_Op<"redux.sync"> {...} ----> def NVVM_ReduxOp : NVVM_Op<"redux.sync", [NVVMRequiresSM<80>]> {...} def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned"> {...} ----> def NVVM_WgmmaFenceAlignedOp : NVVM_Op<"wgmma.fence.aligned", [NVVMRequiresSMa<[90]>]> {...} ``` --------- Co-authored-by: Guray Ozen <guray.ozen@gmail.com>	2025-05-21 08:53:00 +05:30
Kazu Hirata	15f7c6ed70	[mlir] Remove unused local variables (NFC) (#138481 )	2025-05-05 10:08:00 -07:00
Jakub Kuderski	c62afbfeda	[mlir][linalg][gpu] Clean up printing. NFC. (#136330 ) * Use `llvm::interleaved` from #135517 to simplify printing * Avoid needless vector allocations	2025-04-18 15:05:27 -04:00
Jakub Kuderski	4be84a142e	[mlir][gpu] Clean up prints in GPU dialect. NFC. (#136250 ) Clean up printing code by switching to `llvm::interleaved` from https://github.com/llvm/llvm-project/pull/135517. Also make some minor readability & performance fixes.	2025-04-18 11:10:17 -04:00
Zichen Lu	360630b567	[mlir][GPUDialect] Add cmdOption suffix consumer in GpuModuleToBinary Pass (#127646 ) Add cmdOption suffix consumer function in GpuModuleToBinary Pass, which can tokenize and remove a specific suffix of cmdOption.	2025-02-18 19:02:23 +01:00
Guray Ozen	837b89fc0f	[MLIR][NVVM] Add `ptxas-cmd-options` to pass flags to the downstream compiler (#127457 ) This PR adds `cmd-options` to the `gpu-lower-to-nvvm-pipeline` pipeline and the `nvvm-attach-target` pass, allowing users to pass flags to the downstream compiler, ptxas. Example: ``` mlir-opt -gpu-lower-to-nvvm-pipeline="cubin-chip=sm_80 ptxas-cmd-options='-v --register-usage-level=8'" ```	2025-02-17 12:09:27 +01:00
jeanPerier	327d627066	[mlir] share argument attributes interface between calls and callables (#123176 ) This patch shares core interface methods dealing with argument and result attributes from CallableOpInterface with the CallOpInterface and makes them mandatory to gives more consistent guarantees about concrete operations using these interfaces. This allows adding argument attributes on call like operations, which is sometimes required to get proper ABI, like with llvm.call (and llvm.invoke). The patch adds optional `arg_attrs` and `res_attrs` attributes to operations using these interfaces that did not have that already. They can then re-use the common "rich function signature" printing/parsing helpers if they want (for the LLVM dialect, this is done in the next patch). Part of RFC: https://discourse.llvm.org/t/mlir-rfc-adding-argument-and-result-attributes-to-llvm-call/84107	2025-02-03 11:27:14 +01:00
Matthias Springer	6aaa8f25b6	[mlir][IR][NFC] Move free-standing functions to `MemRefType` (#123465 ) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.	2025-01-21 08:48:09 +01:00
Krzysztof Drewniak	0aa831e0ed	[mlir][GPU] Implement ValueBoundsOpInterface for GPU ID operations (#122190 ) The GPU ID operations already implement InferIntRangeInterface, which gives constant lower and upper bounds on those IDs when appropriate metadata is prentent on the operations or in the surrounding context. This commit uses that existing code to implement the ValueBoundsOpInterface, which is used when analyzing affine operations (unlike the integer range interface, which is used for arithmetic optimization). It also implements the interface for gpu.launch, where we can use it to express the constraint that block/grid sizes are equal to their value from outside the launch op and that the corresponding IDs are bounded above by that size. As a consequence, the test pass for this inference is updated to work on a FunctionOpInterface and not a func.func, creating minor churn in other tests.	2025-01-09 11:42:22 -08:00
Timothy Hoffman	fbbbd65b25	[MLIR] correct return type of parse() functions (#120180 ) The `parseX()` functions that are defined to support `custom<X>` in `assemblyFormat` should return `ParseResult` rather than `LogicalResult`. The `ParseResult` type is necessary due to tablegen generating code that expects this type within an Op `parseX()` function.	2024-12-17 09:06:55 -08:00
Mehdi Amini	72e8b9aeaa	[MLIR] Add a BlobAttr interface for attribute to wrap arbitrary content and use it as linkLibs for ModuleToObject (#120116 ) This change allows to expose through an interface attributes wrapping content as external resources, and the usage inside the ModuleToObject show how we will be able to provide runtime libraries without relying on the filesystem.	2024-12-17 01:30:56 +01:00
Renaud Kauffmann	9919295cfd	[mlir][gpu] Adding ELF section option to the gpu-module-to-binary pass (#119440 ) This is a follow-up of #117246. I thought then it would be easy to edit a DictionaryAttr but it turns out that these attributes are immutable and need to be passed during the construction of the gpu.binary Op. The first commit was using the NVVMTargetAttr to pass the information. After feedback from @fabianmcg, this PR now passes the information through a new option of the gpu-module-to-binary pass. Please add reviewers, as you see fit.	2024-12-16 09:09:41 -08:00
Mehdi Amini	a9b399aeef	[MLIR][GPU] Fix memref.dim folding with out-of-bound index (#118890 ) Fixes #118760	2024-12-05 16:36:33 -08:00
Petr Kurapov	ecaf2c335c	[MLIR] Move warp_execute_on_lane_0 from vector to gpu (#116994 ) Please see the related RFC here: https://discourse.llvm.org/t/rfc-move-execute-on-lane-0-from-vector-to-gpu-dialect/82989. This patch does exactly one thing - moves the op to gpu.	2024-11-22 15:30:47 +01:00
Zichen Lu	08e7609692	[mlir][fix] Add callback functions for ModuleToObject (#116916 ) Here is the [merged MR](https://github.com/llvm/llvm-project/pull/116007) which caused a failure and [was reverted](https://github.com/llvm/llvm-project/pull/116811). Thanks to @joker-eph for the help, I fix it (miss constructing `ModuleObject` with callback functions in `mlir/lib/Target/LLVM/NVVM/Target.cpp`) and split unit tests from origin test which don't need `ptxas` to make the test runs more widely.	2024-11-20 13:22:08 +01:00
Mehdi Amini	af41c55673	Revert "[MLIR] Add callback functions for ModuleToObject" (#116811 ) Reverts llvm/llvm-project#116007 Bot is broken.	2024-11-19 15:28:17 +01:00
Zichen Lu	2153672ba3	[MLIR] Add callback functions for ModuleToObject (#116007 ) In ModuleToObject flow, users may want to add some callback functions invoked with LLVM IR/ISA for debugging or other purposes.	2024-11-19 13:51:08 +01:00
Andrzej Warzyński	bfde17834d	[mlir] Update the return type of `getNum{Dynamic\|Scalable}Dims` (#110472 ) Updates the return type of `getNumDynamicDims` and `getNumScalableDims` from `int64_t` to `size_t`. This is for consistency with other helpers/methods that return "size" and to reduce the number of `static_cast`s in various places.	2024-09-30 14:53:50 +01:00

1 2 3 4 5 ...

277 Commits