llvm-project

Author	SHA1	Message	Date
Matthias Springer	e7790fbed3	[mlir] Add `test-convergence` option to Canonicalizer tests This new option is set to `false` by default. It should be set only in Canonicalizer tests to detect faulty canonicalization patterns. I.e., patterns that prevent the canonicalizer from converging. The canonicalizer should always convergence on such small unit tests that we have in `canonicalize.mlir`. Two faulty canonicalization patterns were detected and fixed with this change. Differential Revision: https://reviews.llvm.org/D140873	2023-01-04 12:02:21 +01:00
Krzysztof Drewniak	be575c5dfc	Re-land D139865 "Add known_block_size and known_grid_size to gpu.func" This should fix the MSVC warning that caused the previous revert. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D140766	2023-01-02 16:39:00 +00:00
Stella Stamenova	828b4762ca	Revert "[mlir][GPU] Add known_block_size and known_grid_size to gpu.func" This reverts commit 85e38d7cd670371206f6067772dc822049d2cbd8. This broke the windows mlir buildbot: https://lab.llvm.org/buildbot/#/builders/13/builds/30180/steps/6/logs/stdio	2022-12-23 17:29:42 -08:00
Krzysztof Drewniak	85e38d7cd6	[mlir][GPU] Add known_block_size and known_grid_size to gpu.func In many cases, the the number of workgroups (the grid size) and the number of workitems within each group (the block size) that a GPU kernel will be launched with are known. For example, if gpu.launch is called with constant block and grid sizes, we know that those are the only possible sizes that will be used to launch that kernel. In other cases, a custom code-generation pipeline that eventually produces GPU kernels may know the launch dimensions of those kernels, or at least may be able to provide an upper bound on them. Other GPU programming systems, such as OpenCL, allow capturing such information to enable compiler optimizations - see reqd_work_group_size, but MLIR currently has no mechanism for doing so. This set of attributes is the first step in enabling optimizations based on the known launch dimensions of kernels. It extends the kernel outline pass to set these bounds on kernels with constant launch dimensions and extends integer range inference for GPU index operations to account for the bounds when they are known. Subsequent revisions will use this data when lowering GPU operations to the ROCDL dialect. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D139865	2022-12-22 21:41:46 +00:00
Ivan Butygin	247d8d4f7a	[mlir][gpu] Add `uniform` flag to gpu reduction ops Differential Revision: https://reviews.llvm.org/D138758	2022-12-14 13:15:58 +01:00
Hanhan Wang	0a1569a400	[mlir][NFC] Remove trailing whitespaces from `.td` and `.mlir` files. This is generated by running ``` sed --in-place 's/[[:space:]]\+$//' mlir/*/.td sed --in-place 's/[[:space:]]\+$//' mlir/*/.mlir ``` Reviewed By: rriddle, dcaballe Differential Revision: https://reviews.llvm.org/D138866	2022-11-28 15:26:30 -08:00
Guray Ozen	c5798fae05	[mlir] [transform] Error for duplicated processor mapping In a nested loop nest, it is not feasible to map different loops to the same processing unit; for an example, check the code below. This modification includes a check in this circumstance. ``` scf.foreach_thread (%i, %j) in (%c32, %c32) {...} { mapping = [#gpu.thread<x>, #gpu.thread<x>] } ``` Note: It also deletes a test because it is not possible to reproduce this error. Depends on D138020 Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D138032	2022-11-18 08:38:53 +01:00
Guray Ozen	63ca939783	[mlir] [transform] Fix for RAUW error in transform gpu dialect The given test fails due to error below. The following error is why the test is failing. One `memref.store` and two `memref.load` are consumers of the loop index for which I do RAUW. `memref.store` is first in the list. If I RAUW on this the loop of `llvm::make early inc range(threadIdx.getUsers())` does not return two `memref.load` as users. They remain unchanged. I'm not really certain why. This change applies RAUW after collecting the users. If a better solution exists, I would be happy to implement it. ``` mlir-opt: ...llvm-project/mlir/include/mlir/IR/UseDefLists.h:175: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. ``` Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D138029	2022-11-16 09:55:24 +01:00
Guray Ozen	beaffb041c	[mlir][transform] Decouple GPUDeviceMapping attribute from the GPU transfrom dialect code generator `DeviceMappingAttrInterface` is implemented as unifiying mechanism for thread mapping. A code generator could use any attribute that implements this interface to lower `scf.foreach_thread` to device specific code. It is allowed to choose its own mapping and interpretation. Currently, GPU transform dialect supports only `GPUThreadMapping` and `GPUBlockMapping`; however, other mappings should to be supported as well. This change addresses this issue. It decouples gpu transform dialect from the `GPUThreadMapping` and `GPUBlockMapping`. Now, they can work any other mapping. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D138020	2022-11-15 18:16:32 +01:00
Nicolas Vasilache	f0a411da77	[mlir][Transform]Significantly cleanup scf.foreach_thread and GPU transform permutation handling Previously, the need for a dense permutation leaked into the thread_dim_mapping specification. This revision allows to use a sparse specification of the thread_dim_mapping and the proper completion / sorting is applied automatically. In the process, the sematics of scf.foreach_thread is tightened to require a matching number of thread dimensions and mappings. The relevant negative test is added. Differential Revision: https://reviews.llvm.org/D137906	2022-11-14 09:19:49 -08:00
Guray Ozen	6663f34704	[mlir] Introduce device mapper attribute for `thread_dim_map` and `mapped to dims` `scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it. ``` scf.foreach_thread (%i, %j) in (%c1, %c2) { scf.foreach_thread (%i2, %j2) in (%c1, %c2) {...} { thread_dim_mapping = [0, 1]} } { thread_dim_mapping = [0, 1]} ``` It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation. The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads. ``` scf.foreach_thread (%i, %j) in (%c1, %c2) { scf.foreach_thread (%i2, %j2) in (%c1, %c2) {...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]} } { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]} ``` Reviewed By: ftynse, nicolasvasilache Differential Revision: https://reviews.llvm.org/D137413	2022-11-11 08:44:57 +01:00
rkayaith	13bd410962	[mlir][Pass] Include anchor op in -pass-pipeline In D134622 the printed form of a pass manager is changed to include the name of the op that the pass manager is anchored on. This updates the `-pass-pipeline` argument format to include the anchor op as well, so that the printed form of a pipeline can be directly passed to `-pass-pipeline`. In most cases this requires updating `-pass-pipeline='pipeline'` to `-pass-pipeline='builtin.module(pipeline)'`. This also fixes an outdated assert that prevented running a `PassManager` anchored on `'any'`. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D134900	2022-11-03 11:36:12 -04:00
rkayaith	1c0f541a4d	[mlir] Don't mix -pass-pipeline with other pass options These are test updates required for D135745, which disallows mixing `-pass-pipeline` and the individual `-pass-name` options. Reviewed By: rriddle, mehdi_amini Differential Revision: https://reviews.llvm.org/D135746	2022-11-02 12:10:51 -04:00
Lei Zhang	b270fbe035	[mlir][gpu] Relax MMA load/store to allow vector memref This is useful for converting to SPIR-V, where we'd like to have memref of vector element types. Reviewed By: ThomasRaoux, bondhugula Differential Revision: https://reviews.llvm.org/D137143	2022-11-01 11:38:14 -04:00
Nirvedh Meshram	c441070665	[mlir][spirv] Add conversion from GPU WMMA ops to SPIRV Cooperative matrix Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D136521	2022-10-22 18:29:40 -07:00
Krzysztof Drewniak	44027f3908	[mlir][GPU] Prevent adding duplicate async tokens If, in the GPU async transformation, the operation being given an async dependency already depended on the token in question, we would add duplicate tokens, creating issues in GPU to LLVM lowering. To resolve this issue, add a check to addAsyncDependency() to ensure that duplicate tokens are not present in the token list. (I'm open to a different approach here, this is just what I went with initially) Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D136105	2022-10-18 15:37:20 +00:00
Alex Zinenko	2e9abc0c71	[mlir] drop unnecssary transform.with_pdl_patterns from tests, NFC Many tests wrap the piece of the IR related to the transform dialect into `transform.with_pdl_patterns` without actually using PDL patterns inside. Some of these are leftovers from migration to `structured.match` and some others are cargo cult, both are useless and pollute the tests. Reviewed By: guraypp Differential Revision: https://reviews.llvm.org/D135661	2022-10-11 12:26:11 +00:00
Alex Zinenko	6fe0309602	[mlir] switch transform dialect ops to use TransformTypeInterface Use the recently introduced TransformTypeInterface instead of hardcoding the PDLOperationType. This will allow the operations to use more specific transform types to express pre/post-conditions in the future. It requires the syntax and Python op construction API to be updated. Dialect extensions will be switched separately. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D135584	2022-10-11 09:55:13 +00:00
Ivan Butygin	b845addae8	[mlir][gpu] Add `subgroup_reduce` operation Introduce `subgroup_reduce` operation, similar to `all_reduce`, but operating on subgroup scope instead of workgroup. It is intended as low-level building block for more high level abstractions (e.g for workgroup-wide `all_reduce` ops). Only introduce version taking reduce operation enum for simplicity sake. Differential Revision: https://reviews.llvm.org/D135323	2022-10-11 11:47:15 +02:00
Ivan Butygin	a93ec06ae6	[mlir][gpu] Introduce `host_shared` flag to `gpu.alloc` Motivation: we have lowering pipeline based on upstream gpu and spirv dialects and and we are using host shared gpu memory to transfer data between host and device. Add `host_shared` flag to `gpu.alloc` to distinguish between shared and device-only gpu memory allocations. Differential Revision: https://reviews.llvm.org/D133533	2022-10-05 22:01:30 +02:00
Guray Ozen	e68a7bed59	[mlir][transform] Add failing test for GPU transform dialect The GPU transform dialect currently has restrictions and several situations where we can't use transform dialect. This update includes a method to test a failing cases in GPU transform dialect. Differential Revision: https://reviews.llvm.org/D135063	2022-10-05 13:10:13 +02:00
Guray Ozen	89bb0cae46	[mlir][transform] Create GPU transform dialect This revision adds GPU transform dialect. It also introduce a prefix such as "transform.gpu" for all ops related to this dialect. MLIR already had two GPU transform op in linalg. This revision moves these ops into GPUTransformOps. The Ops are as follows: `transform.structured.map_nested_foreach_thread_to_gpu_blocks` -> `transform.gpu.map_foreach_to_blocks` This op selects the outermost (toplevel) foreach_thread and parallelize across GPU blocks. It can also generate `gpu_launch`. `transform.structured.map_nested_foreach_thread_to_gpu_threads` -> `transform.gpu.map_nested_foreach_to_threads` This op parallelizes nested foreach_thread that are inside `gpu_launch` across GPU threads. It doesn't add new functionality, but there are some minor refactoring of the code. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D134800	2022-10-04 13:09:08 +02:00
Alex Zinenko	8a8bacb973	[mlir][GPU] treat the absence of workgroup attributes correctly The helper function in GPUFuncOp incorrectly assumed the workgroup attribution attribute is always present. Instead, treat its absence as if its value was zero, i.e., no workgroup attributions are specified. Closes #58045. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D134865	2022-09-29 12:36:10 +00:00
River Riddle	986b5c56ea	[mlir] Flip Async/GPU/OpenACC/OpenMP to use Both accessors This allows for incrementally updating the old API usages without needing to update everything at once. These will be left on Both for a little bit and then flipped to prefixed when all APIs have been updated. Differential Revision: https://reviews.llvm.org/D134386	2022-09-21 17:36:13 -07:00
Christian Sigg	50c33a3a9c	[MLIR] Harden gpu.func verification GPUFuncOpLowering moves the body out of gpu.func op and erases it. An empty gpu.func may fail verification but should not crash it. Verification of an erased op is triggered e.g. with debug printing on. Reviewed By: akuegel Differential Revision: https://reviews.llvm.org/D132446	2022-08-23 14:58:46 +02:00
Jacques Pienaar	7d273fde11	[mlir] Populate default attributes on op creation Default attributes were only handled by ODS accessors generated with the intention that these behave as if set attributes. This addresses the long standing TODO to address this inconsistency. Moving the initialization to construction vs every access. Removing need for duplicated default attribute population in python bindings. Switch some of the OpenMP ones to optional attribute with default as the currently set default values are not legal. May need to dig more there. Switched LinAlg generated ones to optional attribute with default as its quite widely used and unclear where it falls on two different interpretations. Differential Revision: https://reviews.llvm.org/D130916	2022-08-22 16:49:46 -07:00
Jeff Niu	58a47508f0	(Reland) [mlir] Switch segment size attributes to DenseI32ArrayAttr This reland includes changes to the Python bindings. Switch variadic operand and result segment size attributes to use the dense i32 array. Dense integer arrays were introduced primarily to represent index lists. They are a better fit for segment sizes than dense elements attrs. Depends on D131801 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D131803	2022-08-12 19:44:52 -04:00
Alex Zinenko	e8e718fa4b	Revert "[mlir] Switch segment size attributes to DenseI32ArrayAttr" This reverts commit 30171e76f0e5ea8037bc4d1450dd3e12af4d9938. Breaks Python tests in MLIR, missing C API and Python changes.	2022-08-12 10:22:47 +02:00
Jeff Niu	30171e76f0	[mlir] Switch segment size attributes to DenseI32ArrayAttr Switch variadic operand and result segment size attributes to use the dense i32 array. Dense integer arrays were introduced primarily to represent index lists. They are a better fit for segment sizes than dense elements attrs. Depends on D131738 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D131702	2022-08-11 20:56:45 -04:00
River Riddle	ab9cdf09f4	[mlir:Parser] Don't use strings for the "ugly" form of Attribute/Type syntax This commit refactors the syntax of "ugly" attribute/type formats to not use strings for wrapping. This means that moving forward attirbutes and type formats will always need to be in some recognizable form, i.e. if they use incompatible characters they will need to manually wrap those in a string, the framework will no longer do it automatically. This has the benefit of greatly simplifying how parsing attributes/types work, given that we currently rely on some extremely complicated nested parser logic which is quite problematic for a myriad of reasons; unecessary complexity(we create a nested source manager/lexer/etc.), diagnostic locations can be off/wrong given string escaping, etc. Differential Revision: https://reviews.llvm.org/D118505	2022-07-05 16:20:30 -07:00
Christian Sigg	3e01af093f	[mlir] Add InferIntRangeInterface to gpu.launch Infers block/grid dimensions/indices or ranges of such dimensions/indices. Reviewed By: krzysz00 Differential Revision: https://reviews.llvm.org/D129036	2022-07-05 07:14:54 +02:00
Mogball	7bdd3722f2	[mlir][gpu] Change ParalellLoopMappingAttr to AttrDef It was a StructAttr. Also adds a FieldParser for AffineMap. Depends on D127348 Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D127350	2022-06-09 22:23:21 +00:00
Christian Sigg	bcf3d52486	[MLIR][GPU] Expose GpuParallelLoopMapping as non-test pass. Reviewed By: bondhugula, herhut Differential Revision: https://reviews.llvm.org/D126199	2022-05-30 09:20:48 +02:00
Arnab Dutta	16219f8c94	[MLIR][GPU] Add canonicalizer for gpu.memcpy Erase gpu.memcpy op when only uses of dest are the memcpy op in question, its allocation and deallocation ops. Reviewed By: bondhugula, csigg Differential Revision: https://reviews.llvm.org/D124257	2022-05-14 19:01:04 +05:30
Thomas Raoux	15bcc36eed	[mlir][gpu] Move async copy ops to NVGPU and add caching hints Move async copy operations to NVGPU as they only exist on NV target and are designed to match ptx semantic. This allows us to also add more fine grain caching hint attribute to the op. Add hint to bypass L1 and hook it up to NVVM op. Differential Revision: https://reviews.llvm.org/D125244	2022-05-10 22:30:24 +00:00
River Riddle	a8308020ac	[mlir] Remove special case parsing/printing of `func` operations This was leftover from when the standard dialect was destroyed, and when FuncOp moved to the func dialect. Now that these transitions have settled a bit we can drop these. Most updates were handled using a simple regex: replace `^( *)func` with `$1func.func` Differential Revision: https://reviews.llvm.org/D124146	2022-05-06 13:36:15 -07:00
Chris Lattner	d85eb4e2d6	[AsmParser] Introduce a new "Argument" abstraction + supporting logic MLIR has a common pattern for "arguments" that uses syntax like `%x : i32 {attrs} loc("sourceloc")` which is implemented in adhoc ways throughout the codebase. The approach this uses is verbose (because it is implemented with parallel arrays) and inconsistent (e.g. lots of things drop source location info). Solve this by introducing OpAsmParser::Argument and make addRegion (which sets up BlockArguments for the region) take it. Convert the world to propagating this down. This means that we correctly capture and propagate source location information in a lot more cases (e.g. see the affine.for testcase example), and it also simplifies much code. Differential Revision: https://reviews.llvm.org/D124649	2022-04-29 12:19:34 -07:00
Fangrui Song	ae46b3e01f	Revert D121279 "[MLIR][GPU] Add canonicalizer for gpu.memcpy" This reverts commit 12f55cac69d8978d1c433756a8b2114bf9ed1e1b. Causes miscompile. Will follow up with a reproduce.	2022-04-21 08:55:13 -07:00
Uday Bondhugula	f47a38f517	Add async dependencies support for gpu.launch op Add async dependencies support for gpu.launch op: this allows specifying a list of async tokens ("streams") as dependencies for the launch. Update the GPU kernel outlining pass lowering to propagate async dependencies from gpu.launch to gpu.launch_func op. Previously, a new stream was being created and destroyed for a kernel launch. The async deps support allows the kernel launch to be serialized on an existing stream. Differential Revision: https://reviews.llvm.org/D123499	2022-04-21 16:25:59 +05:30
River Riddle	0fd3a1ce60	[mlir][NFC] Update remaining textual references of un-namespaced `func` operations The special case parsing of operations in the `func` dialect is being removed, and operations will require the dialect namespace prefix.	2022-04-20 22:17:31 -07:00
River Riddle	412b8850f6	[mlir][NFC] Update textual references of `func` to `func.func` in Bufferization/Complex/EmitC/CF/Func/GPU tests The special case parsing of `func` operations is being removed.	2022-04-20 22:17:28 -07:00
Arnab Dutta	12f55cac69	[MLIR][GPU] Add canonicalizer for gpu.memcpy Fold away gpu.memcpy op when only uses of dest are the memcpy op in question, its allocation and deallocation ops. Reviewed By: bondhugula Differential Revision: https://reviews.llvm.org/D121279	2022-04-19 17:54:00 +05:30
Arnab Dutta	392d55c1e2	[MLIR][GPU] Add canonicalization patterns for folding simple gpu.wait ops. * Fold away redundant %t = gpu.wait async + gpu.wait [%t] pairs. * Fold away %t = gpu.wait async ... ops when %t has no uses. * Fold away gpu.wait [] ops. * In case of %t1 = gpu.wait async [%t0], replace all uses of %t1 with %t0. Differential Revision: https://reviews.llvm.org/D121878	2022-04-14 12:30:55 +05:30
Thomas Raoux	d77f483640	[mlir][gpu] Relax restriction on mma load/store op Those ops can support more complex layout as long as the most inner dimension is contiguous. Differential Revision: https://reviews.llvm.org/D122452	2022-03-25 04:03:40 +00:00
River Riddle	4a3460a791	[mlir:FunctionOpInterface] Rename the "type" attribute to "function_type" This removes any potential confusion with the `getType` accessors which correspond to SSA results of an operation, and makes it clear what the intent is (i.e. to represent the type of the function). Differential Revision: https://reviews.llvm.org/D121762	2022-03-16 17:07:04 -07:00
Ivan Butygin	9f864a5447	[mlir][gpu] Introduce gpu.global_id op Introduce OpenCL-style global_id op and corresponding spirv lowering. Differential Revision: https://reviews.llvm.org/D121548	2022-03-15 13:25:50 +03:00
Chia-hung Duan	ed645f6336	[mlir] Support verification order (3/3) In this CL, update the function name of verifier according to the behavior. If a verifier needs to access the region then it'll be updated to `verifyRegions`. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D120373	2022-03-11 01:16:28 +00:00
Chia-hung Duan	9445b39673	[mlir] Support verification order (2/3) This change gives explicit order of verifier execution and adds `hasRegionVerifier` and `verifyWithRegions` to increase the granularity of verifier classification. The orders are as below, 1. InternalOpTrait will be verified first, they can be run independently. 2. `verifyInvariants` which is constructed by ODS, it verifies the type, attributes, .etc. 3. Other Traits/Interfaces that have marked their verifier as `verifyTrait` or `verifyWithRegions=0`. 4. Custom verifier which is defined in the op and has marked `hasVerifier=1` If an operation has regions, then it may have the second phase, 5. Traits/Interfaces that have marked their verifier as `verifyRegionTrait` or `verifyWithRegions=1`. This implies the verifier needs to access the operations in its regions. 6. Custom verifier which is defined in the op and has marked `hasRegionVerifier=1` Note that the second phase will be run after the operations in the region are verified. Based on the verification order, you will be able to avoid verifying duplicate things. Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D116789	2022-02-25 19:04:56 +00:00
Krzysztof Drewniak	84718d37db	[MLIR][GPU] Add gpu.set_default_device op This op is added to allow MLIR code running on multi-GPU systems to select the GPU they want to execute operations on when no GPU is otherwise specified. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D119883	2022-02-17 21:30:09 +00:00
Ivan Butygin	d271fc04d5	[mlir][gpu] Split ops sinking from gpu-kernel-outlining pass into separate pass Previously `gpu-kernel-outlining` pass was also doing index computation sinking into gpu.launch before actual outlining. Split ops sinking from `gpu-kernel-outlining` pass into separate pass, so users can use theirs own sinking pass before outlining. To achieve old behavior users will need to call both passes: `-gpu-launch-sink-index-computations -gpu-kernel-outlining`. Differential Revision: https://reviews.llvm.org/D119932	2022-02-17 10:34:20 +03:00

1 2 3

150 Commits