39 Commits

Author SHA1 Message Date
Kazu Hirata
70c73d1b72 [mlir] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:23:50 -08:00
Kazu Hirata
1a36588ec6 [mlir] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-03 18:50:27 -08:00
Christian Sigg
be065c41d8 [mlir] Change scf::LoopNest to store 'results'.
This fixes the case where scf::LoopNest::loops is empty.

Change LoopVector and ValueVector to SmallVector.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D136926
2022-12-01 06:51:45 +01:00
Matthias Springer
0d9761d50e [mlir][SCF] Add tensor.dim(scf.foreach_thread) folding
Dim sizes of `scf.foreach_thread` op results match the dim sizes of their respective tied shared_outs operands.

Differential Revision: https://reviews.llvm.org/D138484
2022-11-22 11:28:27 +01:00
Mohammed Anany
77533d79f7 [mlir][SCF] Adding custom builder to SCF::WhileOp.
This is a similar builder to the one for SCF::IfOp which allows users to pass region builders to it. Refer to the builders for IfOp.

Reviewed By: tpopp

Differential Revision: https://reviews.llvm.org/D137709
2022-11-15 18:16:49 +01:00
Nicolas Vasilache
f0a411da77 [mlir][Transform]Significantly cleanup scf.foreach_thread and GPU transform permutation handling
Previously, the need for a dense permutation leaked into the thread_dim_mapping specification.
This revision allows to use a sparse specification of the thread_dim_mapping and the proper completion / sorting is applied automatically.

In the process, the sematics of scf.foreach_thread is tightened to require a matching number of thread dimensions and mappings.
The relevant negative test is added.

Differential Revision: https://reviews.llvm.org/D137906
2022-11-14 09:19:49 -08:00
Alexander Belyaev
07665e78cb [mlir] Fix forward the fix for incorrect Optional<ArrayAttr> usage. 2022-11-11 10:53:04 +01:00
Alexander Belyaev
b7162e136e [mlir] Fix incorrect access to the Optional<ArrayAttr> underlying values. 2022-11-11 10:46:04 +01:00
Guray Ozen
6663f34704 [mlir] Introduce device mapper attribute for thread_dim_map and mapped to dims
`scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.

```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
	scf.foreach_thread (%i2, %j2) in (%c1, %c2)
	{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}
```

It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.

The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.

```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
	scf.foreach_thread (%i2, %j2) in (%c1, %c2)
	{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}
```

Reviewed By: ftynse, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D137413
2022-11-11 08:44:57 +01:00
Mehdi Amini
34233d4995 Apply clang-tidy fixes for readability-redundant-smartptr-get in SCF.cpp (NFC) 2022-11-09 00:41:30 +00:00
Mehdi Amini
f5865c8701 Apply clang-tidy fixes for llvm-qualified-auto in SCF.cpp (NFC) 2022-11-09 00:41:30 +00:00
Kazu Hirata
585e35a998 [mlir] Use llvm::is_contained (NFC) 2022-11-06 19:56:15 -08:00
Sanjoy Das
788390c130 Make scf.for and affine.for conditionally speculatable
for (I = Start; I < End; I += 1) always terminates so mark
{scf|affine}.for as RecursivelySpeculatable when step is known to be
1.

Reviewed By: chelini

Differential Revision: https://reviews.llvm.org/D136376
2022-10-30 16:08:42 -07:00
Jeff Niu
07d8fe9391 [mlir][scf] Add an IndexSwitchOp
The `scf.index_switch` is a control-flow operation that branches to one of the
given regions based on the values of the argument and the cases. The
argument is always of type `index`.

Example:

```mlir
%0 = scf.index_switch %arg0 -> i32
case 2 {
  %1 = arith.constant 10 : i32
  scf.yield %1 : i32
}
case 5 {
  %2 = arith.constant 20 : i32
  scf.yield %2 : i32
}
default {
  %3 = arith.constant 30 : i32
  scf.yield %3 : i32
}
```

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D136003
2022-10-21 09:21:10 -07:00
Jeff Niu
6005a1d8af [mlir][scf] Match any constants instead of arith.constant
By matching `arith.constant` specifically, SCF canonicalizers/folders
are incompatible with other kinds of constants. Use the generic
matchers instead.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D135517
2022-10-12 18:01:57 -07:00
Jakub Kuderski
abc362a107 [mlir][arith] Change dialect name from Arithmetic to Arith
Suggested by @lattner in https://discourse.llvm.org/t/rfc-define-precise-arith-semantics/65507/22.

Tested with:
`ninja check-mlir check-mlir-integration check-mlir-mlir-spirv-cpu-runner check-mlir-mlir-vulkan-runner check-mlir-examples`

and `bazel build --config=generic_clang @llvm-project//mlir:all`.

Reviewed By: lattner, Mogball, rriddle, jpienaar, mehdi_amini

Differential Revision: https://reviews.llvm.org/D134762
2022-09-29 11:23:28 -04:00
Peiming Liu
52887071ea [mlir][scf] Support simple symbolic expression without depending on AffineDialect to simply trivial loops.
Remove dependence of AffineDialect

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D134291
2022-09-20 18:13:05 +00:00
Peiming Liu
d518fc28b6 [mlir][scf] Support simple symbolic expression when simplify loops
Reviewed By: aartbik, ThomasRaoux

Differential Revision: https://reviews.llvm.org/D134204
2022-09-19 21:50:01 +00:00
Guray Ozen
233de4e808 [mlir] Add map_nested_foreach_thread_to_gpu_threads op to transform dialect
This revision adds a new op `map_nested_foreach_thread_to_gpu_threads` to transform dialect. The op searches `scf.foreach_threads` inside the `gpu_launch` and distributes them with `gpu.thread_id` attribute.

Loop mapping is explicit and given by the `map_nested_foreach_thread_to_gpu_threads` op. Mapping is done one-to-one, therefore the loops dissappear.

The dynamic trip count or trip count that are larger than thread size are not supported for the time being. However, we can indeed support them by generating a loop inside with cyclic scheduling.

For the time being, trip counts that are dynamic or bigger than thread sizes are not supported. However, in the future the compiler can indeed generate a loop with static cyclic scheduling to support these cases.

Current mechanism allows `scf.foreach_threads` to be siblings or nested. There cannot be interleaving code between the loops when they are nested.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D133950
2022-09-19 16:27:30 +02:00
Matthias Springer
4cd7362083 [mlir][SCF] foreach_thread: Capture shared output tensors explicitly
This change refines the semantics of scf.foreach_thread. Tensors that are inserted into in the terminator must now be passed to the region explicitly via `shared_outs`. Inside of the body of the op, those tensors are then accessed via block arguments.

The body of a scf.foreach_thread is now treated as a repetitive region. I.e., op dominance can no longer be used in conflict detection when using a value that is defined outside of the body. Such uses may now be considered as conflicts (if there is at least one read and one write in the body), effectively privatizing the tensor. Shared outputs are not privatized when they are used via their corresponding block arguments.

As part of this change, it was also necessary to update the "tiling to scf.foreach_thread", such that the generated tensor.extract_slice ops use the scf.foreach_thread's block arguments. This is implemented by cloning the TilingInterface op inside the scf.foreach_thread, rewriting all of its outputs with block arguments and then calling the tiling implementation. Afterwards, the cloned op is deleted again.

Differential Revision: https://reviews.llvm.org/D133114
2022-09-02 14:54:04 +02:00
Jeff Niu
27e8ee208c [mlir] Remove a not very useful eraseArguments overload
This overload just wraps a bitvector, and in most cases a bitvector
could be used directly instead of a list.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D132896
2022-08-29 16:07:32 -07:00
Jeff Niu
58a47508f0 (Reland) [mlir] Switch segment size attributes to DenseI32ArrayAttr
This reland includes changes to the Python bindings.

Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.

Depends on D131801

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D131803
2022-08-12 19:44:52 -04:00
Alex Zinenko
e8e718fa4b Revert "[mlir] Switch segment size attributes to DenseI32ArrayAttr"
This reverts commit 30171e76f0e5ea8037bc4d1450dd3e12af4d9938.

Breaks Python tests in MLIR, missing C API and Python changes.
2022-08-12 10:22:47 +02:00
Jeff Niu
30171e76f0 [mlir] Switch segment size attributes to DenseI32ArrayAttr
Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.

Depends on D131738

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D131702
2022-08-11 20:56:45 -04:00
Benjamin Kramer
9fa59e7643 [mlir] Use C++17 structured bindings instead of std::tie where applicable. NFCI 2022-08-09 13:34:17 +02:00
Kazu Hirata
c8e6ebd74e Use value instead of getValue (NFC) 2022-08-06 11:21:39 -07:00
Kazu Hirata
9750648cb4 [mlir, flang] Use has_value instead of hasValue (NFC) 2022-08-06 11:12:47 -07:00
Nicolas Vasilache
7fbf55c927 [mlir][Tensor] Move ParallelInsertSlice to the tensor dialect
This is moslty NFC and will allow tensor.parallel_insert_slice to gain
rank-reducing semantics by reusing the vast majority of the tensor.insert_slice impl.

Depends on D128857

Differential Revision: https://reviews.llvm.org/D128920
2022-07-04 01:53:12 -07:00
Nicolas Vasilache
b994d388ae [mlir][SCF] Add a ParallelCombiningOpInterface to decouple scf::PerformConcurrently from its contained operations
This allows purging references of scf.ForeachThreadOp and scf.PerformConcurrentlyOp from
ParallelInsertSliceOp.
This will allowmoving the op closer to tensor::InsertSliceOp with which it should share much more
code.

In the future, the decoupling will also allow extending the type of ops that can be used in the
parallel combinator as well as semantics related to multiple concurrent inserts to the same
result.

Differential Revision: https://reviews.llvm.org/D128857
2022-07-01 00:16:02 -07:00
Matthias Springer
04dac2ca7c [mlir][SCF][bufferize][NFC] Implement resolveConflicts for ParallelInsertSliceOp
This was previous implemented as part of the BufferizableOpInterface of ForEachThreadOp. Moving the implementation to ParallelInsertSliceOp to be consistent with the remaining ops and to have a nice example op that can serve as a blueprint for other ops.

Differential Revision: https://reviews.llvm.org/D128666
2022-06-28 12:18:22 +02:00
Nicolas Vasilache
a0f843fdaf [SCF] Add thread_dim_mapping attribute to scf.foreach_thread
An optional thread_dim_mapping index array attribute specifies for each
virtual thread dimension, how it remaps 1-1 to a set of concrete processing
element resources (e.g. a CUDA grid dimension or a level of concrete nested
async parallelism). At this time, the specification is backend-dependent and
is not verified by the op, beyond being an index array attribute.
It is the reponsibility of the lowering to interpret the index array in the
context of the concrete target the op is lowered to, or to ignore it when
the specification is ill-formed or unsupported for a particular target.

Differential Revision: https://reviews.llvm.org/D128633
2022-06-27 04:58:36 -07:00
Jacques Pienaar
2d70eff802 [mlir] Flip more uses to prefixed accessor form (NFC).
Try to keep the final flip small. Need to flip MemRef as there are many
templated cases with it and Tensor.
2022-06-26 19:12:38 -07:00
Kazu Hirata
3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
2022-06-25 11:56:50 -07:00
Kazu Hirata
aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Nicolas Vasilache
98dbaed1e6 [mlir][SCF] Fold tensor.cast feeding into scf.foreach_thread.parallel_insert_slice
Differential Revision: https://reviews.llvm.org/D128247
2022-06-21 01:19:18 -07:00
Nicolas Vasilache
a489aa745b [mlir][SCF] Add scf::ForeachThread canonicalization.
This revision adds the necessary plumbing for canonicalizing scf::ForeachThread with the
`AffineOpSCFCanonicalizationPattern`.
In the process the `loopMatcher` helper is updated to take OpFoldResult instead of just values.
This allows composing various scenarios without the need for an artificial builder.

Differential Revision: https://reviews.llvm.org/D128244
2022-06-21 00:54:46 -07:00
Kazu Hirata
6d5fc1e3d5 [mlir] Don't use Optional::getValue (NFC) 2022-06-20 23:20:25 -07:00
Kazu Hirata
037f09959a [mlir] Don't use Optional::hasValue (NFC) 2022-06-20 11:22:37 -07:00
Alex Zinenko
8b68da2c7d [mlir] move SCF headers to SCF/{IR,Transforms} respectively
This aligns the SCF dialect file layout with the majority of the dialects.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D128049
2022-06-20 10:18:01 +02:00