8 Commits

Author SHA1 Message Date
Petr Kurapov
bc29fc937c
[MLIR] Create GPU utils library & move distribution utils (#119264)
Continue the move of `warp_execute_on_lane_0` op to the gpu dialect
(#116994). This patch creates a utils library in GPU and moves generic
helper functions there.
2024-12-13 10:26:57 +01:00
Andrea Faulds
a800ffac41
[mlir][gpu] Disjoint patterns for lowering clustered subgroup reduce (#109158)
Making the existing populateGpuLowerSubgroupReduceToShufflePatterns()
function also cover the new "clustered" subgroup reductions is proving
to be inconvenient, because certain backends may have more specific
lowerings that only cover the non-clustered type, and this creates pass
ordering constraints. This commit removes coverage of clustered
reductions from this function in favour of a new separate function,
which makes controlling the lowering much more straightforward.
2024-09-18 15:55:53 -04:00
Andrea Faulds
fd26f8444a
[mlir][gpu] Rename two misspelled pattern population functions (#109015) 2024-09-17 15:26:14 -04:00
Andrea Faulds
3d01f0a33b
[mlir][gpu] Add 'cluster_stride' attribute to gpu.subgroup_reduce (#107142)
Follow-up to 7aa22f013e24d20291aad745368ff907baa9dfa4, adding an
additional attribute needed in some applications.
2024-09-05 09:03:22 -04:00
Andrea Faulds
7aa22f013e
[mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (#104851)
This enables performing several reductions in parallel, each smaller
than the size of the subgroup.

One potential application is flash attention with subgroup-wide matrix
multiplication and reduction combined in one kernel. The multiplication
operation requires a 2D matrix to be distributed over the lanes of the
subgroup, which then constrains the shape the following reduction can
have if we want to keep data in registers.
2024-08-20 13:37:03 -04:00
Ramkumar Ramachandra
db791b278a
mlir/LogicalResult: move into llvm (#97309)
This patch is part of a project to move the Presburger library into
LLVM.
2024-07-02 10:42:33 +01:00
Jakub Kuderski
c0345b4648
[mlir][gpu] Add subgroup_reduce to shuffle lowering (#76530)
This supports both the scalar and the vector multi-reduction cases.
2024-01-02 16:14:22 -05:00
Jakub Kuderski
2af186f9bd
[mlir][gpu] Add patterns to break down subgroup reduce (#76271)
The new patterns break down subgroup reduce ops with vector values into
a sequence of subgroup reductions that fit the native shuffle size. The
maximum/native shuffle size is parametrized.

The overall goal is to be able to perform multi-element reductions with
a sequence of `gpu.shuffle` ops.
2023-12-28 14:39:46 -05:00