150 Commits

Author SHA1 Message Date
Matthias Springer
e7790fbed3 [mlir] Add test-convergence option to Canonicalizer tests
This new option is set to `false` by default. It should  be set only in Canonicalizer tests to detect faulty canonicalization patterns. I.e., patterns that prevent the canonicalizer from converging. The canonicalizer should always convergence on such small unit tests that we have in `canonicalize.mlir`.

Two faulty canonicalization patterns were detected and fixed with this change.

Differential Revision: https://reviews.llvm.org/D140873
2023-01-04 12:02:21 +01:00
Krzysztof Drewniak
be575c5dfc Re-land D139865 "Add known_block_size and known_grid_size to gpu.func"
This should fix the MSVC warning that caused the previous revert.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D140766
2023-01-02 16:39:00 +00:00
Stella Stamenova
828b4762ca Revert "[mlir][GPU] Add known_block_size and known_grid_size to gpu.func"
This reverts commit 85e38d7cd670371206f6067772dc822049d2cbd8.

This broke the windows mlir buildbot:
https://lab.llvm.org/buildbot/#/builders/13/builds/30180/steps/6/logs/stdio
2022-12-23 17:29:42 -08:00
Krzysztof Drewniak
85e38d7cd6 [mlir][GPU] Add known_block_size and known_grid_size to gpu.func
In many cases, the the number of workgroups (the grid size) and the
number of workitems within each group (the block size) that a GPU
kernel will be launched with are known. For example, if gpu.launch is
called with constant block and grid sizes, we know that those are the
only possible sizes that will be used to launch that kernel. In other
cases, a custom code-generation pipeline that eventually produces GPU
kernels may know the launch dimensions of those kernels, or at least
may be able to provide an upper bound on them.

Other GPU programming systems, such as OpenCL, allow capturing such
information to enable compiler optimizations - see
reqd_work_group_size, but MLIR currently has no mechanism for doing so.

This set of attributes is the first step in enabling optimizations
based on the known launch dimensions of kernels. It extends the kernel
outline pass to set these bounds on kernels with constant launch
dimensions and extends integer range inference for GPU index
operations to account for the bounds when they are known.

Subsequent revisions will use this data when lowering GPU operations
to the ROCDL dialect.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D139865
2022-12-22 21:41:46 +00:00
Ivan Butygin
247d8d4f7a [mlir][gpu] Add uniform flag to gpu reduction ops
Differential Revision: https://reviews.llvm.org/D138758
2022-12-14 13:15:58 +01:00
Hanhan Wang
0a1569a400 [mlir][NFC] Remove trailing whitespaces from *.td and *.mlir files.
This is generated by running

```
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.td
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.mlir
```

Reviewed By: rriddle, dcaballe

Differential Revision: https://reviews.llvm.org/D138866
2022-11-28 15:26:30 -08:00
Guray Ozen
c5798fae05 [mlir] [transform] Error for duplicated processor mapping
In a nested loop nest, it is not feasible to map different loops to the same processing unit; for an example, check the code below. This modification includes a check in this circumstance.

```
scf.foreach_thread (%i, %j) in (%c32, %c32) {...}
{ mapping = [#gpu.thread<x>, #gpu.thread<x>] }

```

Note: It also deletes a test because it is not possible to reproduce this error.

Depends on D138020

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D138032
2022-11-18 08:38:53 +01:00
Guray Ozen
63ca939783 [mlir] [transform] Fix for RAUW error in transform gpu dialect
The given test fails due to error below.

The following error is why the test is failing. One `memref.store` and two `memref.load` are consumers of the loop index for which I do RAUW. `memref.store` is first in the list. If I RAUW on this the loop of `llvm::make early inc range(threadIdx.getUsers())` does not return two `memref.load` as users. They remain unchanged. I'm not really certain why.

This change applies RAUW after collecting the users. If a better solution exists, I would be happy to implement it.

```
mlir-opt: ...llvm-project/mlir/include/mlir/IR/UseDefLists.h:175: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
```

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D138029
2022-11-16 09:55:24 +01:00
Guray Ozen
beaffb041c [mlir][transform] Decouple GPUDeviceMapping attribute from the GPU transfrom dialect code generator
`DeviceMappingAttrInterface` is implemented as unifiying mechanism for thread mapping. A code generator could use any attribute that implements this interface to lower `scf.foreach_thread` to device specific code. It is allowed to choose its own mapping and interpretation.

Currently, GPU transform dialect supports only `GPUThreadMapping` and `GPUBlockMapping`; however, other mappings should to be supported as well. This change addresses this issue. It decouples gpu transform dialect from the `GPUThreadMapping` and `GPUBlockMapping`. Now, they can work any other mapping.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D138020
2022-11-15 18:16:32 +01:00
Nicolas Vasilache
f0a411da77 [mlir][Transform]Significantly cleanup scf.foreach_thread and GPU transform permutation handling
Previously, the need for a dense permutation leaked into the thread_dim_mapping specification.
This revision allows to use a sparse specification of the thread_dim_mapping and the proper completion / sorting is applied automatically.

In the process, the sematics of scf.foreach_thread is tightened to require a matching number of thread dimensions and mappings.
The relevant negative test is added.

Differential Revision: https://reviews.llvm.org/D137906
2022-11-14 09:19:49 -08:00
Guray Ozen
6663f34704 [mlir] Introduce device mapper attribute for thread_dim_map and mapped to dims
`scf.foreach_thread` defines mapping its loops to processors via an integer array, see an example below. A lowering can use this mapping. However, expressing mapping as an integer array is very confusing, especially when there are multiple levels of parallelism. In addition, the op does not verify the integer array. This change introduces device mapping attribute to make mapping descriptive and verifiable. Then it makes GPU transform dialect use it.

```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
	scf.foreach_thread (%i2, %j2) in (%c1, %c2)
	{...} { thread_dim_mapping = [0, 1]}
} { thread_dim_mapping = [0, 1]}
```

It first introduces a `DeviceMappingInterface` which is an attribute interface. `scf.foreach_thread` defines its mapping via this interface. A lowering must define its attributes and implement this interface as well. This way gives us a clear validation.

The change also introduces two new attributes (`#gpu.thread<x/y/z>` and `#gpu.block<x,y,z>` ). After this change, the above code prints as below, as seen here, this way clarifies the loop mappings. The change also implements consuming of these two new attribute by the transform dialect. Transform dialect binds the outermost loops to the thread blocks and innermost loops to threads.

```
scf.foreach_thread (%i, %j) in (%c1, %c2) {
	scf.foreach_thread (%i2, %j2) in (%c1, %c2)
	{...} { thread_dim_mapping = [#gpu.thread<x>, #gpu.thread<y>]}
} { thread_dim_mapping = [#gpu.block<x>, #gpu.block<y>]}
```

Reviewed By: ftynse, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D137413
2022-11-11 08:44:57 +01:00
rkayaith
13bd410962 [mlir][Pass] Include anchor op in -pass-pipeline
In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.

This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D134900
2022-11-03 11:36:12 -04:00
rkayaith
1c0f541a4d [mlir] Don't mix -pass-pipeline with other pass options
These are test updates required for D135745, which disallows mixing
`-pass-pipeline` and the individual `-pass-name` options.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D135746
2022-11-02 12:10:51 -04:00
Lei Zhang
b270fbe035 [mlir][gpu] Relax MMA load/store to allow vector memref
This is useful for converting to SPIR-V, where we'd like to have
memref of vector element types.

Reviewed By: ThomasRaoux, bondhugula

Differential Revision: https://reviews.llvm.org/D137143
2022-11-01 11:38:14 -04:00
Nirvedh Meshram
c441070665 [mlir][spirv] Add conversion from GPU WMMA ops to SPIRV Cooperative matrix
Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D136521
2022-10-22 18:29:40 -07:00
Krzysztof Drewniak
44027f3908 [mlir][GPU] Prevent adding duplicate async tokens
If, in the GPU async transformation, the operation being given an
async dependency already depended on the token in question, we
would add duplicate tokens, creating issues in GPU to LLVM lowering.

To resolve this issue, add a check to addAsyncDependency() to ensure
that duplicate tokens are not present in the token list.

(I'm open to a different approach here, this is just what I went with
initially)

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D136105
2022-10-18 15:37:20 +00:00
Alex Zinenko
2e9abc0c71 [mlir] drop unnecssary transform.with_pdl_patterns from tests, NFC
Many tests wrap the piece of the IR related to the transform dialect
into `transform.with_pdl_patterns` without actually using PDL patterns
inside. Some of these are leftovers from migration to `structured.match`
and some others are cargo cult, both are useless and pollute the tests.

Reviewed By: guraypp

Differential Revision: https://reviews.llvm.org/D135661
2022-10-11 12:26:11 +00:00
Alex Zinenko
6fe0309602 [mlir] switch transform dialect ops to use TransformTypeInterface
Use the recently introduced TransformTypeInterface instead of hardcoding
the PDLOperationType. This will allow the operations to use more
specific transform types to express pre/post-conditions in the future.
It requires the syntax and Python op construction API to be updated.
Dialect extensions will be switched separately.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D135584
2022-10-11 09:55:13 +00:00
Ivan Butygin
b845addae8 [mlir][gpu] Add subgroup_reduce operation
Introduce `subgroup_reduce` operation, similar to `all_reduce`, but operating on subgroup scope instead of workgroup.
It is intended as low-level building block for more high level abstractions (e.g for workgroup-wide `all_reduce` ops).
Only introduce version taking reduce operation enum for simplicity sake.

Differential Revision: https://reviews.llvm.org/D135323
2022-10-11 11:47:15 +02:00
Ivan Butygin
a93ec06ae6 [mlir][gpu] Introduce host_shared flag to gpu.alloc
Motivation: we have lowering pipeline based on upstream gpu and spirv dialects and and we are using host shared gpu memory to transfer data between host and device.
Add `host_shared` flag to `gpu.alloc` to distinguish between shared and device-only gpu memory allocations.

Differential Revision: https://reviews.llvm.org/D133533
2022-10-05 22:01:30 +02:00
Guray Ozen
e68a7bed59 [mlir][transform] Add failing test for GPU transform dialect
The GPU transform dialect currently has restrictions and several situations where we can't use transform dialect.

This update includes a method to test a failing cases in GPU transform dialect.

Differential Revision: https://reviews.llvm.org/D135063
2022-10-05 13:10:13 +02:00
Guray Ozen
89bb0cae46 [mlir][transform] Create GPU transform dialect
This revision adds GPU transform dialect. It also introduce a prefix such as "transform.gpu" for all ops related to this dialect.

MLIR already had two GPU transform op in linalg. This revision moves these ops into GPUTransformOps. The Ops are as follows:

`transform.structured.map_nested_foreach_thread_to_gpu_blocks`  -> `transform.gpu.map_foreach_to_blocks`
This op selects the outermost (toplevel) foreach_thread and parallelize across GPU blocks. It can also generate `gpu_launch`.

`transform.structured.map_nested_foreach_thread_to_gpu_threads` -> `transform.gpu.map_nested_foreach_to_threads`
This op parallelizes nested foreach_thread that are inside `gpu_launch` across GPU threads.

It doesn't add new functionality, but there are some minor refactoring of the code.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D134800
2022-10-04 13:09:08 +02:00
Alex Zinenko
8a8bacb973 [mlir][GPU] treat the absence of workgroup attributes correctly
The helper function in GPUFuncOp incorrectly assumed the workgroup
attribution attribute is always present. Instead, treat its absence as
if its value was zero, i.e., no workgroup attributions are specified.

Closes #58045.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D134865
2022-09-29 12:36:10 +00:00
River Riddle
986b5c56ea [mlir] Flip Async/GPU/OpenACC/OpenMP to use Both accessors
This allows for incrementally updating the old API usages without
needing to update everything at once. These will be left on Both
for a little bit and then flipped to prefixed when all APIs have been
updated.

Differential Revision: https://reviews.llvm.org/D134386
2022-09-21 17:36:13 -07:00
Christian Sigg
50c33a3a9c [MLIR] Harden gpu.func verification
GPUFuncOpLowering moves the body out of gpu.func op and erases it. An empty gpu.func may fail verification but should not crash it. Verification of an erased op is triggered e.g. with debug printing on.

Reviewed By: akuegel

Differential Revision: https://reviews.llvm.org/D132446
2022-08-23 14:58:46 +02:00
Jacques Pienaar
7d273fde11 [mlir] Populate default attributes on op creation
Default attributes were only handled by ODS accessors generated with the
intention that these behave as if set attributes. This addresses the
long standing TODO to address this inconsistency. Moving the
initialization to construction vs every access. Removing need for
duplicated default attribute population in python bindings.

Switch some of the OpenMP ones to optional attribute with default as the
currently set default values are not legal. May need to dig more there.

Switched LinAlg generated ones to optional attribute with default as its
quite widely used and unclear where it falls on two different
interpretations.

Differential Revision: https://reviews.llvm.org/D130916
2022-08-22 16:49:46 -07:00
Jeff Niu
58a47508f0 (Reland) [mlir] Switch segment size attributes to DenseI32ArrayAttr
This reland includes changes to the Python bindings.

Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.

Depends on D131801

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D131803
2022-08-12 19:44:52 -04:00
Alex Zinenko
e8e718fa4b Revert "[mlir] Switch segment size attributes to DenseI32ArrayAttr"
This reverts commit 30171e76f0e5ea8037bc4d1450dd3e12af4d9938.

Breaks Python tests in MLIR, missing C API and Python changes.
2022-08-12 10:22:47 +02:00
Jeff Niu
30171e76f0 [mlir] Switch segment size attributes to DenseI32ArrayAttr
Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.

Depends on D131738

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D131702
2022-08-11 20:56:45 -04:00
River Riddle
ab9cdf09f4 [mlir:Parser] Don't use strings for the "ugly" form of Attribute/Type syntax
This commit refactors the syntax of "ugly" attribute/type formats to not use
strings for wrapping. This means that moving forward attirbutes and type formats
will always need to be in some recognizable form, i.e. if they use incompatible
characters they will need to manually wrap those in a string, the framework will
no longer do it automatically.

This has the benefit of greatly simplifying how parsing attributes/types work, given
that we currently rely on some extremely complicated nested parser logic which is
quite problematic for a myriad of reasons; unecessary complexity(we create a nested
source manager/lexer/etc.), diagnostic locations can be off/wrong given string escaping,
etc.

Differential Revision: https://reviews.llvm.org/D118505
2022-07-05 16:20:30 -07:00
Christian Sigg
3e01af093f [mlir] Add InferIntRangeInterface to gpu.launch
Infers block/grid dimensions/indices or ranges of such dimensions/indices.

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D129036
2022-07-05 07:14:54 +02:00
Mogball
7bdd3722f2 [mlir][gpu] Change ParalellLoopMappingAttr to AttrDef
It was a StructAttr. Also adds a FieldParser for AffineMap.

Depends on D127348

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D127350
2022-06-09 22:23:21 +00:00
Christian Sigg
bcf3d52486 [MLIR][GPU] Expose GpuParallelLoopMapping as non-test pass.
Reviewed By: bondhugula, herhut

Differential Revision: https://reviews.llvm.org/D126199
2022-05-30 09:20:48 +02:00
Arnab Dutta
16219f8c94 [MLIR][GPU] Add canonicalizer for gpu.memcpy
Erase gpu.memcpy op when only uses of dest are
the memcpy op in question, its allocation and deallocation
ops.

Reviewed By: bondhugula, csigg

Differential Revision: https://reviews.llvm.org/D124257
2022-05-14 19:01:04 +05:30
Thomas Raoux
15bcc36eed [mlir][gpu] Move async copy ops to NVGPU and add caching hints
Move async copy operations to NVGPU as they only exist on NV target and are
designed to match ptx semantic. This allows us to also add more fine grain
caching hint attribute to the op.
Add hint to bypass L1 and hook it up to NVVM op.

Differential Revision: https://reviews.llvm.org/D125244
2022-05-10 22:30:24 +00:00
River Riddle
a8308020ac [mlir] Remove special case parsing/printing of func operations
This was leftover from when the standard dialect was destroyed, and
when FuncOp moved to the func dialect. Now that these transitions
have settled a bit we can drop these.

Most updates were handled using a simple regex: replace `^( *)func` with `$1func.func`

Differential Revision: https://reviews.llvm.org/D124146
2022-05-06 13:36:15 -07:00
Chris Lattner
d85eb4e2d6 [AsmParser] Introduce a new "Argument" abstraction + supporting logic
MLIR has a common pattern for "arguments" that uses syntax
like `%x : i32 {attrs} loc("sourceloc")` which is implemented
in adhoc ways throughout the codebase.  The approach this uses
is verbose (because it is implemented with parallel arrays) and
inconsistent (e.g. lots of things drop source location info).

Solve this by introducing OpAsmParser::Argument and make addRegion
(which sets up BlockArguments for the region) take it.  Convert the
world to propagating this down.  This means that we correctly
capture and propagate source location information in a lot more
cases (e.g. see the affine.for testcase example), and it also
simplifies much code.

Differential Revision: https://reviews.llvm.org/D124649
2022-04-29 12:19:34 -07:00
Fangrui Song
ae46b3e01f Revert D121279 "[MLIR][GPU] Add canonicalizer for gpu.memcpy"
This reverts commit 12f55cac69d8978d1c433756a8b2114bf9ed1e1b.

Causes miscompile. Will follow up with a reproduce.
2022-04-21 08:55:13 -07:00
Uday Bondhugula
f47a38f517 Add async dependencies support for gpu.launch op
Add async dependencies support for gpu.launch op: this allows specifying
a list of async tokens ("streams") as dependencies for the launch.

Update the GPU kernel outlining pass lowering to propagate async
dependencies from gpu.launch to gpu.launch_func op. Previously, a new
stream was being created and destroyed for a kernel launch. The async
deps support allows the kernel launch to be serialized on an existing
stream.

Differential Revision: https://reviews.llvm.org/D123499
2022-04-21 16:25:59 +05:30
River Riddle
0fd3a1ce60 [mlir][NFC] Update remaining textual references of un-namespaced func operations
The special case parsing of operations in the `func` dialect is being removed, and
operations will require the dialect namespace prefix.
2022-04-20 22:17:31 -07:00
River Riddle
412b8850f6 [mlir][NFC] Update textual references of func to func.func in Bufferization/Complex/EmitC/CF/Func/GPU tests
The special case parsing of `func` operations is being removed.
2022-04-20 22:17:28 -07:00
Arnab Dutta
12f55cac69 [MLIR][GPU] Add canonicalizer for gpu.memcpy
Fold away gpu.memcpy op when only uses of dest are
the memcpy op in question, its allocation and deallocation
ops.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D121279
2022-04-19 17:54:00 +05:30
Arnab Dutta
392d55c1e2 [MLIR][GPU] Add canonicalization patterns for folding simple gpu.wait ops.
* Fold away redundant %t = gpu.wait async + gpu.wait [%t] pairs.

* Fold away %t = gpu.wait async ... ops when %t has no uses.

* Fold away gpu.wait [] ops.

* In case of %t1 = gpu.wait async [%t0], replace all uses of %t1
  with %t0.

Differential Revision: https://reviews.llvm.org/D121878
2022-04-14 12:30:55 +05:30
Thomas Raoux
d77f483640 [mlir][gpu] Relax restriction on mma load/store op
Those ops can support more complex layout as long as the most inner
dimension is contiguous.

Differential Revision: https://reviews.llvm.org/D122452
2022-03-25 04:03:40 +00:00
River Riddle
4a3460a791 [mlir:FunctionOpInterface] Rename the "type" attribute to "function_type"
This removes any potential confusion with the `getType` accessors
which correspond to SSA results of an operation, and makes it
clear what the intent is (i.e. to represent the type of the function).

Differential Revision: https://reviews.llvm.org/D121762
2022-03-16 17:07:04 -07:00
Ivan Butygin
9f864a5447 [mlir][gpu] Introduce gpu.global_id op
Introduce OpenCL-style global_id op and corresponding spirv lowering.

Differential Revision: https://reviews.llvm.org/D121548
2022-03-15 13:25:50 +03:00
Chia-hung Duan
ed645f6336 [mlir] Support verification order (3/3)
In this CL, update the function name of verifier according to the
behavior. If a verifier needs to access the region then it'll be updated
to `verifyRegions`.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D120373
2022-03-11 01:16:28 +00:00
Chia-hung Duan
9445b39673 [mlir] Support verification order (2/3)
This change gives explicit order of verifier execution and adds
    `hasRegionVerifier` and `verifyWithRegions` to increase the granularity
    of verifier classification. The orders are as below,

    1. InternalOpTrait will be verified first, they can be run independently.
    2. `verifyInvariants` which is constructed by ODS, it verifies the type,
       attributes, .etc.
    3. Other Traits/Interfaces that have marked their verifier as
       `verifyTrait` or `verifyWithRegions=0`.
    4. Custom verifier which is defined in the op and has marked
       `hasVerifier=1`

    If an operation has regions, then it may have the second phase,

    5. Traits/Interfaces that have marked their verifier as
       `verifyRegionTrait` or
       `verifyWithRegions=1`. This implies the verifier needs to access the
       operations in its regions.
    6. Custom verifier which is defined in the op and has marked
       `hasRegionVerifier=1`

    Note that the second phase will be run after the operations in the
    region are verified. Based on the verification order, you will be able to
    avoid verifying duplicate things.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D116789
2022-02-25 19:04:56 +00:00
Krzysztof Drewniak
84718d37db [MLIR][GPU] Add gpu.set_default_device op
This op is added to allow MLIR code running on multi-GPU systems to
select the GPU they want to execute operations on when no GPU is
otherwise specified.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D119883
2022-02-17 21:30:09 +00:00
Ivan Butygin
d271fc04d5 [mlir][gpu] Split ops sinking from gpu-kernel-outlining pass into separate pass
Previously `gpu-kernel-outlining` pass was also doing index computation sinking into gpu.launch before actual outlining.
Split ops sinking from `gpu-kernel-outlining` pass into separate pass, so users can use theirs own sinking pass before outlining.
To achieve old behavior users will need to call both passes: `-gpu-launch-sink-index-computations -gpu-kernel-outlining`.

Differential Revision: https://reviews.llvm.org/D119932
2022-02-17 10:34:20 +03:00