21585 Commits

Author SHA1 Message Date
vfdev
f136c800b6
Enabled freethreading support in MLIR python bindings (#122684)
Reland reverted https://github.com/llvm/llvm-project/pull/107103 with
the fixes for Python 3.8

cc @jpienaar

Co-authored-by: Peter Hawkins <phawkins@google.com>
2025-01-13 03:00:31 -08:00
xiaoleis-nv
d03f35f9b6
[MLIR][NVVM] Fix the datatype error for nvvm.mma.sync when the operand is bf16 (#122664)
The PR fixes the datatype error for `nvvm.mma.sync` when the operand is
`bf16`. This operation originally requires the A/B type to be `f16x2`
for the `bf16` MMA. However, it violates the NVVM intrinsic
[[here](372044ee09/llvm/include/llvm/IR/IntrinsicsNVVM.td (L119))],
where the A/B operand type should be `i32`. This is a bug, and there are
no tests in MLIR that cover this datatype.

```
    // mma bf16 -> s32 @ m16n8k16/m16n8k8
    !eq(gft,"m16n8k16🅰️bf16") : !listsplat(llvm_i32_ty, 4),
    !eq(gft,"m16n8k16🅱️bf16") : !listsplat(llvm_i32_ty, 2),
    !eq(gft,"m16n8k8🅰️bf16") : !listsplat(llvm_i32_ty, 2),
    !eq(gft,"m16n8k8🅱️bf16") : [llvm_i32_ty],
```

This PR addresses this bug and adds tests to guarantee correctness.

Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>
2025-01-13 15:03:05 +05:30
Clément Fournier
36c3466aef
[mlir][linalg] Fix neutral elt for softmax (#118952)
The decomposition of `linalg.softmax` uses `maxnumf`, but the identity
element that is used in the generated code is the one for `maximumf`.
They are not the same, as the identity for `maxnumf` is `NaN`, while the
one of `maximumf` is `-Infty`. This is wrong and prevents the maxnumf
from being folded.

Related to #114595, which fixed the folder for maxnumf.
2025-01-13 15:21:07 +08:00
Jacques Pienaar
3f1486f08e Revert "Added free-threading CPython mode support in MLIR Python bindings (#107103)"
Breaks on 3.8, rolling back to avoid breakage while fixing.

This reverts commit 9dee7c44491635ec9037b90050bcdbd3d5291e38.
2025-01-12 18:30:42 +00:00
vfdev
9dee7c4449
Added free-threading CPython mode support in MLIR Python bindings (#107103)
Related to https://github.com/llvm/llvm-project/issues/105522

Description:

This PR is a joint work with Peter Hawkins (@hawkinsp) originally done
by myself for pybind11 and then reworked to nanobind based on Peter's
branch: https://github.com/hawkinsp/llvm-project/tree/nbdev .

- Added free-threading CPython mode support for MLIR Python bindings
- Added a test which can reveal data races when cpython and LLVM/MLIR
compiled with TSAN

Context:
- Related to https://github.com/google/jax/issues/23073

Co-authored-by: Peter Hawkins <phawkins@google.com>
2025-01-12 09:56:49 -08:00
Twice
b91d5af1ac
[MLIR][Vector] Allow any strided memref for one-element vector.load in lowering vector.gather (#122437)
In `Gather1DToConditionalLoads`, currently we will check if the stride
of the most minor dim of the input memref is 1. And if not, the
rewriting pattern will not be applied. However, according to the
verification of `vector.load` here:

4e32271e8b/mlir/lib/Dialect/Vector/IR/VectorOps.cpp (L4971-L4975)

.. if the output vector type of `vector.load` contains only one element,
we can ignore the requirement of the stride of the input memref, i.e.
the input memref can be with any stride layout attribute in such case.

So here we can allow more cases in lowering `vector.gather` by relaxing
such check.

As shown in the test case attached in this patch
[here](1933fbad58/mlir/test/Dialect/Vector/vector-gather-lowering.mlir (L151)),
now `vector.gather` of memref with non-trivial stride can be lowered
successfully if the result vector contains only one element.

---------

Signed-off-by: PragmaTwice <twice@apache.org>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
2025-01-12 16:02:41 +00:00
Matthias Springer
6422546e99
[mlir][LLVM] Fix conversion of non-standard MLIR float types (#122634)
Certain non-standard float types were directly passed through in the
LLVM type converter, resulting in invalid IR or failed assertions:

```
mlir-opt: mlir/lib/Conversion/LLVMCommon/TypeConverter.cpp:638: FailureOr<Type> mlir::LLVMTypeConverter::convertVectorType(VectorType) const: Assertion `LLVM::isCompatibleVectorType(vectorType) && "expected vector type compatible with the LLVM dialect"' failed.
```

The LLVM type converter should not define invalid type conversion rules
for such types. If there is no type conversion rule, conversion patterns
will not apply to ops with such operand types.
2025-01-12 15:17:12 +01:00
Kareem Ergawy
42da12063f
[flang][OpenMP] Extend delayed privatization for omp.simd (#122156)
Adds support for delayed privatization for `simd` directives. This PR
includes PFT down to LLVM IR lowering.
2025-01-12 07:46:58 +01:00
Kazu Hirata
4f4e2abb1a
[mlir] Migrate away from PointerUnion::{is,get} (NFC) (#122591)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2025-01-11 13:16:43 -08:00
William Moses
38fcf62483
[MLIR] Import LLVM add flag to disable loadalldialects (#122574)
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
2025-01-11 09:11:22 -05:00
William Moses
b306eff56f
[MLIR] Enable inlining for private symbols (#122572)
The inlining code for llvm funcs seems to have needlessly forbidden
inlining of private (e.g. non-cloning) symbols.
2025-01-11 09:10:27 -05:00
Kazu Hirata
35e89897a4
[Dialect] Migrate away from PointerUnion::{is,get} (NFC) (#122568)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>
2025-01-11 02:06:33 -08:00
Kazu Hirata
26d513d197
[TableGen] Migrate away from PointerUnion::{is,get} (NFC) (#122569)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>
2025-01-11 00:17:40 -08:00
Kazu Hirata
129ec84574
[Conversion] Migrate away from PointerUnion::{is,get} (NFC) (#122421)
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
2025-01-10 15:10:17 -08:00
Matthias Springer
5d26a6d759
[mlir][Interfaces] ViewLikeOpInterface: Remove parser/printer overloads (#122436)
#115808 adds additional `custom<>` parser/printer variants. The overall
list of overloads/variants is getting larger.

This commit removes overloads that are not needed, to keep the
parser/printer simple.
2025-01-10 17:18:53 +01:00
Guray Ozen
2e6030ef6a [MLIR][NVVM] Add missing cmake dependency
Another fix
2025-01-10 12:22:20 +01:00
Guray Ozen
1ef2580972 [MLIR][NVVM] Add missing cmake dependency
NVVMdialect uses InferIntRangeInterface, but its dependence was missing in cmake. This PR adds that.
2025-01-10 11:26:59 +01:00
Guray Ozen
66e41a1a20
[MLIR][NVVM] Declare InferIntRangeInterface for RangeableRegisterOp (#122263) 2025-01-10 10:32:25 +01:00
Lukas Sommer
4adeb6cf55
[mlir][spirv] Add convergent attribute to builtin (#122131)
Add the `convergent` attribute to builtin functions and builtin function
calls when lowering SPIR-V non-uniform group functions to LLVM dialect.

---------

Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
2025-01-10 09:15:18 +01:00
Longsheng Mou
9190e1c0ef
[mlir][linalg] Handle reassociationIndices correctly for 0D tensor (#121683)
This PR fixes a bug where a value is assigned to a 0-sized
reassociationIndices, preventing a crash. Fixes #116043.
2025-01-10 09:23:50 +08:00
Krzysztof Drewniak
0aa831e0ed
[mlir][GPU] Implement ValueBoundsOpInterface for GPU ID operations (#122190)
The GPU ID operations already implement InferIntRangeInterface, which
gives constant lower and upper bounds on those IDs when appropriate
metadata is prentent on the operations or in the surrounding context.

This commit uses that existing code to implement the
ValueBoundsOpInterface, which is used when analyzing affine operations
(unlike the integer range interface, which is used for arithmetic
optimization).

It also implements the interface for gpu.launch, where we can use it to
express the constraint that block/grid sizes are equal to their value
from outside the launch op and that the corresponding IDs are bounded
above by that size.

As a consequence, the test pass for this inference is updated to work on
a FunctionOpInterface and not a func.func, creating minor churn in other
tests.
2025-01-09 11:42:22 -08:00
Razvan Lupusoru
cbcb7ad32e
[mlir][acc] Introduce MappableType interface (#122146)
OpenACC data clause operations previously required that the variable
operand implemented PointerLikeType interface. This was a reasonable
constraint because the dialects currently mixed with `acc` do use
pointers to represent variables. However, this forces the "pointer"
abstraction to be exposed too early and some cases are not cleanly
representable through this approach (more specifically FIR's `fix.box`
abstraction).

Thus, relax this by allowing a variable to be a type which implements
either `PointerLikeType` interface or `MappableType` interface.
2025-01-09 10:27:37 -08:00
Andrea Faulds
7724be9728
[mlir][spirv] Do SPIR-V serialization in -test-vulkan-runner-pipeline (#121494)
This commit is a further incremental step toward moving the whole
mlir-vulkan-runner MLIR pass pipeline into mlir-opt (see #73457). The
previous step was b225b3adf7b78387c9fcb97a3ff0e0a1e26eafe2, which moved
all device passes prior to SPIR-V serialization into a new mlir-opt test
pass, `-test-vulkan-runner-pipeline`.

This commit changes how SPIR-V serialization is accomplished for Vulkan
runner tests. Until now, this was done by the Vulkan-specific
ConvertGpuLaunchFuncToVulkanLaunchFunc pass. With this commit, this
responsibility is removed from that pass, and is instead done with the
existing generic GpuModuleToBinaryPass. In addition, the SPIR-V
serialization step is no longer done inside mlir-vulkan-runner, but
rather inside mlir-opt (in the `-test-vulkan-runner-pipeline` pass).
Both of these changes represent a greater alignment between
mlir-vulkan-runner and the other GPU integration tests. Notably, the IR
shapes produced by the mlir-opt pipelines for the Vulkan and SYCL
runners are now much more similar, with both using a gpu.binary op for
the serialized SPIR-V kernel.

In order to enable this, this commit includes these supporting changes:

- ConvertToSPIRVPass is enhanced to support producing the IR shape where
a spirv.module is nested inside a gpu.module, since this is what
GpuModuleToBinaryPass expects.
- ConvertGPULaunchFuncToVulkanLaunchFunc is changed to remove its SPIR-V
serialization functionality, and instead now extracts the SPIR-V from a
gpu.binary operation (as produced by ConvertToSPIRVPass).
- `-test-vulkan-runner-pipeline` now attaches SPIR-V target information
required by GpuModuleToBinaryPass.
- The WebGPU pass option, which had been removed from mlir-vulkan-runner
in the previous commit in this series, is restored as an option to
`-test-vulkan-runner-pipeline` instead, so that the WebGPU pass
continues being inserted into the pipeline just before SPIR-V
serialization.
2025-01-09 17:58:51 +01:00
Alexander Belyaev
d056c756ae
[mlir][scf] Fix unrolling when the yielded value is defined above the loop. (#122177) 2025-01-09 17:31:17 +01:00
Andrzej Warzyński
dfa5ee2af2
[mlir][tensor] Add e2e test for tensor.unpack with dynamic tile sizes (#121557)
Adds an end-to-end test for `tensor.unpack` with dynamic inner tile sizes.
While relatively simple (e.g., no vectorization), this example required a few
fixes in handling `tensor.unpack` (and similar fixes for `tensor.pack` before
that):

* #119379, #121393, #121400.

The end goal for this test is to incrementally increase its complexity
and to work towards scalable tile sizes.

Note, this PR complements #115698 in which similar test for
`tensor.pack` was added.
2025-01-09 15:30:23 +00:00
Andrzej Warzyński
21ba7aef3b
[mlir][vector][nfc] Update alignedConversionPrecondition (#122136)
Adds some comments and re-name variables to clarify the usage.
2025-01-09 15:14:34 +00:00
Kareem Ergawy
6f9e688203
[flang][OpenMP] Fix reduction init region block management (#122079)
Replaces https://github.com/llvm/llvm-project/pull/121886
Fixes https://github.com/llvm/llvm-project/issues/120254 (hopefully 🤞)

## Problem

Consider the following example:
```fortran
program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program
```

The HLFIR+OMP IR for this example looks like this:
```mlir
  func.func @_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
      %6 = fir.alloca !fir.box<!fir.array<1xf32>>
      ...
      omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }
```

The problem addressed by this PR is related to: the `alloca` in the
`omp.parallel` region + the related `reduction` clause on the
`omp.wsloop` op. When we try translate the reduction from MLIR to LLVM,
we have to choose an `alloca` insertion point. This happens in
`convertOmpWsloop` where at entry to that function, this is what the
LLVM module looks like:

```llvm
define void @_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
```

Now, when we choose an `alloca` insertion point for the reduction, this
is the chosen block `omp.par.entry` (without the changes in this PR).
The problem is that the allocation needed for the reduction needs to
reference the `%5` SSA value. This results in inserting allocations in
`omp.par.entry` that reference allocations in a later block
`omp.par.region1` which causes the `Instruction does not dominate all
uses!` error.

## Possible solution - take 2:

This PR contains a more localized solution than
https://github.com/llvm/llvm-project/pull/121886. It makes sure that on
entry to `initReductionVars`, the IR builder is at a point where we can
starting inserting initialization region; to make things cleaner, we
still split the builder insertion point to a dedicated
`omp.reduction.init`. This way we avoid splitting after the latest
allocation block; which is what causing the issue.
2025-01-09 16:11:18 +01:00
Pietro Ghiglio
cdd652eb28
[MLIR][GPU] Support bf16 and i1 gpu::shuffles to LLVMSPIRV conversion (#119675)
This PR adds support to the `bf16` and `i1` data types when converting
`gpu::shuffle` to the `LLVMSPV` dialect, by inserting `bitcast` to/from
`i16` (for `bf16`) and extending/truncating to `i8` (for `i1`).
2025-01-09 13:16:18 +01:00
Arda Unal
b3ce6dc723
[mlir][licm] Make scf.if recursively speculatable (#122031)
This change:

-  makes **scf.if** recursively speculatable like **affine.if** is. 

- also introduces related LICM tests for both **scf.if** and
**affine.if**
2025-01-08 09:54:18 -08:00
Matthias Springer
4751f47c7a
[mlir][Transforms] Dialect conversion: Turn LLVM_DEPRECATED into comments (#122073)
Some functions of the deprecated 1:N dialect conversion were marked as
`LLVM_DEPRECATED`. This caused compilation warnings because there are
still test cases of the 1:N dialect conversion framework. (These test
cases will be deleted at the same time when the 1:N driver is deleted.)
2025-01-08 17:10:06 +01:00
Benjamin Kramer
0d7022ed75 [MLIR][GPU] Fix gpu.printf test syntax after f50f9698ad012882df8dd605f5482e280c138266 2025-01-08 15:17:39 +01:00
Benjamin Kramer
35c5e56b61 Clean up -Wdangling-assignment-gsl in clang and mlir
These are triggering after b037bceef6a40c5c00c1f67cc5a334e2c4e5e041.
2025-01-08 14:46:15 +01:00
William Moses
1c067a513c
[MLIR] Enable import of non self referential alias scopes (#121987)
Fixes #121965.

---------

Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
Co-authored-by: Alex Zinenko <git@ozinenko.com>
2025-01-08 13:40:05 +01:00
Jack Frankland
360a03c980
[mlir][tosa] Add acc_type to Tosa-v1.0 Conv Ops (#121466)
Tosa v1.0 adds accumulator type attributes to the various convolution
operations defined in the spec. Update the dialect and any lit tests to
include these attributes.

Signed-off-by: Tai Ly <tai.ly@arm.com>
Co-authored-by: Tai Ly <tai.ly@arm.com>
2025-01-08 12:12:26 +02:00
Longsheng Mou
c1d01b2fc2
[mlir][tosa] Add missing verifier for tosa.pad (#120934)
This PR adds a missing verifier for `tosa.pad`, ensuring that the
padding shape matches [2*rank(shape1)] according to V1.0.0
Specification. Fixes #119840.
2025-01-08 10:45:59 +02:00
Guray Ozen
f50f9698ad
[MLIR][GPU] Fix gpu.printf (#121940) 2025-01-08 08:25:57 +01:00
Alex MacLean
4583f6d344
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.

This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
2025-01-07 18:24:50 -08:00
Krzysztof Drewniak
c6f67b8e39
[mlir][affine] Add ValueBoundsOpInterface to [de]linearize_index (#121833)
Since a need for it came up dowstream (in proving that loops run at
least once), this commit implements the ValueBoundsOpInterface for
affine.delinearize_index and affine.linearize_index, using affine map
representations of the operations they perform.

These implementations also use information from outer bounds to impose
additional constraints when those are available.
2025-01-07 16:28:14 -06:00
vfdev
a0f5bbcfb7
Fixed typo in dunder get/set methods in PyAttrBuilderMap (#121794)
Description:
- fixed a typo in the method name: dunde -> dunder
2025-01-07 10:33:01 -05:00
vfdev
96f8cfe4d0
Cosmetic fixes in the code and typos in Python bindings docs (#121791)
Description:
- removed trailing spaces in few files
- fixed markdown link definition:
2025-01-07 10:32:01 -05:00
Michael Jungmair
1fb98b5a7e
[mlir][Transforms] Make LocationSnapshotPass respect OpPrintingFlags (#119373)
The current implementation of LocationSnapshotPass takes an
OpPrintingFlags argument and stores it as member, but does not use it
for printing.

Properly implement the printing flags, also supporting command line args.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2025-01-07 12:14:35 +01:00
William Moses
5656cbca52
[MLIR][CAPI] export LLVMFunctionType param getter and setters (#121888) 2025-01-07 02:39:44 -05:00
MaheshRavishankar
8cd94e0b6d
[mlir][Affine] Add nsw to lowering of AffineMulExpr. (#121535)
Since index operations have no set bitwidth, it is ill-defined to use
signed/unsigned wrapping behavior. The corollary to which is that it is
always safe to add nsw/nuw to lowering of affine ops.

Also add a folder to fold `div(s|u)i (mul (a, v), v) -> a`

Signed-off-by: MaheshRavishankar <mravisha@amd.com>
2025-01-06 14:57:24 -08:00
Markus Böck
97ea0aba15
[TableGen] Do not exit in template argument check (#121636)
The signature of `CheckTemplateArgValues` implements error handling via
the `bool` return type, yet always returned false. The single possible
error case instead used `PrintFatalError,` which exits the program
afterward.

This behavior is undesirable: It prevents any further errors from being
printed and makes TableGen less usable as a library as it crashes the
entire process (e.g. `tblgen-lsp-server`).

This PR therefore fixes the issue by using `Error` instead and returning
true if an error occurred. All callers already perform proper error
handling.

As `llvm-tblgen` exits on error, a test was also added to the LSP to
ensure it exits normally despite the error.
2025-01-06 21:06:17 +01:00
Ian Wood
fe42e63d7b
[mlir][NFC] Refactor eraseState to take constant time (#121670)
Refactors `analysisStates` to use two nested maps . This prevents
`eraseState` from having to scan through every analysis state which can
be costly when there are many analysis states and/or `eraseState` is
called frequently.

Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
2025-01-06 10:05:14 -08:00
Maksim Levental
0c1cf75300
[mlir] DCE RegisteredOperationName::parseAssembly decl (#121730) 2025-01-06 07:12:59 -05:00
Maksim Levental
9ce8f4b70b
[mlir] DCE friend Dialect::registerDialect (#121728) 2025-01-06 07:12:07 -05:00
Matthias Springer
599c739905
[mlir][GPU] Add NVVM-specific cf.assert lowering (#120431)
This commit add an NVIDIA-specific lowering of `cf.assert` to to
`__assertfail`.

Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and
`getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can
be reused.
2025-01-06 12:00:11 +01:00
Oleksandr "Alex" Zinenko
f6bfbc8777
[mlir] flush output in transform.print (#121382)
Print operations are often used for debugging, immediately before the
compiler aborts. In such cases, it is sometimes possible that the output
isn't fully produced yet. Make sure it is by explicitly flushing the
output.
2025-01-06 10:47:40 +01:00
Matthias Springer
5f7568a32c
[mlir][Transforms] Fix mapping in findOrBuildReplacementValue (#121644)
Fixes two minor issues in `findOrBuildReplacementValue`:
* Remove a redundant `mapping.map`.
* Map `repl` instead of `value`. We used to overwrite an existing
mapping, which could introduce extra materializations.

Note: We generally do not want to overwrite mappings, but create a chain
of mappings. There are still a few more places, where a mapping is
overwritten. Once those are fixed, I will put an assertion into
`ConversionValueMapping::map`.
2025-01-06 08:55:18 +01:00