This PR adds pattern to distribute the load/store/prefetch nd ops with
offsets from workgroup to subgroup IR. This PR is part of the transition
to move offsets from create_nd to load/store/prefetch nd ops.
Create_nd PR : #152351
addLocalFloorDiv currently returns void and requires the caller to know
that the newly added local variable is in a particular index. This
commit returns the index of the newly added variable so that callers
need not tie themselves to this implementation detail.
I found one relevant callsite demonstrating this and updated it. I am
using this API out of tree and wanted to make our out-of-tree code a bit
more resilient to upstream changes.
Add a debugging flag to the dialect conversion to dump the
materialization kind. This flag is useful to find out whether a missing
materialization rule is for source or target materializations.
Also add missing test coverage for the `buildMaterializations` flag.
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.
Example:
%1 = omp.target_allocmem %device : i32, i64
omp.target_freemem %device, %1 : i32, i64
The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
An operand of the nested yield op can be null and hasn't been verified
yet when processing the enclosing operation. Using `getResultTypes()`
will dereference this null Value and crash in the verifier.
This is in preparation of a follow-up change to stop traversing
unreachable blocks.
This is not NFC because of a subtlety of the early_inc. On a test case
like:
```
scf.if %cond {
"test.move_after_parent_op"() ({
"test.any_attr_of_i32_str"() {attr = 0 : i32} : () -> ()
}) : () -> ()
}
```
We recursively traverse the nested regions, and process an op when the
region is done (post-order).
We need to pre-increment the iterator before processing an operation in
case it gets deleted. However
we can do this before or after processing the nested region. This
implementation does the latter.
Operations like:
%add = arith.addi %add, %add : i64
are legal in unreachable code. Unfortunately many patterns would be
unsafe to apply on such IR and can lead to crashes or infinite loops. To
avoid this we can remove unreachable blocks before attempting to apply
patterns.
We may have to do this also whenever the CFG is changed by a pattern, it
is left up for future work right now.
Fixes#153732
Retry landing https://github.com/llvm/llvm-project/pull/153373
## Major changes from previous attempt
- remove the test in CAPI because no existing tests in CAPI deal with
sanitizer exemptions
- update `mlir/docs/Dialects/GPU.md` to reflect the new behavior: load
GPU binary in global ctors, instead of loading them at call site.
- skip the test on Aarch64 since we have an issue with initialization there
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Add support for 1:N type conversions to the `ControlFlowToLLVM` lowering
patterns. Not applicable to `cf.switch` and `cf.assert`.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
Add support for 1:N type conversions to the `FuncToLLVM` lowering
patterns. This commit does not change the lowering of any types (such as
`MemRefType`). It just sets up the infrastructure, such that 1:N type
conversions can be used during `FuncToLLVM`.
Note: When the converted result types of a `func.func` have more than 1
type, then the results are wrapped in an `llvm.struct`. That's because
`llvm.func` does not support multiple result values. This "wrapping" was
already implemented for cases where the original `func.func` has
multiple results. With 1:N conversions, even a single result can now
expand to multiple converted results, triggering the same wrapping
mechanism.
The test cases are exercised with both the old and the new no-rollback
conversion driver.
Setting LLVM_LIT_ARGS to include --quiet and then running check-mlir in
a standard checkout will otherwise cause test failures here because
LLVM_LIT_ARGS gets propagated into this project.
Similar to `IntegerRelation::addLocalFloorDiv`, this adds a utility
`IntegerRelation::addLocalModulo` that adds and returns a local variable
that is the modulus of an affine function of the variables modulo some
constant modulus. The function returns the absolute index of the new var
in the relation.
This is computed by first finding the floordiv of `exprs // modulus = q`
and then computing the remainder `result = exprs - q * modulus`.
Signed-off-by: Asra Ali <asraa@google.com>
This MR adds a `verifier` for the `emitc.get_field` op.
- The `verifier` checks that the `emitc.get_field` operation is nested
inside an `emitc.class` op.
- Additionally, appropriate tests for erroneous cases were added for
class-related operations in `invalid_ops.mlir`.
Currently, modifier is printed as address, so it is not readable and not
useful. This PR adds readable printing for it.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
We've 2 ops:
1. nvvm.griddepcontrol.wait
2. nvvm.griddepcontrol.launch_dependents
They are related to Grid Dependent Launch (or programmatic dependent
launch in CUDA) and same concept. This PR unifies both ops into a single
one.
* Add `requiresArgsAndResultsAttr` to `LLVM_OneResultIntrOp`
* Add `args_attrs` to `llvm.intr.masked.{expandload,compressstore}`
The LLVM intrinsics
[`llvm.intr.masked.expandload`](https://llvm.org/docs/LangRef.html#llvm-masked-expandload-intrinsics)
and
[`llvm.intr.masked.compressstore`](https://llvm.org/docs/LangRef.html#llvm-masked-compressstore-intrinsics)
both allow an optional align parameter attribute to be set which
defaults to one.
Inlining the documentation below for [`llvm.intr.masked.expandload` 's
](https://llvm.org/docs/LangRef.html#id1522) and
[`llvm.intr.masked.compressstore`'s](https://llvm.org/docs/LangRef.html#id1522)
arguments respectively
> The `align` parameter attribute can be provided for the first
argument. The pointer alignment defaults to 1.
> The `align` parameter attribute can be provided for the second
argument. The pointer alignment defaults to 1.
Prior to this PR, the default behaviour of a conversion pattern which
receives operands of a 1:N is to abort the compilation. This has
historically been useful when the 1:N type conversion got merged into
the dialect conversion as it allowed us to easily find patterns that
should be capable of handling 1:N type conversions but didn't.
However, this behaviour has the disadvantage of being non-composable:
While the pattern in question cannot handle the 1:N type conversion,
another pattern part of the set might, but doesn't get the chance as
compilation is aborted.
This PR fixes this behaviour by failing to match and instead of
aborting, giving other patterns the chance to legalize an op. The
implementation uses a reusable function called `dispatchTo1To1` to allow
derived conversion patterns to also implement the behaviour.
First step in introducing the wasm-import target to mlir-translate.
This is the first PR to introduce the pass, with this PR, there is very
little support for the actual WebAssembly language, it's mostly there to
introduce the skeleton of the importer. A follow-up will come with
support for a wider range of operators. It was split to make it easier
to review, since it's a good chunk of work.
---------
Co-authored-by: Luc Forget <dev@alias.lforget.fr>
Co-authored-by: Ferdinand Lemaire <ferdinand.lemaire@woven-planet.global>
Co-authored-by: Jessica Paquette <jessica.paquette@woven-planet.global>
Co-authored-by: Luc Forget <luc.forget@woven.toyota>
SPIRVImageInterfaces.h.inc uses some types, e.g. mlir::TypedValue,
without #include the necessary headers. This is fine most of the time,
but we did run into a weird case where bazel fails to compile
//mlir:SPIRVImageInterfaces on clang19 for ChromiumOS when parse_headers
(see [1]) is specified.
[1]: https://bazel.build/docs/bazel-and-cpp#toolchain-features
Like we did for the 'private' clause, this adds an easier to use helper
function to add the 'firstprivate' clause + recipe to the Parallel and
Serial ops.
Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
1. compute Strides;
2. compute Offsets;
3. collapseMemrefTo1D;
4. create Load gather or store_scatter op
Split the function into two: one that copies a single unranked
descriptor and one that copies multiple unranked descriptors. This is in
preparation of adding 1:N support to the Func->LLVM lowering patterns.
Both `linalg.map` and `linalg.reduce` are sometimes printed in short
form incorrectly, resulting in a round-trip output with different
semantics. This patch adds additional `yield` operand checks to ensure
that all criteria for short-form printing are satisfied. Updated/added
comments and renamed the `findPayloadOp` function to `canUseShortForm`,
which more accurately reflects its purpose. A couple of new lit tests
check for the proper use of long form when short-form conditions are not
met.
Fixes#117528
Ref: https://discourse.llvm.org/t/mlir-project-maintainers/87189
See also:
* #151721
* #150945
Compared to the original proposal, one change is included:
* The `ub` dialect has @Hardcode84 as maintainer.
Please accept to validate your nomination, let's keep new nominations
for follow up PRs.
Rename `computeSizes` to `computeSize` and make it compute just a single
size. This is in preparation of adding 1:N support to the Func->LLVM
lowering patterns.
This PR builds upon the previous #146531 and enables scalable
vectorization for `batch_mmt4d` as well.
---------
Signed-off-by: Ege Beysel <beyselege@gmail.com>
This header uses GPUModuleOp but does not directly include the header:
`error: no type named 'GPUModuleOp' in namespace 'mlir::gpu'; did you
mean 'ModuleOp'?`
Needed for #148286