Adds the `#llvm.target<triple = $TRIPLE, chip = $CHIP, features =
$FEATURES>` attribute and along with a `-llvm-target-to-data-layout`
pass to derive a MLIR data layout from the LLVM data layout string
(using the existing `DataLayoutImporter`). The attribute implements the
relevant DLTI-interfaces, to expose the `triple`, `chip` (AKA `cpu`) and
`features` on `#llvm.target` and the full `DataLayoutSpecInterface`. The
pass combines the generated `#dlti.dl_spec` with an existing `dl_spec`
in case one is already present, e.g. a `dl_spec` which is there to
specify size of the `index` type.
Adds a `TargetAttrInterface` which can be implemented by all attributes
representing LLVM targets.
Similar to the Draft PR https://github.com/llvm/llvm-project/pull/78073.
RFC on which this PR is based:
https://discourse.llvm.org/t/mandatory-data-layout-in-the-llvm-dialect/85875
I have seen misuse of the `hasEffect` API in downstream projects: users
sometimes think that `hasEffect == false` indicates that the operation
does not have a certain memory effect. That's not necessarily the case.
When the op does not implement the `MemoryEffectsOpInterface`, it is
unknown whether it has the specified effect. "false" can also mean
"maybe".
This commit clarifies the semantics in the documentation. Also adds
`hasUnknownEffects` and `mightHaveEffect` convenience functions. Also
simplifies a few call sites.
Exposes the `tensor.extract_slice` reshaping logic in
`BubbleUpExpandShapeThroughExtractSlice` and
`BubbleUpCollapseShapeThroughExtractSlice` through two corresponding
utility functions. These compute the offsets/sizes/strides of an extract
slice after either collapsing or expanding.
This should also make it easier to implement the two other bubbling
cases: (1) the `collapse_shape` is a consumer or (2) the `expand_shape`
is a consumer.
---------
Signed-off-by: Ian Wood <ianwood@u.northwestern.edu>
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 30 instances that rely on this "redirection". Since the
redirection doesn't improve readability, this patch replaces SmallSet
with SmallPtrSet for pointer element types.
I'm planning to remove the redirection eventually.
## Description
This change introduces a new canonicalization pattern for the MLIR
Vector dialect that optimizes chains of insertions. The optimization
identifies when a vector is **completely** initialized through a series
of vector.insert operations and replaces the entire chain with a
single `vector.from_elements` operation.
Please be aware that the new pattern **doesn't** work for poison vectors
where only **some** elements are set, as MLIR doesn't support partial
poison vectors for now.
**New Pattern: InsertChainFullyInitialized**
* Detects chains of vector.insert operations.
* Validates that all insertions are at static positions, and all
intermediate insertions have only one use.
* Ensures the entire vector is **completely** initialized.
* Replaces the entire chain with a
single vector.from_elementts operation.
**Refactored Helper Function**
* Extracted `calculateInsertPosition` from
`foldDenseElementsAttrDestInsertOp` to avoid code duplication.
## Example
```
// Before:
%v1 = vector.insert %c10, %v0[0] : i64 into vector<2xi64>
%v2 = vector.insert %c20, %v1[1] : i64 into vector<2xi64>
// After:
%v2 = vector.from_elements %c10, %c20 : vector<2xi64>
```
It also works for multidimensional vectors.
```
// Before:
%v1 = vector.insert %cv0, %v0[0] : vector<3xi64> into vector<2x3xi64>
%v2 = vector.insert %cv1, %v1[1] : vector<3xi64> into vector<2x3xi64>
// After:
%0:3 = vector.to_elements %arg1 : vector<3xi64>
%1:3 = vector.to_elements %arg2 : vector<3xi64>
%v2 = vector.from_elements %0#0, %0#1, %0#2, %1#0, %1#1, %1#2 : vector<2x3xi64>
```
---------
Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
In FoldArithToVectorOuterProduct pattern, static cast to vector type
causes assertion when a scalar type was encountered. It seems the author
meant to have a dyn_cast instead.
This NFC patch handles it by using dyn_cast.
This PR adds pattern to distribute the load/store/prefetch nd ops with
offsets from workgroup to subgroup IR. This PR is part of the transition
to move offsets from create_nd to load/store/prefetch nd ops.
Create_nd PR : #152351
addLocalFloorDiv currently returns void and requires the caller to know
that the newly added local variable is in a particular index. This
commit returns the index of the newly added variable so that callers
need not tie themselves to this implementation detail.
I found one relevant callsite demonstrating this and updated it. I am
using this API out of tree and wanted to make our out-of-tree code a bit
more resilient to upstream changes.
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.
Example:
%1 = omp.target_allocmem %device : i32, i64
omp.target_freemem %device, %1 : i32, i64
The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
An operand of the nested yield op can be null and hasn't been verified
yet when processing the enclosing operation. Using `getResultTypes()`
will dereference this null Value and crash in the verifier.
This MR adds a `verifier` for the `emitc.get_field` op.
- The `verifier` checks that the `emitc.get_field` operation is nested
inside an `emitc.class` op.
- Additionally, appropriate tests for erroneous cases were added for
class-related operations in `invalid_ops.mlir`.
* Add `requiresArgsAndResultsAttr` to `LLVM_OneResultIntrOp`
* Add `args_attrs` to `llvm.intr.masked.{expandload,compressstore}`
The LLVM intrinsics
[`llvm.intr.masked.expandload`](https://llvm.org/docs/LangRef.html#llvm-masked-expandload-intrinsics)
and
[`llvm.intr.masked.compressstore`](https://llvm.org/docs/LangRef.html#llvm-masked-compressstore-intrinsics)
both allow an optional align parameter attribute to be set which
defaults to one.
Inlining the documentation below for [`llvm.intr.masked.expandload` 's
](https://llvm.org/docs/LangRef.html#id1522) and
[`llvm.intr.masked.compressstore`'s](https://llvm.org/docs/LangRef.html#id1522)
arguments respectively
> The `align` parameter attribute can be provided for the first
argument. The pointer alignment defaults to 1.
> The `align` parameter attribute can be provided for the second
argument. The pointer alignment defaults to 1.
Like we did for the 'private' clause, this adds an easier to use helper
function to add the 'firstprivate' clause + recipe to the Parallel and
Serial ops.
Lowering transfer_read/transfer_write to load_gather/store_scatter in
case the target uArch doesn't support load_nd/store_nd. The high level
steps:
1. compute Strides;
2. compute Offsets;
3. collapseMemrefTo1D;
4. create Load gather or store_scatter op
Both `linalg.map` and `linalg.reduce` are sometimes printed in short
form incorrectly, resulting in a round-trip output with different
semantics. This patch adds additional `yield` operand checks to ensure
that all criteria for short-form printing are satisfied. Updated/added
comments and renamed the `findPayloadOp` function to `canUseShortForm`,
which more accurately reflects its purpose. A couple of new lit tests
check for the proper use of long form when short-form conditions are not
met.
Fixes#117528
This PR builds upon the previous #146531 and enables scalable
vectorization for `batch_mmt4d` as well.
---------
Signed-off-by: Ege Beysel <beyselege@gmail.com>
As part of XeVM dialect upsteaming, covers remaining parts required for XeVM dialect integration and testing.
It has two high level components
- XeVM target and serialization support
- XeVM dialect integration tests using level zero runtime
Co-Authored-by: Artem Kroviakov <artem.kroviakov@intel.com>
This commit improves the `allowPatternRollback` flag handling in the
dialect conversion driver. Previously, this flag was used to merely
detect cases that are incompatible with the new One-Shot Dialect
Conversion driver. This commit implements the driver itself: when the
flag is set to "false", all IR changes are materialized immediately,
bypassing the `IRRewrite` and `ConversionValueMapping` infrastructure.
A few selected test cases now run with both the old and the new driver.
RFC:
https://discourse.llvm.org/t/rfc-a-new-one-shot-dialect-conversion-driver/79083
This PR fixes a crash in `GpuKernelOutliningPass` that occurred when
encountering a symbol that was not a `FlatSymbolRefAttr`, enabling
outlining of nested `gpu.launch` operations. Fixes#149318.
Previously, the NVVM dialect's ldmatrix operation could only generate a
limited subset of the available NVVM ldmatrix intrinsics. The intrinsics
generating new ops introduced in BlackWell are not accessible through
the NVVM ops. This commit extends the ldmatrix operation to support all
available ldmatrix intrinsics.
This PR adds pattern to distribute the create_nd_desc op without offsets
from workgroup (Wg) IR to subgroup (Sg) IR.
The round robin distribution logic (involves offset calculation) now
will happen in load/store/prefetch nd ops instead of create_nd.
Add missing xor AtomicRMWKind enum in arith. Also add support for xor to
memref.atomic_rmw so the change can be tested.
This does NOT add it for all users of the enum (e.g. Affine, Vector)
Fold `broadcast(shape_cast(x))` into `broadcast(x)` if the type of x is
compatible with broadcast's result type and the shape_cast only adds or removes ones in the leading dimensions.
---------
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
Co-authored-by: James Newling <james.newling@gmail.com>
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
In case of mixed precision inputs, the inputs are generally casted to
match output type thereby introduces arith.extFOp/extIOp instructions.
Folding such pattern into vector.contract is desirable for HW having
mixed precision ISA support.
This patch adds folding of mixed precision pattern into vector.contract
optionaly which can be enabled using attribute
`fold_type_extensions_into_contract`.
`gpu::LaunchOp` is updated the following way:
- Change the attribute type of kernel function and module from
`SymbolRefAttr` to `FlatSymbolRefAttr` to avoid nested symbol
references.
- Rename variables from camel case (kernelFunc, kernelModule) to lower
case (function, module) and update the syntax.
- `LaunchOp::build` support passing `module` and `function` attributes.