Although XeVM is an LLVM extension dialect,
SPIR-V backend relies on [function
calls](https://llvm.org/docs/SPIRVUsage.html#instructions-as-function-calls)
instead of defining LLVM intrinsics to represent SPIR-V instructions.
convert-xevm-to-llvm pass lowers xevm ops to function declarations and
calls using the above naming convention.
In the future, most part of the pass should be replaced with llvmBuilder
and handled as part of translation to LLVM instead.
---------
Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com>
The MappableType OpenACC type interface is a richer interface that
allows OpenACC dialect to be capable to better interact with a source
dialect, FIR in this case. fir.box and fir.class types already
implemented this interface. Now the same is being done with the other
FIR types that represent variables.
One additional notable change is that fir.array no longer implements
this interface. This is because MappableType is primarily intended for
variables - and FIR variables of this type have storage associated and
thus there's a pointer-like type (fir.ref/heap/pointer) that holds the
array type.
The end goal of promoting these FIR types to MappableType is that we
will soon implement ability to generate recipes outside of the frontend
via this interface.
This patch adds a better maskedload/maskedstore lowering on amdgpu
backend for loads which are either fully masked or fully unmasked. For
these cases, we can either generate a oob buffer load with no if
condition, or we can generate a normal load with a if condition (if no
fat_raw_buffer space).
This is a fix for https://github.com/llvm/llvm-project/pull/136102. It
missed scoping for `DeclareFuncOps`.
In scenarios with multiple function declarations, the `valueMapper`
wasn't updated and later uses of values in other functions still used
the assigned names in prior functions.
This is visible in the reproducer here
https://github.com/iree-org/iree/issues/21303: Although the counter for
variable enumeration was reset, as it is visible for the local vars, the
function arguments were mapped to old names. Due to this mapping, the
counter was never increased, and the local variables conflicted with the
arguments.
This fix adds proper scoping for declarations and a test-case to cover
the scenario with multiple `DeclareFuncOps`.
Add the supporting OpenMP Dialect operations, types, and interfaces for
modelling
MLIR Operations:
* omp.newcli
* omp.canonical_loop
MLIR Types:
* !omp.cli
MLIR Interfaces:
* LoopTransformationInterface
As a first loop transformations to be able to use these new operation in
follow-up PRs (#144785)
* omp.unroll_heuristic
This PR adds a new transformation that turns sequences of `vector.to_elements` and `vector.from_elements` into a binary tree of `vector.shuffle` operations.
(Related RFC:
https://discourse.llvm.org/t/rfc-adding-vector-to-elements-op-to-the-vector-dialect/86779).
Example:
```
%0:4 = vector.to_elements %a : vector<4xf32>
%1:4 = vector.to_elements %b : vector<4xf32>
%2:4 = vector.to_elements %c : vector<4xf32>
%3 = vector.from_elements %0#0, %0#1, %0#2, %0#3,
%1#0, %1#1, %1#2, %1#3,
%2#0, %2#1, %2#2, %2#3 : vector<12xf32>
==>
%0 = vector.shuffle %a, %b [0, 1, 2, 3, 4, 5, 6, 7] : vector<4xf32>, vector<4xf32>
%1 = vector.shuffle %c, %c [0, 1, 2, 3, -1, -1, -1, -1] : vector<4xf32>, vector<4xf32>
%2 = vector.shuffle %0, %1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] : vector<8xf32>, vector<8xf32>
```
The algorithm leverages the structured extraction/insertion information
of `vector.to_elements` and `vector.from_elements` operations and builds
a set of intervals to determine the vector length that should be used at
each level of the tree to combine the level inputs in pairs.
There are a few improvements that can be implemented in the future, such
as shuffle mask compression to avoid unnecessarily large vector lengths
with poison values, but I decided to keep things "simpler" and spend
more time documenting the different steps of the algorithm so that
people can follow along.
Since `scf::tileUsingSCF` is the core method used for tiling the root
operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the
latter can fuse into any tiled loop generated using `scf::tileUsingSCF`.
This patch adds a test for tiling a root operation using
`ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and
fusing producers with it.
Since this strategy generates a rank-reducing extract slice
`tensor::replaceExtractSliceWithTiledProducer` which is the core method
used for the fusion was extended to handle the rank-reducing slices.
Also fix a small bug in the computation of the reduction induction
variable (which needs to use `floorDiv` instead of `ceilDiv`)
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
SPIRV_AnyTensorArm is a recently added composite type. This patch adds
to the list of composite type represented by SPIRV_Composite. This was
missing from a previous
[patch](https://github.com/llvm/llvm-project/pull/144667) where
SPIRV_AnyTensorArm was introduced.
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
This change only applies to functions the can be reasonably expected to
use SVE registers.
Modifying vector length in the middle of a function might cause
incorrect stack deallocation if there are callee-saved SVE registers or
incorrect access to SVE stack slots.
Addresses (non-issue) https://github.com/llvm/llvm-project/issues/143670
When collapsing linalg dimensions we check if its memref operands are
guaranteed to be collapsible. However, we currently assume that the
matching indexing map is the identity map.
This commit modifies this behavior and checks if the memref is
collapsible on the transformed dimensions.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The patch introduced changes to add address spaces to a wider array of MLIR/LLVM values, however,
it was missing an address space cast that exists in our downstream implementation that's required
for declare target to work correctly.
Cooperative matrix operands are only supported for `add/sub/mul/div`
binary arithmetic ops, but currently all binary arithmetic ops accept
cooperative matrix operands, including `mod/rem`. This change fixes this
behaviour.
When the result of an insert op is used by an insert op, and the
subsequent insert op is inserted at the same location as the previous
insert op, replaces the dest of the subsequent insert op with the dest
of the previous insert op.This is because the previous insert op does
not affect subsequent insert ops.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
Despite currently being ignored with a warning, simd as a leaf in
composite constructs behaves as expected when the construct does not
contain a reduction. Enable it for those non-reduction constructs.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
A new transform op to represent that an attribute is to be chosen from a
set of alternatives and that this choice is made available as a
`!transform.param`. When a `selected` argument is provided, the op's
`apply()` semantics is that of just making this selected attribute
available as the result. When `selected` is not provided, `apply()`
complains that nothing has resolved the non-determinism that the op is
representing.
Changed naming of loop induction variables to follow natural naming (i,
j, k, ...). This helps readability and locating positions referred to.
Created new scopes to represent different behavior at function and loop
level, to still enable re-using value names between different functions
(as before). Removed unused scoping at other levels.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
The motivation is to avoid having to negate `isDynamic*` checks, avoid
double negations, and allow for `ShapedType::isStaticDim` to be used in
ADT functions without having to wrap it in a lambda performing the
negation.
Also add the new functions to C and Python bindings.
This revision adds DeviceMaskingAttrInterface and extends
DeviceMappingArrayAttr to accept a union of DeviceMappingAttrInterface
and DeviceMaskingAttrInterface.
Support is added to GPUTransformOps to take advantage of this
information and lower to block/warpgroup/warp/thread specialization when
mapped to linear ids.
The revision also connects to scf::ForallOp and uses the new attribute
to implement warp specialization.
The implementation is in the form of a GPUMappingMaskAttr, which can be
additionally passed to the scf.forall.mapping attribute to specify a
mask on compute resources that should be active.
In the first implementation the masking is a bitfield that specifies for
each processing unit whether it is active or not.
In the future, we may want to implement this as a symbol to refer to
dynamically defined values.
Extending op semantics with an operand is deemed too intrusive at this
time.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <git@ozinenko.com>
This reverts commit 7a6435bec59010e4bb2e1e52a9ba840ed152b4ce.
This causes ubsan failures when the sentinel pointers are upcast
using static_cast<>, which checks alignment.
This revision adds a new attribute for mapping `scf.forall` to linear
lane ids.
Example:
```
// %arg2 and %arg3 map to lanes [0, 6) and are turned into epxressions
// involving threadIdx.x/y by the map_nested_forall_to_threads
// transformation. This results in a if (linear_thread_id < 6) conditional.
scf.forall (%arg2, %arg3) in (2, 3) {
...
} {mapping = [#gpu.lane<linear_dim_0>, #gpu.lane<linear_dim_1>]}
```
---------
Co-authored-by: Oleksandr "Alex" Zinenko <git@ozinenko.com>
RFC:
https://discourse.llvm.org/t/rfc-deprecate-linalg-elemwise-unary-and-elemwise-binary/87144
Remove the two operations and fix the tests by:
* Cleaning simple operation tests of the old ops
* Changing `linalg.elemwise_{u|bi}nary` with `linalg.{exp|add}` on
transform tests
* Changing some of the tests with `linalg.elementwise` instead, to
broaden test coverage
* Surgically removing the `elemwise_*` part in the Python tests
* Update MLIR transform examples (text and tests) with
`linalg.elementwise` instead
Nothing else changed.
DenseMapInfo for pointers currently uses empty/tombstone values that are
aligned (by assuming a very conservative alignment). However, this means
that we have to work with larger immediates.
This patch proposes to use the values -1 and -2 instead, without caring
about pointer alignment. (Non-roundtrip) integer to pointer casts are
implementation-defined in C++, but the general implementer consensus
(including Clang) is that raw pointers do not carry alignment
requirements, only memory accesses do.
We already have lots of places that rely on this using variations on
`reinterpret_cast<T*>(-1)`, so it seems odd to insist on properly
aligned pointers in this one place.
It is necessary to adjust a few other places after this change, which
currently assume that `DenseMapInfo<void *>` returns a highly-aligned
pointer.
This is a small improvement for both compile-time and clang binary size.
In the process of adding support for Aligned, I have noticed that the
support for `MakePointerAvailable` and `MakePointerVisible` is
incomplete as the operation does not accept a scope nor check for
`NonPrivatePointer`. The PR does not address it, but the relevant issues
has been created #145485.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.