This analysis currently just crashes when applied to a graph region that
has a use-def cycle. This PR fixes that by keeping track of the
operations the DFS has already visited when following use-def edges and
stopping once we visit an operation again.
This patch introduces two new ops to the SPIR-V dialect:
- `spirv.EXT.ConstantCompositeReplicate`
- `spirv.EXT.SpecConstantCompositeReplicate`
These ops represent composite constants and specialization constants,
respectively, constructed by replicating a single splat constant across
all elements. They correspond to `SPV_EXT_replicated_composites`
extension instructions:
- `OpConstantCompositeReplicatedEXT`
- `OpSpecConstantCompositeReplicatedEXT`
No transformation to these new ops has been introduced in this patch.
This approach is chosen as per the discussions on RFC
https://discourse.llvm.org/t/rfc-basic-support-for-spv-ext-replicated-composites-in-mlir-spir-v-compile-time-constant-lowering-only/86987
---------
Signed-off-by: Mohammadreza Ameri Mahabadian <mohammadreza.amerimahabadian@arm.com>
Reduction support: https://github.com/llvm/llvm-project/pull/146671
If Support is fixed in this PR
The problem for the IF clause in composite constructs was that wsloop
and simd both operate on the same CanonicalLoopInfo structure: with the
SIMD processed first, followed by the wsloop. Previously the IF clause
generated code like
```
if (cond) {
while (...) {
simd_loop_body;
}
} else {
while (...) {
nonsimd_loop_body;
}
}
```
The problem with this is that this invalidates the CanonicalLoopInfo
structure to be processed by the wsloop later. To avoid this, in this
patch I preserve the original loop, moving the IF clause inside of the
loop:
```
while (...) {
if (cond) {
simd_loop_body;
} else {
non_simd_loop_body;
}
}
```
On simple examples I tried LLVM was able to hoist the if condition
outside of the loop at -O3.
The disadvantage of this is that we cannot add the
llvm.loop.vectorize.enable attribute on either the SIMD or non-SIMD
loops because they both share a loop back edge. There's no way of
solving this without keeping the old design of having two different
loops: which cannot be represented using only one CanonicalLoopInfo
structure. I don't think the presence or absence of this attribute makes
much difference. In my testing it is the llvm.loop.parallel_access
metadata which makes the difference to vectorization. LLVM will
vectorize if legal whether or not this attribute is there in the TRUE
branch. In the FALSE branch this means the loop might be vectorized even
when the condition is false: but I think this is still standards
compliant: OpenMP 6.0 says that when the if clause is false that should
be treated like the SIMDLEN clause is one. The SIMDLEN clause is defined
as a "hint". For the same reason, SIMDLEN and SAFELEN clauses are
silently ignored when SIMD IF is used.
I think it is better to implement SIMD IF and ignore SIMDLEN and SAFELEN
and some vectorization encouragement metadata when combined with IF than
to ignore IF because IF could have correctness consequences whereas the
rest are optimiztion hints. For example, the user might use the IF
clause to disable SIMD programatically when it is known not safe to
vectorize the loop. In this case it is not at all safe to add the
parallel access or SAFELEN metadata.
Reapply attempt for : https://github.com/llvm/llvm-project/pull/148291
Fix for the build failure reported in :
https://lab.llvm.org/buildbot/#/builders/116/builds/15477
-----
This crash is caused by mismatch of distributed type returned by
`getDistributedType` and intended distributed type for forOp results.
Solution diff:
20c2cf6766
Example:
```
func.func @warp_scf_for_broadcasted_result(%arg0: index) -> vector<1xf32> {
%c128 = arith.constant 128 : index
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%2 = gpu.warp_execute_on_lane_0(%arg0)[32] -> (vector<1xf32>) {
%ini = "some_def"() : () -> (vector<1xf32>)
%0 = scf.for %arg3 = %c0 to %c128 step %c1 iter_args(%arg4 = %ini) -> (vector<1xf32>) {
%1 = "some_op"(%arg4) : (vector<1xf32>) -> (vector<1xf32>)
scf.yield %1 : vector<1xf32>
}
gpu.yield %0 : vector<1xf32>
}
return %2 : vector<1xf32>
}
```
In this case the distributed type for forOp result is `vector<1xf32>`
(result is not distributed and broadcasted to all lanes instead).
However, in this case `getDistributedType` will return NULL type.
Therefore, if the distributed type can be recovered from warpOp, we
should always do that first before using `getDistributedType`
This enables memref.load/store + vector.load/store support for sub-byte
float types. Since the memref types don't matter for loads/stores, we
still use the same types as integers with equivalent widths, with a few
extra bitcasts needed around certain operations.
There is no direct change needed for vector.load/store support. The
tests added for them are to verify that float types are
supported as well.
Currently `builder.create<...>` does not in any meaningful way hint/show
the various builders an op supports (arg names/types) because [`create`
forwards the args to
`build`](887222e352/mlir/include/mlir/IR/Builders.h (L503)).
To improve QoL, this PR adds static create methods to the ops themselves
like
```c++
static arith::ConstantIntOp create(OpBuilder& builder, Location location, int64_t value, unsigned width);
```
Now if one types `arith::ConstantIntO::create(builder,...` instead of
`builder.create<arith::ConstantIntO>(...` auto-complete/hints will pop
up.
See
https://discourse.llvm.org/t/rfc-building-mlir-operation-observed-caveats-and-proposed-solution/87204/13
for more info.
At the missing `spirv::ImageFetchOp` operation to the SPIR-V MLIR
dialect ODS with appropriate testing including negative testing of the
verifiers.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
This commit removes the verifier that checked if branch weights are
negative. This check was too strict because weights are interpreted as
unsigned integers.
This showed up when running the verifier on LLVM dialect modules that
were imported from LLVM IR.
This test pass is meant to test various affine fusion utilities as
opposed to being a pass to perform valid fusion. Rename an option to
avoid confusion.
Fixes: https://github.com/llvm/llvm-project/issues/132172
Although XeVM is an LLVM extension dialect,
SPIR-V backend relies on [function
calls](https://llvm.org/docs/SPIRVUsage.html#instructions-as-function-calls)
instead of defining LLVM intrinsics to represent SPIR-V instructions.
convert-xevm-to-llvm pass lowers xevm ops to function declarations and
calls using the above naming convention.
In the future, most part of the pass should be replaced with llvmBuilder
and handled as part of translation to LLVM instead.
---------
Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com>
Propagating vector.extract when a dynamic position is present can cause
dominance issues and needs better handling. For now, disable propagation
if there is a dynamic position present.
Support for translating the operations introduced in #144785 to LLVM-IR.
In order to keep the lowering simple,
`OpenMPIRBuider::unrollLoopHeuristic` is applied when encountering the
`omp.unroll_heuristic` op. As a result, the operation that unrolling is
applied to (`omp.canonical_loop`) must have been emitted before even
though logically there is no such requirement.
Eventually, all transformations on a loop must be applied directly after
emitting `omp.canonical_loop`, i.e. future transformations must be
looked-up when encountering `omp.canonical_loop` itself. This is because
many OpenMPIRBuilder methods (e.g. `createParallel`) expect all the
region code to be emitted withing a callback. In the case of
`createParallel`, the region code is getting outlined into a new
function. Therefore, making the operation order a formal requirement
would not make the implementation any easier.
There is a `hasStorageCustomConstructor` flag that allows one to provide
custom attribute/type construction implementation. Unfortunately, it
seems like the flag does not work properly: the generated C++ produces
*empty body* method instead of producing only a declaration.
This commit removes Pure trait from clock, clock64 and globaltimer Ops by creating NVVM_NCSpecialRegisterOp class to represent Ops which return non-constant values. This prevents CSE pass from optimizing awayredundant uses of them
We have this pattern of code in OMPIRBuilder for many functions that are
used in reduction operations.
```
Function *LtGRFunc = Function::Create
BasicBlock *EntryBlock = BasicBlock::Create(Ctx, "entry", LtGRFunc);
Builder.SetInsertPoint(EntryBlock);
```
The insertion point is moved to the new function but the debug location is not updated. This means that reduction function will use the debug location that points to another function. This problem gets hidden because these functions gets inlined but the potential for failure exists.
This patch resets the debug location when insertion point is moved to new function. Some `InsertPointGuard` have been added to make sure why restore the debug location correctly when we are done with the reduction function.
Although XeVM is an LLVM extension dialect,
SPIR-V backend relies on [function
calls](https://llvm.org/docs/SPIRVUsage.html#instructions-as-function-calls)
instead of defining LLVM intrinsics to represent SPIR-V instructions.
convert-xevm-to-llvm pass lowers xevm ops to function declarations and
calls using the above naming convention.
In the future, most part of the pass should be replaced with llvmBuilder
and handled as part of translation to LLVM instead.
---------
Co-authored-by: Artem Kroviakov <artem.kroviakov@intel.com>
This patch adds a better maskedload/maskedstore lowering on amdgpu
backend for loads which are either fully masked or fully unmasked. For
these cases, we can either generate a oob buffer load with no if
condition, or we can generate a normal load with a if condition (if no
fat_raw_buffer space).
This is a fix for https://github.com/llvm/llvm-project/pull/136102. It
missed scoping for `DeclareFuncOps`.
In scenarios with multiple function declarations, the `valueMapper`
wasn't updated and later uses of values in other functions still used
the assigned names in prior functions.
This is visible in the reproducer here
https://github.com/iree-org/iree/issues/21303: Although the counter for
variable enumeration was reset, as it is visible for the local vars, the
function arguments were mapped to old names. Due to this mapping, the
counter was never increased, and the local variables conflicted with the
arguments.
This fix adds proper scoping for declarations and a test-case to cover
the scenario with multiple `DeclareFuncOps`.
Add the supporting OpenMP Dialect operations, types, and interfaces for
modelling
MLIR Operations:
* omp.newcli
* omp.canonical_loop
MLIR Types:
* !omp.cli
MLIR Interfaces:
* LoopTransformationInterface
As a first loop transformations to be able to use these new operation in
follow-up PRs (#144785)
* omp.unroll_heuristic
This PR adds a new transformation that turns sequences of `vector.to_elements` and `vector.from_elements` into a binary tree of `vector.shuffle` operations.
(Related RFC:
https://discourse.llvm.org/t/rfc-adding-vector-to-elements-op-to-the-vector-dialect/86779).
Example:
```
%0:4 = vector.to_elements %a : vector<4xf32>
%1:4 = vector.to_elements %b : vector<4xf32>
%2:4 = vector.to_elements %c : vector<4xf32>
%3 = vector.from_elements %0#0, %0#1, %0#2, %0#3,
%1#0, %1#1, %1#2, %1#3,
%2#0, %2#1, %2#2, %2#3 : vector<12xf32>
==>
%0 = vector.shuffle %a, %b [0, 1, 2, 3, 4, 5, 6, 7] : vector<4xf32>, vector<4xf32>
%1 = vector.shuffle %c, %c [0, 1, 2, 3, -1, -1, -1, -1] : vector<4xf32>, vector<4xf32>
%2 = vector.shuffle %0, %1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] : vector<8xf32>, vector<8xf32>
```
The algorithm leverages the structured extraction/insertion information
of `vector.to_elements` and `vector.from_elements` operations and builds
a set of intervals to determine the vector length that should be used at
each level of the tree to combine the level inputs in pairs.
There are a few improvements that can be implemented in the future, such
as shuffle mask compression to avoid unnecessarily large vector lengths
with poison values, but I decided to keep things "simpler" and spend
more time documenting the different steps of the algorithm so that
people can follow along.
Since `scf::tileUsingSCF` is the core method used for tiling the root
operation within the `scf::tileConsumersAndFuseProducersUsingSCF`, the
latter can fuse into any tiled loop generated using `scf::tileUsingSCF`.
This patch adds a test for tiling a root operation using
`ReductionTilingStrategy::PartialReductionOuterParallelStrategy` and
fusing producers with it.
Since this strategy generates a rank-reducing extract slice
`tensor::replaceExtractSliceWithTiledProducer` which is the core method
used for the fusion was extended to handle the rank-reducing slices.
Also fix a small bug in the computation of the reduction induction
variable (which needs to use `floorDiv` instead of `ceilDiv`)
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
This change only applies to functions the can be reasonably expected to
use SVE registers.
Modifying vector length in the middle of a function might cause
incorrect stack deallocation if there are callee-saved SVE registers or
incorrect access to SVE stack slots.
Addresses (non-issue) https://github.com/llvm/llvm-project/issues/143670
When collapsing linalg dimensions we check if its memref operands are
guaranteed to be collapsible. However, we currently assume that the
matching indexing map is the identity map.
This commit modifies this behavior and checks if the memref is
collapsible on the transformed dimensions.
Cooperative matrix operands are only supported for `add/sub/mul/div`
binary arithmetic ops, but currently all binary arithmetic ops accept
cooperative matrix operands, including `mod/rem`. This change fixes this
behaviour.
When the result of an insert op is used by an insert op, and the
subsequent insert op is inserted at the same location as the previous
insert op, replaces the dest of the subsequent insert op with the dest
of the previous insert op.This is because the previous insert op does
not affect subsequent insert ops.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
Despite currently being ignored with a warning, simd as a leaf in
composite constructs behaves as expected when the construct does not
contain a reduction. Enable it for those non-reduction constructs.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
A new transform op to represent that an attribute is to be chosen from a
set of alternatives and that this choice is made available as a
`!transform.param`. When a `selected` argument is provided, the op's
`apply()` semantics is that of just making this selected attribute
available as the result. When `selected` is not provided, `apply()`
complains that nothing has resolved the non-determinism that the op is
representing.
Changed naming of loop induction variables to follow natural naming (i,
j, k, ...). This helps readability and locating positions referred to.
Created new scopes to represent different behavior at function and loop
level, to still enable re-using value names between different functions
(as before). Removed unused scoping at other levels.
The motivation is to avoid having to negate `isDynamic*` checks, avoid
double negations, and allow for `ShapedType::isStaticDim` to be used in
ADT functions without having to wrap it in a lambda performing the
negation.
Also add the new functions to C and Python bindings.