At the moment, loads introduced by multiply fusion may be placed after
an objects lifetime has been terminated by lifetime.end. This introduces
reads to dead objects.
To avoid this, first collect all lifetime.end calls in the function.
During fusion, we deal with any lifetime.end calls that may alias any of
the loads.
Such lifetime.end calls are either moved when possible (both the
lifetime.end and the store are in the same block) or deleted.
PR: https://github.com/llvm/llvm-project/pull/84914
Row and column arguments for matrix_transpose indicate the shape of the
operand. When hoisting the transpose to the result of the add, the add
operates on the original operand's shape, and so does the hoisted
transpose.
This patch also adds an assert that the shape for the original add and
the transpose match, as well as the shape of the new add matches the
cached shape for it.
The assert could potentially be moved to
updateShapeAndReplaceAllUsesWith.
Generalize the logic used to convert column-vector ops to row-vectors to
support converting chains of operations.
A potential next step is to further generalize this to convert
column-vector ops to row-vector ops in general, not just for operands of
dot products. Dot-product handling would then be driven by the general
conversion, rather than the other way around.
PR: https://github.com/llvm/llvm-project/pull/72647
This patch adds #include "llvm/ADT/SmallSet.h" to a couple of files
that are relying on transitive includes of SmallSet.h. It in turn
unblocks the removal of unnecessary includes of llvm/ADT/SmallSet.h in
several other files.
These values don't propagate to the output; they are always replaced with a subsequent shuffle
or insertelement.
Tested equivalence with Alive2, e.g., https://alive2.llvm.org/ce/z/fj4s78.
Partial progress towards removing in-tree uses of `getPointerTo()`,
by employing the following options:
* Drop the call entirely if the sole purpose of it is to support
a no-op bitcast (remove the no-op bitcast as well).
* Replace with `PointerType::get()`/`PointerType::getUnqual()`.
Also, remove no-op function `EmitBitCastOfLValueToProperType()`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D154392
The dot product lowering will use the left operand as row vector.
If the operand is a binary op, convert it to operate on a row vector
instead of a column vector.
Depends on D148428.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D148429
Extend dot-product handling to skip transposes of the first operand. As
this is a vector, the conversion between column and row vector via the
transpose isn't needed.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D148428
At the moment, lower-matrix-intrinsics accepts mis-matches between
shapes for operations. See shape-verification.ll for an example where
@llvm.matrix.column.major.load specifies 6x1 and then the use
(@llvm.matrix.multiply) specifies the operand to have 1x6.
This patch adds verification for shapes to check if shapes match.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D147438
The current code did not properly account for integer matrixes. Check
if the operands are floating point or integer matrixes and use FAdd/Add
accordingly.
This is already done for other cases, like multiplies.
Fixes#62281.
Perform dot-product lowering before instruction fusion to avoid crash in
newly added test. Also update lowerDotProduct to properly mark optimized
matmul as fused.
Limit to dot product lowering to column major matrixes for now. This
simplifies the code and reasoning for upcoming planned improvements.
Support for row-major matrixes can be added later as extension.
Add special case to matrix lowering for dot products. Normal matrix lowering if optimized for either row-major or column-major, which results in many `shufflevector` instructions being generated for one vector. We work around this in our special case. We can also use vector-reduce adds instead of sequential adds to sum the result of the element-wise multiplication, which takes advantage of SIMD instructions.
Reviewed By: fhahn, thegameg
Differential Revision: https://reviews.llvm.org/D131125
First, sink the transposes to the operands to simplify redudant
ones. Then, lift them to reduce the number of realized transposes.
```
(A + B)^T -> A^T + B^T -> (A + B)^T
```
See tests for more examples.
Differential Revision: https://reviews.llvm.org/D133657
Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of
that header and include it when needed.
No functional change intended, but there's no longer a transitive include
fromMathExtras.h to cmath.
If one of the operands is a transposed splat, the transpose can be
removed.
This is useful to simplify when transposes are distributed to operands
of a matmul:
* k^T -> k
* (A * k)^t -> A^t * k
Differential Revision: https://reviews.llvm.org/D130177
If an instruction at the beginning of a block is erased, this may
trigger crash due to dereferencing an invalid iterator.
Check if II is at the end before dereferencing it.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D127736
When creating an alloca to copy a matrix due to memory conflicts, those
allocas used to use VectorTypes, which forced them to have huge
alignments for large vectors.
This patch updates LowerMatrixIntrinsics to use a corresponding array
type, like Clang already does, to get more manageable alignments.
Reviewed By: anemet, thegameg
Differential Revision: https://reviews.llvm.org/D118239
getNumberOfRegisters takes a ClassID as it's argument. It shouldn't be passed a bool. Assuming the bool meant vector or not, we should call getRegisterClassForType first.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D116903
dyn_cast<> can return null - use cast<> instead to assert the cast is valid before dereferencing the casted pointer.
Fixes static-analyzer null dereference warning.
This reverts commit fd4808887ee47f3ec8a030e9211169ef4fb094c3.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.
Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.
The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).
LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch
Differential Revision: https://reviews.llvm.org/D109310
This reverts the revert 28c04794df74ad3c38155a244729d1f8d57b9400.
The failing MLIR test that caused the revert should be fixed in this
version.
Also includes a PPC test fix previously in 1f87c7c478a6.