105 Commits

Author SHA1 Message Date
Florian Hahn
98e50881e9
[Matrix] Refine cost estimate for dot-product.
Adjust lowerDotProduct cost estimate to include the cost benefits of:
 * emitting a wide load
 * emitting a wide multiply.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D147330
2023-04-14 11:35:01 +01:00
Florian Hahn
e6ab86a887
[Matrix] Fix IsSupported check in lowerDotProduct.
The check incorrectly checks the RHS while LHS is transformed later.
Update to check LHS, which fixes a crash in the newly added test cases.
2023-04-13 19:00:30 +01:00
Florian Hahn
78148eba49
[Matrix] Fix crash during dot product lowering.
Perform dot-product lowering before instruction fusion to avoid crash in
newly added test. Also update lowerDotProduct to properly mark optimized
matmul as fused.
2023-04-12 15:08:39 +01:00
Florian Hahn
04681243b4
[Matrix] Limit dot lowering to column major matrixes.
Limit to dot product lowering to column major matrixes for now. This
simplifies the code and reasoning for upcoming planned improvements.
Support for row-major matrixes can be added later as extension.
2023-04-05 15:49:06 +01:00
Kazu Hirata
52dd9deb15 [Scalar] Use SmallPtrSet::contains (NFC) 2023-03-31 23:50:17 -07:00
Vir Narula
e7281c6f61
[Matrix] Add special case dot product lowering
Add special case to matrix lowering for dot products. Normal matrix lowering if optimized for either row-major or column-major, which results in many `shufflevector` instructions being generated for one vector. We work around this in our special case. We can also use vector-reduce adds instead of sequential adds to sum the result of the element-wise multiplication, which takes advantage of SIMD instructions.

Reviewed By: fhahn, thegameg

Differential Revision: https://reviews.llvm.org/D131125
2023-03-31 12:40:20 +01:00
Liren Peng
529ee9750b [NFC] Use single quotes for single char output during printPipline
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144365
2023-02-22 02:35:13 +00:00
Francis Visoiu Mistrih
da09b35334
[Matrix] Optimize matrix transposes around additions
First, sink the transposes to the operands to simplify redudant
ones. Then, lift them to reduce the number of realized transposes.

```
(A + B)^T -> A^T + B^T -> (A + B)^T
```

See tests for more examples.

Differential Revision: https://reviews.llvm.org/D133657
2023-01-11 15:21:59 -08:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
serge-sans-paille
16544cbe64 [iwyu] Move <cmath> out of llvm/Support/MathExtras.h
Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of
that header and include it when needed.

No functional change intended, but there's no longer a transitive include
fromMathExtras.h to cmath.
2022-09-28 20:49:01 +02:00
Francis Visoiu Mistrih
c5b10f348e [Matrix] Use print instead of dump for matrix-print-after-transpose-opt
We should be able to use this option even if LLVM_ENABLE_DUMP is not on.

(should fix the bots too)
2022-09-02 16:12:21 -07:00
Francis Visoiu Mistrih
81bdb4068d [Matrix] Simplify matmuls with scalars
If one of the operands is a transposed splat, the transpose can be
removed.

This is useful to simplify when transposes are distributed to operands
of a matmul:

* k^T -> k
* (A * k)^t -> A^t * k

Differential Revision: https://reviews.llvm.org/D130177
2022-09-02 15:50:25 -07:00
Kazu Hirata
50724716cd [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-14 12:51:58 -07:00
Francis Visoiu Mistrih
bfd3883e83 [Matrix] Refactor transpose distribution. NFC
Use a function to distribute transposes. Preparation for future patches.
2022-07-28 17:30:00 -07:00
Francis Visoiu Mistrih
448a094d3e [Matrix] Add assert to catch extracted vectors with poison elements
Assert when the extracted vector is wider than the row/column.

Differential Revision: https://reviews.llvm.org/D130173
2022-07-26 11:07:02 -07:00
Francis Visoiu Mistrih
2c6e8b4636 [Matrix] Refactor tiled loops in a struct. NFC
The three loops have the same structure: index, header, latch.
2022-07-26 11:02:22 -07:00
Nuno Lopes
022bd92c78 [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC] 2022-07-03 12:32:19 +01:00
Nuno Lopes
7c4f45f87a Revert [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC]
This reverts commits 47e6f98f84ac3 and 3e701bcd2a6aee2
2022-07-01 23:53:41 +01:00
Nuno Lopes
47e6f98f84 [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC] 2022-07-01 23:31:31 +01:00
Florian Hahn
7c0089d735
[Matrix] Check if iterator is at beginning of BB in optimizeTranspose.
If an instruction at the beginning of a block is erased,  this may
trigger crash due to dereferencing an invalid iterator.

Check if II is at the end before dereferencing it.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D127736
2022-06-14 21:37:02 +01:00
Kazu Hirata
8daf23d364 [Scalar] Use llvm::make_early_inc_range (NFC) 2022-06-05 23:53:18 -07:00
serge-sans-paille
59630917d6 Cleanup includes: Transform/Scalar
Estimated impact on preprocessor output line:
before: 1062981579
after:  1062494547

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120817
2022-03-03 07:56:34 +01:00
serge-sans-paille
a494ae43be Cleanup includes: TransformsUtils
Estimation on the impact on preprocessor output:
before: 1065307662
after:  1064800684

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
2022-03-01 21:00:07 +01:00
Nikita Popov
cdc0573f75 [MatrixBuilder] Remove unnecessary IRBuilder template (NFC)
IRBuilderBase exists specifically to avoid the need for this.
2022-02-07 16:42:38 +01:00
Florian Hahn
b339bbdb19
[Matrix] Use ArrayType for allocas instead of VectorType.
When creating an alloca to copy a matrix due to memory conflicts, those
allocas used to use VectorTypes, which forced them to have huge
alignments for large vectors.

This patch updates LowerMatrixIntrinsics to use a corresponding array
type, like Clang already does, to get more manageable alignments.

Reviewed By: anemet, thegameg

Differential Revision: https://reviews.llvm.org/D118239
2022-01-28 10:47:52 +00:00
Craig Topper
38b30eb2b2 [LowerMatrixIntrinsics] Call getRegisterClassForType before getNumberOfRegisters.
getNumberOfRegisters takes a ClassID as it's argument. It shouldn't be passed a bool. Assuming the bool meant vector or not, we should call getRegisterClassForType first.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D116903
2022-01-10 15:32:13 -08:00
Kazu Hirata
b932bdf59f [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-07 17:45:09 -08:00
Simon Pilgrim
5e7912d80f [LowerMatrixIntrinsics] writeFnName - don't dereference a dyn_cast<>. NFC.
dyn_cast<> can return null - use cast<> instead to assert the cast is valid before dereferencing the casted pointer.

Fixes static-analyzer null dereference warning.
2022-01-06 17:09:32 +00:00
Kazu Hirata
e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887ee47f3ec8a030e9211169ef4fb094c3.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Kazu Hirata
fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Markus Lavin
1ac209ed76 [NPM] Added -print-pipeline-passes print params for a few passes.
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.

Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.

The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).

LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch

Differential Revision: https://reviews.llvm.org/D109310
2021-09-15 08:34:04 +02:00
Kazu Hirata
8e86c0e4f4 [Scalar] Use make_early_inc_range (NFC) 2021-09-12 08:17:18 -07:00
Florian Hahn
f999312872
Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store."
This reverts the revert 28c04794df74ad3c38155a244729d1f8d57b9400.

The failing MLIR test that caused the revert should be fixed  in this
version.

Also includes a PPC test fix previously in 1f87c7c478a6.
2021-08-12 18:31:57 +01:00
Mehdi Amini
28c04794df Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store."
This reverts commit a1ef81de35a4bac6d3b22e9d7186d880124d7a55.

Broke the MLIR buildbot.
2021-08-12 11:57:19 +00:00
Florian Hahn
a1ef81de35
[Matrix] Overload stride arg in matrix.columnwise.load/store.
This patch adjusts the intrinsics definition of
llvm.matrix.column.major.load and llvm.matrix.column.major.store to
allow overloading the type of the stride. The bitwidth of the stride is
used to perform the offset computation.

This fixes a crash when using __builtin_matrix_column_major_load or
__builtin_matrix_column_major_store on 32 bit platforms. The stride argument
of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit
platforms.

Note that we still perform offset computations with 64 bit width on 32
bit platforms for accesses that do not take a user-specified stride.
This can be fixed separately.

Fixes PR51304.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D107349
2021-08-12 10:45:25 +01:00
Adam Nemet
d87d3615f7 [Matrix] Fix shape for factored transpose
The shape of the input is C x R.

Differential Revision: https://reviews.llvm.org/D106722
2021-07-27 11:36:13 -07:00
Adam Nemet
bf7eb48454 [Matrix] RAUW should only replace an instruction in ShapeMap if supportsShapeInfo
As an instruction is replaced in optimizeTransposes RAUW will replace it in
the ShapeMap (ShapeMap is ValueMap so that uses are updated).  In
finalizeLowering however we skip updating uses if they are in the ShapeMap
since they will be lowered separately at which point we pick up the lowered
operands.

In the testcase what happened was that since we replaced the doubled-transpose
with the shuffle, it ended up in the ShapeMap.  As we lowered the
columnwise-load the use in the shuffle was not updated.  Then as we removed
the original columnwise-load we changed that to an undef.  I.e. we ended up
with:

```
%shuf = shufflevector <8 x double> undef, <8 x double> poison, <6 x i32>
                                   ^^^^^
                                  <i32 0, i32 1, i32 2, i32 4, i32 5, i32 6>
```

Besides the fix itself, I have fortified this last bit.  As we change uses to
undef when removing instruction we track the undefed instruction to make sure
we eventually remove those too.  This would have caught the issue at compile
time.

Differential Revision: https://reviews.llvm.org/D106714
2021-07-27 11:36:13 -07:00
Fangrui Song
3b181568db [Matrix] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off build after D106457. NFC 2021-07-22 11:33:02 -07:00
Adam Nemet
ce5b1320a7 [Matrix] Fix miscompile for NT matmul if the transpose has other use
We should only add the fake lowering entry for the matrix remark if the
transpose is not lowered on its own.  `MapVector::insert` is used to insert
the entry during proper lowering which does not overwrite the fake entry in
the map.

We actually had test coverage for this but the reference output code was
wrong; it was storing undef rather than the transposed column.

Also add an assert that would have caught this.

Differential Revision: https://reviews.llvm.org/D106457
2021-07-22 10:45:56 -07:00
Florian Hahn
a3ca578eb9
[Matrix] Fix crash during fusion if the same load is re-used.
This patch fixes a crash when the same load is used for both operands of
a fuseable multiply.
2021-07-02 14:00:17 +01:00
Florian Hahn
7655061cc6
[Matrix] Hoist address computation before multiply to enable fusion.
If the store address does not dominate the matrix multiply, try to hoist
address computation instructions without side-effects and/or memory
reads before the multiply, to allow fusion.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D105193
2021-07-02 09:52:11 +01:00
Jeroen Dobbelaere
bb8ce25e88 Intrinsic::getName: require a Module argument
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.

This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.

Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99173
2021-06-14 14:52:29 +02:00
Adam Nemet
e0efebb8eb [Matrix] In transpose opts, handle a^t * a^t
Without the fix the testcase crashes because we remove the same instruction
twice.

Differential Revision: https://reviews.llvm.org/D104127
2021-06-11 09:29:43 -07:00
Adam Nemet
ffde966cd9 [Matrix] Fix transpose-multiply folding if transpose has multiple uses
Don't add it to FusedInsts in this case.

Differential Revision: https://reviews.llvm.org/D103627
2021-06-04 10:55:03 -07:00
Hamza Mahfooz
83235b07e3
[Matrix] Preserve existing fast-math flags during lowering
This patch makes it so, floating-point instructions created in
LowerMatrixIntrinsics retain fast-math flags from instructions that are
higher up the chain.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49738

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D103233
2021-06-03 15:29:31 +01:00
Benjamin Kramer
d2d4f16806 [Matrix] Use LLVM_DEBUG for a debug flag
dump() doesn't exist in release builds.

ld.lld: error: undefined symbol: llvm::Value::dump() const
>>> referenced by LowerMatrixIntrinsics.cpp
>>>               LowerMatrixIntrinsics.o:((anonymous namespace)::LowerMatrixIntrinsics::Visit())
2021-05-25 21:10:19 +02:00
Adam Nemet
dfd1bbd00a [Matrix] Factor and distribute transposes across multiplies
Now that we can fold some transposes into multiplies (CM: A * B^t and RM:
A^t * B), we want to move them around to create the optimal expressions:

* fold away double transposes while still using them to assert the shape
* sink transposes hoping they cancel out
* lift transposes when both operands are transposed

This also modifies the matrix remarks to include the number of exposed
transposes (i.e. transposes that we couldn't fold into a multiply).

The adjustment to the test remarks-inlining is a bit subtle: I am changing the
double transpose to a single transpose so that we don't remove it completely.
More importantly this changes some of the total instruction count, most
notable stores because we can no longer use a vector store.

Differential Revision: https://reviews.llvm.org/D102733
2021-05-25 11:12:20 -07:00
Florian Hahn
a6de8d95db
[Matrix] Bail out early if there are no matrix intrinsics.
If there are no matrix intrinsics in a function, we can directly bail
out, as there's nothing left to do.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D102931
2021-05-22 11:37:25 +01:00
Florian Hahn
a0ce6439ca
[Matrix] Remove unused matrix-propagate-shape option.
The option was used during the initial bringup, but it does not add any
value at this point. Remove it.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D102930
2021-05-21 19:01:54 +01:00
Adam Nemet
fcffd087c6 [Matrix] Fold the transpose into the matmul operand used to fetch scalars
For column-major this is:
  A * B^t
whereas for row-major:
  A^t * B

Differential Revision: https://reviews.llvm.org/D101762
2021-05-17 17:40:46 -07:00