99 Commits

Author SHA1 Message Date
Nikita Popov
a5f3415533 [InstCombine] Replace non-demanded undef vector with poison
If an operand (esp to shufflevector or insertelement) is not
demanded, canonicalize it from undef to poison.
2023-12-18 16:12:37 +01:00
Nikita Popov
d0605e21af [InstCombine] Canonicalize splat shuffles to use poison operand
If the splat shuffle is represented using an undef RHS, replace it
with poison.
2023-12-18 15:57:49 +01:00
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
David Green
2a859b2014 [AArch64] Change the cost of vector insert/extract to 2
The cost of vector instructions has always been high under AArch64, in order to
add a high cost for inserts/extracts, shuffles and scalarization. This is a
conservative approach to limit the scope of unusual SLP vectorization where the
codegen ends up being quite poor, but has always been higher than the correct
costs would be for any specific core.

This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a
generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is
also overridden for integer vector at the same time, to remove the effect of
lane 0 being considered free for integer vectors (something that should only be
true for float when scalarizing).

The lower insert/extract cost will reduce the cost of insert, extracts,
shuffling and scalarization. The adjustments of ScalaizationOverhead will
increase the cost on integer, especially for small vectors. The end result will
be lower cost for float and long-integer types, some higher cost for some
smaller vectors. This, along with the raw insert/extract cost being lower, will
generally mean more vectorization from the Loop and SLP vectorizer.

We may end up regretting this, as that vectorization is not always profitable.
In all the benchmarking I have done this is generally an improvement in the
overall performance, and I've attempted to address the places where it wasn't
with other costmodel adjustments.

Differential Revision: https://reviews.llvm.org/D155459
2023-07-28 21:26:50 +01:00
Nikita Popov
bc39a7a5e4 [LowerMatrixIntrinsics] Fix test expectations (NFC)
Some of the test expectation were incorrectly changed in
23c21759458014fc4d7cbea45b6fbe7349a0a4fd. Regenerate the tests.
2023-07-18 11:21:11 +02:00
Nuno Lopes
23c2175945 [LowerMatrixIntrinsics] Use poison instead of undef as placeholder [NFC]
These values don't propagate to the output; they are always replaced with a subsequent shuffle
or insertelement.
Tested equivalence with Alive2, e.g., https://alive2.llvm.org/ce/z/fj4s78.
2023-07-18 09:54:41 +01:00
Florian Hahn
c10a7772bd
[Matrix] Convert binop operand of dot product to a row vector.
The dot product lowering will use the left operand as row vector.
If the operand is a binary op, convert it to operate on a row vector
instead of a column vector.

Depends on D148428.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D148429
2023-06-07 20:45:08 +01:00
Florian Hahn
ebbcbb2af5
[Matrix] Remove redundant transpose with dot product lowering.
Extend dot-product handling to skip transposes of the first operand. As
this is a vector, the conversion between column and row vector via the
transpose isn't needed.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D148428
2023-05-14 22:07:38 +01:00
Florian Hahn
0e8717f711
[Matrix] Add shape verification.
At the moment, lower-matrix-intrinsics accepts mis-matches between
shapes for operations. See shape-verification.ll for an example where
@llvm.matrix.column.major.load specifies 6x1 and then the use
(@llvm.matrix.multiply) specifies the operand to have 1x6.

This patch adds verification for shapes to check if shapes match.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D147438
2023-05-13 09:41:27 +01:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Florian Hahn
f10153fe91
[Matrix] Handle integer types when distributing transposes across adds.
The current code did not properly account for integer matrixes. Check
if the operands are floating point or integer matrixes and use FAdd/Add
accordingly.

This is already done for other cases, like multiplies.

Fixes #62281.
2023-04-21 16:35:11 +01:00
Florian Hahn
a25b962a7f
[Matrix] Split off transpose + dot product tests. 2023-04-15 14:06:47 +01:00
Florian Hahn
98e50881e9
[Matrix] Refine cost estimate for dot-product.
Adjust lowerDotProduct cost estimate to include the cost benefits of:
 * emitting a wide load
 * emitting a wide multiply.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D147330
2023-04-14 11:35:01 +01:00
Florian Hahn
677b0d33e3
[Matrix] Add dot product tests with builtin loads with variable strides
Extra tests for D147330.
2023-04-14 10:40:47 +01:00
Florian Hahn
e6ab86a887
[Matrix] Fix IsSupported check in lowerDotProduct.
The check incorrectly checks the RHS while LHS is transformed later.
Update to check LHS, which fixes a crash in the newly added test cases.
2023-04-13 19:00:30 +01:00
Florian Hahn
78148eba49
[Matrix] Fix crash during dot product lowering.
Perform dot-product lowering before instruction fusion to avoid crash in
newly added test. Also update lowerDotProduct to properly mark optimized
matmul as fused.
2023-04-12 15:08:39 +01:00
Florian Hahn
04681243b4
[Matrix] Limit dot lowering to column major matrixes.
Limit to dot product lowering to column major matrixes for now. This
simplifies the code and reasoning for upcoming planned improvements.
Support for row-major matrixes can be added later as extension.
2023-04-05 15:49:06 +01:00
Florian Hahn
17fc38889a
[Matrix] Add dotproduct tests with row-major default layout. 2023-04-05 15:19:11 +01:00
Florian Hahn
2f21659ee9
[Matrix] Add test variants where 2nd operand of dotprod is add/sub. 2023-04-05 15:04:05 +01:00
Florian Hahn
c0dbe85790
[Matrix] Fix shapes in dot product tests.
The shape arguments for the @llvm.matrix.column.major.load where
incorrect. Flip them so they are in sync with the shape of the
multiplications.
2023-04-03 12:50:05 +01:00
Vir Narula
e7281c6f61
[Matrix] Add special case dot product lowering
Add special case to matrix lowering for dot products. Normal matrix lowering if optimized for either row-major or column-major, which results in many `shufflevector` instructions being generated for one vector. We work around this in our special case. We can also use vector-reduce adds instead of sequential adds to sum the result of the element-wise multiplication, which takes advantage of SIMD instructions.

Reviewed By: fhahn, thegameg

Differential Revision: https://reviews.llvm.org/D131125
2023-03-31 12:40:20 +01:00
Florian Hahn
16a008bbde
[Matrix] Update most dot tests using vXi64 to vXi32.
Update dot-product-int.ll tests to use mostly i32 instead of i64;
there's no mul.2d instruction, so vector versions of v2i64 cannot be
lowered efficiently.
2023-03-31 12:32:41 +01:00
Florian Hahn
22ebb49b9f
[Matrix] Extend test coverage for dot product lowering.
Extra tests:
* result is used by instruction
* constant vector operands
* multiply fed by other math instructions
* extra test with larger stride
2023-03-25 21:30:20 +00:00
Florian Hahn
18353d221d
[Matrix] Split up dot product tests into integer and float variants.
To avoid the individual files getting too big with further additions.
2023-03-25 21:23:01 +00:00
Jannik Silvanus
a4753f5dc0 [IR] Avoid creation of GEPs into vectors (in one place)
The method DataLayout::getGEPIndexForOffset(Type *&ElemTy, APInt &Offset)
allows to generate GEP indices for a given byte-based offset.
This allows to generate "natural" GEPs using the given type structure
if the byte offset happens to match a nested element object.

With opaque pointers and a general move towards byte-based GEPs [1],
this function may be questionable in the future.

This patch avoids creation of GEPs into vectors in routines that use
DataLayout::getGEPIndexForOffset by not returning indices in that case.

The reason is that A) GEPs into vectors have been discouraged for a long
time [2], and B) that GEPs into vectors are currently broken if the element
type is overaligned [1]. This is also demonstrated by a lit test where
previously InstCombine replaced valid loads by poison. Note that
the result of InstCombine on that test is *still* invalid, because
padding bytes are assumed.
Moreover, GEPs into vectors may be outright forbidden in the future [1].

[1]: https://discourse.llvm.org/t/67497
[2]: https://llvm.org/docs/GetElementPtr.html

The test case is new. It will be precommitted if this patch is accepted.

Differential Revision: https://reviews.llvm.org/D142146
2023-01-23 13:25:39 +01:00
Francis Visoiu Mistrih
da09b35334
[Matrix] Optimize matrix transposes around additions
First, sink the transposes to the operands to simplify redudant
ones. Then, lift them to reduce the number of realized transposes.

```
(A + B)^T -> A^T + B^T -> (A + B)^T
```

See tests for more examples.

Differential Revision: https://reviews.llvm.org/D133657
2023-01-11 15:21:59 -08:00
Paul Walker
eae26b6640 [IRBuilder] Use canonical i64 type for insertelement index used by vector splats.
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.

NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.

Differential Revision: https://reviews.llvm.org/D140983
2023-01-11 14:08:06 +00:00
Matt Arsenault
256d5ad3e8 LowerMatrixIntrinsics: Convert tests to opaque pointers
store-align-volatile.ll needed manually updated check lines for a
-NEXT check after a deleted bitcast.

Also avoided breaking the example C++ comment in remarks-inlining.ll
2022-11-27 21:42:25 -05:00
Nikita Popov
304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Arthur Eubanks
c384b20b55 [opt] Remove temporary legacy pass name translations
And update corresponding tests.
2022-10-07 11:09:46 -07:00
Francis Visoiu Mistrih
0fcc99ade4 [Matrix] Add tests for addition transpose optimizations
Tests before transpose optimizations around additions.

Differential Revision: https://reviews.llvm.org/D133656
2022-09-26 13:27:03 -07:00
Francis Visoiu Mistrih
81bdb4068d [Matrix] Simplify matmuls with scalars
If one of the operands is a transposed splat, the transpose can be
removed.

This is useful to simplify when transposes are distributed to operands
of a matmul:

* k^T -> k
* (A * k)^t -> A^t * k

Differential Revision: https://reviews.llvm.org/D130177
2022-09-02 15:50:25 -07:00
Vir Narula
625877b0ef
[Matrix] Add tests dot product with varied strides
Add more tests with varied strides. Changes to lowering upcoming in https://reviews.llvm.org/D131125

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D131444
2022-08-11 19:09:21 +01:00
Nuno Lopes
022bd92c78 [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC] 2022-07-03 12:32:19 +01:00
Nuno Lopes
7c4f45f87a Revert [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC]
This reverts commits 47e6f98f84ac3 and 3e701bcd2a6aee2
2022-07-01 23:53:41 +01:00
Nuno Lopes
3e701bcd2a attempt to fix aarch64 build bot 2022-07-01 23:43:48 +01:00
Nuno Lopes
47e6f98f84 [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC] 2022-07-01 23:31:31 +01:00
Florian Hahn
7c0089d735
[Matrix] Check if iterator is at beginning of BB in optimizeTranspose.
If an instruction at the beginning of a block is erased,  this may
trigger crash due to dereferencing an invalid iterator.

Check if II is at the end before dereferencing it.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D127736
2022-06-14 21:37:02 +01:00
Vir Narula
210c851327
[Matrix] Add dot product tests
LLVM LIT tests for our upcoming dot product lowering change

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D126942
2022-06-03 20:02:42 +01:00
Johannes Doerfert
a81fff8afd Reapply "[Intrinsics] Add nocallback to the default intrinsic attributes"
This reverts commit c5f789050daab25aad6770790987e2b7c0395936 and
reapplies 7aea3ea8c3b33c9bb338d5d6c0e4832be1d09ac3 with additional test
changes.
2022-03-25 09:36:50 -05:00
Andrew Wei
0af3e6a22d [InstCombine] Sink instructions with multiple users in a successor block.
This patch tries to sink instructions when they are only used in a successor block.

This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.

In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
  https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D121585
2022-03-18 11:53:45 +08:00
Arthur Eubanks
dec9be85cc [test][LowerMatrixIntrinsics] Use new PM RUN lines 2022-03-08 13:39:18 -08:00
Florian Hahn
b339bbdb19
[Matrix] Use ArrayType for allocas instead of VectorType.
When creating an alloca to copy a matrix due to memory conflicts, those
allocas used to use VectorTypes, which forced them to have huge
alignments for large vectors.

This patch updates LowerMatrixIntrinsics to use a corresponding array
type, like Clang already does, to get more manageable alignments.

Reviewed By: anemet, thegameg

Differential Revision: https://reviews.llvm.org/D118239
2022-01-28 10:47:52 +00:00
Nikita Popov
80110aafa0 [Tests] Fix incorrect noalias metadata
Mostly this fixes cases where !noalias or !alias.scope were passed
a scope rather than a scope list. In some cases I opted to drop
the metadata entirely instead, because it is not really relevant
to the test.
2021-09-18 20:51:00 +02:00
Bjorn Pettersson
d52f506192 [NewPM] Use parameterized syntax for a couple of more passes
A couple of passes that are parameterized in new-PM used different
pass names (in cmd line interface) while using the same pass class
name. This patch updates the PassRegistry to model pass parameters
more properly using PASS_WITH_PARAMS.

Reason for the change is to ensure that we have a 1-1 mapping
between class name and pass name (when disregarding the params).
With a 1-1 mapping it is more obvious which pass name to use in
options such as -debug-only, -print-after etc.

The opt -passes syntax is changed for the following passes:
  early-cse-memssa => early-cse<memssa>
  post-inline-ee-instrument => ee-instrument<post-inline>
  loop-extract-single => loop-extract<single>
  lower-matrix-intrinsics-minimal => lower-matrix-intrinsics<minimal>

This patch is not updating pass names in docs/Passes.rst. Not quite
sure what the status is for that document (e.g. when it comes to
listing pass paramters). It is only loop-extract-single that is
mentioned in Passes.rst today, out of the passes mentioned above.

Differential Revision: https://reviews.llvm.org/D108362
2021-08-20 14:59:21 +02:00
Florian Hahn
f999312872
Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store."
This reverts the revert 28c04794df74ad3c38155a244729d1f8d57b9400.

The failing MLIR test that caused the revert should be fixed  in this
version.

Also includes a PPC test fix previously in 1f87c7c478a6.
2021-08-12 18:31:57 +01:00
Mehdi Amini
28c04794df Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store."
This reverts commit a1ef81de35a4bac6d3b22e9d7186d880124d7a55.

Broke the MLIR buildbot.
2021-08-12 11:57:19 +00:00
Florian Hahn
a1ef81de35
[Matrix] Overload stride arg in matrix.columnwise.load/store.
This patch adjusts the intrinsics definition of
llvm.matrix.column.major.load and llvm.matrix.column.major.store to
allow overloading the type of the stride. The bitwidth of the stride is
used to perform the offset computation.

This fixes a crash when using __builtin_matrix_column_major_load or
__builtin_matrix_column_major_store on 32 bit platforms. The stride argument
of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit
platforms.

Note that we still perform offset computations with 64 bit width on 32
bit platforms for accesses that do not take a user-specified stride.
This can be fixed separately.

Fixes PR51304.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D107349
2021-08-12 10:45:25 +01:00
Adam Nemet
d87d3615f7 [Matrix] Fix shape for factored transpose
The shape of the input is C x R.

Differential Revision: https://reviews.llvm.org/D106722
2021-07-27 11:36:13 -07:00
Adam Nemet
bf7eb48454 [Matrix] RAUW should only replace an instruction in ShapeMap if supportsShapeInfo
As an instruction is replaced in optimizeTransposes RAUW will replace it in
the ShapeMap (ShapeMap is ValueMap so that uses are updated).  In
finalizeLowering however we skip updating uses if they are in the ShapeMap
since they will be lowered separately at which point we pick up the lowered
operands.

In the testcase what happened was that since we replaced the doubled-transpose
with the shuffle, it ended up in the ShapeMap.  As we lowered the
columnwise-load the use in the shuffle was not updated.  Then as we removed
the original columnwise-load we changed that to an undef.  I.e. we ended up
with:

```
%shuf = shufflevector <8 x double> undef, <8 x double> poison, <6 x i32>
                                   ^^^^^
                                  <i32 0, i32 1, i32 2, i32 4, i32 5, i32 6>
```

Besides the fix itself, I have fortified this last bit.  As we change uses to
undef when removing instruction we track the undefed instruction to make sure
we eventually remove those too.  This would have caught the issue at compile
time.

Differential Revision: https://reviews.llvm.org/D106714
2021-07-27 11:36:13 -07:00