106 Commits

Author SHA1 Message Date
Florian Hahn
e77378cc14
[Matrix] Adjust lifetime.ends during multiply fusion. (#84914)
At the moment, loads introduced by multiply fusion may be placed after
an objects lifetime has been terminated by lifetime.end. This introduces
reads to dead objects.

To avoid this, first collect all lifetime.end calls in the function.
During fusion, we deal with any lifetime.end calls that may alias any of
the loads.

Such lifetime.end calls are either moved when possible (both the
lifetime.end and the store are in the same block) or deleted.

PR: https://github.com/llvm/llvm-project/pull/84914
2024-03-16 20:41:36 +01:00
Florian Hahn
d96d917f38
[Matrix] Add tests showing mis-compile with lifetime.end and fusion.
Add a set of tests showing miscompiles due to multiply fusion
introducing loads to dead objects after lifetime.end.
2024-03-12 13:33:55 +00:00
Florian Hahn
dbe4143f23
[Matrix] Fix dimensions when hoisting transpose across add. (#81507)
Row and column arguments for matrix_transpose indicate the shape of the
operand. When hoisting the transpose to the result of the add, the add
operates on the original operand's shape, and so does the hoisted
transpose.

This patch also adds an assert that the shape for the original add and
the transpose match, as well as the shape of the new add matches the
cached shape for it.

The assert could potentially be moved to
updateShapeAndReplaceAllUsesWith.
2024-02-12 18:45:13 +00:00
Florian Hahn
673e5e34b4
[Matrix] Add dedicated tests for transpose lifting.
Add extra test coverage for transpose lifting using
-matrix-print-after-transpose-opt.

The added tests show a mis-compile.
2024-02-12 16:19:31 +00:00
Alexey Bataev
7bc079c852
[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for
extract subvector.

Many targets do not have cost for extractsubvector shuffle kind, but
have the costs for single source permute. If there are no costs
estimation for extractsubvector, better to switchto single source
permute for better cost estimation.

Reviewers: RKSimon, davemgreen, arsenm

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/79837
2024-02-12 07:09:49 -05:00
Florian Hahn
f89fe08d77
[Matrix] Convert column-vector ops feeding dot product to row-vectors. (#72647)
Generalize the logic used to convert column-vector ops to row-vectors to
support converting chains of operations.

A potential next step is to further generalize this to convert
column-vector ops to row-vector ops in general, not just for operands of
dot products. Dot-product handling would then be driven by the general
conversion, rather than the other way around.

PR: https://github.com/llvm/llvm-project/pull/72647
2024-02-06 13:47:31 +00:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Nikita Popov
a5f3415533 [InstCombine] Replace non-demanded undef vector with poison
If an operand (esp to shufflevector or insertelement) is not
demanded, canonicalize it from undef to poison.
2023-12-18 16:12:37 +01:00
Nikita Popov
d0605e21af [InstCombine] Canonicalize splat shuffles to use poison operand
If the splat shuffle is represented using an undef RHS, replace it
with poison.
2023-12-18 15:57:49 +01:00
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
David Green
2a859b2014 [AArch64] Change the cost of vector insert/extract to 2
The cost of vector instructions has always been high under AArch64, in order to
add a high cost for inserts/extracts, shuffles and scalarization. This is a
conservative approach to limit the scope of unusual SLP vectorization where the
codegen ends up being quite poor, but has always been higher than the correct
costs would be for any specific core.

This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a
generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is
also overridden for integer vector at the same time, to remove the effect of
lane 0 being considered free for integer vectors (something that should only be
true for float when scalarizing).

The lower insert/extract cost will reduce the cost of insert, extracts,
shuffling and scalarization. The adjustments of ScalaizationOverhead will
increase the cost on integer, especially for small vectors. The end result will
be lower cost for float and long-integer types, some higher cost for some
smaller vectors. This, along with the raw insert/extract cost being lower, will
generally mean more vectorization from the Loop and SLP vectorizer.

We may end up regretting this, as that vectorization is not always profitable.
In all the benchmarking I have done this is generally an improvement in the
overall performance, and I've attempted to address the places where it wasn't
with other costmodel adjustments.

Differential Revision: https://reviews.llvm.org/D155459
2023-07-28 21:26:50 +01:00
Nikita Popov
bc39a7a5e4 [LowerMatrixIntrinsics] Fix test expectations (NFC)
Some of the test expectation were incorrectly changed in
23c21759458014fc4d7cbea45b6fbe7349a0a4fd. Regenerate the tests.
2023-07-18 11:21:11 +02:00
Nuno Lopes
23c2175945 [LowerMatrixIntrinsics] Use poison instead of undef as placeholder [NFC]
These values don't propagate to the output; they are always replaced with a subsequent shuffle
or insertelement.
Tested equivalence with Alive2, e.g., https://alive2.llvm.org/ce/z/fj4s78.
2023-07-18 09:54:41 +01:00
Florian Hahn
c10a7772bd
[Matrix] Convert binop operand of dot product to a row vector.
The dot product lowering will use the left operand as row vector.
If the operand is a binary op, convert it to operate on a row vector
instead of a column vector.

Depends on D148428.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D148429
2023-06-07 20:45:08 +01:00
Florian Hahn
ebbcbb2af5
[Matrix] Remove redundant transpose with dot product lowering.
Extend dot-product handling to skip transposes of the first operand. As
this is a vector, the conversion between column and row vector via the
transpose isn't needed.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D148428
2023-05-14 22:07:38 +01:00
Florian Hahn
0e8717f711
[Matrix] Add shape verification.
At the moment, lower-matrix-intrinsics accepts mis-matches between
shapes for operations. See shape-verification.ll for an example where
@llvm.matrix.column.major.load specifies 6x1 and then the use
(@llvm.matrix.multiply) specifies the operand to have 1x6.

This patch adds verification for shapes to check if shapes match.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D147438
2023-05-13 09:41:27 +01:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
Florian Hahn
f10153fe91
[Matrix] Handle integer types when distributing transposes across adds.
The current code did not properly account for integer matrixes. Check
if the operands are floating point or integer matrixes and use FAdd/Add
accordingly.

This is already done for other cases, like multiplies.

Fixes #62281.
2023-04-21 16:35:11 +01:00
Florian Hahn
a25b962a7f
[Matrix] Split off transpose + dot product tests. 2023-04-15 14:06:47 +01:00
Florian Hahn
98e50881e9
[Matrix] Refine cost estimate for dot-product.
Adjust lowerDotProduct cost estimate to include the cost benefits of:
 * emitting a wide load
 * emitting a wide multiply.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D147330
2023-04-14 11:35:01 +01:00
Florian Hahn
677b0d33e3
[Matrix] Add dot product tests with builtin loads with variable strides
Extra tests for D147330.
2023-04-14 10:40:47 +01:00
Florian Hahn
e6ab86a887
[Matrix] Fix IsSupported check in lowerDotProduct.
The check incorrectly checks the RHS while LHS is transformed later.
Update to check LHS, which fixes a crash in the newly added test cases.
2023-04-13 19:00:30 +01:00
Florian Hahn
78148eba49
[Matrix] Fix crash during dot product lowering.
Perform dot-product lowering before instruction fusion to avoid crash in
newly added test. Also update lowerDotProduct to properly mark optimized
matmul as fused.
2023-04-12 15:08:39 +01:00
Florian Hahn
04681243b4
[Matrix] Limit dot lowering to column major matrixes.
Limit to dot product lowering to column major matrixes for now. This
simplifies the code and reasoning for upcoming planned improvements.
Support for row-major matrixes can be added later as extension.
2023-04-05 15:49:06 +01:00
Florian Hahn
17fc38889a
[Matrix] Add dotproduct tests with row-major default layout. 2023-04-05 15:19:11 +01:00
Florian Hahn
2f21659ee9
[Matrix] Add test variants where 2nd operand of dotprod is add/sub. 2023-04-05 15:04:05 +01:00
Florian Hahn
c0dbe85790
[Matrix] Fix shapes in dot product tests.
The shape arguments for the @llvm.matrix.column.major.load where
incorrect. Flip them so they are in sync with the shape of the
multiplications.
2023-04-03 12:50:05 +01:00
Vir Narula
e7281c6f61
[Matrix] Add special case dot product lowering
Add special case to matrix lowering for dot products. Normal matrix lowering if optimized for either row-major or column-major, which results in many `shufflevector` instructions being generated for one vector. We work around this in our special case. We can also use vector-reduce adds instead of sequential adds to sum the result of the element-wise multiplication, which takes advantage of SIMD instructions.

Reviewed By: fhahn, thegameg

Differential Revision: https://reviews.llvm.org/D131125
2023-03-31 12:40:20 +01:00
Florian Hahn
16a008bbde
[Matrix] Update most dot tests using vXi64 to vXi32.
Update dot-product-int.ll tests to use mostly i32 instead of i64;
there's no mul.2d instruction, so vector versions of v2i64 cannot be
lowered efficiently.
2023-03-31 12:32:41 +01:00
Florian Hahn
22ebb49b9f
[Matrix] Extend test coverage for dot product lowering.
Extra tests:
* result is used by instruction
* constant vector operands
* multiply fed by other math instructions
* extra test with larger stride
2023-03-25 21:30:20 +00:00
Florian Hahn
18353d221d
[Matrix] Split up dot product tests into integer and float variants.
To avoid the individual files getting too big with further additions.
2023-03-25 21:23:01 +00:00
Jannik Silvanus
a4753f5dc0 [IR] Avoid creation of GEPs into vectors (in one place)
The method DataLayout::getGEPIndexForOffset(Type *&ElemTy, APInt &Offset)
allows to generate GEP indices for a given byte-based offset.
This allows to generate "natural" GEPs using the given type structure
if the byte offset happens to match a nested element object.

With opaque pointers and a general move towards byte-based GEPs [1],
this function may be questionable in the future.

This patch avoids creation of GEPs into vectors in routines that use
DataLayout::getGEPIndexForOffset by not returning indices in that case.

The reason is that A) GEPs into vectors have been discouraged for a long
time [2], and B) that GEPs into vectors are currently broken if the element
type is overaligned [1]. This is also demonstrated by a lit test where
previously InstCombine replaced valid loads by poison. Note that
the result of InstCombine on that test is *still* invalid, because
padding bytes are assumed.
Moreover, GEPs into vectors may be outright forbidden in the future [1].

[1]: https://discourse.llvm.org/t/67497
[2]: https://llvm.org/docs/GetElementPtr.html

The test case is new. It will be precommitted if this patch is accepted.

Differential Revision: https://reviews.llvm.org/D142146
2023-01-23 13:25:39 +01:00
Francis Visoiu Mistrih
da09b35334
[Matrix] Optimize matrix transposes around additions
First, sink the transposes to the operands to simplify redudant
ones. Then, lift them to reduce the number of realized transposes.

```
(A + B)^T -> A^T + B^T -> (A + B)^T
```

See tests for more examples.

Differential Revision: https://reviews.llvm.org/D133657
2023-01-11 15:21:59 -08:00
Paul Walker
eae26b6640 [IRBuilder] Use canonical i64 type for insertelement index used by vector splats.
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.

NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.

Differential Revision: https://reviews.llvm.org/D140983
2023-01-11 14:08:06 +00:00
Matt Arsenault
256d5ad3e8 LowerMatrixIntrinsics: Convert tests to opaque pointers
store-align-volatile.ll needed manually updated check lines for a
-NEXT check after a deleted bitcast.

Also avoided breaking the example C++ comment in remarks-inlining.ll
2022-11-27 21:42:25 -05:00
Nikita Popov
304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Arthur Eubanks
c384b20b55 [opt] Remove temporary legacy pass name translations
And update corresponding tests.
2022-10-07 11:09:46 -07:00
Francis Visoiu Mistrih
0fcc99ade4 [Matrix] Add tests for addition transpose optimizations
Tests before transpose optimizations around additions.

Differential Revision: https://reviews.llvm.org/D133656
2022-09-26 13:27:03 -07:00
Francis Visoiu Mistrih
81bdb4068d [Matrix] Simplify matmuls with scalars
If one of the operands is a transposed splat, the transpose can be
removed.

This is useful to simplify when transposes are distributed to operands
of a matmul:

* k^T -> k
* (A * k)^t -> A^t * k

Differential Revision: https://reviews.llvm.org/D130177
2022-09-02 15:50:25 -07:00
Vir Narula
625877b0ef
[Matrix] Add tests dot product with varied strides
Add more tests with varied strides. Changes to lowering upcoming in https://reviews.llvm.org/D131125

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D131444
2022-08-11 19:09:21 +01:00
Nuno Lopes
022bd92c78 [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC] 2022-07-03 12:32:19 +01:00
Nuno Lopes
7c4f45f87a Revert [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC]
This reverts commits 47e6f98f84ac3 and 3e701bcd2a6aee2
2022-07-01 23:53:41 +01:00
Nuno Lopes
3e701bcd2a attempt to fix aarch64 build bot 2022-07-01 23:43:48 +01:00
Nuno Lopes
47e6f98f84 [LowerMatrixMultiplication] Switch dummy values from undef to poison [NFC] 2022-07-01 23:31:31 +01:00
Florian Hahn
7c0089d735
[Matrix] Check if iterator is at beginning of BB in optimizeTranspose.
If an instruction at the beginning of a block is erased,  this may
trigger crash due to dereferencing an invalid iterator.

Check if II is at the end before dereferencing it.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D127736
2022-06-14 21:37:02 +01:00
Vir Narula
210c851327
[Matrix] Add dot product tests
LLVM LIT tests for our upcoming dot product lowering change

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D126942
2022-06-03 20:02:42 +01:00
Johannes Doerfert
a81fff8afd Reapply "[Intrinsics] Add nocallback to the default intrinsic attributes"
This reverts commit c5f789050daab25aad6770790987e2b7c0395936 and
reapplies 7aea3ea8c3b33c9bb338d5d6c0e4832be1d09ac3 with additional test
changes.
2022-03-25 09:36:50 -05:00
Andrew Wei
0af3e6a22d [InstCombine] Sink instructions with multiple users in a successor block.
This patch tries to sink instructions when they are only used in a successor block.

This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.

In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
  https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D121585
2022-03-18 11:53:45 +08:00
Arthur Eubanks
dec9be85cc [test][LowerMatrixIntrinsics] Use new PM RUN lines 2022-03-08 13:39:18 -08:00
Florian Hahn
b339bbdb19
[Matrix] Use ArrayType for allocas instead of VectorType.
When creating an alloca to copy a matrix due to memory conflicts, those
allocas used to use VectorTypes, which forced them to have huge
alignments for large vectors.

This patch updates LowerMatrixIntrinsics to use a corresponding array
type, like Clang already does, to get more manageable alignments.

Reviewed By: anemet, thegameg

Differential Revision: https://reviews.llvm.org/D118239
2022-01-28 10:47:52 +00:00