142 Commits

Author SHA1 Message Date
Florian Hahn
8e61085291
[Matrix] Place allocas in function entry. (#190032)
Create allocas for temporary matrixes in the function entry. Limit the
lifetime via lifetime.start & lifetime.end. This avoids dynamic allocas.

Improvement suggested in
https://github.com/llvm/llvm-project/pull/188721.

PR: https://github.com/llvm/llvm-project/pull/190032
2026-04-02 17:36:13 +00:00
Florian Hahn
ba6041bfbd
[Matrix] Handle load/store with different AS in getNonAliasingPointer. (#188721)
If a load and a store have different address spaces, we cannot create a
runtime check. Instead, always copy the data to an alloca matching the
store address space.

Fixes https://github.com/llvm/llvm-project/issues/185236.

PR: https://github.com/llvm/llvm-project/pull/188721
2026-03-28 20:52:54 +00:00
Florian Hahn
b164e7c610
[Matrix] Handle -fuse-matrix-tile-size=0 as tiling disabled. (#188861)
Treat -fuse-matrix-tile-size=0 as disabling tiling, as tile-size of 0
doesn't really make sense.

Fixes https://github.com/llvm/llvm-project/issues/185153

PR: https://github.com/llvm/llvm-project/pull/188861
2026-03-27 10:14:20 +00:00
Nikita Popov
6ecbc0c96e
[InstCombine] Canonicalize GEP source element types (#180745)
Canonicalize GEP source element types from `%T` to `[sizeof(%T) x i8]`.

This is intended to flush out any remaining places that rely on GEP
element types, as part of the `ptradd` migration. The impact of this
change is expected to be fairly minimal (we might enable a few more
hoist/sink style folds that depend on equal GEP types).
2026-03-06 14:48:01 +00:00
Nikita Popov
f4a29d9278
[LowerMatrixIntrinsics] Avoid use of ptrtoint (#182289)
The ptrtoint result here is used in icmp. However, icmp can already
directly work with pointers, so there's no need to perform the cast.

(I originally wanted to switch this to ptrtoaddr, but that's not really
necessary when we can directly compare on pointers.)
2026-02-19 16:23:58 +01:00
Aiden Grossman
7cbf453018
[ProfCheck][Matrix] Add profile data where relevant
This patch tackles two cases:
1. Checks around aliasing/overlapping ranges. This is runtime dependent
   on the pointer values passed in, which we have no way of knowing
   without additional profiling.
2. Loop backedges. For these we also have an associated trip count, so
   we set up the branch weights to represent this.

Tests updated/profcheck-xfail.txt updated.

Reviewers: alanzhao1, fhahn, mtrofin, snehasish

Pull Request: https://github.com/llvm/llvm-project/pull/181292
2026-02-17 18:51:08 -08:00
Aiden Grossman
293acb5d32
[ProfCheck][Matrix] Propagate profile information for selects
LowerMatrixIntrinsics creates new selects in the process of lowering
matrix intrinsics. The condition of such selects remains the same as
before. Because of this, we can directly propagate the profile
information for all selects on scalar conditions.

Reviewers: mtrofin, snehasish, fhahn, alanzhao1

Pull Request: https://github.com/llvm/llvm-project/pull/181248
2026-02-17 18:42:19 -08:00
Florian Hahn
54177e95d1
[Matrix] Use tiled loops automatically for large kernels. (#179325)
Update LowerMatrixIntrinsics to use tiled loops automatically in for
larger matrixes. The fully unrolled codegen creates a huge amount of
code, which performs noticably worse then the tiled loop nest variant.

We new try to estimate the number of instructions needed for the
multiply, and if it is too large, tiled loops are used. The current
threshold is anything roughly larger than 6x6x6 double multiply.

Eventually I think we want to only generate tiled loops. This patch is a
first step, trying to opt in for cases where we know it is beneficial.
Checked on AArch64, but should help on other architectures similarly,
and also drastically reduce binary size + compile time.

PR: https://github.com/llvm/llvm-project/pull/179325
2026-02-11 15:36:34 +00:00
Florian Hahn
0c07203e07
[Matrix] Add test where pointer phi currently blocks tiling.
Add a test with a phi node currently unnecessarily preventing tiling.
2026-02-02 22:38:45 +00:00
Florian Hahn
acb23124ee
[Matrix] Update test to make sure tiled loops can be used. (NFC)
The 6x6x6 and 7x7x7 matrix multiply used previously could not
use tiled loop codegen. Update to 8x8x8 and forced tile-size of 2.
2026-02-01 21:47:17 +00:00
Florian Hahn
dd9fe97cc7
[Matrix] Add test showing excessive matrix codegen.
Add tests showing excessive codegen for small number of matrix ops.
2026-01-30 18:35:10 +00:00
Vladislav Dzhidzhoev
e2a2c03eef
[DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (#166855)
These checks ensure that retained nodes of a DISubprogram belong to the
subprogram.

Tests with incorrect IR are fixed. We should not have variables of one subprogram present in retained nodes of other subprograms.

Also, interface for accessing DISubprogram's retained nodes is slightly
refactored. `DISubprogram::visitRetainedNodes` and
`DISubprogram::forEachRetainedNode` are added to avoid repeating checks
like
```
if (const auto *LV = dyn_cast<DILocalVariable>(N))
  ...
else if (const auto *L = dyn_cast<DILabel>(N))
  ...
else if (const auto *IE = dyn_cast<DIImportedEntity>(N))
  ...
```
2025-11-10 13:13:49 +01:00
Adam Nemet
783b050f88
[LMI] Support non-power-of-2 types for the matmul remainder (#163987)
In the inner loop of matmul, instead of continuously halving the HW
vector register width, I just use the remainder vector directly if it's
legal.

We don't have in-tree targets that have this so I opted for adding a
hidden flag to simulate this for testing purposes:
-matrix-split-matmul-remainder-over-threshold

The tests are the vectorization-friendly 3x3x1 matrix-vector and 1x3x3
vector-matrix multiplies for CM, RM respectively.
2025-10-17 18:42:30 +00:00
Nathan Corbyn
b00c4ff4b9
[Matrix][IR] Cap stride bitwidth at 64 (#163729)
a1ef81d added overloads for `llvm.matrix.column.major.store` and
`llvm.matrix.column.major.load` that allow strides to occupy an
arbitrary bitwidth. This change wasn't reflected in the verifier,
causing an assertion to trip when given strides overflowing 64-bit. This
patch explicitly caps the bitwidth at 64, repairing the crash and
avoiding future complexity dealing with strides that overflow 64 bits.

PR: https://github.com/llvm/llvm-project/pull/163729
2025-10-17 12:54:28 +01:00
Nathan Corbyn
625aa09fc3
[Matrix] Use data layout index type for lowering matrix intrinsics (#162646)
To properly support the matrix intrinsics on, e.g., 32-bit platforms
(without the need to emit `libc` calls), `LowerMatrixIntrinsics` pass
should generate code that performs strided index calculations using the
same pointer bit-width as the matrix pointers, as determined by the data
layout. This patch updates the `LowerMatrixInstrics` transform to make
this the case.

PR: https://github.com/llvm/llvm-project/pull/162646
2025-10-13 12:40:27 +01:00
Nikita Popov
c23b4fbdbb
[IR] Remove size argument from lifetime intrinsics (#150248)
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.

This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
2025-08-08 11:09:34 +02:00
Nikita Popov
92c55a315e
[IR] Only allow lifetime.start/end on allocas (#149310)
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.

However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.

* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.

I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.

This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)

In practice, I've only found a few places that currently produce
lifetimes on non-allocas:

* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
2025-07-21 15:04:50 +02:00
Jon Roelofs
0fa373c77d
[Matrix] Propagate shape information through PHI insts (#141681)
... and split them as we lower them, avoiding several shuffles in the
process.
2025-06-18 09:00:48 -07:00
Jon Roelofs
56548e1d9b
[Matrix] Fix a crash in VisitSelectInst due to iteration length mismatch 2025-06-12 09:27:06 -07:00
Jon Roelofs
ca5b71a455
[Matrix] Propagate shape information through Select insts (#141876) 2025-06-12 07:52:25 -07:00
Jon Roelofs
44b928e0d5
[Matrix] Propagate shape information through cast insts (#141869) 2025-06-10 12:15:41 -07:00
Jon Roelofs
68bb005ae0
[Matrix] Add -debug-only prints when matrices get flattened (#142078)
This is a potential source of overhead, which we might be able to alleviate in some cases. For example, static element extracts, or shuffles that pluck out a specific row. Since these diagnostics are highly specific to the pass itself and not immediately actionable for compiler users, these prints don't make a whole lot of sense as Remarks.
2025-06-10 09:44:53 -07:00
Jon Roelofs
274f5a817b
[Matrix] Propagate shape information through (f)abs insts (#141704) 2025-06-09 12:52:43 -07:00
Florian Hahn
370d01765c
[Matrix] Use shape info for StoreInst directly. (#142664)
ShapeInfo for the store operand may be dropped, e.g. because the operand
got folded by transpose optimizations to another instruction w/o shape
info. This was exposed by the assertion added in
https://github.com/llvm/llvm-project/pull/142416.

This updates VisitStore to use the shape-info directly from the
instruction, which is in line with the other Visit* functions and
ensures that we won't lose shape info.

PR: https://github.com/llvm/llvm-project/pull/142664
2025-06-04 09:15:57 +01:00
Jon Roelofs
1f1c725b68
[Matrix] Propagate shape information through all binops (#141705)
They all have vector variants, so the obvious "find and replace" does
the trick.
2025-05-28 11:20:06 -07:00
Jon Roelofs
e838b8b7e7
[Matrix] Fix a miscompile due to an incorrect double-transpose fold (#135397)
Transposes are only inverses of each other when they have matching
shapes.

rdar://145592582
2025-04-12 07:31:13 -07:00
Florian Hahn
48441cb8a2
[Matrix] Properly set Changed status when optimizing transposes.
Currently Changed is not updated properly when transposes are optimized,
causing missing analysis invalidation. Update optimizeTransposes to
indicate if changes have been made.
2025-04-06 17:36:56 +01:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Simon Pilgrim
1bb784a748 [LowerMatrixIntrinsics] multiply-minimal.ll - use -passes="..." to allow DOS to correctly evaluate the RUN command
Necessary for running update_test_checks.py on windows
2025-01-27 16:05:29 +00:00
Nikita Popov
f7685af4a5 [InstCombine] Move gep of phi fold into separate function
This makes sure that an early return during this fold doesn't end
up skipping later gep folds.
2024-12-05 15:20:56 +01:00
Florian Hahn
ffb1c21bd4
[Matrix] Fix crash in liftTranspose when instructions are folded.
Builder.Create(F)Add may constant fold the inputs, return a constant
instead of an instruction. Account for that instead of crashing.
2024-12-05 12:57:54 +00:00
Florian Hahn
7b6e0d9fc3
[Matrix] Use DenseMap for ShapeMap instead of ValueMap. (#118282)
ValueMap automatically updates entries with the new value if they have
been RAUW. This can lead to instructions that are expected to not have
shape info to be added to the map (e.g. shufflevector as in the added
test case).

This leads to incorrect results. Originally it was used for transpose
optimizations, but they now all use updateShapeAndReplaceAllUsesWith,
which takes care of updating the shape info as needed.

This fixes a crash in the newly added test cases.

PR: https://github.com/llvm/llvm-project/pull/118282
2024-12-04 14:51:31 +00:00
Florian Hahn
12cefcc7ec
[Matrix] Skip already fused instructions before trying to fuse multiply.
lowerDotProduct called above may already lower a matrix multiply and
mark it as procssed by adding it to FusedInsts. Don't try to process it
again in LowerMatrixMultiplyFused by checking if FusedInsts.

Without this change, we trigger an assertion when trying to erase the
same original matrix multiply twice.
2024-11-28 16:11:40 +00:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
sbite0138
05d3f5ed91
[LowerMatrixIntrinsics] Fix type suffix for matrix.multiply.* (#100940)
Based on the [proposal
PDF](https://llvm.org/devmtg/2020-09/slides/Hahn-Matrix_Support_in_LLVM_and_Clang.pdf)
and the test code under
[llvm/test/Transforms/LowerMatrixIntrinsics](https://github.com/llvm/llvm-project/tree/main/llvm/test/Transforms/LowerMatrixIntrinsics),
the suffix for the `@llvm.matrix.multiply.*` intrinsic should be {output
matrix type}.{input matrix 1 type}.{input matrix 2 type} (e.g.,
`@llvm.matrix.multiply.v4i32.v4i32.v4i32`).

This PR corrects the places where these suffixes do not follow the
aforementioned format.
2024-08-01 12:47:35 +02:00
David Green
352a836176
[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606)
This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep
i8, p, (mul x, C*4)`, so that the mul can combine both of the constant
multiplications, and we take a small step towards canonicalizing more
geps to i8.

It currently doesn't attempt to check for multiple uses on the mul, but
that should be possible if it sounds better. Let me know what you think
of the idea in general.
2024-06-26 14:25:54 +01:00
Florian Hahn
e77378cc14
[Matrix] Adjust lifetime.ends during multiply fusion. (#84914)
At the moment, loads introduced by multiply fusion may be placed after
an objects lifetime has been terminated by lifetime.end. This introduces
reads to dead objects.

To avoid this, first collect all lifetime.end calls in the function.
During fusion, we deal with any lifetime.end calls that may alias any of
the loads.

Such lifetime.end calls are either moved when possible (both the
lifetime.end and the store are in the same block) or deleted.

PR: https://github.com/llvm/llvm-project/pull/84914
2024-03-16 20:41:36 +01:00
Florian Hahn
d96d917f38
[Matrix] Add tests showing mis-compile with lifetime.end and fusion.
Add a set of tests showing miscompiles due to multiply fusion
introducing loads to dead objects after lifetime.end.
2024-03-12 13:33:55 +00:00
Florian Hahn
dbe4143f23
[Matrix] Fix dimensions when hoisting transpose across add. (#81507)
Row and column arguments for matrix_transpose indicate the shape of the
operand. When hoisting the transpose to the result of the add, the add
operates on the original operand's shape, and so does the hoisted
transpose.

This patch also adds an assert that the shape for the original add and
the transpose match, as well as the shape of the new add matches the
cached shape for it.

The assert could potentially be moved to
updateShapeAndReplaceAllUsesWith.
2024-02-12 18:45:13 +00:00
Florian Hahn
673e5e34b4
[Matrix] Add dedicated tests for transpose lifting.
Add extra test coverage for transpose lifting using
-matrix-print-after-transpose-opt.

The added tests show a mis-compile.
2024-02-12 16:19:31 +00:00
Alexey Bataev
7bc079c852
[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for
extract subvector.

Many targets do not have cost for extractsubvector shuffle kind, but
have the costs for single source permute. If there are no costs
estimation for extractsubvector, better to switchto single source
permute for better cost estimation.

Reviewers: RKSimon, davemgreen, arsenm

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/79837
2024-02-12 07:09:49 -05:00
Florian Hahn
f89fe08d77
[Matrix] Convert column-vector ops feeding dot product to row-vectors. (#72647)
Generalize the logic used to convert column-vector ops to row-vectors to
support converting chains of operations.

A potential next step is to further generalize this to convert
column-vector ops to row-vector ops in general, not just for operands of
dot products. Dot-product handling would then be driven by the general
conversion, rather than the other way around.

PR: https://github.com/llvm/llvm-project/pull/72647
2024-02-06 13:47:31 +00:00
Nikita Popov
90ba33099c
[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882)
This patch canonicalizes getelementptr instructions with constant
indices to use the `i8` source element type. This makes it easier for
optimizations to recognize that two GEPs are identical, because they
don't need to see past many different ways to express the same offset.

This is a first step towards
https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699.
This is limited to constant GEPs only for now, as they have a clear
canonical form, while we're not yet sure how exactly to deal with
variable indices.

The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives
two representative examples of the kind of optimization improvement we
expect from this change. In the first test SimplifyCFG can now realize
that all switch branches are actually the same. In the second test it
can convert it into simple arithmetic. These are representative of
common optimization failures we see in Rust.

Fixes https://github.com/llvm/llvm-project/issues/69841.
2024-01-24 15:25:29 +01:00
Nikita Popov
a5f3415533 [InstCombine] Replace non-demanded undef vector with poison
If an operand (esp to shufflevector or insertelement) is not
demanded, canonicalize it from undef to poison.
2023-12-18 16:12:37 +01:00
Nikita Popov
d0605e21af [InstCombine] Canonicalize splat shuffles to use poison operand
If the splat shuffle is represented using an undef RHS, replace it
with poison.
2023-12-18 15:57:49 +01:00
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
David Green
2a859b2014 [AArch64] Change the cost of vector insert/extract to 2
The cost of vector instructions has always been high under AArch64, in order to
add a high cost for inserts/extracts, shuffles and scalarization. This is a
conservative approach to limit the scope of unusual SLP vectorization where the
codegen ends up being quite poor, but has always been higher than the correct
costs would be for any specific core.

This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a
generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is
also overridden for integer vector at the same time, to remove the effect of
lane 0 being considered free for integer vectors (something that should only be
true for float when scalarizing).

The lower insert/extract cost will reduce the cost of insert, extracts,
shuffling and scalarization. The adjustments of ScalaizationOverhead will
increase the cost on integer, especially for small vectors. The end result will
be lower cost for float and long-integer types, some higher cost for some
smaller vectors. This, along with the raw insert/extract cost being lower, will
generally mean more vectorization from the Loop and SLP vectorizer.

We may end up regretting this, as that vectorization is not always profitable.
In all the benchmarking I have done this is generally an improvement in the
overall performance, and I've attempted to address the places where it wasn't
with other costmodel adjustments.

Differential Revision: https://reviews.llvm.org/D155459
2023-07-28 21:26:50 +01:00
Nikita Popov
bc39a7a5e4 [LowerMatrixIntrinsics] Fix test expectations (NFC)
Some of the test expectation were incorrectly changed in
23c21759458014fc4d7cbea45b6fbe7349a0a4fd. Regenerate the tests.
2023-07-18 11:21:11 +02:00
Nuno Lopes
23c2175945 [LowerMatrixIntrinsics] Use poison instead of undef as placeholder [NFC]
These values don't propagate to the output; they are always replaced with a subsequent shuffle
or insertelement.
Tested equivalence with Alive2, e.g., https://alive2.llvm.org/ce/z/fj4s78.
2023-07-18 09:54:41 +01:00
Florian Hahn
c10a7772bd
[Matrix] Convert binop operand of dot product to a row vector.
The dot product lowering will use the left operand as row vector.
If the operand is a binary op, convert it to operate on a row vector
instead of a column vector.

Depends on D148428.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D148429
2023-06-07 20:45:08 +01:00