llvm-project

Author	SHA1	Message	Date
Florian Hahn	8e61085291	[Matrix] Place allocas in function entry. (#190032 ) Create allocas for temporary matrixes in the function entry. Limit the lifetime via lifetime.start & lifetime.end. This avoids dynamic allocas. Improvement suggested in https://github.com/llvm/llvm-project/pull/188721. PR: https://github.com/llvm/llvm-project/pull/190032	2026-04-02 17:36:13 +00:00
Florian Hahn	ba6041bfbd	[Matrix] Handle load/store with different AS in getNonAliasingPointer. (#188721 ) If a load and a store have different address spaces, we cannot create a runtime check. Instead, always copy the data to an alloca matching the store address space. Fixes https://github.com/llvm/llvm-project/issues/185236. PR: https://github.com/llvm/llvm-project/pull/188721	2026-03-28 20:52:54 +00:00
Florian Hahn	b164e7c610	[Matrix] Handle -fuse-matrix-tile-size=0 as tiling disabled. (#188861 ) Treat -fuse-matrix-tile-size=0 as disabling tiling, as tile-size of 0 doesn't really make sense. Fixes https://github.com/llvm/llvm-project/issues/185153 PR: https://github.com/llvm/llvm-project/pull/188861	2026-03-27 10:14:20 +00:00
Nikita Popov	6ecbc0c96e	[InstCombine] Canonicalize GEP source element types (#180745 ) Canonicalize GEP source element types from `%T` to `[sizeof(%T) x i8]`. This is intended to flush out any remaining places that rely on GEP element types, as part of the `ptradd` migration. The impact of this change is expected to be fairly minimal (we might enable a few more hoist/sink style folds that depend on equal GEP types).	2026-03-06 14:48:01 +00:00
Nikita Popov	f4a29d9278	[LowerMatrixIntrinsics] Avoid use of ptrtoint (#182289 ) The ptrtoint result here is used in icmp. However, icmp can already directly work with pointers, so there's no need to perform the cast. (I originally wanted to switch this to ptrtoaddr, but that's not really necessary when we can directly compare on pointers.)	2026-02-19 16:23:58 +01:00
Aiden Grossman	7cbf453018	[ProfCheck][Matrix] Add profile data where relevant This patch tackles two cases: 1. Checks around aliasing/overlapping ranges. This is runtime dependent on the pointer values passed in, which we have no way of knowing without additional profiling. 2. Loop backedges. For these we also have an associated trip count, so we set up the branch weights to represent this. Tests updated/profcheck-xfail.txt updated. Reviewers: alanzhao1, fhahn, mtrofin, snehasish Pull Request: https://github.com/llvm/llvm-project/pull/181292	2026-02-17 18:51:08 -08:00
Aiden Grossman	293acb5d32	[ProfCheck][Matrix] Propagate profile information for selects LowerMatrixIntrinsics creates new selects in the process of lowering matrix intrinsics. The condition of such selects remains the same as before. Because of this, we can directly propagate the profile information for all selects on scalar conditions. Reviewers: mtrofin, snehasish, fhahn, alanzhao1 Pull Request: https://github.com/llvm/llvm-project/pull/181248	2026-02-17 18:42:19 -08:00
Florian Hahn	54177e95d1	[Matrix] Use tiled loops automatically for large kernels. (#179325 ) Update LowerMatrixIntrinsics to use tiled loops automatically in for larger matrixes. The fully unrolled codegen creates a huge amount of code, which performs noticably worse then the tiled loop nest variant. We new try to estimate the number of instructions needed for the multiply, and if it is too large, tiled loops are used. The current threshold is anything roughly larger than 6x6x6 double multiply. Eventually I think we want to only generate tiled loops. This patch is a first step, trying to opt in for cases where we know it is beneficial. Checked on AArch64, but should help on other architectures similarly, and also drastically reduce binary size + compile time. PR: https://github.com/llvm/llvm-project/pull/179325	2026-02-11 15:36:34 +00:00
Florian Hahn	0c07203e07	[Matrix] Add test where pointer phi currently blocks tiling. Add a test with a phi node currently unnecessarily preventing tiling.	2026-02-02 22:38:45 +00:00
Florian Hahn	acb23124ee	[Matrix] Update test to make sure tiled loops can be used. (NFC) The 6x6x6 and 7x7x7 matrix multiply used previously could not use tiled loop codegen. Update to 8x8x8 and forced tile-size of 2.	2026-02-01 21:47:17 +00:00
Florian Hahn	dd9fe97cc7	[Matrix] Add test showing excessive matrix codegen. Add tests showing excessive codegen for small number of matrix ops.	2026-01-30 18:35:10 +00:00
Vladislav Dzhidzhoev	e2a2c03eef	[DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (#166855 ) These checks ensure that retained nodes of a DISubprogram belong to the subprogram. Tests with incorrect IR are fixed. We should not have variables of one subprogram present in retained nodes of other subprograms. Also, interface for accessing DISubprogram's retained nodes is slightly refactored. `DISubprogram::visitRetainedNodes` and `DISubprogram::forEachRetainedNode` are added to avoid repeating checks like ``` if (const auto LV = dyn_cast<DILocalVariable>(N)) ... else if (const auto L = dyn_cast<DILabel>(N)) ... else if (const auto *IE = dyn_cast<DIImportedEntity>(N)) ... ```	2025-11-10 13:13:49 +01:00
Adam Nemet	783b050f88	[LMI] Support non-power-of-2 types for the matmul remainder (#163987 ) In the inner loop of matmul, instead of continuously halving the HW vector register width, I just use the remainder vector directly if it's legal. We don't have in-tree targets that have this so I opted for adding a hidden flag to simulate this for testing purposes: -matrix-split-matmul-remainder-over-threshold The tests are the vectorization-friendly 3x3x1 matrix-vector and 1x3x3 vector-matrix multiplies for CM, RM respectively.	2025-10-17 18:42:30 +00:00
Nathan Corbyn	b00c4ff4b9	[Matrix][IR] Cap stride bitwidth at 64 (#163729 ) a1ef81d added overloads for `llvm.matrix.column.major.store` and `llvm.matrix.column.major.load` that allow strides to occupy an arbitrary bitwidth. This change wasn't reflected in the verifier, causing an assertion to trip when given strides overflowing 64-bit. This patch explicitly caps the bitwidth at 64, repairing the crash and avoiding future complexity dealing with strides that overflow 64 bits. PR: https://github.com/llvm/llvm-project/pull/163729	2025-10-17 12:54:28 +01:00
Nathan Corbyn	625aa09fc3	[Matrix] Use data layout index type for lowering matrix intrinsics (#162646 ) To properly support the matrix intrinsics on, e.g., 32-bit platforms (without the need to emit `libc` calls), `LowerMatrixIntrinsics` pass should generate code that performs strided index calculations using the same pointer bit-width as the matrix pointers, as determined by the data layout. This patch updates the `LowerMatrixInstrics` transform to make this the case. PR: https://github.com/llvm/llvm-project/pull/162646	2025-10-13 12:40:27 +01:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Nikita Popov	92c55a315e	[IR] Only allow lifetime.start/end on allocas (#149310 ) lifetime.start and lifetime.end are primarily intended for use on allocas, to enable stack coloring and other liveness optimizations. This is necessary because all (static) allocas are hoisted into the entry block, so lifetime markers are the only way to convey the actual lifetimes. However, lifetime.start and lifetime.end are currently allowed to be used on non-alloca pointers. We don't actually do this in practice, but just the mere fact that this is possible breaks the core purpose of the lifetime markers, which is stack coloring of allocas. Stack coloring can only work correctly if all lifetime markers for an alloca are analyzable. * If a lifetime marker may operate on multiple allocas via a select/phi, we don't know which lifetime actually starts/ends and handle it incorrectly (https://github.com/llvm/llvm-project/issues/104776). * Stack coloring operates on the assumption that all lifetime markers are visible, and not, for example, hidden behind a function call or escaped pointer. It's not possible to change this, as part of the purpose of lifetime markers is that they work even in the presence of escaped pointers, where simple use analysis is insufficient. I don't think there is any way to have coherent semantics for lifetime markers on allocas, while also permitting them on arbitrary pointer values. This PR restricts lifetimes to operate on allocas only. As a followup, I will also drop the size argument, which is superfluous if we always operate on an alloca. (This change also renders various code handling lifetime markers on non-alloca dead. I plan to clean up that kind of code after dropping the size argument as well.) In practice, I've only found a few places that currently produce lifetimes on non-allocas: * CoroEarly replaces the promise alloca with the result of an intrinsic, which will later be replaced back with an alloca. I think this is the only place where there is some legitimate loss of functionality, but I don't think this is particularly important (I don't think we'd expect the promise in a coroutine to admit useful lifetime optimization.) * SafeStack moves unsafe allocas onto a separate frame. We can safely drop lifetimes here, as SafeStack performs its own stack coloring. * Similar for AddressSanitizer, it also moves allocas into separate memory. * LSR sometimes replaces the lifetime argument with a GEP chain of the alloca (where the offsets ultimately cancel out). This is just unnecessary. (Fixed separately in https://github.com/llvm/llvm-project/pull/149492.) * InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast of an alloca. I don't think this is necessary.	2025-07-21 15:04:50 +02:00
Jon Roelofs	0fa373c77d	[Matrix] Propagate shape information through PHI insts (#141681 ) ... and split them as we lower them, avoiding several shuffles in the process.	2025-06-18 09:00:48 -07:00
Jon Roelofs	56548e1d9b	[Matrix] Fix a crash in VisitSelectInst due to iteration length mismatch	2025-06-12 09:27:06 -07:00
Jon Roelofs	ca5b71a455	[Matrix] Propagate shape information through Select insts (#141876 )	2025-06-12 07:52:25 -07:00
Jon Roelofs	44b928e0d5	[Matrix] Propagate shape information through cast insts (#141869 )	2025-06-10 12:15:41 -07:00
Jon Roelofs	68bb005ae0	[Matrix] Add -debug-only prints when matrices get flattened (#142078 ) This is a potential source of overhead, which we might be able to alleviate in some cases. For example, static element extracts, or shuffles that pluck out a specific row. Since these diagnostics are highly specific to the pass itself and not immediately actionable for compiler users, these prints don't make a whole lot of sense as Remarks.	2025-06-10 09:44:53 -07:00
Jon Roelofs	274f5a817b	[Matrix] Propagate shape information through (f)abs insts (#141704 )	2025-06-09 12:52:43 -07:00
Florian Hahn	370d01765c	[Matrix] Use shape info for StoreInst directly. (#142664 ) ShapeInfo for the store operand may be dropped, e.g. because the operand got folded by transpose optimizations to another instruction w/o shape info. This was exposed by the assertion added in https://github.com/llvm/llvm-project/pull/142416. This updates VisitStore to use the shape-info directly from the instruction, which is in line with the other Visit* functions and ensures that we won't lose shape info. PR: https://github.com/llvm/llvm-project/pull/142664	2025-06-04 09:15:57 +01:00
Jon Roelofs	1f1c725b68	[Matrix] Propagate shape information through all binops (#141705 ) They all have vector variants, so the obvious "find and replace" does the trick.	2025-05-28 11:20:06 -07:00
Jon Roelofs	e838b8b7e7	[Matrix] Fix a miscompile due to an incorrect double-transpose fold (#135397 ) Transposes are only inverses of each other when they have matching shapes. rdar://145592582	2025-04-12 07:31:13 -07:00
Florian Hahn	48441cb8a2	[Matrix] Properly set Changed status when optimizing transposes. Currently Changed is not updated properly when transposes are optimized, causing missing analysis invalidation. Update optimizeTransposes to indicate if changes have been made.	2025-04-06 17:36:56 +01:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Simon Pilgrim	1bb784a748	[LowerMatrixIntrinsics] multiply-minimal.ll - use -passes="..." to allow DOS to correctly evaluate the RUN command Necessary for running update_test_checks.py on windows	2025-01-27 16:05:29 +00:00
Nikita Popov	f7685af4a5	[InstCombine] Move gep of phi fold into separate function This makes sure that an early return during this fold doesn't end up skipping later gep folds.	2024-12-05 15:20:56 +01:00
Florian Hahn	ffb1c21bd4	[Matrix] Fix crash in liftTranspose when instructions are folded. Builder.Create(F)Add may constant fold the inputs, return a constant instead of an instruction. Account for that instead of crashing.	2024-12-05 12:57:54 +00:00
Florian Hahn	7b6e0d9fc3	[Matrix] Use DenseMap for ShapeMap instead of ValueMap. (#118282 ) ValueMap automatically updates entries with the new value if they have been RAUW. This can lead to instructions that are expected to not have shape info to be added to the map (e.g. shufflevector as in the added test case). This leads to incorrect results. Originally it was used for transpose optimizations, but they now all use updateShapeAndReplaceAllUsesWith, which takes care of updating the shape info as needed. This fixes a crash in the newly added test cases. PR: https://github.com/llvm/llvm-project/pull/118282	2024-12-04 14:51:31 +00:00
Florian Hahn	12cefcc7ec	[Matrix] Skip already fused instructions before trying to fuse multiply. lowerDotProduct called above may already lower a matrix multiply and mark it as procssed by adding it to FusedInsts. Don't try to process it again in LowerMatrixMultiplyFused by checking if FusedInsts. Without this change, we trigger an assertion when trying to erase the same original matrix multiply twice.	2024-11-28 16:11:40 +00:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
sbite0138	05d3f5ed91	[LowerMatrixIntrinsics] Fix type suffix for matrix.multiply.* (#100940 ) Based on the [proposal PDF](https://llvm.org/devmtg/2020-09/slides/Hahn-Matrix_Support_in_LLVM_and_Clang.pdf) and the test code under [llvm/test/Transforms/LowerMatrixIntrinsics](https://github.com/llvm/llvm-project/tree/main/llvm/test/Transforms/LowerMatrixIntrinsics), the suffix for the `@llvm.matrix.multiply.*` intrinsic should be {output matrix type}.{input matrix 1 type}.{input matrix 2 type} (e.g., `@llvm.matrix.multiply.v4i32.v4i32.v4i32`). This PR corrects the places where these suffixes do not follow the aforementioned format.	2024-08-01 12:47:35 +02:00
David Green	352a836176	[InstCombine] Canonicalize non-i8 gep of mul to i8 (#96606 ) This is a small canonicalization for `gep i32, p, (mul x, C)` -> `gep i8, p, (mul x, C*4)`, so that the mul can combine both of the constant multiplications, and we take a small step towards canonicalizing more geps to i8. It currently doesn't attempt to check for multiple uses on the mul, but that should be possible if it sounds better. Let me know what you think of the idea in general.	2024-06-26 14:25:54 +01:00
Florian Hahn	e77378cc14	[Matrix] Adjust lifetime.ends during multiply fusion. (#84914 ) At the moment, loads introduced by multiply fusion may be placed after an objects lifetime has been terminated by lifetime.end. This introduces reads to dead objects. To avoid this, first collect all lifetime.end calls in the function. During fusion, we deal with any lifetime.end calls that may alias any of the loads. Such lifetime.end calls are either moved when possible (both the lifetime.end and the store are in the same block) or deleted. PR: https://github.com/llvm/llvm-project/pull/84914	2024-03-16 20:41:36 +01:00
Florian Hahn	d96d917f38	[Matrix] Add tests showing mis-compile with lifetime.end and fusion. Add a set of tests showing miscompiles due to multiply fusion introducing loads to dead objects after lifetime.end.	2024-03-12 13:33:55 +00:00
Florian Hahn	dbe4143f23	[Matrix] Fix dimensions when hoisting transpose across add. (#81507 ) Row and column arguments for matrix_transpose indicate the shape of the operand. When hoisting the transpose to the result of the add, the add operates on the original operand's shape, and so does the hoisted transpose. This patch also adds an assert that the shape for the original add and the transpose match, as well as the shape of the new add matches the cached shape for it. The assert could potentially be moved to updateShapeAndReplaceAllUsesWith.	2024-02-12 18:45:13 +00:00
Florian Hahn	673e5e34b4	[Matrix] Add dedicated tests for transpose lifting. Add extra test coverage for transpose lifting using -matrix-print-after-transpose-opt. The added tests show a mis-compile.	2024-02-12 16:19:31 +00:00
Alexey Bataev	7bc079c852	[TTI]Fallback to SingleSrcPermute shuffle kind, if no direct estimation for extract subvector. Many targets do not have cost for extractsubvector shuffle kind, but have the costs for single source permute. If there are no costs estimation for extractsubvector, better to switchto single source permute for better cost estimation. Reviewers: RKSimon, davemgreen, arsenm Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/79837	2024-02-12 07:09:49 -05:00
Florian Hahn	f89fe08d77	[Matrix] Convert column-vector ops feeding dot product to row-vectors. (#72647 ) Generalize the logic used to convert column-vector ops to row-vectors to support converting chains of operations. A potential next step is to further generalize this to convert column-vector ops to row-vector ops in general, not just for operands of dot products. Dot-product handling would then be driven by the general conversion, rather than the other way around. PR: https://github.com/llvm/llvm-project/pull/72647	2024-02-06 13:47:31 +00:00
Nikita Popov	90ba33099c	[InstCombine] Canonicalize constant GEPs to i8 source element type (#68882 ) This patch canonicalizes getelementptr instructions with constant indices to use the `i8` source element type. This makes it easier for optimizations to recognize that two GEPs are identical, because they don't need to see past many different ways to express the same offset. This is a first step towards https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699. This is limited to constant GEPs only for now, as they have a clear canonical form, while we're not yet sure how exactly to deal with variable indices. The test llvm/test/Transforms/PhaseOrdering/switch_with_geps.ll gives two representative examples of the kind of optimization improvement we expect from this change. In the first test SimplifyCFG can now realize that all switch branches are actually the same. In the second test it can convert it into simple arithmetic. These are representative of common optimization failures we see in Rust. Fixes https://github.com/llvm/llvm-project/issues/69841.	2024-01-24 15:25:29 +01:00
Nikita Popov	a5f3415533	[InstCombine] Replace non-demanded undef vector with poison If an operand (esp to shufflevector or insertelement) is not demanded, canonicalize it from undef to poison.	2023-12-18 16:12:37 +01:00
Nikita Popov	d0605e21af	[InstCombine] Canonicalize splat shuffles to use poison operand If the splat shuffle is represented using an undef RHS, replace it with poison.	2023-12-18 15:57:49 +01:00
Dmitriy Smirnov	e13bed4c5f	[PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP This patch tries to canonicalise add + gep to gep + gep. Co-authored-by: Paul Walker <paul.walker@arm.com> Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D155688	2023-10-06 12:29:06 +01:00
David Green	2a859b2014	[AArch64] Change the cost of vector insert/extract to 2 The cost of vector instructions has always been high under AArch64, in order to add a high cost for inserts/extracts, shuffles and scalarization. This is a conservative approach to limit the scope of unusual SLP vectorization where the codegen ends up being quite poor, but has always been higher than the correct costs would be for any specific core. This relaxes that, reducing the vector insert/extract cost from 3 to 2. It is a generalization of D142359 to all AArch64 cpus. The ScalarizationOverhead is also overridden for integer vector at the same time, to remove the effect of lane 0 being considered free for integer vectors (something that should only be true for float when scalarizing). The lower insert/extract cost will reduce the cost of insert, extracts, shuffling and scalarization. The adjustments of ScalaizationOverhead will increase the cost on integer, especially for small vectors. The end result will be lower cost for float and long-integer types, some higher cost for some smaller vectors. This, along with the raw insert/extract cost being lower, will generally mean more vectorization from the Loop and SLP vectorizer. We may end up regretting this, as that vectorization is not always profitable. In all the benchmarking I have done this is generally an improvement in the overall performance, and I've attempted to address the places where it wasn't with other costmodel adjustments. Differential Revision: https://reviews.llvm.org/D155459	2023-07-28 21:26:50 +01:00
Nikita Popov	bc39a7a5e4	[LowerMatrixIntrinsics] Fix test expectations (NFC) Some of the test expectation were incorrectly changed in 23c21759458014fc4d7cbea45b6fbe7349a0a4fd. Regenerate the tests.	2023-07-18 11:21:11 +02:00
Nuno Lopes	23c2175945	[LowerMatrixIntrinsics] Use poison instead of undef as placeholder [NFC] These values don't propagate to the output; they are always replaced with a subsequent shuffle or insertelement. Tested equivalence with Alive2, e.g., https://alive2.llvm.org/ce/z/fj4s78.	2023-07-18 09:54:41 +01:00
Florian Hahn	c10a7772bd	[Matrix] Convert binop operand of dot product to a row vector. The dot product lowering will use the left operand as row vector. If the operand is a binary op, convert it to operate on a row vector instead of a column vector. Depends on D148428. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D148429	2023-06-07 20:45:08 +01:00

1 2 3

142 Commits