llvm-project

Author	SHA1	Message	Date
michaelselehov	3645cef1ef	[AMDGPU] LiveRegOptimizer: consider i8/i16 binops on SDWA (#155800 ) PHI-node part was merged with PR#160909. Extend `isOpLegal` to treat 8/16-bit vector add/sub/and/or/xor as profitable on SDWA targets (stores and intrinsics remain profitable). This repacks loop-carried values to i32 across BBs and restores SDWA lowering instead of scattered lshr/lshl/or sequences. Testing: - Local: `check-llvm-codegen-amdgpu` is green (4314/4320 passed, 6 XFAIL). - Additional: validated in AMD internal CI	2025-12-15 12:04:33 -05:00
michaelselehov	617854f819	[AMDGPU] LRO: allow same-BB non-lookthrough users for PHI (#160909 ) Loop headers frequently consume the loop-carried value in the header block via non-lookthrough ops (e.g. byte-wise vector binops). LiveRegOptimizer’s same-BB filter currently prunes these users, so the loop-carried PHI is not coerced to i32 and the intended packed form is lost. Relax the filter: when the def is a PHI, allow same-BB non-lookthrough users. Also fix the check to look at the user (CII) rather than the def (II) so the walk does not terminate prematurely.	2025-09-30 00:57:07 +09:00
Ivan Kosarev	faca8c9ed4	[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769 ) Saves around 125-210 MB of compilation memory usage per source for roughly one third of our backend sources, ~60 MB on average.	2025-08-22 10:05:06 +01:00
Jim M. R. Teichgräber	5d54a576fe	[AMDGPU] AMDGPULateCodeGenPrepare Legacy PM: replace `setPreservesAll()` with `setPreservesCFG()` (#148167 ) This PR depends on #148165; the first commit (90f1d0a881a21a8b4f192622d798c290770fda63) belongs to that PR. The changes are distinct, so separate PRs seemed like the best option. I don't have commit access, so I couldn't use user-branches to mark the dependency. As AMDGPULateCodeGenPrepare actually performs changes that invalidate Uniformity Analysis; use `setPreservesCFG()` to mark this, instead of `setPreservesAll()` which wrongly includes preserving Uniformity Analysis. Note that before #148165, this would still have preserved Uniformity Analysis, hence the dependency. In addition, `amdgpu/llc-pipeline.cc` needs to be changed when both changes are in effect, but those changes would make the test fail if the PRs weren't based on one another. Note on why this hasn't caused issues so far: It just so happens that AMDGPULateCodeGenPrepare is always immediately followed by AMDGPUUnifyDivergentExitNodes, which does invalidate most analyses, including Uniformity. And because UnifyDivergentExitNodes only looks at terminators, and LateCGP seemingly does not replace uniform values with divergent values, or divergent values with uniform values, and it only inserts new values that are not looked at by UnifyDivergentExitNodes, this bug remained hidden. --- I ran `git-clang-format` on my changes. I tested them using the `check-llvm` target; no unexpected failures occurred after I made the change to `amdgpu/llc-pipeline.ll`.	2025-08-12 19:40:02 +09:00
Pierre van Houtryve	ed87f0afba	[AMDGPU] Visit all PHIs in each call to optimizeLiveType (#147522 ) Make the Visited set a local variable, otherwise we can reject a PHI (those that do not have a zeroinitializer constant) but mark it as visited, and the rest of the function thinks the PHI is ok when it isn't. This is a bit crude but it's the only fix that consistently worked in my testing. Fixes SWDEV-541767	2025-07-10 09:29:48 +02:00
Ramkumar Ramachandra	b40e4ceaa6	[ValueTracking] Make Depth last default arg (NFC) (#142384 ) Having a finite Depth (or recursion limit) for computeKnownBits is very limiting, but is currently a load-bearing necessity, as all KnownBits are recomputed on each call and there is no caching. As a prerequisite for an effort to remove the recursion limit altogether, either using a clever caching technique, or writing a easily-invalidable KnownBits analysis, make the Depth argument in APIs in ValueTracking uniformly the last argument with a default value. This would aid in removing the argument when the time comes, as many callers that currently pass 0 explicitly are now updated to omit the argument altogether.	2025-06-03 17:12:24 +01:00
Kazu Hirata	510c8a23e6	[llvm] Use llvm::find_if (NFC) (#139654 )	2025-05-12 22:58:30 -07:00
Jay Foad	886f1199f0	[AMDGPU] Use variadic isa<>. NFC. (#137016 )	2025-04-24 08:19:09 +01:00
Jeffrey Byrnes	bf12954715	[AMDGPU] Whitelist all intrinsics (#130150 ) For code maintainability -- this may result in cases where we are applying the optimization where it is not profitable, but those are likely to be rare.	2025-03-06 15:39:11 -08:00
choikwa	45759fe5b4	[AMDGPU] Filter candidates of LiveRegOptimizer for profitable cases (#124624 ) It is known that for vector whose element fits in i16 will be split and scalarized in SelectionDag's type legalizer (see SIISelLowering::getPreferredVectorAction). LRO attempts to undo the scalarizing of vectors across basic block boundary and shoehorn Values in VGPRs. LRO is beneficial for operations that natively work on illegal vector types to prevent flip-flopping between unpacked and packed. If we know that operations on vector will be split and scalarized, then we don't want to shoehorn them back to packed VGPR. Operations that we know to work natively on illegal vector types usually come in the form of intrinsics (MFMA, DOT8), buffer store, shuffle, phi nodes to name a few.	2025-03-05 18:44:48 -05:00
Kazu Hirata	66c31f5d02	[AMDGPU] Avoid repeated hash lookups (NFC) (#126401 ) This patch just cleans up the "if" condition. Further cleanups are left to subsequent patches.	2025-02-08 23:17:06 -08:00
Shilei Tian	f15da5fb78	[AMDGPU] Fix an invalid cast in `AMDGPULateCodeGenPrepare::visitLoadInst` (#122494 ) Fixes: SWDEV-507695	2025-01-12 23:40:25 -05:00
Jay Foad	f9f7c42ca6	[AMDGPU] Refine AMDGPULateCodeGenPrepare class. NFC. (#118792 ) Use references instead of pointers for most state and initialize it all in the constructor, and similarly for the LiveRegOptimizer class.	2024-12-05 14:05:51 +00:00
Jay Foad	3923e0451a	[AMDGPU] Preserve all analyses if nothing changed (#117994 )	2024-11-28 14:33:05 +00:00
Kazu Hirata	be187369a0	[AMDGPU] Remove unused includes (NFC) (#116154 ) Identified with misc-include-cleaner.	2024-11-13 21:10:03 -08:00
Kazu Hirata	0cb80c4f00	[AMDGPU] Avoid repeated hash lookups (NFC) (#113409 )	2024-10-22 23:02:34 -07:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Kazu Hirata	d7db094340	[AMDGPU] Avoid repeated hash lookups (NFC) (#109506 )	2024-09-21 00:02:19 -07:00
Matt Arsenault	05b75e006b	AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (#102806 )	2024-08-12 15:09:12 +04:00
Jeffrey Byrnes	03936534b5	[AMDGPU] Protect against null entries in ValMap Change-Id: Icbda7c3fecf38679d06006986e5e17cb1f1b8749	2024-07-22 16:50:54 -07:00
Jeffrey Byrnes	6e68b75e66	[AMDGPU] Reland: Do not use original PHIs in coercion chains Change-Id: I579b5c69a85997f168ed35354b326524b6f84ef7	2024-07-19 09:02:28 -07:00
Jay Foad	1612e4a351	Revert "[AMDGPU] Do not use original PHIs in coercion chains (#98063 )" This reverts commit dc8ea046a516c3bdd0ece306f406c9ea833d4dac. It generated broken IR as described here: https://github.com/llvm/llvm-project/pull/98063#issuecomment-2225259451	2024-07-15 15:15:29 +01:00
Jeffrey Byrnes	dc8ea046a5	[AMDGPU] Do not use original PHIs in coercion chains (#98063 ) It's possible that we are unable to coerce all the incoming values of a PHINode (A). Thus, we are unable to coerce the PHINode. In this situation, we previously would add the PHINode back to the ValMap. This would cause a problem is PhiNode (B) was a user of A. In this scenario, if B has been coerced, we would hit an assert regarding the incompatible type between the PHINode and its incoming value. Deleting non-coerced PHINodes from the map, and propagating the removal to users, resolves the issue.	2024-07-10 11:32:45 -07:00
Jeffrey Byrnes	5da7179cb3	[AMDGPU] Reland: Add IR LiveReg type-based optimization	2024-07-03 09:26:19 -07:00
Vitaly Buka	3e53c97d33	Revert "[AMDGPU] Add IR LiveReg type-based optimization" (#97138 ) Part of #66838. https://lab.llvm.org/buildbot/#/builders/52/builds/404 https://lab.llvm.org/buildbot/#/builders/55/builds/358 https://lab.llvm.org/buildbot/#/builders/164/builds/518 This reverts commit ded956440739ae326a99cbaef18ce4362e972679.	2024-06-28 23:18:26 -07:00
Jeffrey Byrnes	ded9564407	[AMDGPU] Add IR LiveReg type-based optimization Change-Id: Ia0d11b79b8302e79247fe193ccabc0dad2d359a0	2024-06-28 15:01:39 -07:00
Jay Foad	89226ecbb9	[AMDGPU] Do not widen scalar loads on GFX12 (#78724 ) GFX12 has subword scalar loads so there is no need to do this.	2024-01-19 15:30:07 +00:00
Jay Foad	4a77414660	[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633 )	2024-01-17 10:28:03 +00:00
Matt Arsenault	3e16167c14	AMDGPU: Use getTypeStoreSizeInBits	2023-04-29 10:35:06 -04:00
Matt Arsenault	4202ad5d94	AMDGPU: Don't create a pointer bitcast in AMDGPULateCodeGenPrepare	2023-04-29 10:34:21 -04:00
pvanhout	036431e31e	[AMDGPU] Use UniformityAnalysis in LateCodeGenPrepare Reviewed By: foad Differential Revision: https://reviews.llvm.org/D145366	2023-03-06 13:35:57 +01:00
Kazu Hirata	4bef0304e1	[AArch64, AMDGPU] Use make_early_inc_range (NFC)	2021-11-03 09:22:51 -07:00
Nikita Popov	357756ecf6	[OpaquePtr] Remove uses of CreateConstGEP1_64() without element type Remove uses of to-be-deprecated API.	2021-07-17 16:43:20 +02:00
Matt Arsenault	a15ed701ab	AMDGPU: Fix assert on constant load from addrspacecasted pointer This was trying to create a bitcast between different address spaces.	2021-05-11 20:12:20 -04:00
Nikita Popov	46354bac76	[OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC) Explicitly pass loaded type when creating loads, in preparation for the deprecation of these APIs. There are still a couple of uses left.	2021-03-11 14:40:57 +01:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Michael Liao	46c3d5cb05	[amdgpu] Add the late codegen preparation pass. Summary: - Teach that pass to widen naturally aligned but not DWORD aligned sub-DWORD loads. Reviewers: rampitec, arsenm Subscribers: Tags: #llvm Differential Revision: https://reviews.llvm.org/D80364	2020-10-27 14:07:59 -04:00

37 Commits