37 Commits

Author SHA1 Message Date
michaelselehov
3645cef1ef
[AMDGPU] LiveRegOptimizer: consider i8/i16 binops on SDWA (#155800)
PHI-node part was merged with PR#160909.

Extend `isOpLegal` to treat 8/16-bit vector add/sub/and/or/xor as
profitable on SDWA targets (stores and intrinsics remain profitable).
This repacks loop-carried values to i32 across BBs and restores SDWA
lowering instead of scattered lshr/lshl/or sequences.

Testing:
- Local: `check-llvm-codegen-amdgpu` is green (4314/4320 passed, 6
XFAIL).
- Additional: validated in AMD internal CI
2025-12-15 12:04:33 -05:00
michaelselehov
617854f819
[AMDGPU] LRO: allow same-BB non-lookthrough users for PHI (#160909)
Loop headers frequently consume the loop-carried value in the header
block via non-lookthrough ops (e.g. byte-wise vector binops).
LiveRegOptimizer’s same-BB filter currently prunes these users, so the
loop-carried PHI is not coerced to i32 and the intended packed form is
lost.

Relax the filter: when the def is a PHI, allow same-BB non-lookthrough
users. Also fix the check to look at the user (CII) rather than the def
(II) so the walk does not terminate prematurely.
2025-09-30 00:57:07 +09:00
Ivan Kosarev
faca8c9ed4
[AMDGPU][NFC] Only include CodeGenPassBuilder.h where needed. (#154769)
Saves around 125-210 MB of compilation memory usage per source for
roughly one third of our backend sources, ~60 MB on average.
2025-08-22 10:05:06 +01:00
Jim M. R. Teichgräber
5d54a576fe
[AMDGPU] AMDGPULateCodeGenPrepare Legacy PM: replace setPreservesAll() with setPreservesCFG() (#148167)
This PR depends on #148165; the first commit
(90f1d0a881a21a8b4f192622d798c290770fda63) belongs to that PR. The
changes are distinct, so separate PRs seemed like the best option. I
don't have commit access, so I couldn't use user-branches to mark the
dependency.

As AMDGPULateCodeGenPrepare actually performs changes that invalidate
Uniformity Analysis; use `setPreservesCFG()` to mark this, instead of
`setPreservesAll()` which wrongly includes preserving Uniformity
Analysis.

Note that before #148165, this would still have preserved Uniformity
Analysis, hence the dependency. In addition, `amdgpu/llc-pipeline.cc`
needs to be changed when both changes are in effect, but those changes
would make the test fail if the PRs weren't based on one another.

Note on why this hasn't caused issues so far:
It just so happens that AMDGPULateCodeGenPrepare is always immediately
followed by AMDGPUUnifyDivergentExitNodes, which *does* invalidate most
analyses, including Uniformity. And because UnifyDivergentExitNodes only
looks at terminators, and LateCGP seemingly does not replace uniform
values with divergent values, or divergent values with uniform values,
and it only *inserts new values that are not looked at by
UnifyDivergentExitNodes*, this bug remained hidden.

---

I ran `git-clang-format` on my changes. I tested them using the
`check-llvm` target; no unexpected failures occurred after I made the
change to `amdgpu/llc-pipeline.ll`.
2025-08-12 19:40:02 +09:00
Pierre van Houtryve
ed87f0afba
[AMDGPU] Visit all PHIs in each call to optimizeLiveType (#147522)
Make the Visited set a local variable, otherwise we can reject a PHI
(those that do not have a zeroinitializer constant) but mark it as
visited,
and the rest of the function thinks the PHI is ok when it isn't.
This is a bit crude but it's the only fix that consistently worked in my
testing.

Fixes SWDEV-541767
2025-07-10 09:29:48 +02:00
Ramkumar Ramachandra
b40e4ceaa6
[ValueTracking] Make Depth last default arg (NFC) (#142384)
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
2025-06-03 17:12:24 +01:00
Kazu Hirata
510c8a23e6
[llvm] Use llvm::find_if (NFC) (#139654) 2025-05-12 22:58:30 -07:00
Jay Foad
886f1199f0
[AMDGPU] Use variadic isa<>. NFC. (#137016) 2025-04-24 08:19:09 +01:00
Jeffrey Byrnes
bf12954715
[AMDGPU] Whitelist all intrinsics (#130150)
For code maintainability -- this may result in cases where we are
applying the optimization where it is not profitable, but those are
likely to be rare.
2025-03-06 15:39:11 -08:00
choikwa
45759fe5b4
[AMDGPU] Filter candidates of LiveRegOptimizer for profitable cases (#124624)
It is known that for vector whose element fits in i16 will be split and
scalarized in SelectionDag's type legalizer
(see SIISelLowering::getPreferredVectorAction).

LRO attempts to undo the scalarizing of vectors across basic block
boundary and shoehorn Values in VGPRs. LRO is beneficial for operations
that natively work on illegal vector types to prevent flip-flopping
between unpacked and packed. If we know that operations on vector will
be split and scalarized, then we don't want to shoehorn them back to
packed VGPR.

Operations that we know to work natively on illegal vector types usually
come in the form of intrinsics (MFMA, DOT8), buffer store, shuffle, phi
nodes to name a few.
2025-03-05 18:44:48 -05:00
Kazu Hirata
66c31f5d02
[AMDGPU] Avoid repeated hash lookups (NFC) (#126401)
This patch just cleans up the "if" condition.  Further cleanups are
left to subsequent patches.
2025-02-08 23:17:06 -08:00
Shilei Tian
f15da5fb78
[AMDGPU] Fix an invalid cast in AMDGPULateCodeGenPrepare::visitLoadInst (#122494)
Fixes: SWDEV-507695
2025-01-12 23:40:25 -05:00
Jay Foad
f9f7c42ca6
[AMDGPU] Refine AMDGPULateCodeGenPrepare class. NFC. (#118792)
Use references instead of pointers for most state and initialize it all
in the constructor, and similarly for the LiveRegOptimizer class.
2024-12-05 14:05:51 +00:00
Jay Foad
3923e0451a
[AMDGPU] Preserve all analyses if nothing changed (#117994) 2024-11-28 14:33:05 +00:00
Kazu Hirata
be187369a0
[AMDGPU] Remove unused includes (NFC) (#116154)
Identified with misc-include-cleaner.
2024-11-13 21:10:03 -08:00
Kazu Hirata
0cb80c4f00
[AMDGPU] Avoid repeated hash lookups (NFC) (#113409) 2024-10-22 23:02:34 -07:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Kazu Hirata
d7db094340
[AMDGPU] Avoid repeated hash lookups (NFC) (#109506) 2024-09-21 00:02:19 -07:00
Matt Arsenault
05b75e006b
AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (#102806) 2024-08-12 15:09:12 +04:00
Jeffrey Byrnes
03936534b5 [AMDGPU] Protect against null entries in ValMap
Change-Id: Icbda7c3fecf38679d06006986e5e17cb1f1b8749
2024-07-22 16:50:54 -07:00
Jeffrey Byrnes
6e68b75e66 [AMDGPU] Reland: Do not use original PHIs in coercion chains
Change-Id: I579b5c69a85997f168ed35354b326524b6f84ef7
2024-07-19 09:02:28 -07:00
Jay Foad
1612e4a351 Revert "[AMDGPU] Do not use original PHIs in coercion chains (#98063)"
This reverts commit dc8ea046a516c3bdd0ece306f406c9ea833d4dac.

It generated broken IR as described here:
https://github.com/llvm/llvm-project/pull/98063#issuecomment-2225259451
2024-07-15 15:15:29 +01:00
Jeffrey Byrnes
dc8ea046a5
[AMDGPU] Do not use original PHIs in coercion chains (#98063)
It's possible that we are unable to coerce all the incoming values of a
PHINode (A). Thus, we are unable to coerce the PHINode. In this
situation, we previously would add the PHINode back to the ValMap. This
would cause a problem is PhiNode (B) was a user of A. In this scenario,
if B has been coerced, we would hit an assert regarding the incompatible
type between the PHINode and its incoming value.

Deleting non-coerced PHINodes from the map, and propagating the removal
to users, resolves the issue.
2024-07-10 11:32:45 -07:00
Jeffrey Byrnes
5da7179cb3 [AMDGPU] Reland: Add IR LiveReg type-based optimization 2024-07-03 09:26:19 -07:00
Vitaly Buka
3e53c97d33
Revert "[AMDGPU] Add IR LiveReg type-based optimization" (#97138)
Part of #66838.

https://lab.llvm.org/buildbot/#/builders/52/builds/404
https://lab.llvm.org/buildbot/#/builders/55/builds/358
https://lab.llvm.org/buildbot/#/builders/164/builds/518

This reverts commit ded956440739ae326a99cbaef18ce4362e972679.
2024-06-28 23:18:26 -07:00
Jeffrey Byrnes
ded9564407 [AMDGPU] Add IR LiveReg type-based optimization
Change-Id: Ia0d11b79b8302e79247fe193ccabc0dad2d359a0
2024-06-28 15:01:39 -07:00
Jay Foad
89226ecbb9
[AMDGPU] Do not widen scalar loads on GFX12 (#78724)
GFX12 has subword scalar loads so there is no need to do this.
2024-01-19 15:30:07 +00:00
Jay Foad
4a77414660
[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633) 2024-01-17 10:28:03 +00:00
Matt Arsenault
3e16167c14 AMDGPU: Use getTypeStoreSizeInBits 2023-04-29 10:35:06 -04:00
Matt Arsenault
4202ad5d94 AMDGPU: Don't create a pointer bitcast in AMDGPULateCodeGenPrepare 2023-04-29 10:34:21 -04:00
pvanhout
036431e31e [AMDGPU] Use UniformityAnalysis in LateCodeGenPrepare
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D145366
2023-03-06 13:35:57 +01:00
Kazu Hirata
4bef0304e1 [AArch64, AMDGPU] Use make_early_inc_range (NFC) 2021-11-03 09:22:51 -07:00
Nikita Popov
357756ecf6 [OpaquePtr] Remove uses of CreateConstGEP1_64() without element type
Remove uses of to-be-deprecated API.
2021-07-17 16:43:20 +02:00
Matt Arsenault
a15ed701ab AMDGPU: Fix assert on constant load from addrspacecasted pointer
This was trying to create a bitcast between different address spaces.
2021-05-11 20:12:20 -04:00
Nikita Popov
46354bac76 [OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC)
Explicitly pass loaded type when creating loads, in preparation
for the deprecation of these APIs.

There are still a couple of uses left.
2021-03-11 14:40:57 +01:00
dfukalov
6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Michael Liao
46c3d5cb05 [amdgpu] Add the late codegen preparation pass.
Summary:
- Teach that pass to widen naturally aligned but not DWORD aligned
  sub-DWORD loads.

Reviewers: rampitec, arsenm

Subscribers:

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80364
2020-10-27 14:07:59 -04:00