1744 Commits

Author SHA1 Message Date
Diana Picus
24405f070f
[AMDGPU] Add intrinsic exposing s_alloc_vgpr (#163951)
Make it possible to use `s_alloc_vgpr` at the IR level. This is a huge
footgun and use for anything other than compiler internal purposes is
heavily discouraged. The calling code must make sure that it does not
allocate fewer VGPRs than necessary - the intrinsic is NOT a request to
the backend to limit the number of VGPRs it uses (in essence it's not so
different from what we do with the dynamic VGPR flags of the
`amdgcn.cs.chain` intrinsic, it just makes it possible to use this
functionality in other scenarios).
2026-02-10 09:28:31 +01:00
Pierre van Houtryve
b79ba02479
[AMDGPU][GFX12.5] Reimplement monitor load as an atomic operation (#177343)
Load monitor operations make more sense as atomic operations, as
non-atomic operations cannot be used for inter-thread communication w/o
additional synchronization.
The previous built-in made it work because one could just override the
CPol bits, but that bypasses the memory model and forces the user to learn
about ISA bits encoding.

Making load monitor an atomic operation has a couple of advantages.
First, the memory model foundation for it is stronger. We just lean on the
existing rules for atomic operations. Second, the CPol bits are abstracted away
from the user, which avoids leaking ISA details into the API.

This patch also adds supporting memory model and intrinsics
documentation to AMDGPUUsage.

Solves SWDEV-516398.
2026-02-09 09:57:27 +01:00
paperchalice
c53acf0443
[SelectionDAGBuilder] Remove NoNaNsFPMath uses (#169904)
Replaced by checking fast-math flags or value tracking results.
2026-02-09 09:48:07 +08:00
Nicolai Hähnle
3e1e86ef1f
[AMDGPU] Return two MMOs for load-to-lds and store-from-lds intrinsics (#175845)
Accurately represent both the load and the store part of those intrinsics.

The test changes seem to be mostly fairly insignificant changes caused
by subtly different scheduler behavior.
2026-02-04 12:29:49 -08:00
Diana Picus
9022f47ca4
[AMDGPU] Implement llvm.sponentry (#176357)
In some of our use cases, the GPU runtime stores some data at the top of
the stack. It figures out where it's safe to store it by using the PAL
metadata generated by the backend, which includes the total stack size.
However, the metadata does not include the space reserved at the bottom
of the stack for the trap handler when CWSR is enabled in dynamic VGPR
mode. This space is reserved dynamically based on whether or not the
code is running on the compute queue. Therefore, the runtime needs a way
to take that into account.

Add support for `llvm.sponentry`, which should return the base of the
stack,
skipping over any reserved areas. This allows us to keep this
computation in
one place rather than duplicate it between the backend and the runtime.

The implementation for functions that set up their own stack uses a
pseudo
that is expanded to the same code sequence as that used in the prolog to
set up the stack in the first place.

In callable functions, we generate a fixed stack object and use that
instead,
similar to the Arm/AArch64 approach. This wastes some stack space but
that's
not a problem for now because we're not planning to use this in callable
functions yet.
2026-02-03 15:02:07 +01:00
Nicolai Hähnle
6f0b873f1c
[CodeGen] Refactor targets to override the new getTgtMemIntrinsic overload (NFC) (#175844)
This is a fairly mechanical change. Instead of returning true/false,
we either keep the Infos vector empty or push one entry.
2026-02-02 17:40:02 -08:00
Aaditya
4ded7e0733
[AMDGPU] Add wave reduce intrinsics for double types - 2 (#170812)
Supported Ops: `add`, `sub`
2026-01-30 18:13:25 +05:30
Aaditya
4238693e09
[AMDGPU] Add wave reduce intrinsics for double types - 1 (#170811)
Supported Ops: `min`, `max`
2026-01-30 10:12:44 +01:00
Carl Ritson
447f1e43bb
[AMDGPU] Implement llvm.fptosi.sat and llvm.fptoui.sat (#174726)
Certain graphics APIs explicitly want the semantics of saturated
conversions, particularly w.r.t. edge cases like NaN. The underlying
hardware instructions (v_cvt_*) provide the expected behaviour so
llvm.fptosi.sat and llvm.fptoui.sat can be implemented directly.

Limitations:
- conversion to i64 is not handled (default expansion is used)
- v_cvt_u16_f16 and v_cvt_i16_f16  are not utilized (future work)
- scalar float is untested/unoptimized (future work)
2026-01-30 17:07:40 +09:00
Kewen Meng
120b482375
Revert "[AMDGPU] Replace AMDGPUISD::FFBH_I32 with ISD::CTLS" (#178837)
Revert to unblock buildbot:
https://lab.llvm.org/buildbot/#/builders/206/builds/12769
2026-01-29 21:19:15 -08:00
Dmitry Sidorov
65925b0405
[AMDGPU] Replace AMDGPUISD::FFBH_I32 with ISD::CTLS (#178420)
Per CDNA4 ISA:
V_FFBH_I32
Count the number of leading bits that are the same as the sign bit of a
vector input and store the result into a vector register. Store -1 if
all input bits are the same.

which matches CTLS semantics.

Addresses: https://github.com/llvm/llvm-project/issues/177635
2026-01-30 01:36:28 +01:00
Carl Ritson
12c13e0009
[AMDGPU][GFX1250] Implement offset handling in s.buffer.load (#178389)
Divergent path of s.buffer.load must handle 32b offset extension
behaviour on GFX1250.
Tests in llvm.amdgcn.s.buffer.load.ll are rewritten to avoid using
export instructions not available on GFX1250.
2026-01-29 18:00:48 +09:00
macurtis-amd
5d018e93fe
AMDGPU: Perform zero/any extend combine into permute (#177370)
Increases opportunities to generate permutes.
Motivated sub-optimal code generation of a CK kernel.
2026-01-28 10:47:22 -06:00
Mariusz Sikora
3c0f5045e1
[AMDGPU] Add FeatureGFX13 and SMEM encoding for gfx13 (#177567)
For now list of features is based on gfx12 and gfx1250

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
2026-01-26 14:16:36 +01:00
Shilei Tian
786a20710d
[NFCI][AMDGPU] Use GET_SUBTARGETINFO_MACRO in GCNSubtarget.h and R600Subtarget.h (#177402)
We can finally get rid of the manually defined boolean variables, like
other targets. Even though most of them are now defined by macros, we
still need to add the entries.
2026-01-25 09:38:42 -05:00
Matt Arsenault
98b55bcdec
AMDGPU: Move f16 legality configuration to SITargetLowering (#177629)
f16 is never legal for R600 so this should not be in the common
base class.
2026-01-23 18:36:26 +00:00
Sam Elliott
7184229fea
[NFC][MI] Tidy Up RegState enum use (2/2) (#177090)
This Change makes `RegState` into an enum class, with bitwise operators.
It also:
- Updates declarations of flag variables/arguments/returns from
`unsigned` to `RegState`.
- Updates empty RegState initializers from 0 to `{}`.

If this is causing problems in downstream code:
- Adopt the `RegState getXXXRegState(bool)` functions instead of using a
ternary operator such as `bool ? RegState::XXX : 0`.
- Adopt the `bool hasRegState(RegState, RegState)` function instead of
using a bitwise check of the flags.
2026-01-23 00:19:03 -08:00
Matt Arsenault
3c40eadfca
AMDGPU: Avoid introducing illegal fminnum_ieee/fmaxnum_ieee (#177418)
Avoid introducing fminnum_ieee/fmaxnum_ieee on f16 if f16
is not legal. This avoids regressing minimum/maximum cases
in a future commit.
2026-01-22 21:48:51 +01:00
Jameson Nash
d10b2b566a
[NFCI] replace getValueType with new getGlobalSize query (#177186)
Returns uint64_t to simplify callers. The goal is eventually replace
getValueType with this query, which should return the known minimum
reference-able size, as provided (instead of a Type) during create.
Additionally the common isSized query would be replaced with an
isExactKnownSize query to test if that size is an exact definition.
2026-01-22 13:55:53 -05:00
Matt Arsenault
056e5a32c8
AMDGPU: Change ABI of 16-bit scalar values for gfx6/gfx7 (#175795)
Keep bf16/f16 values encoded as the low half of a 32-bit register,
instead of promoting to float. This avoids unwanted FP effects
from the fpext/fptrunc which should not be implied by just
passing an argument. This also fixes ABI divergence between
SelectionDAG and GlobalISel.

I've wanted to make this change for ages, and failed the last
few times. The main complication was the hack to return
shader integer types in SGPRs, which now needs to inspect
the underlying IR type.
2026-01-22 18:34:06 +00:00
Shilei Tian
4b1cfc5d7c
[NFCI][AMDGPU] Final touch before moving to GET_SUBTARGETINFO_MACRO (#177401) 2026-01-22 17:33:17 +00:00
Matt Arsenault
a97f5ec95f
AMDGPU: Change ABI of 16-bit element vectors on gfx6/7 (#175781)
Fix ABI on old subtargets so match new subtargets, packing
16-bit element subvectors into 32-bit registers. Previously
this would be scalarized and promoted to i32/float.

Note this only changes the vector cases. Scalar i16/half are
still promoted to i32/float for now. I've unsuccessfully tried
to make that switch in the past, so leave that for later.

This will help with removal of softPromoteHalfType.
2026-01-22 17:24:29 +01:00
Shilei Tian
02d34a76f7
[NFCI][AMDGPU] Remove more redundant code from GCNSubtarget.h (#177297)
We are getting pretty close to use `GET_SUBTARGETINFO_MACRO` in the
header with this cleanup.
2026-01-22 09:07:15 -05:00
Shilei Tian
1843a7fe9f
[NFCI][AMDGPU] Use X-macro to reduce boilerplate in GCNSubtarget.h (#176844)
`GCNSubtarget.h` contained a large amount of repetitive code following
the pattern `bool HasXXX = false;` for member declarations and `bool
hasXXX() const { return HasXXX; }` for getters. This boilerplate made
the file unnecessarily long and harder to maintain.

This patch introduces an X-macro pattern `GCN_SUBTARGET_HAS_FEATURE`
that consolidates 135 simple subtarget features into a single list. The
macro is expanded twice: once in the protected section to generate
member variable declarations, and once in the public section to generate
the corresponding getter methods. This reduces the file by approximately
600 lines while preserving the exact same API and functionality.
Features with complex getter logic or inconsistent naming conventions
are left as manual implementations for future improvement.

Ideally, these could be generated by TableGen using
`GET_SUBTARGETINFO_MACRO`, similar to the X86 backend. However,
`AMDGPU.td` has several issues that prevent direct adoption: duplicate
field names (e.g., `DumpCode` is set by both `FeatureDumpCode` and
`FeatureDumpCodeLower`), and inconsistent naming conventions where many
features don't have the `Has` prefix (e.g., `FlatAddressSpace`,
`GFX10Insts`, `FP64`). Fixing these issues would require renaming fields
in `AMDGPU.td` and updating all references, which is left for future
work.
2026-01-21 15:29:09 -05:00
Matt Arsenault
9bd0db7ad5
AMDGPU: Handle FP in integer in argument lowering (#175835)
This avoids an assertion when softPromoteHalfType is
enabled.
2026-01-20 20:20:52 +00:00
Brox Chen
dd83ead9a5
[AMDGPU][True16] extractEltcheap check 16bit in true16 mode (#171762) 2026-01-20 09:45:05 -05:00
Frederik Harwath
5fec9fb3cf
[AMDGPU] Enable ISD::{FSIN,FCOS} custom lowering to work on v2f16 (#176382)
Currently ISD::FSIN and ISD::FCOS of type MVT::v2f16 are legalized by
first expanding and then using a custom lowering on the resulting f16
instructions. This ordering prevents using packed math variants of the
instructions introduced by the legalization (e.g. the multiplication) and
makes it difficult to deal with the resulting IR in peephole
optimizations (e.g. si-peephole-sdwa).

Change the legalization action for ISD::FSIN and ISD::FCOS of type
MTF::v2f16 to Custom and change the custom trig lowering to deal
with vectors.
2026-01-20 07:35:54 +01:00
Hongyu Chen
007f1af30e
[AMDGPU] Use APInt in performSetCCCombine (#176564)
Fixes #176559.
2026-01-20 09:14:25 +08:00
Akshay Deodhar
3860147a7f
[NFC][TargetLowering] Make shouldExpandAtomicRMWInIR and shouldExpandAtomicCmpXchgInIR take a const Instruction pointer (#176073)
Splits out change from https://github.com/llvm/llvm-project/pull/176015

Changes shouldExpandAtomicRMWInIR to take a constant argument: This is
to allow some other TargetLowering constant-argument functions to call
it. This change touches several backends. An alternative solution
exists, but to me, this seems the "right" way.
2026-01-15 14:22:57 -08:00
Frederik Harwath
4e00719777
[AMDGPU] Remove unnecessary AddPromotedToType use from SIIselLowering (NFC) (#175994) 2026-01-14 19:38:25 +01:00
Matt Arsenault
2e0e4f6cb3
AMDGPU: Directly use v2bf16 as register type for bf16 vectors. (#175761)
Previously we were casting v2bf16 to i32, unlike the f16 case. Simplify
this by using the natural vector type. This is probably a leftover from
before v2bf16 was treated as legal. This is preparation for fixing a
miscompile in globalisel.
2026-01-13 17:48:38 +01:00
Shilei Tian
5a63367b15
Reapply "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674) (#174697)
This reverts commit 0b2f3cfb72a76fa90f3ec2a234caabe0d0712590.
2026-01-07 06:12:19 +00:00
dyung
0b2f3cfb72
Revert "[AMDGPU] Rework the clamp support for WMMA instructions" (#174674)
Reverts llvm/llvm-project#174310

This change is causing 2 cross-project-test failures on
https://lab.llvm.org/buildbot/#/builders/174/builds/29695
2026-01-07 01:18:23 +00:00
Shilei Tian
ccca3b8c67
[AMDGPU] Rework the clamp support for WMMA instructions (#174310)
Fixes #166989.
2026-01-06 15:46:40 -05:00
saxlungs
c262893f4b
Reland "[AMDGPU] Add new llvm.amdgcn.wave.shuffle intrinsic (#167372)" (#174614)
This change adds a new intrinsic for AMDGPU that implements a wave
shuffle, allowing arbitrary swizzling between lanes using an index. In
the initial version of this commit, there was an issue in one of the
tests added that returned a signal, causing testing to fail when
combined with another recent change to 'not'.

For context on the initial commit see #167372

---------

Signed-off-by: Domenic Nutile <domenic.nutile@gmail.com>
Co-authored-by: Jay Foad <jay.foad@gmail.com>
2026-01-06 15:02:08 -05:00
Joe Nash
4bca00d56b
Revert "[AMDGPU] Add new llvm.amdgcn.wave.shuffle intrinsic" (#174501)
Reverts llvm/llvm-project#167372
2026-01-05 17:52:28 -05:00
saxlungs
b9fbc19017
[AMDGPU] Add new llvm.amdgcn.wave.shuffle intrinsic (#167372)
This intrinsic will be useful for implementing the
OpGroupNonUniformShuffle operation in the SPIR-V reference

---------

Signed-off-by: Domenic Nutile <domenic.nutile@gmail.com>
Co-authored-by: Jay Foad <jay.foad@gmail.com>
2026-01-05 17:15:58 -05:00
Matt Arsenault
9ad39dd116
AMDGPU: Avoid crashing on statepoint-like pseudoinstructions (#170657)
At the moment the MIR tests are somewhat redundant. The waitcnt
one is needed to ensure we actually have a load, given we are
currently just emitting an error on ExternalSymbol. The asm printer
one is more redundant for the moment, since it's stressed by the IR
test. However I am planning to change the error path for the IR test,
so it will soon not be redundant.
2025-12-29 19:08:08 +01:00
Islam Imad
7ceecfad40
[CodeGen] Fix EVT::changeVectorElementType assertion on simple-to-extended fallback (#173413)
Fixes #171608
2025-12-28 18:51:18 +00:00
Jay Foad
35c2dbd481
[AMDGPU] Remove trivially true predicates from GCNSubtarget. NFC. (#172830) 2025-12-18 11:05:34 +00:00
Matt Arsenault
68aea8e202
AMDGPU: Avoid introducing unnecessary fabs in fast fdiv lowering (#172553)
If the sign bit of the denominator is known 0, do not emit the fabs.
Also, extend this to handle min/max with fabs inputs.

I originally tried to do this as the general combine on fabs, but
it proved to be too much trouble at this time. This is mostly
complexity introduced by expanding the various min/maxes into
canonicalizes, and then not being able to assume the sign bit
of canonicalize (fabs x) without nnan.

This defends against future code size regressions in the atan2 and
atan2pi library functions.
2025-12-17 00:22:12 +01:00
Juan Manuel Martinez Caamaño
c13bf9eb26
Reapply "[AMDGPU][SDAG] Add missing cases for SI_INDIRECT_SRC/DST (#170323) (#171838)
A buildbot failed for the original patch.

https://github.com/llvm/llvm-project/pull/171835 addresses the issue
raised by the buildbot.
After the fix is merged, the original patch is reapplied without any
change.
2025-12-15 09:05:00 +01:00
Matt Arsenault
2af693bbec
AMDGPU: Fix selection failure on bf16 inverse sqrt (#172044)
On !hasBF16TransInsts targets, an illegal rsq would form
and fail to select.
2025-12-12 18:10:08 +01:00
Juan Manuel Martinez Caamaño
c02978867e
Revert "[AMDGPU][SDAG] Add missing cases for SI_INDIRECT_SRC/DST (#170323) (#171787)
```
Step 7 (test-check-all) failure: Test just built components: check-all completed (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/insert_vector_dynelt.ll' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 2
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn -mcpu=fiji < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -enable-var-scope -check-prefixes=GCN /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
# executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn -mcpu=fiji
# executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -enable-var-scope -check-prefixes=GCN /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
# RUN: at line 3
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -O0 -mtriple=amdgcn -mcpu=fiji < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck --check-prefixes=GCN-O0 /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll
# executed command: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -O0 -mtriple=amdgcn -mcpu=fiji
# .---command stderr------------
# |
# | # After Instruction Selection
# | # Machine code for function insert_dyn_i32_6: IsSSA, TracksLiveness
# | Function Live Ins: $sgpr16 in %8, $sgpr17 in %9, $sgpr18 in %10, $sgpr19 in %11, $sgpr20 in %12, $sgpr21 in %13, $vgpr0 in %14, $vgpr1 in %15
# |
# | bb.0 (%ir-block.0):
# |   successors: %bb.1(0x80000000); %bb.1(100.00%)
# |   liveins: $sgpr16, $sgpr17, $sgpr18, $sgpr19, $sgpr20, $sgpr21, $vgpr0, $vgpr1
# |   %15:vgpr_32 = COPY $vgpr1
# |   %14:vgpr_32 = COPY $vgpr0
# |   %13:sgpr_32 = COPY $sgpr21
# |   %12:sgpr_32 = COPY $sgpr20
# |   %11:sgpr_32 = COPY $sgpr19
# |   %10:sgpr_32 = COPY $sgpr18
# |   %9:sgpr_32 = COPY $sgpr17
# |   %8:sgpr_32 = COPY $sgpr16
# |   %17:sgpr_192 = REG_SEQUENCE %8:sgpr_32, %subreg.sub0, %9:sgpr_32, %subreg.sub1, %10:sgpr_32, %subreg.sub2, %11:sgpr_32, %subreg.sub3, %12:sgpr_32, %subreg.sub4, %13:sgpr_32, %subreg.sub5
# |   %16:sgpr_192 = COPY %17:sgpr_192
# |   %19:vreg_192 = COPY %17:sgpr_192
# |   %28:sreg_64_xexec = IMPLICIT_DEF
# |   %27:sreg_64_xexec = S_MOV_B64 $exec
# |
# | bb.1:
# | ; predecessors: %bb.1, %bb.0
# |   successors: %bb.1(0x40000000), %bb.3(0x40000000); %bb.1(50.00%), %bb.3(50.00%)
# |
# |   %26:vreg_192 = PHI %19:vreg_192, %bb.0, %18:vreg_192, %bb.1
# |   %29:sreg_64 = PHI %28:sreg_64_xexec, %bb.0, %30:sreg_64, %bb.1
# |   %31:sreg_32_xm0 = V_READFIRSTLANE_B32 %14:vgpr_32, implicit $exec
# |   %32:sreg_64 = V_CMP_EQ_U32_e64 %31:sreg_32_xm0, %14:vgpr_32, implicit $exec
# |   %30:sreg_64 = S_AND_SAVEEXEC_B64 killed %32:sreg_64, implicit-def $exec, implicit-def $scc, implicit $exec
# |   $m0 = COPY killed %31:sreg_32_xm0
# |   %18:vreg_192 = V_INDIRECT_REG_WRITE_MOVREL_B32_V8 %26:vreg_192(tied-def 0), %15:vgpr_32, 3, implicit $m0, implicit $exec
# |   $exec = S_XOR_B64_term $exec, %30:sreg_64, implicit-def $scc
# |   S_CBRANCH_EXECNZ %bb.1, implicit $exec
# |
# | bb.3:
```

This reverts commit 15df9e701f1f1194a25e6123612cc735ad392ae4.
2025-12-11 10:08:20 +00:00
Juan Manuel Martinez Caamaño
15df9e701f
[AMDGPU][SDAG] Add missing cases for SI_INDIRECT_SRC/DST (#170323)
Before this patch, `insertelement/extractelement` with dynamic indices
would
fail to select with `-O0` for vector 32-bit element types with sizes 3,
5, 6 and 7,
which did not map to a `SI_INDIRECT_SRC/DST` pattern.

Other "weird" sizes bigger than 8 (like 13) are properly handled
already.

To solve this issue we add the missing patterns for the problematic
sizes.

Solves SWDEV-568862
2025-12-11 09:17:43 +01:00
Jay Foad
6ae0b9f586
[AMDGPU] Implement codegen for GFX11+ V_CVT_PK_[IU]16_F32 (#168719) 2025-12-10 22:26:59 +00:00
Mirko Brkušanin
5759a3a779
[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501) 2025-12-10 09:45:13 +01:00
anjenner
27651133e2
AMDGPU: Drop and upgrade llvm.amdgcn.atomic.csub/cond.sub to atomicrmw (#105553)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2025-12-09 23:13:33 +00:00
Shilei Tian
3ccd67295b
[AMDGPU] Fix a crash when a bool variable is used in inline asm (#171004)
Fixes SWDEV-570184.
2025-12-08 14:44:21 -05:00
Dark Steve
cc19f420b9
[AMDGPU][NPM] Port AMDGPUArgumentUsageInfo to NPM (#170886)
Port AMDGPUArgumentUsageInfo analysis to the NPM to fix suboptimal code
generation when NPM is enabled by default.

Previously, DAG.getPass() returns nullptr when using NPM, causing the
argument usage info to be unavailable during ISel. This resulted in
fallback to FixedABIFunctionInfo which assumes all implicit arguments
are needed, generating unnecessary register setup code for entry
functions.

Fixes LLVM::CodeGen/AMDGPU/cc-entry.ll

Changes:
- Split AMDGPUArgumentUsageInfo into a data class and NPM analysis
wrapper
- Update SIISelLowering to use DAG.getMFAM() for NPM path
- Add RequireAnalysisPass in addPreISel() to ensure analysis
availability

This follows the same pattern used for PhysicalRegisterUsageInfo.
2025-12-08 20:38:00 +05:30