100 Commits

Author SHA1 Message Date
Shilei Tian
1f6e13a161 Revert "[AMDGPU][Attributor] Infer inreg attribute in AMDGPUAttributor (#146720)"
This reverts commit 84ab301554f8b8b16b94263a57b091b07e9204f2 because it breaks
several AMDGPU test bots.
2025-08-18 22:59:52 -04:00
Shilei Tian
84ab301554
[AMDGPU][Attributor] Infer inreg attribute in AMDGPUAttributor (#146720)
This patch introduces `AAAMDGPUUniformArgument` that can infer `inreg` function
argument attribute. The idea is, for a function argument, if the corresponding
call site arguments are always uniform, we can mark it as `inreg` thus pass it
via SGPR.

In addition, this AA is also able to propagate the inreg attribute if feasible.
2025-08-18 22:01:47 -04:00
Matt Arsenault
e19743bd6c
AMDGPU: Remove unused TargetPassConfig include from attributor (#150892) 2025-07-28 20:58:25 +09:00
Juan Manuel Martinez Caamaño
862b9ea805
[AMDGPU] Remove AAInstanceInfo from the AMDGPUAttributor (#150232)
Related to compile-time issue SWDEV-543240 and functional issue
SWDEV-544256
2025-07-24 17:12:04 +02:00
Kazu Hirata
7c83d66719
[llvm] Remove unused includes (NFC) (#148768)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-07-14 22:19:14 -07:00
Shoreshen
181b014c06
Attributor: Infer noalias.addrspace metadata for memory instructions (#136553)
Add noalias.addrspace metadata for store, load and atomic instruction in
AMDGPU backend.
2025-07-08 09:50:31 +08:00
Matt Arsenault
2dcf436340
AMDGPU: Remove legacy pass manager version of AMDGPUAttributor (#145262) 2025-06-23 16:49:46 +09:00
Kazu Hirata
92cebab210
[IPO] Teach AbstractAttribute::getName to return StringRef (NFC) (#141313)
This patch addresses clang-tidy's readability-const-return-type by
dropping const from the return type while switching to StringRef at
the same time because these functions just return string constants.
2025-05-23 23:58:49 -07:00
Kazu Hirata
d9a1f8a3a8 [AMDGPU] Fix a warning
This patch fixes:

  llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:1139:17: error: unused
  variable 'Func' [-Werror,-Wunused-variable]
2025-05-17 00:27:11 -07:00
Shilei Tian
578741b5e8
[AMDGPU][Attributor] Rework update of AAAMDWavesPerEU (#123995)
Currently, we use `AAAMDWavesPerEU` to iteratively update values based
on attributes from the associated function, potentially propagating
user-annotated values, along with `AAAMDFlatWorkGroupSize`. Similarly,
we have `AAAMDFlatWorkGroupSize`. However, since the value calculated
through the flat workgroup size always dominates the user annotation
(i.e., the attribute), running `AAAMDWavesPerEU` iteratively is
unnecessary if no user-annotated value exists.

This PR completely rewrites how the `amdgpu-waves-per-eu` attribute is
handled in `AMDGPUAttributor`. The key changes are as follows:

- `AAAMDFlatWorkGroupSize` remains unchanged.
- `AAAMDWavesPerEU` now only propagates user-annotated values.
- A new function is added to check and update `amdgpu-waves-per-eu`
based on the following rules:
- No waves per eu, no flat workgroup size: Assume a flat workgroup size
of `1,1024` and compute waves per eu based on this.
- No waves per eu, flat workgroup size exists: Use the provided flat
workgroup size to compute waves-per-eu.
- Waves per eu exists, no flat workgroup size: This is a tricky case. In
this PR, we assume a flat workgroup size of `1,1024`, but this can be
adjusted if a different approach is preferred. Alternatively, we could
directly use the user-annotated value.
- Both waves per eu and flat workgroup size exist: If there’s a
conflict, the value derived from the flat workgroup size takes
precedence over waves per eu.

This PR also updates the logic for merging two waves per eu pairs. The
current implementation, which uses `clampStateAndIndicateChange` to
compute a union, might not be ideal. If we think from ensure proper
resource allocation perspective, for instance, if one pair specifies a
minimum of 2 waves per eu, and another specifies a minimum of 4, we
should guarantee that 4 waves per eu can be supported, as failing to do
so could result in excessive resource allocation per wave. A similar
principle applies to the upper bound. Thus, the PR uses the following
approach for merging two pairs, `lo_a,up_a` and `lo_b,up_b`: `max(lo_a,
lo_b), max(up_a, up_b)`. This ensures that resource allocation adheres
to the stricter constraints from both inputs.

Fix #123092.
2025-05-17 01:01:09 -04:00
Austin Kerbow
2c9a46cce3
[AMDGPU] Move kernarg preload logic to separate pass (#130434)
Moves kernarg preload logic to its own module pass. Cloned function
declarations are removed when preloading hidden arguments. The inreg
attribute is now added in this pass instead of AMDGPUAttributor. The
rest of the logic is copied from AMDGPULowerKernelArguments which now
only check whether an arguments is marked inreg to avoid replacing
direct uses of preloaded arguments. This change requires test updates to
remove inreg from lit tests with kernels that don't actually want
preloading.
2025-05-11 21:18:11 -07:00
Shilei Tian
d6dbe7799e
[AMDGPU][Attributor] Add ThinOrFullLTOPhase as an argument (#123994) 2025-05-02 11:33:56 -04:00
Lucas Ramirez
e377dc4d38
[AMDGPU] Max. WG size-induced occupancy limits max. waves/EU (#137807)
The default maximum waves/EU returned by the family of
`AMDGPUSubtarget::getWavesPerEU` is currently the maximum number of
waves/EU supported by the subtarget (only a valid occupancy range in
"amdgpu-waves-per-eu" may lower that maximum). This ignores maximum
achievable occupancy imposed by flat workgroup size and LDS usage,
resulting in situations where `AMDGPUSubtarget::getWavesPerEU` produces
a maximum higher than the one from
`AMDGPUSubtarget::getOccupancyWithWorkGroupSizes`.

This limits the waves/EU range's maximum to the maximum achievable
occupancy derived from flat workgroup sizes and LDS usage. This only has
an impact on functions which restrict flat workgroup size with
"amdgpu-flat-work-group-size", since the default range of flat workgroup
sizes achieves the maximum number of waves/EU supported by the
subtarget.

Improvements to the handling of "amdgpu-waves-per-eu" are left for a
follow up PR (e.g., I think the attribute should be able to lower the
full range of waves/EU produced by these methods).
2025-05-01 13:22:23 +02:00
Rahul Joshi
a3754ade63
[NFC][LLVM][AMDGPU] Cleanup pass initialization for AMDGPU (#134410)
- Remove calls to pass initialization from pass constructors.
- https://github.com/llvm/llvm-project/issues/111767
2025-04-07 17:27:50 -07:00
Matt Arsenault
428e3a27c3
AMDGPU: Fix attributor not handling all trap intrinsics (#131758) 2025-03-19 10:17:28 +07:00
Matt Arsenault
a216358ce7
AMDGPU: Replace amdgpu-no-agpr with amdgpu-agpr-alloc (#129893)
This performs the minimal replacment of amdgpu-no-agpr to
amdgpu-agpr-alloc=0. Most of the test diffs are due to the new
attribute sorting later alphabetically.

We could do better by trying to perform range merging in the attributor,
and trying to pick non-0 values.
2025-03-06 09:17:51 +07:00
Matt Arsenault
44201679c6
AMDGPU: Fix mishandling of search for constantexpr addrspacecasts (#120346) 2024-12-20 07:37:19 +07:00
Shilei Tian
f4037277bb
[AMDGPU][Attributor] Make AAAMDWavesPerEU honor existing attribute (#114438) 2024-12-11 16:50:06 -05:00
Shilei Tian
7dbd6cd294
[AMDGPU][Attributor] Make AAAMDFlatWorkGroupSize honor existing attribute (#114357)
If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwise, it uses the default range as a starting
point. We will no longer manipulate the known range, which can cause issues
because the known range is a "throttle" to the assumed range such that the
assumed range can't get widened properly in `updateImpl` if the known range is
not set properly for whatever reasons. Another benefit of not touching the known
range is, if we indicate pessimistic state, it also invalidates the AA such that
`manifest` will not be called. Since we honor the attribute, we don't want and
will not add any half-baked attribute added to a function.
2024-12-11 16:47:51 -05:00
Shilei Tian
04269ea0e4
[AMDGPU] Re-enable closed-world assumption as an opt-in feature (#115371)
Although the ABI (if one exists) doesn’t explicitly prohibit
cross-code-object function calls—particularly since our loader can
handle them—such calls are not actually allowed in any of the officially
supported programming models. However, this limitation has some nuances.
For instance, the loader can handle cross-code-object global variables,
which complicates the situation further.

Given this complexity, assuming a closed-world model at link time isn’t
always safe. To address this, this PR introduces an option that enables
this assumption, providing end users the flexibility to enable it for
improved compiler optimizations. However, it is the user’s
responsibility to ensure they do not violate this assumption.
2024-12-10 15:57:41 -05:00
Jun Wang
41ed16c3b3
Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)" (#118907)
This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5.

This fixes the test file attributor-flatscratchinit-globalisel.ll.
2024-12-09 16:44:48 -08:00
Matt Arsenault
664a226bf6
AMDGPU: Propagate amdgpu-max-num-workgroups attribute (#113018)
I'm not sure what the interpretation of 0 is supposed to be,
AMDGPUUsage doesn't say.
2024-12-09 09:57:27 -06:00
Philip Reames
1ef9410a96 Revert "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)"
This reverts commit e6aec2c12095cc7debd1a8004c8535eef41f4c36.  Commit breaks "ninja check-llvm" on x86 host.
2024-12-04 15:37:25 -08:00
Jun Wang
e6aec2c120
[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)
The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and
"amdgpu-stack-objects" attributes, which are used to infer whether we
need to initialize flat scratch. This is, however, not precise. Instead,
we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on
kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .
2024-12-04 14:10:15 -08:00
Shilei Tian
9234ae1bbe [NFC] clang-format -i llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 2024-10-31 11:44:15 -04:00
Shilei Tian
9a7519fdb3 Revert "[NFC][AMDGPU][Attributor] Exit earlier if entry CC (#114177)"
This reverts commit 922a0d3dfe2db7a2ef50e8cef4537fa94a7b95bb.
2024-10-30 00:53:43 -04:00
Shilei Tian
922a0d3dfe
[NFC][AMDGPU][Attributor] Exit earlier if entry CC (#114177)
Avoid calling TTI or other stuff unnecessarily
2024-10-30 00:42:44 -04:00
Shilei Tian
3de5dbb111
[AMDGPU][Attributor] Check the validity of a dependent AA before using its value (#114165)
Even though the Attributor framework will invalidate all its dependent
AAs after the current iteration, a dependent AA can still use the worst
state of a depending AA if it doesn't check the state of the depending
AA in current iteration.
2024-10-29 23:43:45 -04:00
Shilei Tian
0446b403b0
[NFC][AMDGPU][Attributor] Only iterate over filtered functions when creating AAs (#108417) 2024-09-12 13:41:15 -04:00
Shilei Tian
ce2e38653f
[Attributor] Add support for atomic operations in AAAddressSpace (#106927) 2024-09-06 12:45:16 -04:00
Shilei Tian
84ed3c29e8
Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106889)
We can't assume closed world even in full LTO post-link stage. It is
only true
if we are building a "GPU executable". However, AMDGPU does support
"dyamic
library". I'm not aware of any approach to tell if it is relocatable
link when
we create the pass. For now let's revert the patch as it is currently
breaking things.
We can re-enable it once we can handle it correctly.
2024-09-01 09:32:08 -04:00
Shilei Tian
d880f5a4c9
[AMDGPU][Attributor] Remove uniformity check in the indirect call specialization callback (#106177)
This patch removes the conservative uniformity check in the indirect
call
specialization callback, as whether the function pointer is uniform
doesn't
matter too much. Instead, we add an argument to control specialization.
2024-08-27 12:27:17 -04:00
Anshil Gandhi
033e225d90
Revert "Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106000)" (#106001)
This reverts commit 4b6c064dd124c70ff163411dff120c6174e0e022.

Add a requirement for an amdgpu target in the test.
2024-08-25 17:23:36 -04:00
Anshil Gandhi
4b6c064dd1
Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106000)
This reverts commit 33f3ebc86e7d3afcb65c551feba5bbc2421b42ed.
2024-08-25 14:56:39 -04:00
Anshil Gandhi
33f3ebc86e
[AMDGPU][LTO] Assume closed world after linking (#105845) 2024-08-25 14:06:29 -04:00
Shilei Tian
1ca9fe6db3 Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952)"
This reverts commit 36467bfe89f231458eafda3edb916c028f1f0619.
2024-08-14 17:16:47 -04:00
Shilei Tian
786c409234
[AMDGPU][Attributor] Add a pass parameter closed-world for AMDGPUAttributor pass (#101760) 2024-08-09 22:12:09 -04:00
Shilei Tian
36467bfe89 Revert "Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952)""
This reverts commit 7a68449a82ab1c1ab005caa72c1d986ca5deca36.

https://lab.llvm.org/buildbot/#/builders/123/builds/3205
2024-08-07 09:22:48 -04:00
Shilei Tian
7a68449a82 Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952)"
This reverts commit 874cd100a076f3b98aaae09f90ef224682501538.
2024-08-06 22:46:32 -04:00
Shilei Tian
a0afcbfb5d
[AMDGPU] Enable AAAddressSpace in AMDGPUAttributor (#101593) 2024-08-06 15:27:18 -04:00
Shilei Tian
874cd100a0 Revert "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952)"
This reverts commit ab819d7cf86932e4a47b5bf6aadea9d714a313a9.
2024-08-02 18:31:21 -04:00
Shilei Tian
ab819d7cf8
[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952) 2024-08-02 17:23:18 -04:00
Shilei Tian
423aec6573
[NFC][AMDGPU] Reformat code for creating AA (#101591) 2024-08-02 08:58:10 -04:00
Jay Foad
74b87b02d2 [AMDGPU] Fix and add namespace closing comments. NFC. 2024-07-16 16:56:31 +01:00
Kazu Hirata
c18bcd0a57
[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072) (#91138)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  38 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-05 13:43:10 -07:00
Matt Arsenault
b6b703b2df
AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948)
SIMachineFunctionInfo has a scan  of the function body for inline asm
which may use AGPRs, or callees in SIMachineFunctionInfo. Move this
into the attributor, so it actually works interprocedurally.
    
Could probably avoid most of the test churn if this bothered to avoid
adding this on subtargets without AGPRs. We should also probably
try to delete the MIR scan in usesAGPRs but it seems to be trickier
to eliminate.
2024-03-21 14:24:06 +05:30
Emma Pilkington
bc82cfb38d
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.

This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
2024-01-21 11:54:47 -05:00
Matt Arsenault
d34a10a47d
AMDGPU: Port AMDGPUAttributor to new pass manager (#71349) 2023-11-07 15:40:40 +09:00
Austin Kerbow
60a227c464 [AMDGPU] Use inreg for hint to preload kernel arguments
This patch is the first in a series that adds support for pre-loading
kernel arguments into SGPRs. The command-line argument
'amdgpu-kernarg-preload-count' is used to specify the number of
arguments sequentially from the first that we should attempt to preload,
the default is 0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156852
2023-09-19 15:13:38 -07:00
Johannes Doerfert
d015018cb7 [AMDGPUAttributor][FIX] No endless recursion for recursive initializers
Fixes: https://github.com/llvm/llvm-project/issues/63956
2023-07-19 10:27:01 -07:00