84 Commits

Author SHA1 Message Date
Matt Arsenault
44201679c6
AMDGPU: Fix mishandling of search for constantexpr addrspacecasts (#120346) 2024-12-20 07:37:19 +07:00
Shilei Tian
f4037277bb
[AMDGPU][Attributor] Make AAAMDWavesPerEU honor existing attribute (#114438) 2024-12-11 16:50:06 -05:00
Shilei Tian
7dbd6cd294
[AMDGPU][Attributor] Make AAAMDFlatWorkGroupSize honor existing attribute (#114357)
If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by
taking its value directly; otherwise, it uses the default range as a starting
point. We will no longer manipulate the known range, which can cause issues
because the known range is a "throttle" to the assumed range such that the
assumed range can't get widened properly in `updateImpl` if the known range is
not set properly for whatever reasons. Another benefit of not touching the known
range is, if we indicate pessimistic state, it also invalidates the AA such that
`manifest` will not be called. Since we honor the attribute, we don't want and
will not add any half-baked attribute added to a function.
2024-12-11 16:47:51 -05:00
Shilei Tian
04269ea0e4
[AMDGPU] Re-enable closed-world assumption as an opt-in feature (#115371)
Although the ABI (if one exists) doesn’t explicitly prohibit
cross-code-object function calls—particularly since our loader can
handle them—such calls are not actually allowed in any of the officially
supported programming models. However, this limitation has some nuances.
For instance, the loader can handle cross-code-object global variables,
which complicates the situation further.

Given this complexity, assuming a closed-world model at link time isn’t
always safe. To address this, this PR introduces an option that enables
this assumption, providing end users the flexibility to enable it for
improved compiler optimizations. However, it is the user’s
responsibility to ensure they do not violate this assumption.
2024-12-10 15:57:41 -05:00
Jun Wang
41ed16c3b3
Reapply "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)" (#118907)
This reverts commit 1ef9410a96c1d9669a6feaf03fcab8d0a4a13bd5.

This fixes the test file attributor-flatscratchinit-globalisel.ll.
2024-12-09 16:44:48 -08:00
Matt Arsenault
664a226bf6
AMDGPU: Propagate amdgpu-max-num-workgroups attribute (#113018)
I'm not sure what the interpretation of 0 is supposed to be,
AMDGPUUsage doesn't say.
2024-12-09 09:57:27 -06:00
Philip Reames
1ef9410a96 Revert "[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)"
This reverts commit e6aec2c12095cc7debd1a8004c8535eef41f4c36.  Commit breaks "ninja check-llvm" on x86 host.
2024-12-04 15:37:25 -08:00
Jun Wang
e6aec2c120
[AMDGPU] Infer amdgpu-no-flat-scratch-init attribute in AMDGPUAttributor (#94647)
The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and
"amdgpu-stack-objects" attributes, which are used to infer whether we
need to initialize flat scratch. This is, however, not precise. Instead,
we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on
kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .
2024-12-04 14:10:15 -08:00
Shilei Tian
9234ae1bbe [NFC] clang-format -i llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 2024-10-31 11:44:15 -04:00
Shilei Tian
9a7519fdb3 Revert "[NFC][AMDGPU][Attributor] Exit earlier if entry CC (#114177)"
This reverts commit 922a0d3dfe2db7a2ef50e8cef4537fa94a7b95bb.
2024-10-30 00:53:43 -04:00
Shilei Tian
922a0d3dfe
[NFC][AMDGPU][Attributor] Exit earlier if entry CC (#114177)
Avoid calling TTI or other stuff unnecessarily
2024-10-30 00:42:44 -04:00
Shilei Tian
3de5dbb111
[AMDGPU][Attributor] Check the validity of a dependent AA before using its value (#114165)
Even though the Attributor framework will invalidate all its dependent
AAs after the current iteration, a dependent AA can still use the worst
state of a depending AA if it doesn't check the state of the depending
AA in current iteration.
2024-10-29 23:43:45 -04:00
Shilei Tian
0446b403b0
[NFC][AMDGPU][Attributor] Only iterate over filtered functions when creating AAs (#108417) 2024-09-12 13:41:15 -04:00
Shilei Tian
ce2e38653f
[Attributor] Add support for atomic operations in AAAddressSpace (#106927) 2024-09-06 12:45:16 -04:00
Shilei Tian
84ed3c29e8
Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106889)
We can't assume closed world even in full LTO post-link stage. It is
only true
if we are building a "GPU executable". However, AMDGPU does support
"dyamic
library". I'm not aware of any approach to tell if it is relocatable
link when
we create the pass. For now let's revert the patch as it is currently
breaking things.
We can re-enable it once we can handle it correctly.
2024-09-01 09:32:08 -04:00
Shilei Tian
d880f5a4c9
[AMDGPU][Attributor] Remove uniformity check in the indirect call specialization callback (#106177)
This patch removes the conservative uniformity check in the indirect
call
specialization callback, as whether the function pointer is uniform
doesn't
matter too much. Instead, we add an argument to control specialization.
2024-08-27 12:27:17 -04:00
Anshil Gandhi
033e225d90
Revert "Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106000)" (#106001)
This reverts commit 4b6c064dd124c70ff163411dff120c6174e0e022.

Add a requirement for an amdgpu target in the test.
2024-08-25 17:23:36 -04:00
Anshil Gandhi
4b6c064dd1
Revert "[AMDGPU][LTO] Assume closed world after linking (#105845)" (#106000)
This reverts commit 33f3ebc86e7d3afcb65c551feba5bbc2421b42ed.
2024-08-25 14:56:39 -04:00
Anshil Gandhi
33f3ebc86e
[AMDGPU][LTO] Assume closed world after linking (#105845) 2024-08-25 14:06:29 -04:00
Shilei Tian
1ca9fe6db3 Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952)"
This reverts commit 36467bfe89f231458eafda3edb916c028f1f0619.
2024-08-14 17:16:47 -04:00
Shilei Tian
786c409234
[AMDGPU][Attributor] Add a pass parameter closed-world for AMDGPUAttributor pass (#101760) 2024-08-09 22:12:09 -04:00
Shilei Tian
36467bfe89 Revert "Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952)""
This reverts commit 7a68449a82ab1c1ab005caa72c1d986ca5deca36.

https://lab.llvm.org/buildbot/#/builders/123/builds/3205
2024-08-07 09:22:48 -04:00
Shilei Tian
7a68449a82 Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952)"
This reverts commit 874cd100a076f3b98aaae09f90ef224682501538.
2024-08-06 22:46:32 -04:00
Shilei Tian
a0afcbfb5d
[AMDGPU] Enable AAAddressSpace in AMDGPUAttributor (#101593) 2024-08-06 15:27:18 -04:00
Shilei Tian
874cd100a0 Revert "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952)"
This reverts commit ab819d7cf86932e4a47b5bf6aadea9d714a313a9.
2024-08-02 18:31:21 -04:00
Shilei Tian
ab819d7cf8
[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor (#100952) 2024-08-02 17:23:18 -04:00
Shilei Tian
423aec6573
[NFC][AMDGPU] Reformat code for creating AA (#101591) 2024-08-02 08:58:10 -04:00
Jay Foad
74b87b02d2 [AMDGPU] Fix and add namespace closing comments. NFC. 2024-07-16 16:56:31 +01:00
Kazu Hirata
c18bcd0a57
[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072) (#91138)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  38 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-05 13:43:10 -07:00
Matt Arsenault
b6b703b2df
AMDGPU: Infer no-agpr usage in AMDGPUAttributor (#85948)
SIMachineFunctionInfo has a scan  of the function body for inline asm
which may use AGPRs, or callees in SIMachineFunctionInfo. Move this
into the attributor, so it actually works interprocedurally.
    
Could probably avoid most of the test churn if this bothered to avoid
adding this on subtargets without AGPRs. We should also probably
try to delete the MIR scan in usesAGPRs but it seems to be trickier
to eliminate.
2024-03-21 14:24:06 +05:30
Emma Pilkington
bc82cfb38d
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.

This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
2024-01-21 11:54:47 -05:00
Matt Arsenault
d34a10a47d
AMDGPU: Port AMDGPUAttributor to new pass manager (#71349) 2023-11-07 15:40:40 +09:00
Austin Kerbow
60a227c464 [AMDGPU] Use inreg for hint to preload kernel arguments
This patch is the first in a series that adds support for pre-loading
kernel arguments into SGPRs. The command-line argument
'amdgpu-kernarg-preload-count' is used to specify the number of
arguments sequentially from the first that we should attempt to preload,
the default is 0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156852
2023-09-19 15:13:38 -07:00
Johannes Doerfert
d015018cb7 [AMDGPUAttributor][FIX] No endless recursion for recursive initializers
Fixes: https://github.com/llvm/llvm-project/issues/63956
2023-07-19 10:27:01 -07:00
Johannes Doerfert
02a4fcec6b [Attributor] Port AANonNull to the isImpliedByIR interface
AANonNull is now the first AA that is always queried via the new APIs
and not created manually. Others will follow shortly to avoid trivial
AAs whenever possible.

This commit introduced some helper logic that will make it simpler to
port the next one. It also untangles AADereferenceable and AANonNull
such that the former does not keep a handle on the latter. Finally,
we stop deducing `nonnull` for `undef`, which was incorrect.
2023-07-09 16:04:19 -07:00
Johannes Doerfert
f086f383d8 [Attributor][NFCI] Move attribute collection and manifest to Attributor
Before, we checked and manifested attributes right in the IR. This was
bad as we modified the IR before the manifest stage. Now we can
add/remove/inspect attributes w/o going to the IR (except for the
initial query).
2023-07-03 11:57:30 -07:00
Johannes Doerfert
d33bca840a [Attributor] Introduce helpers to judge AAs prior to creation
This is a partial cleanup to centralize the initialization and update
decisions for AAs. Lifting the burdon and boilerplate on users and
making it harder to accidentally perform unsound deductions.

The two static helpers show how we can lift the decisions to generate an
AA into the Attributor, avoiding trivial AAs that just cost us compile
time and maintenance code (to check for pre-conditions).
2023-06-29 12:32:45 -07:00
Johannes Doerfert
c11d22c151 [AAAMDAttributes] AAPointerInfo depends on AAUnderlyingObjects 2023-06-29 09:18:35 -07:00
Johannes Doerfert
6adf352782 [AMDGPUAttributor][NFC] Make the debug output meaningful 2023-06-29 09:18:35 -07:00
Johannes Doerfert
e9fc399db3 [Attributor][NFCI] Use pointers to pass around AAs
This will make it easier to create less trivial AAs in the future as we
can simply return `nullptr` rather than an AA with in invalid state.
2023-06-23 17:21:20 -07:00
Matt Arsenault
7ed1aec3b3 AMDGPU: Remove unnecessary Attributor overrides 2023-06-16 15:04:08 -04:00
Matt Arsenault
b9c6d9e6c3 AMDGPU: Propagate amdgpu-waves-per-eu with attributor
This will do a value range merging down the callgraph, unlike the
current pass which can only propagate values to undecorated functions
from a kernel.

This one is a bit weird due to the interaction with the implied range
from amdgpu-flat-workgroup-size. At the default group range of 1,1024,
the minimum implied bounds is 4 so this ends up introducing the
attribute on undecorated functions. We could probably simplify this by
ignoring it and propagating the raw values. The subtarget interaction
and the interaction with amdgpu-flat-workgroup-size only really clamp
invalid values (plus the lower bound doesn't seem to do anything as
far as I can tell anyway).
2023-06-16 15:04:08 -04:00
Changpeng Fang
7ca3444fba AMDGPU: Use module flag to get code object version at IR level folow-up
Summary:
  This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.

Reviewers: arsenm

Differential Revision
  https://reviews.llvm.org/D143293
2023-02-10 11:16:38 -08:00
Changpeng Fang
54cf69c9d5 AMDGPU: Use module flag to get code object version at IR level
Summary:
  This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line.
In case the module flag is missing, we use the current default code object version supported in the compiler.

For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code
object version, That will be in a separate patch later.

For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file.
In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal.

Reviewer: arsenm

Differential Revision:
  https://reviews.llvm.org/D14313
2023-02-02 18:57:26 -08:00
Matt Arsenault
4d4894ab92 Partially reapply "AMDGPU: Invert handling of enqueued block detection"
This mostly reverts commit 270e96f435596449002fc89962595497481c8770.

Keep the attributor related changes around, but functionally restore
the old behavior as a workaround. Device enqueue goes back to not
working at -O0 with this version.
2023-01-12 15:02:16 -05:00
Matt Arsenault
270e96f435 Revert "AMDGPU: Invert handling of enqueued block detection"
This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e.

The runtime is having trouble with this at -O0 when the inputs are
always enabled.
2023-01-07 21:48:07 -05:00
Matt Arsenault
47288cc977 AMDGPU: Invert handling of enqueued block detection
Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.

There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
2023-01-06 21:16:08 -05:00
Sameer Sahasrabuddhe
9c1b82599d [AAPointerInfo] handle multiple offsets in PHI
Previously reverted in 8b446ea2ba39e406bcf940ea35d6efb4bb9afe95

Reapplying because this commit is NOT DEPENDENT on the reverted commit
fc21f2d7bae2e0be630470cc7ca9323ed5859892, which broke the ASAN buildbot.
See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.

The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D138991
2022-12-18 10:51:20 +05:30
Mitch Phillips
525d6c54b5 Revert "[AAPointerInfo] handle multiple offsets in PHI"
This reverts commit 88db516af69619d4326edea37e52fc7321c33bb5.

Reason: This change is dependent on a commit that needs to be rolled
back because it broke the ASan buildbot. See
https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.
2022-12-16 17:55:48 -08:00
Mitch Phillips
7928a6387f Revert "Revert "[AAPointerInfo] handle multiple offsets in PHI""
This reverts commit 12696d302d146ffe616eecab3feceba9d29be2db.

Reason: This change is dependent on a commit that needs to be rolled
back because it broke the ASan buildbot. See
https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.
2022-12-16 17:55:38 -08:00