53 Commits

Author SHA1 Message Date
Matt Arsenault
d34a10a47d
AMDGPU: Port AMDGPUAttributor to new pass manager (#71349) 2023-11-07 15:40:40 +09:00
Austin Kerbow
60a227c464 [AMDGPU] Use inreg for hint to preload kernel arguments
This patch is the first in a series that adds support for pre-loading
kernel arguments into SGPRs. The command-line argument
'amdgpu-kernarg-preload-count' is used to specify the number of
arguments sequentially from the first that we should attempt to preload,
the default is 0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D156852
2023-09-19 15:13:38 -07:00
Johannes Doerfert
d015018cb7 [AMDGPUAttributor][FIX] No endless recursion for recursive initializers
Fixes: https://github.com/llvm/llvm-project/issues/63956
2023-07-19 10:27:01 -07:00
Johannes Doerfert
02a4fcec6b [Attributor] Port AANonNull to the isImpliedByIR interface
AANonNull is now the first AA that is always queried via the new APIs
and not created manually. Others will follow shortly to avoid trivial
AAs whenever possible.

This commit introduced some helper logic that will make it simpler to
port the next one. It also untangles AADereferenceable and AANonNull
such that the former does not keep a handle on the latter. Finally,
we stop deducing `nonnull` for `undef`, which was incorrect.
2023-07-09 16:04:19 -07:00
Johannes Doerfert
f086f383d8 [Attributor][NFCI] Move attribute collection and manifest to Attributor
Before, we checked and manifested attributes right in the IR. This was
bad as we modified the IR before the manifest stage. Now we can
add/remove/inspect attributes w/o going to the IR (except for the
initial query).
2023-07-03 11:57:30 -07:00
Johannes Doerfert
d33bca840a [Attributor] Introduce helpers to judge AAs prior to creation
This is a partial cleanup to centralize the initialization and update
decisions for AAs. Lifting the burdon and boilerplate on users and
making it harder to accidentally perform unsound deductions.

The two static helpers show how we can lift the decisions to generate an
AA into the Attributor, avoiding trivial AAs that just cost us compile
time and maintenance code (to check for pre-conditions).
2023-06-29 12:32:45 -07:00
Johannes Doerfert
c11d22c151 [AAAMDAttributes] AAPointerInfo depends on AAUnderlyingObjects 2023-06-29 09:18:35 -07:00
Johannes Doerfert
6adf352782 [AMDGPUAttributor][NFC] Make the debug output meaningful 2023-06-29 09:18:35 -07:00
Johannes Doerfert
e9fc399db3 [Attributor][NFCI] Use pointers to pass around AAs
This will make it easier to create less trivial AAs in the future as we
can simply return `nullptr` rather than an AA with in invalid state.
2023-06-23 17:21:20 -07:00
Matt Arsenault
7ed1aec3b3 AMDGPU: Remove unnecessary Attributor overrides 2023-06-16 15:04:08 -04:00
Matt Arsenault
b9c6d9e6c3 AMDGPU: Propagate amdgpu-waves-per-eu with attributor
This will do a value range merging down the callgraph, unlike the
current pass which can only propagate values to undecorated functions
from a kernel.

This one is a bit weird due to the interaction with the implied range
from amdgpu-flat-workgroup-size. At the default group range of 1,1024,
the minimum implied bounds is 4 so this ends up introducing the
attribute on undecorated functions. We could probably simplify this by
ignoring it and propagating the raw values. The subtarget interaction
and the interaction with amdgpu-flat-workgroup-size only really clamp
invalid values (plus the lower bound doesn't seem to do anything as
far as I can tell anyway).
2023-06-16 15:04:08 -04:00
Changpeng Fang
7ca3444fba AMDGPU: Use module flag to get code object version at IR level folow-up
Summary:
  This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.

Reviewers: arsenm

Differential Revision
  https://reviews.llvm.org/D143293
2023-02-10 11:16:38 -08:00
Changpeng Fang
54cf69c9d5 AMDGPU: Use module flag to get code object version at IR level
Summary:
  This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line.
In case the module flag is missing, we use the current default code object version supported in the compiler.

For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code
object version, That will be in a separate patch later.

For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file.
In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal.

Reviewer: arsenm

Differential Revision:
  https://reviews.llvm.org/D14313
2023-02-02 18:57:26 -08:00
Matt Arsenault
4d4894ab92 Partially reapply "AMDGPU: Invert handling of enqueued block detection"
This mostly reverts commit 270e96f435596449002fc89962595497481c8770.

Keep the attributor related changes around, but functionally restore
the old behavior as a workaround. Device enqueue goes back to not
working at -O0 with this version.
2023-01-12 15:02:16 -05:00
Matt Arsenault
270e96f435 Revert "AMDGPU: Invert handling of enqueued block detection"
This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e.

The runtime is having trouble with this at -O0 when the inputs are
always enabled.
2023-01-07 21:48:07 -05:00
Matt Arsenault
47288cc977 AMDGPU: Invert handling of enqueued block detection
Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.

There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
2023-01-06 21:16:08 -05:00
Sameer Sahasrabuddhe
9c1b82599d [AAPointerInfo] handle multiple offsets in PHI
Previously reverted in 8b446ea2ba39e406bcf940ea35d6efb4bb9afe95

Reapplying because this commit is NOT DEPENDENT on the reverted commit
fc21f2d7bae2e0be630470cc7ca9323ed5859892, which broke the ASAN buildbot.
See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.

The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D138991
2022-12-18 10:51:20 +05:30
Mitch Phillips
525d6c54b5 Revert "[AAPointerInfo] handle multiple offsets in PHI"
This reverts commit 88db516af69619d4326edea37e52fc7321c33bb5.

Reason: This change is dependent on a commit that needs to be rolled
back because it broke the ASan buildbot. See
https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.
2022-12-16 17:55:48 -08:00
Mitch Phillips
7928a6387f Revert "Revert "[AAPointerInfo] handle multiple offsets in PHI""
This reverts commit 12696d302d146ffe616eecab3feceba9d29be2db.

Reason: This change is dependent on a commit that needs to be rolled
back because it broke the ASan buildbot. See
https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.
2022-12-16 17:55:38 -08:00
Mitch Phillips
8b446ea2ba Revert "[AAPointerInfo] handle multiple offsets in PHI"
This reverts commit 179ed8871101cd197e0a719a3629cd5077b1a999.

Reason: This change is dependent on a commit that needs to be rolled
back because it broke the ASan buildbot. See
https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.
2022-12-16 17:54:44 -08:00
Sameer Sahasrabuddhe
179ed88711 [AAPointerInfo] handle multiple offsets in PHI
Previously reverted in 12696d302d146ffe616eecab3feceba9d29be2db

The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D138991
2022-12-15 12:23:50 +05:30
Sameer Sahasrabuddhe
12696d302d Revert "[AAPointerInfo] handle multiple offsets in PHI"
This reverts commit 88db516af69619d4326edea37e52fc7321c33bb5.
2022-12-15 10:14:39 +05:30
Sameer Sahasrabuddhe
88db516af6 [AAPointerInfo] handle multiple offsets in PHI
The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D138991
2022-12-15 08:48:38 +05:30
Sameer Sahasrabuddhe
6a2305484e [AAPointerInfo] track multiple constant offsets for each use
An expression of the form `gep(base, select(pred, const1, const2))` can result
in a set of offsets instead of just one. PointerInfo can now track these sets
instead of conservatively modeling them as Unknown. In general, AAPointerInfo
now uses AAPotentialConstantValues to examine the operands of the GEP.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D138646
2022-12-13 22:27:25 +05:30
Sameer Sahasrabuddhe
2fdeb27790 Revert "[AAPointerInfo] track multiple constant offsets for each use"
Assertion fired in openmp-offload-amdgpu-runtime:
https://lab.llvm.org/buildbot/#/builders/193/builds/23177

This reverts commit c2a0baad1fbb21fe111fef83ec93c2d7923b9b0c.
2022-12-12 15:39:18 +05:30
Sameer Sahasrabuddhe
c2a0baad1f [AAPointerInfo] track multiple constant offsets for each use
An expression of the form `gep(base, select(pred, const1, const2))` can result
in a set of offsets instead of just one. PointerInfo can now track these sets
instead of conservatively modeling them as Unknown. In general, AAPointerInfo
now uses AAPotentialConstantValues to examine the operands of the GEP.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D138646
2022-12-12 13:36:45 +05:30
Sameer Sahasrabuddhe
5ac633d2de [NFC][AAPointerInfo] rename OffsetAndSize to RangeTy
This is in preparation for future changes that introduce an actual list of
ranges per Access, to be called a RangeList.

Differential Revision: https://reviews.llvm.org/D138644
2022-12-08 12:14:28 +05:30
Johannes Doerfert
477e8e10f0 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-10-05 06:19:47 -07:00
Johannes Doerfert
c922cac868 Revert "[Attributor] AAPointerInfo should allow "harmless" uses"
Revert "[Attributor] Teach AAPointerInfo to look into aggregates"

This reverts commit 844f6c5d03d58e7ac0c6b838e4a7834ac575ab9b and
4ed0a88cd8a77370073feb270d77a9e8b27bd68c as they broke the buildbots
that run openmp/libomptarget/test/offloading/bug49021.cpp.
2022-09-11 21:37:54 -07:00
Johannes Doerfert
4ed0a88cd8 [Attributor] Teach AAPointerInfo to look into aggregates
If we have a constant aggregate, e.g., as an initializer, we usually
failed to extract the proper value/type from it. This patch provides the
size and offset information necessary to extract the right part of the
constant.
2022-09-11 20:16:11 -07:00
Johannes Doerfert
bf789b1957 [Attributor] Replace AAValueSimplify with AAPotentialValues
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
  locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
  only as an afterthought. `genericValueTraversal` did offer an option
  but `AAValueSimplify` did not. Thus, we might end up with "too much"
  simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
  problems like the infinite recursion bug (#54981) as well as code
  duplication.

This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.

`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.

We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.

Fixes: https://github.com/llvm/llvm-project/issues/54981

Note: A previous version was flawed and consequently reverted in
      6555558a80589d1c5a1154b92cc3af9495f8f86c.
2022-07-19 16:24:42 -05:00
Jon Chesterfield
3a20597776 [amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060
2022-07-19 17:46:19 +01:00
Johannes Doerfert
3be3b40188 [Attributor][NFCI] Introduce AttributorConfig to bundle all options
Instead of lengthy constructors we can now set the members of a
read-only struct before the Attributor is created. Should make it
clearer what is configurable and also help introducing new options in
the future. This actually added IsModulePass and avoids deduction
through the Function set size. No functional change was intended.
2022-04-15 18:17:19 -05:00
Changpeng Fang
8edaf25986 AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary:
  Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default.
We use implicitarg_ptr + offset to check whether the multigrid synchronization
pointer is used. If yes, we remove this attribute and also remove
amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg
only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.

Reviewers: arsenm, sameerds, b-sumner and foad

Differential Revision: https://reviews.llvm.org/D123548
2022-04-12 12:36:30 -07:00
Changpeng Fang
dd5895cc39 AMDGPU: Use the implicit kernargs for code object version 5
Summary:
  Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Reviewers: arsenm, sameerds, yaxunl

Differential Revision: https://reviews.llvm.org/D120265
2022-03-17 14:12:36 -07:00
Changpeng Fang
0f20a35b9e AMDGPU: Set up User SGPRs for queue_ptr only when necessary
Summary:
  In general, we need queue_ptr for aperture bases and trap handling,
and user SGPRs have to be set up to hold queue_ptr. In current implementation,
user SGPRs are set up unnecessarily for some cases. If the target has aperture
registers, queue_ptr is not needed to reference aperture bases. For trap
handling, if target suppots getDoorbellID, queue_ptr is also not necessary.
Futher, code object version 5 introduces new kernel ABI which passes queue_ptr
as an implicit kernel argument, so user SGPRs are no longer necessary for
queue_ptr. Based on the trap handling document:
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table,
llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap
in the backend.

Reviewers: sameerds, arsenm

Fixes: SWDEV-307189

Differential Revision: https://reviews.llvm.org/D119762
2022-03-09 10:14:05 -08:00
Changpeng Fang
ca62b1db9f [AMDGPU][NFC]: Emit metadata for hidden_heap_v1 kernarg
Summary:
  Emit metadata for hidden_heap_v1 kernarg

Reviewers:
  sameerds, b-sumner

Fixes:
  SWDEV-307188

Differential Revision:
  https://reviews.llvm.org/D119027
2022-02-25 10:45:35 -08:00
Sameer Sahasrabuddhe
d8f99bb6e0 [AMDGPU] replace hostcall module flag with function attribute
The module flag to indicate use of hostcall is insufficient to catch
all cases where hostcall might be in use by a kernel. This is now
replaced by a function attribute that gets propagated to top-level
kernel functions via their respective call-graph.

If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the
default behaviour is to emit kernel metadata indicating that the
kernel uses the hostcall buffer pointer passed as an implicit
argument.

The attribute may be placed explicitly by the user, or inferred by the
AMDGPU attributor by examining the call-graph. The attribute is
inferred only if the function is not being sanitized, and the
implictarg_ptr does not result in a load of any byte in the hostcall
pointer argument.

Reviewed By: jdoerfert, arsenm, kpyzhov

Differential Revision: https://reviews.llvm.org/D119216
2022-02-11 22:51:56 +05:30
Sameer Sahasrabuddhe
c6a6b57902 [AMDGPU] [NFC] Fix incorrect use of bitwise operator.
Differential Revision: https://reviews.llvm.org/D119308
2022-02-08 22:12:54 -05:00
Sameer Sahasrabuddhe
02a2e46ff0 [AMDGPU] [NFC] refactor the AMDGPU attributor
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D119087
2022-02-07 21:45:32 -05:00
Matt Arsenault
4132dc917e AMDGPU: Return result from indicatePessimisticFixpoint
I don't think this fixes anything.
2021-12-16 11:26:30 -05:00
Matt Arsenault
6bcf1f9181 AMDGPU: Indicate pessimistic fixpoint for entry functions
There aren't going to be any callers for these, so avoid running
through the machinery to look at the callers.
2021-12-11 11:42:34 -05:00
Matt Arsenault
0eebe2e36c AMDGPU: Sanitized functions require implicit arguments
Do not infer no-amdgpu-implicitarg-ptr for sanitized functions. If a
function is explicitly marked amdgpu-no-implicitarg-ptr and
sanitize_address, infer that it is required.
2021-12-02 17:55:43 -05:00
Benjamin Kramer
9b8b16457c Put implementation details into anonymous namespaces. NFCI. 2021-11-07 15:18:30 +01:00
Matt Arsenault
ec57b37551 AMDGPU: Use attributor to propagate amdgpu-flat-work-group-size
This can merge the acceptable ranges based on the call graph, rather
than the simple application of the attribute. Remove the handling from
the old pass.
2021-10-22 16:23:50 -04:00
Matt Arsenault
f12174204c AMDGPU: Rename attributor class for uniform-work-group-size
This isn't really an AMDGPU specific attribute and could be moved to
generic code. It's also important to include the word uniform in the
name.
2021-09-14 19:49:08 -04:00
Matt Arsenault
088cc63640 AMDGPU: Invert AMDGPUAttributor
Switch to using BitIntegerState for each of the inputs, and invert
their meanings.

This now diverges more from the old AMDGPUAnnotateKernelFeatures, but
this isn't used yet anyway.
2021-08-26 21:32:13 -04:00
Matt Arsenault
46d82e7357 AMDGPU: Restrict attributor transforms
We only really want this to add the custom attributes. Theoretically
the regular transforms were already run at this point. Touching
undefined behavior breaks a lot of tests when this is enabled by
default, many of which are expecting to test handling of undef
operations.
2021-08-26 21:08:51 -04:00
Matt Arsenault
cf32d61a05 AMDGPU: Remove hacky attribute deduction from AMDGPUAttributor
amdgpu-calls and amdgpu-stack-objects don't really belong as
attributes, and are currently a hacky way of passing an analysis into
the DAG. These don't really belong in the IR, and don't really fit in
with the other attributes. Remove these to facilitate inverting the
pass.

I don't exactly understand the indirect call test changes. These tests
are using calls which are trivially replacable with a direct call, so
I'm not sure what the point is.
2021-08-26 20:31:14 -04:00
Matt Arsenault
98d7aa435f AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr
We no longer use this intrinsic outside of the backend and no longer
support using it outside of kernels.
2021-08-26 20:30:03 -04:00