19 Commits

Author SHA1 Message Date
Matt Arsenault
bbc7b30fbf AMDGPU: Remove invalid testcase for enqueue kernel
The call didn't have the right calling convention, but calls to
kernels are supposed to be illegal anyway.
2023-04-26 17:25:30 -04:00
Matt Arsenault
99d4c722e3 AMDGPU: Really invert handling of enqueued block detection
Remove the broken call graph analysis in the block enqueue lowering
pass. The previous iteration was reverted due to a runtime bug when
the completion action was unconditionally enabled.
2023-04-20 06:58:24 -04:00
Matt Arsenault
4fc07e1849 AMDGPU: Use constant and externally_initialized for block handle
The runtime initializes this.
2023-01-10 20:35:49 -05:00
Matt Arsenault
0cd3a39e95 AMDGPU: Fix opaque pointer handling for enqueued blocks, again 2023-01-10 20:35:48 -05:00
Matt Arsenault
270e96f435 Revert "AMDGPU: Invert handling of enqueued block detection"
This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e.

The runtime is having trouble with this at -O0 when the inputs are
always enabled.
2023-01-07 21:48:07 -05:00
Matt Arsenault
47554a0c73 AMDGPU: Use more accurate IR type for block handle
The device library uses this as a struct with a pointer sized integer
and 2 ints.
2023-01-06 21:23:28 -05:00
Matt Arsenault
47288cc977 AMDGPU: Invert handling of enqueued block detection
Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.

There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
2023-01-06 21:16:08 -05:00
Matt Arsenault
0416883dc1 AMDGPU: Fix enqueue block lowering for opaque pointers
This was looking for a specific constant cast of the function, when
the type doesn't matter. Doesn't bother trying to handle typed
pointers, it will just assert.

Things probably don't work completely correctly if the block kernel
address is captured somewhere else, but that wouldn't work before
either. The uses should really be loads out of the handle, and the
handle initializer should contain the kernel address.
2023-01-06 21:15:39 -05:00
Matt Arsenault
4ce5400a3f AMDGPU: Convert enqueue-kernel.ll to opaque pointers
This demonstrates the pass is broken with them, the follow up change
will fix it.
2023-01-06 21:15:39 -05:00
Matt Arsenault
1f93517b25 AMDGPU: Switch enqueue kernel test to generated checks 2023-01-05 11:39:23 -05:00
Matt Arsenault
a74ea40cb6 AMDGPU: Remove unnecessary metadata from test
The pass isn't doing anything with it, and the line wrapping is
confusing update_test_checks.
2022-11-29 11:12:08 -05:00
Matt Arsenault
06c192d454 OpaquePtr: Bulk update tests to use typed byval
Upgrade of the IR text tests should be the only thing blocking making
typed byval mandatory. Partially done through regex and partially
manual.
2020-11-20 14:00:46 -05:00
Yaxun Liu
fb17bf60dd [AMDGPU] Change enqueue kernel handle type
Currently the handle type is a global pointer which holds 8 bytes.
We need a larger type which hold 16 bytes, therefore change it
to [i64 x 2].

Differential Revision: https://reviews.llvm.org/D48094

llvm-svn: 334625
2018-06-13 17:31:51 +00:00
Yaxun Liu
9381ae9791 [AMDGPU] Fix lowering enqueue_kernel
Two issues were fixed:

runtime has difficulty to allocate memory for an external symbol of a
kernel and set the address of the external symbol, therefore make the runtime
handle of an enqueued kernel an ordinary global variable. Runtime only needs
to store the address of the loaded kernel to the handle and has verified
that this approach works.

handle the situation where __enqueue_kernel* gets inlined therefore
the enqueued kernel may be used through a constant expr instead
of an instruction.

Differential Revision: https://reviews.llvm.org/D45187

llvm-svn: 329815
2018-04-11 14:46:15 +00:00
Yaxun Liu
a99e7d8e44 [AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.

Differential Revision: https://reviews.llvm.org/D44322

llvm-svn: 327291
2018-03-12 16:34:06 +00:00
Yaxun Liu
46439e8d4a [AMDGPU] Fix lowering OpenCL enqueue_kernel
One addrspacecast disappeared in clang emitted IR for
block invoke function due to adoption of the new
addr space mapping.

Differential Revision: https://reviews.llvm.org/D43785

llvm-svn: 326806
2018-03-06 16:04:39 +00:00
Rafael Espindola
e4b0231c63 Make internal/private GVs implicitly dso_local.
While updating clang tests for having clang set dso_local I noticed
that:

- There are *a lot* of tests to update.
- Many of the updates are redundant.

They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.

llvm-svn: 322317
2018-01-11 22:15:05 +00:00
Yaxun Liu
c928f2a6d4 [AMDGPU] Emit metadata for hidden arguments for kernel enqueue
Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with calls-enqueue-kernel function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.

Differential Revision: https://reviews.llvm.org/D39255

llvm-svn: 316907
2017-10-30 14:30:28 +00:00
Yaxun Liu
de4b88d9a1 [AMDGPU] Lower enqueued blocks and generate runtime metadata
This patch adds a post-linking pass which replaces the function pointer of enqueued
block kernel with a global variable (runtime handle) and adds
runtime-handle attribute to the enqueued block kernel.

In LLVM CodeGen the runtime-handle metadata will be translated to
RuntimeHandle metadata in code object. Runtime allocates a global buffer
for each kernel with RuntimeHandel metadata and saves the kernel address
required for the AQL packet into the buffer. __enqueue_kernel function
in device library knows that the invoke function pointer in the block
literal is actually runtime handle and loads the kernel address from it
and puts it into AQL packet for dispatching.

This cannot be done in FE since FE cannot create a unique global variable
with external linkage across LLVM modules. The global variable with internal
linkage does not work since optimization passes will try to replace loads
of the global variable with its initialization value.

Differential Revision: https://reviews.llvm.org/D38610

llvm-svn: 315352
2017-10-10 19:39:48 +00:00