llvm-project/llvm/lib/Target/AMDGPU/AMDGPUExportKernelRuntimeHandles.cpp
Matt Arsenault 0d2c55cb96
AMDGPU: Move enqueued block handling into clang (#128519)
The previous implementation wasn't maintaining a faithful IR
representation of how this really works. The value returned by
createEnqueuedBlockKernel wasn't actually used as a function, and
hacked up later to be a pointer to the runtime handle global
variable. In reality, the enqueued block is a struct where the first
field is a pointer to the kernel descriptor, not the kernel itself. We
were also relying on passing around a reference to a global using a
string attribute containing its name. It's better to base this on a
proper IR symbol reference during final emission.

This now avoids using a function attribute on kernels and avoids using
the additional "runtime-handle" attribute to populate the final
metadata. Instead, associate the runtime handle reference to the
kernel with the !associated global metadata. We can then get a final,
correctly mangled name at the end.

I couldn't figure out how to get rename-with-external-symbol behavior
using a combination of comdats and aliases, so leaves an IR pass to
externalize the runtime handles for codegen. If anything breaks, it's
most likely this, so leave avoiding this for a later step. Use a
special section name to enable this behavior. This also means it's
possible to declare enqueuable kernels in source without going through
the dedicated block syntax or other dedicated compiler support.

We could move towards initializing the runtime handle in the
compiler/linker. I have a working patch where the linker sets up the
first field of the handle, avoiding the need to export the block
kernel symbol for the runtime. We would need new relocations to get
the private and group sizes, but that would avoid the runtime's
special case handling that requires the device_enqueue_symbol metadata
field.

https://reviews.llvm.org/D141700
2025-03-10 19:54:04 +07:00

111 lines
3.4 KiB
C++

//===- AMDGPUExportKernelRuntimeHandles.cpp - Lower enqueued block --------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// \file
//
// Give any globals used for OpenCL block enqueue runtime handles external
// linkage so the runtime may access them. These should behave like internal
// functions for purposes of linking, but need to have an external symbol in the
// final object for the runtime to access them.
//
// TODO: This could be replaced with a new linkage type or global object
// metadata that produces an external symbol in the final object, but allows
// rename on IR linking. Alternatively if we can rely on
// GlobalValue::getGlobalIdentifier we can just make these external symbols to
// begin with.
//
//===----------------------------------------------------------------------===//
#include "AMDGPUExportKernelRuntimeHandles.h"
#include "AMDGPU.h"
#include "llvm/IR/Module.h"
#include "llvm/Pass.h"
#define DEBUG_TYPE "amdgpu-export-kernel-runtime-handles"
using namespace llvm;
namespace {
/// Lower enqueued blocks.
class AMDGPUExportKernelRuntimeHandlesLegacy : public ModulePass {
public:
static char ID;
explicit AMDGPUExportKernelRuntimeHandlesLegacy() : ModulePass(ID) {}
private:
bool runOnModule(Module &M) override;
};
} // end anonymous namespace
char AMDGPUExportKernelRuntimeHandlesLegacy::ID = 0;
char &llvm::AMDGPUExportKernelRuntimeHandlesLegacyID =
AMDGPUExportKernelRuntimeHandlesLegacy::ID;
INITIALIZE_PASS(AMDGPUExportKernelRuntimeHandlesLegacy, DEBUG_TYPE,
"Externalize enqueued block runtime handles", false, false)
ModulePass *llvm::createAMDGPUExportKernelRuntimeHandlesLegacyPass() {
return new AMDGPUExportKernelRuntimeHandlesLegacy();
}
static bool exportKernelRuntimeHandles(Module &M) {
bool Changed = false;
const StringLiteral HandleSectionName(".amdgpu.kernel.runtime.handle");
for (GlobalVariable &GV : M.globals()) {
if (GV.getSection() == HandleSectionName) {
GV.setLinkage(GlobalValue::ExternalLinkage);
GV.setDSOLocal(false);
Changed = true;
}
}
if (!Changed)
return false;
// FIXME: We shouldn't really need to export the kernel address. We can
// initialize the runtime handle with the kernel descriptor.
for (Function &F : M) {
if (F.getCallingConv() != CallingConv::AMDGPU_KERNEL)
continue;
const MDNode *Associated = F.getMetadata(LLVMContext::MD_associated);
if (!Associated)
continue;
auto *VM = cast<ValueAsMetadata>(Associated->getOperand(0));
auto *Handle = dyn_cast<GlobalObject>(VM->getValue());
if (Handle && Handle->getSection() == HandleSectionName) {
F.setLinkage(GlobalValue::ExternalLinkage);
F.setVisibility(GlobalValue::ProtectedVisibility);
}
}
return Changed;
}
bool AMDGPUExportKernelRuntimeHandlesLegacy::runOnModule(Module &M) {
return exportKernelRuntimeHandles(M);
}
PreservedAnalyses
AMDGPUExportKernelRuntimeHandlesPass::run(Module &M,
ModuleAnalysisManager &MAM) {
if (!exportKernelRuntimeHandles(M))
return PreservedAnalyses::all();
PreservedAnalyses PA;
PA.preserveSet<AllAnalysesOn<Function>>();
return PA;
}