llvm-project/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.h
Diana Picus 20d8398825
[AMDGPU] ISel & PEI for whole wave functions (#145858)
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.

Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.

At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.

This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
  used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
  a SReg_1 representing `%active`, which needs to be passed into
  SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
  special handling of %active. Since the return may be in a different
  basic block, it's difficult to add the virtual reg for %active to
  SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
  which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
  marks any used VGPRs as WWM registers, which are then spilled and
  restored with the usual logic.

Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).

---------

Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-07-21 10:39:09 +02:00

96 lines
3.8 KiB
C++

//===- lib/Target/AMDGPU/AMDGPUCallLowering.h - Call lowering -*- C++ -*---===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
///
/// \file
/// This file describes how to lower LLVM calls to machine code calls.
///
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUCALLLOWERING_H
#define LLVM_LIB_TARGET_AMDGPU_AMDGPUCALLLOWERING_H
#include "llvm/CodeGen/GlobalISel/CallLowering.h"
namespace llvm {
class AMDGPUTargetLowering;
class GCNSubtarget;
class MachineInstrBuilder;
class SIMachineFunctionInfo;
class AMDGPUCallLowering final : public CallLowering {
void lowerParameterPtr(Register DstReg, MachineIRBuilder &B,
uint64_t Offset) const;
void lowerParameter(MachineIRBuilder &B, ArgInfo &AI, uint64_t Offset,
Align Alignment) const;
bool canLowerReturn(MachineFunction &MF, CallingConv::ID CallConv,
SmallVectorImpl<BaseArgInfo> &Outs,
bool IsVarArg) const override;
bool lowerReturnVal(MachineIRBuilder &B, const Value *Val,
ArrayRef<Register> VRegs, MachineInstrBuilder &Ret) const;
void addOriginalExecToReturn(MachineFunction &MF,
MachineInstrBuilder &Ret) const;
public:
AMDGPUCallLowering(const AMDGPUTargetLowering &TLI);
bool lowerReturn(MachineIRBuilder &B, const Value *Val,
ArrayRef<Register> VRegs,
FunctionLoweringInfo &FLI) const override;
bool lowerFormalArgumentsKernel(MachineIRBuilder &B, const Function &F,
ArrayRef<ArrayRef<Register>> VRegs) const;
bool lowerFormalArguments(MachineIRBuilder &B, const Function &F,
ArrayRef<ArrayRef<Register>> VRegs,
FunctionLoweringInfo &FLI) const override;
bool passSpecialInputs(MachineIRBuilder &MIRBuilder,
CCState &CCInfo,
SmallVectorImpl<std::pair<MCRegister, Register>> &ArgRegs,
CallLoweringInfo &Info) const;
bool
doCallerAndCalleePassArgsTheSameWay(CallLoweringInfo &Info,
MachineFunction &MF,
SmallVectorImpl<ArgInfo> &InArgs) const;
bool
areCalleeOutgoingArgsTailCallable(CallLoweringInfo &Info, MachineFunction &MF,
SmallVectorImpl<ArgInfo> &OutArgs) const;
/// Returns true if the call can be lowered as a tail call.
bool
isEligibleForTailCallOptimization(MachineIRBuilder &MIRBuilder,
CallLoweringInfo &Info,
SmallVectorImpl<ArgInfo> &InArgs,
SmallVectorImpl<ArgInfo> &OutArgs) const;
void handleImplicitCallArguments(
MachineIRBuilder &MIRBuilder, MachineInstrBuilder &CallInst,
const GCNSubtarget &ST, const SIMachineFunctionInfo &MFI,
CallingConv::ID CalleeCC,
ArrayRef<std::pair<MCRegister, Register>> ImplicitArgRegs) const;
bool lowerTailCall(MachineIRBuilder &MIRBuilder, CallLoweringInfo &Info,
SmallVectorImpl<ArgInfo> &OutArgs) const;
bool lowerChainCall(MachineIRBuilder &MIRBuilder,
CallLoweringInfo &Info) const;
bool lowerCall(MachineIRBuilder &MIRBuilder,
CallLoweringInfo &Info) const override;
static CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool IsVarArg);
static CCAssignFn *CCAssignFnForReturn(CallingConv::ID CC, bool IsVarArg);
};
} // End of namespace llvm;
#endif