llvm-project

Author	SHA1	Message	Date
Jon Chesterfield	0507448d82	[amdgpu] Implement dynamic LDS accesses from non-kernel functions The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so. 1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel 2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables 3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen 4/ The assembler builds the lookup table using the metadata 5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144233	2023-04-04 20:06:34 +01:00
Jon Chesterfield	75c7019b7e	[amdgpu] Fix broken error detection in LDS lowering std::optional<uint32_t> can be compared to uint32_t without warning, but does not compare to the value within the optional. It needs to be prefixed . Wconversion does not warn about this. ``` bool bug(uint32_t Offset, std::optional<uint32_t> Expect) { return (Offset != Expect); } bool deref(uint32_t Offset, std::optional<uint32_t> Expect) { return (Offset != Expect); } ``` Both compile without warnings. Wrote the former, intended the latter. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D146775	2023-03-30 13:42:38 +01:00
Jon Chesterfield	d3dda422bf	[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD Post ISel, LDS variables are absolute values. Representing them as such is simpler than the frame recalculation currently used to build assembler tables from their addresses. This is a precursor to lowering dynamic/external LDS accesses from non-kernel functions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144221	2023-03-12 13:47:48 +00:00
Matt Arsenault	69e75ae695	CodeGen: Don't lazily construct MachineFunctionInfo This fixes what I consider to be an API flaw I've tripped over multiple times. The point this is constructed isn't well defined, so depending on where this is first called, you can conclude different information based on the MachineFunction. For example, the AMDGPU implementation inspected the MachineFrameInfo on construction for the stack objects and if the frame has calls. This kind of worked in SelectionDAG which visited all allocas up front, but broke in GlobalISel which hasn't visited any of the IR when arguments are lowered. I've run into similar problems before with the MIR parser and trying to make use of other MachineFunction fields, so I think it's best to just categorically disallow dependency on the MachineFunction state in the constructor and to always construct this at the same time as the MachineFunction itself. A missing feature I still could use is a way to access an custom analysis pass on the IR here.	2022-12-21 10:49:32 -05:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Jon Chesterfield	d77ae7f251	[amdgpu] Reimplement LDS lowering Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433	2022-12-07 22:02:54 +00:00
Nico Weber	a862d09a92	Revert "[amdgpu] Reimplement LDS lowering" This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed. Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862	2022-12-06 12:01:36 -05:00
Jon Chesterfield	982017240d	[amdgpu] Reimplement LDS lowering Renames the current lowering scheme to "module" and introduces two new ones, "kernel" and "table", plus a "hybrid" that chooses between those three on a per-variable basis. Unit tests are set up to pass with the default lowering of "module" or "hybrid" with this patch defaulting to "module", which will be a less dramatic codegen change relative to the current. This reflects the sparsity of test coverage for the table lowering method. Hybrid is better than module in every respect and will be default in a subsequent patch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139433	2022-12-06 16:28:15 +00:00
Stanislav Mekhanoshin	5a3fe9a039	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-28 13:13:40 -07:00
Jon Chesterfield	80ba432821	[amdgpu][nfc] Allocate kernel-specific LDS struct deterministically A kernel may have an associated struct for laying out LDS variables. This patch puts that instance, if present, at a deterministic address by allocating it at the same time as the module scope instance. This is relatively likely to be where the instance was allocated anyway (~NFC) but will allow later patches to calculate where a given field can be found, which means a function which is only reachable from a single kernel will be able to access a LDS variable with zero overhead. That will be particularly helpful for applications that instantiate a function template containing LDS variables once per kernel. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127052	2022-09-28 14:55:16 +01:00
Vitaly Buka	20a80d60a8	Revert "[AMDGPU] Move SIModeRegisterDefaults to SI MFI" Break msan bots. Details in D134666. This reverts commit 0ce96e06ee0226938e723bd0c8e16e3d2d51f203.	2022-09-26 22:22:09 -07:00
Stanislav Mekhanoshin	0ce96e06ee	[AMDGPU] Move SIModeRegisterDefaults to SI MFI It does not belong to a general AMDGPU MFI. Differential Revision: https://reviews.llvm.org/D134666	2022-09-26 13:20:24 -07:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
Jon Chesterfield	bc78c09952	[amdgpu] Elide module lds allocation in kernels with no callees Introduces a string attribute, amdgpu-requires-module-lds, to allow eliding the module.lds block from kernels. Will allocate the block as before if the attribute is missing or has its default value of true. Patch uses the new attribute to detect the simplest possible instance of this, where a kernel makes no calls and thus cannot call any functions that use LDS. Tests updated to match, coverage was already good. Interesting cases is in lower-module-lds-offsets where annotating the kernel allows the backend to pick a different (in this case better) variable ordering than previously. A later patch will avoid moving kernel variables into module.lds when the kernel can have this attribute, allowing optimal ordering and locally unused variable elimination. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122091	2022-05-04 22:42:07 +01:00
serge-sans-paille	7030654296	[iwyu] Handle regressions in libLLVM header include Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few regressions, fixing them. Differential Revision: https://reviews.llvm.org/D124847	2022-05-04 08:32:38 +02:00
Matt Arsenault	1900b6c77b	AMDGPU: Add assert for GDS globals	2022-04-19 22:28:11 -04:00
Matt Arsenault	b5ec131267	AMDGPU: Fix allocating GDS globals to LDS offsets These don't seem to be very well used or tested, but try to make the behavior a bit more consistent with LDS globals. I'm not sure what the definition for amdgpu-gds-size is supposed to mean. For now I assumed it's allocating a static size at the beginning of the allocation, and any known globals are allocated after it.	2022-04-19 22:14:48 -04:00
Jon Chesterfield	bcbd4cf1f2	Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal" Reconsidered, better to handle per-function state in the constructor as before. This reverts commit 98e474c1b3210d90e313457bf6a6e39a7edb4d2b.	2022-03-20 00:58:26 +00:00
Jon Chesterfield	98e474c1b3	[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal	2022-03-19 16:42:17 +00:00
Kazu Hirata	f3a344d212	[Target] Remove redundant member initialization (NFC) Identified with readability-redundant-member-init.	2022-01-06 22:01:44 -08:00
Stanislav Mekhanoshin	748db5bfac	[AMDGPU] Fix module LDS selection Accesses to global module LDS variable start from null, but kernel also thinks its variables start address is null. Fixed by not using a null as an address. Differential Revision: https://reviews.llvm.org/D102882	2021-05-20 15:59:01 -07:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Jon Chesterfield	13e49dcee4	[amdgpu] Implement lower function LDS pass [amdgpu] Implement lower function LDS pass Local variables are allocated at kernel launch. This pass collects global variables that are used from non-kernel functions, moves them into a new struct type, and allocates an instance of that type in every kernel. Uses are then replaced with a constantexpr offset. Prior to this pass, accesses from a function are compiled to trap. With this pass, most such accesses are removed before reaching codegen. The trap logic is left unchanged by this pass. It is still reachable for the cases this pass misses, notably the extern shared construct from hip and variables marked constant which survive the optimizer. This is of interest to the openmp project because the deviceRTL runtime library uses cuda shared variables from functions that cannot be inlined. Trunk llvm therefore cannot compile some openmp kernels for amdgpu. In addition to the unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled and the function pointer hashing scheme deleted passes the openmp suite. This lowering will use more LDS than strictly necessary. It is intended to be a functionally correct fallback for cases that are difficult to target from future optimisation passes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94648	2021-03-15 15:24:01 +00:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Sebastian Neubauer	5733167f54	[AMDGPU] Mark amdgpu_gfx functions as module entry function - Allows lds allocations - Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata Differential Revision: https://reviews.llvm.org/D92946	2020-12-14 10:43:39 +01:00
Michael Liao	5257a60ee0	[amdgpu] Add codegen support for HIP dynamic shared memory. Summary: - HIP uses an unsized extern array `extern __shared__ T s[]` to declare the dynamic shared memory, which size is not known at the compile time. Reviewers: arsenm, yaxunl, kpyzhov, b-sumner Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82496	2020-08-20 21:29:18 -04:00
Guillaume Chatelet	52911428ef	[Alignment][NFC] Migrate AMDGPU backend to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82743	2020-06-29 11:56:06 +00:00
Eli Friedman	a2caa3b614	Remove GlobalValue::getAlignment(). This function is deceptive at best: it doesn't return what you'd expect. If you have an arbitrary GlobalValue and you want to determine the alignment of that pointer, Value::getPointerAlignment() returns the correct value. If you want the actual declared alignment of a function or variable, GlobalObject::getAlignment() returns that. This patch switches all the users of GlobalValue::getAlignment to an appropriate alternative. Differential Revision: https://reviews.llvm.org/D80368	2020-06-23 19:13:42 -07:00
Matt Arsenault	61813b8069	AMDGPU: Use member initializers in MFI	2020-05-19 18:11:34 -04:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Matt Arsenault	db0ed3e429	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.	2019-11-19 19:55:43 +05:30
Guillaume Chatelet	b65fa48305	[Alignment] Migrate Attribute::getWith(Stack)Alignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Reviewed By: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68792 llvm-svn: 374884	2019-10-15 12:56:24 +00:00
Matt Arsenault	e7e23e3e91	AMDGPU: Make AMDGPUPerfHintAnalysis an SCC pass Add a string attribute instead of directly setting MachineFunctionInfo. This avoids trying to get the analysis in the MachineFunctionInfo in a way that doesn't work with the new pass manager. This will also avoid re-visiting the call graph for every single function. llvm-svn: 365241	2019-07-05 20:26:13 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Matt Arsenault	4bec7d4261	Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering" Reverts r337079 with fix for msan error. llvm-svn: 337535	2018-07-20 09:05:08 +00:00
Evgeniy Stepanov	1971ba097d	Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering" This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const, llvm::raw_ostream&, char const) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079	2018-07-14 01:20:53 +00:00
Matt Arsenault	de95077780	AMDGPU: Fix handling of alignment padding in DAG argument lowering This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021	2018-07-13 16:40:25 +00:00
Matt Arsenault	75e7192ba3	AMDGPU: Remove MFI::ABIArgOffset We have too many mechanisms for tracking the various offsets used for kernel arguments, so remove one. There's still a lot of confusion with these because there are two different "implicit" argument areas located at the beginning and end of the kernarg segment. Additionally, the offset was determined based on the memory size of the split element types. This would break in a future commit where v3i32 is decomposed into separate i32 pieces. llvm-svn: 335830	2018-06-28 10:18:55 +00:00
Stanislav Mekhanoshin	1c538423dc	[AMDGPU] Add perf hints to functions This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289	2018-05-25 17:25:12 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Matt Arsenault	2b1f9aa577	AMDGPU: Start defining a calling convention Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308	2017-05-17 21:56:25 +00:00
Marek Olsak	a302a736ec	AMDGPU: Add AMDGPU_HS calling convention Reviewers: arsenm, nhaehnle Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32644 llvm-svn: 301930	2017-05-02 15:41:10 +00:00
Matt Arsenault	1074cb5420	AMDGPU: Rename isKernel What we really want to do is distinguish functions that may be called by other functions, and graphics shaders are not called kernels. llvm-svn: 299140	2017-03-30 23:58:04 +00:00
Matt Arsenault	3cb390498e	AMDGPU: Fold omod into instructions llvm-svn: 296372	2017-02-27 19:35:42 +00:00
Matt Arsenault	52ef4019fd	AMDGPU: Make AMDGPUMachineFunction fields private ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766	2016-07-26 16:45:58 +00:00
Nikolay Haustov	beb24f5b20	Resubmit r268719 - AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2. This was reverted in r268740 because of problems with corresponding Clang change. Clang change was updated and resubmitted in r274220. Check calling convention in AMDGPUMachineFunction::isKernel This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF. Also, in the future unused non-kernels may be optimized. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19917 llvm-svn: 274341	2016-07-01 10:00:58 +00:00
Matt Arsenault	e935f05a94	AMDGPU: Fix kernel argument alignment impacting stack size Don't use AllocateStack because kernel arguments have nothing to do with the stack. The ensureMaxAlignment call was still changing the stack alignment. llvm-svn: 273080	2016-06-18 05:15:53 +00:00
Nikolay Haustov	6eb050ea4e	Revert "AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2." This reverts commit 47486d52454d60cdf6becc0b2efe533c73794380. It broke calling OpenCL kernel from another kernel. llvm-svn: 268739	2016-05-06 14:59:04 +00:00
Nikolay Haustov	dc1bb79b92	AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2. Summary: Check calling convention in AMDGPUMachineFunction::isKernel This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF. Also, in the future unused non-kernels may be optimized. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19917 llvm-svn: 268719	2016-05-06 09:23:13 +00:00

1 2

54 Commits