Introduces a string attribute, amdgpu-requires-module-lds, to allow
eliding the module.lds block from kernels. Will allocate the block as before
if the attribute is missing or has its default value of true.
Patch uses the new attribute to detect the simplest possible instance of this,
where a kernel makes no calls and thus cannot call any functions that use LDS.
Tests updated to match, coverage was already good. Interesting cases is in
lower-module-lds-offsets where annotating the kernel allows the backend to pick
a different (in this case better) variable ordering than previously. A later
patch will avoid moving kernel variables into module.lds when the kernel can
have this attribute, allowing optimal ordering and locally unused variable
elimination.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122091
Currently the superalign option only increases the alignment of
variables that are moved into the module.lds block. Change that to all LDS
variables. Also only increase the alignment once, instead of once per function.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D115488
When adding alias.scope and noalias metadata to a memcpy function,
the alias.scope and noalias metadata from the operands are merged.
The rule for merging alias.scope is to take the intersection of
the domains and the union of the scopes within those domains.
The rule for merging noalias is to take the intersection.
The bug is that AMDGPULowerModuleLDS was using concatenation for
both alias.scope and noalias. For example, when f1 and f2 are added
to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)).
Then, concatenation creates noalias metadata for the memcpy that
includes both {f1, f2}. That means that the memcpy is assumed
not to alias a prior load of f2, which enables the optimizer to
remove a load of f2 that occurs after mempcy.
The function MDNode::getmostGenericAliasScope defines the semantics
for alias.scope. There is a function, combineMetadata in Local.cpp,
that uses intersect for noalias.
Differential Revision: https://reviews.llvm.org/D110049
Alias analysis is unable to disambiguate accesses to the structure
fields without it unlike distinct variables. As a result we cannot
combine ds_read and ds_write operations in a case of any store in
between which always considered clobbering.
Differential Revision: https://reviews.llvm.org/D108315
The main motivation behind pointer replacement of LDS use within non-kernel
functions is - to *avoid* subsequent LDS lowering pass from directly packing
LDS (assume large LDS) into a struct type which would otherwise cause allocating
huge memory for struct instance within every kernel.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103225
This allows to lower an LDS variable into a kernel structure
even if there is a constant expression used from different
kernels.
Differential Revision: https://reviews.llvm.org/D103655
Before packing LDS globals into a sorted structure, make sure that
their alignment is properly updated based on their size. This will make
sure that the members of sorted structure are properly aligned, and
hence it will further reduce the probability of unaligned LDS access.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103261
Before packing LDS globals into a sorted structure, make sure that
their alignment is properly updated based on their size. This will make
sure that the members of sorted structure are properly aligned, and
hence it will further reduce the probability of unaligned LDS access.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103261
I do not see any practical difference but technically
used.* variables are internal and a call to getGlobalVariable
misses true as a second argument. NFC as far as I can tell.
Differential Revision: https://reviews.llvm.org/D102884
Accesses to global module LDS variable start from null,
but kernel also thinks its variables start address is
null. Fixed by not using a null as an address.
Differential Revision: https://reviews.llvm.org/D102882
Move some utility functions which are used within LDS lowering pass to a separate utils
file so that other LDS related passes can make use of them when required.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100526
[amdgpu] Implement lower function LDS pass
Local variables are allocated at kernel launch. This pass collects global
variables that are used from non-kernel functions, moves them into a new struct
type, and allocates an instance of that type in every kernel. Uses are then
replaced with a constantexpr offset.
Prior to this pass, accesses from a function are compiled to trap. With this
pass, most such accesses are removed before reaching codegen. The trap logic
is left unchanged by this pass. It is still reachable for the cases this pass
misses, notably the extern shared construct from hip and variables marked
constant which survive the optimizer.
This is of interest to the openmp project because the deviceRTL runtime library
uses cuda shared variables from functions that cannot be inlined. Trunk llvm
therefore cannot compile some openmp kernels for amdgpu. In addition to the
unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled
and the function pointer hashing scheme deleted passes the openmp suite.
This lowering will use more LDS than strictly necessary. It is intended to be
a functionally correct fallback for cases that are difficult to target from
future optimisation passes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D94648