61 Commits

Author SHA1 Message Date
Jon Chesterfield
e17c1bb494 [amdgpu][nfc] Update comments on LDS lowering 2023-04-11 10:48:19 +01:00
Jon Chesterfield
0507448d82 [amdgpu] Implement dynamic LDS accesses from non-kernel functions
The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.

1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel
2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables
3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen
4/ The assembler builds the lookup table using the metadata
5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use

Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144233
2023-04-04 20:06:34 +01:00
Jon Chesterfield
62951784f0 [amdgpu][nfc] Refactor prior to D144233 to remove noise from diff 2023-04-03 16:47:01 +01:00
Jon Chesterfield
78e6818049 [amdgpu][nfc] clang-format AMDGPULowerModuleLDS for easier merging 2023-03-22 01:49:53 +00:00
Jon Chesterfield
d70e7ea0d1 [amdgpu][nfc] Extract more functions in LowerModuleLDS, mark more methods static 2023-03-22 01:25:28 +00:00
Jon Chesterfield
e8ad2a051c [amdgpu][nfc] Comment and extract two functions in LowerModuleLDS 2023-03-21 23:39:20 +00:00
Jon Chesterfield
d3dda422bf [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.

This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144221
2023-03-12 13:47:48 +00:00
Nikita Popov
576060fb41 [ReplaceConstant] Extract code for expanding users of constant (NFC)
AMDGPU implements some handy code for expanding all constexpr
users of LDS globals. Extract the core logic into ReplaceConstant,
so that it can be reused elsewhere.
2023-03-03 16:09:06 +01:00
Jon Chesterfield
bf579a7049 [amdgpu] Change LDS lowering default to hybrid
Postponed from D139433 until the bug fixed by D139874 could be resolved.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D141852
2023-02-24 15:20:12 +00:00
Guillaume Chatelet
87b6b347fc Revert D141134 "[NFC] Only expose getXXXSize functions in TypeSize"
The patch should be discussed further.

This reverts commit dd56e1c92b0e6e6be249f2d2dd40894e0417223f.
2023-01-06 15:27:50 +00:00
Guillaume Chatelet
dd56e1c92b [NFC] Only expose getXXXSize functions in TypeSize
Currently 'TypeSize' exposes two functions that serve the same purpose:
 - getFixedSize / getFixedValue
 - getKnownMinSize / getKnownMinValue

source : bf82070ea4/llvm/include/llvm/Support/TypeSize.h (L337-L338)

This patch offers to remove one of the two and stick to a single function in the code base.

Differential Revision: https://reviews.llvm.org/D141134
2023-01-06 15:24:52 +00:00
Matt Arsenault
6fed2c90d3 AMDGPU: Diagnose which LDS global failed to lower
Also lowercase the message to start since that seems to be the
prevailing convention for error messages.
2023-01-03 09:31:07 -05:00
Kazu Hirata
7e937d08e1 Don't include StringSwitch (NFC)
These files do not use llvm::StringSwitch.
2022-12-14 21:50:34 -08:00
Matt Arsenault
aa9bdd50c2 llvm-reduce: Fix invalid reductions with llvm.used
Fixes issue 59413.
2022-12-14 15:06:22 -05:00
Jon Chesterfield
d77ae7f251 [amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.

Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139433
2022-12-07 22:02:54 +00:00
Nico Weber
a862d09a92 Revert "[amdgpu] Reimplement LDS lowering"
This reverts commit 982017240d7f25a8a6969b8b73dc51f9ac5b93ed.
Breaks check-llvm, see https://reviews.llvm.org/D139433#3974862
2022-12-06 12:01:36 -05:00
Jon Chesterfield
982017240d [amdgpu] Reimplement LDS lowering
Renames the current lowering scheme to "module" and introduces two new
ones, "kernel" and "table", plus a "hybrid" that chooses between those three
on a per-variable basis.

Unit tests are set up to pass with the default lowering of "module" or "hybrid"
with this patch defaulting to "module", which will be a less dramatic codegen
change relative to the current. This reflects the sparsity of test coverage for
the table lowering method. Hybrid is better than module in every respect and
will be default in a subsequent patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D139433
2022-12-06 16:28:15 +00:00
Jon Chesterfield
56202c51d4 Revert "[amdgpu][lds] Use the same isKernel predicate consistently"
Looks like this composed poorly with a nominally independent patch, will fix
This reverts commit 0ba0398517778514eb44cb7ba9bf9d4d20a856e0.
2022-11-09 16:54:20 +00:00
Jon Chesterfield
0ba0398517 [amdgpu][lds] Use the same isKernel predicate consistently
isKernelCC != isKernel(F->getCallingConv())
There's a test case (lower-kernel-lds.ll) that explicitly skips amdgpu_ps
so this change picks the isKernel predicate that continues to skip that
calling convention.

isKernel returns true for AMDGPU_KERNEL and SPIR_KERNEL. isKernelCC also
returns true for other calling conventions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136599
2022-11-09 16:45:05 +00:00
Jon Chesterfield
e9a2aa6355 [amdgpu][lds] Use a consistent order of fields in generated structs
Avoids spurious and confusing test failures on changing implementation.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136598
2022-11-09 15:57:41 +00:00
Jon Chesterfield
35f2584ef9 [amdgpu] Error, instead of miscompile, anonymous kernels using lds
The association between kernel and struct is done by symbol name.
This doesn't work robustly for anonymous kernels as shown by the modified
test case.

An alternative association between function and struct can be constructed
if necessary, probably though metadata, but on the basis that we currently
miscompile anonymous kernels and that they are difficult to construct from
application code and difficult to call from the runtime, this patch makes
it a fatal error for now.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D134741
2022-09-28 16:30:04 +01:00
Jon Chesterfield
cdb9738963 [amdgpu] Expand all ConstantExpr users of LDS variables in instructions
Bug noted in D112717 can be sidestepped with this change.

Expanding all ConstantExpr involved with LDS up front makes the variable specialisation simpler. Excludes ConstantExpr that don't access LDS to avoid disturbing codegen elsewhere.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133422
2022-09-14 07:55:46 +01:00
Jon Chesterfield
23f6c8d635 [amdgpu] Always, instead of mostly, remove unused LDS symbols
Currently LDS variables are removed by the lower module pass
if they have a use which is caught by the replace with struct control flow.
This makes tests brittle to changes to that control flow which induces
noise when trying to improve lowering. Some tests already check that
variables are removed, while others checked that they are not removed.

LDS variables are not (currently) externally accessible, and if that
changes the machinery which makes them externally accessible will look
like a use. This change therefore breaks no applications.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D133028
2022-09-07 18:28:16 +01:00
Kazu Hirata
fedc59734a [llvm] Use range-based for loops (NFC) 2022-09-03 11:17:40 -07:00
Jon Chesterfield
a28bbd00c6 [amdgpu][nfc] Factor predicate out of findLDSVariablesToLower 2022-08-31 15:44:51 +01:00
Dmitri Gribenko
b435da027d [amdgpu][nfc] Fix build with a certan Clang version
It errors out in the Bazel CI:

AMDGPULowerModuleLDSPass.cpp:384:12: error: chosen constructor is
explicit in copy-initialization
    return {SGV, std::move(Map)};

Reviewed By: rupprecht

Differential Revision: https://reviews.llvm.org/D130623
2022-07-27 17:29:36 +02:00
Jon Chesterfield
3ccd88f209 [amdgpu][nfc] Separate processUsedLDS into independent pieces, rename it 2022-07-27 01:55:43 +01:00
Jon Chesterfield
9981afdd42 [amdgpu][nfc] Extract kernel annotation from processUsedLDS 2022-07-27 01:38:41 +01:00
Jon Chesterfield
923b90bddb [amdgpu][nfc] Separate LDS struct creation from RAUW 2022-07-26 20:59:17 +01:00
Jon Chesterfield
26dcc7e64a [amdgpu][nfc] Skip operations on padding fields in LDS struct 2022-07-26 18:31:02 +01:00
Jon Chesterfield
2224bbcd74 [nfc][amdgpu] LDS. Move selection logic up the stack. 2022-07-19 17:20:19 +01:00
Jon Chesterfield
eda2bcad02 [nfc][amdgpu] Remove dead variable and function 2022-07-15 23:56:43 +01:00
Jon Chesterfield
bc78c09952 [amdgpu] Elide module lds allocation in kernels with no callees
Introduces a string attribute, amdgpu-requires-module-lds, to allow
eliding the module.lds block from kernels. Will allocate the block as before
if the attribute is missing or has its default value of true.

Patch uses the new attribute to detect the simplest possible instance of this,
where a kernel makes no calls and thus cannot call any functions that use LDS.

Tests updated to match, coverage was already good. Interesting cases is in
lower-module-lds-offsets where annotating the kernel allows the backend to pick
a different (in this case better) variable ordering than previously. A later
patch will avoid moving kernel variables into module.lds when the kernel can
have this attribute, allowing optimal ordering and locally unused variable
elimination.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122091
2022-05-04 22:42:07 +01:00
Sebastian Neubauer
6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Stanislav Mekhanoshin
c7eb846345 [AMDGPU] Merge AMDGPULDSUtils into AMDGPUMemoryUtils
Differential Revision: https://reviews.llvm.org/D119502
2022-02-11 10:32:24 -08:00
Jon Chesterfield
24b28db8cc [amdgpu] Increase alignment of all LDS variables
Currently the superalign option only increases the alignment of
variables that are moved into the module.lds block. Change that to all LDS
variables. Also only increase the alignment once, instead of once per function.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D115488
2021-12-12 19:30:32 +00:00
Kazu Hirata
d395befa65 [llvm] Use range-based for loops (NFC) 2021-12-11 11:29:12 -08:00
Jon Chesterfield
86caf517bf Revert "[amdgpu][nfc] Delete dead code in LowerModuleLDS"
This reverts commit 7b9ab06d10a6a989f76e6c5ecf89d906f838fe7d.
Said code is better removed as part of a larger change.
2021-12-11 00:31:51 +00:00
Jon Chesterfield
7b9ab06d10 [amdgpu][nfc] Delete dead code in LowerModuleLDS 2021-12-10 00:43:46 +00:00
Jon Chesterfield
04b2f6ea8a [amdgpu][nfc] Drop dead PtrSet, fix a comment 2021-12-09 17:05:20 +00:00
Jon Chesterfield
f0e3b39a5d [amdgpu][nfc] Move non-shared code out of LDSUtils 2021-12-08 16:23:03 +00:00
Brendon Cahoon
cbdf624bb8 [AMDGPU] Correctly merge alias.scope and noalias metadata for memops
When adding alias.scope and noalias metadata to a memcpy function,
the alias.scope and noalias metadata from the operands are merged.
The rule for merging alias.scope is to take the intersection of
the domains and the union of the scopes within those domains.
The rule for merging noalias is to take the intersection.

The bug is that AMDGPULowerModuleLDS was using concatenation for
both alias.scope and noalias. For example, when f1 and f2 are added
to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)).
Then, concatenation creates noalias metadata for the memcpy that
includes both {f1, f2}. That means that the memcpy is assumed
not to alias a prior load of f2, which enables the optimizer to
remove a load of f2 that occurs after mempcy.

The function MDNode::getmostGenericAliasScope defines the semantics
for alias.scope. There is a function, combineMetadata in Local.cpp,
that uses intersect for noalias.

Differential Revision: https://reviews.llvm.org/D110049
2021-09-21 13:02:01 -05:00
Jacob Lambert
dc6e8dfdfe [AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor. 2021-09-20 14:48:50 -07:00
Matt Arsenault
ce51c5d4a9 AMDGPU: Fix crashing on kernel declarations when lowering LDS
This was trying to insert the used marker into a declaration.
2021-08-26 19:01:10 -04:00
Stanislav Mekhanoshin
8d7d89b081 [AMDGPU] Add alias.scope metadata to lowered LDS struct
Alias analysis is unable to disambiguate accesses to the structure
fields without it unlike distinct variables. As a result we cannot
combine ds_read and ds_write operations in a case of any store in
between which always considered clobbering.

Differential Revision: https://reviews.llvm.org/D108315
2021-08-19 11:40:30 -07:00
Stanislav Mekhanoshin
9dc2636623 [AMDGPU] Disable LDS lowering for GFX shaders
Apparently these need external LDS symbols to remain.

Fixes: SC1-3279

Differential Revision: https://reviews.llvm.org/D106288
2021-07-20 02:55:25 -07:00
Stanislav Mekhanoshin
d274d64ef4 [AMDGPU] Check for pointer operand while refining LDS align
Also skips the propagation if alignment is 1.

Differential Revision: https://reviews.llvm.org/D104796
2021-06-23 12:27:55 -07:00
Stanislav Mekhanoshin
2b43209ee3 [AMDGPU] Propagate LDS align into to instructions
Differential Revision: https://reviews.llvm.org/D104316
2021-06-23 00:57:16 -07:00
Stanislav Mekhanoshin
d797a7f8da [AMDGPU] Use performOptimizedStructLayout for LDS sort
This gives better packing.

Differential Revision: https://reviews.llvm.org/D104331
2021-06-22 09:58:10 -07:00
hsmahesha
80fd5fa526 [AMDGPU] Replace non-kernel function uses of LDS globals by pointers.
The main motivation behind pointer replacement of LDS use within non-kernel
functions is - to *avoid* subsequent LDS lowering pass from directly packing
LDS (assume large LDS) into a struct type which would otherwise cause allocating
huge memory for struct instance within every kernel.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103225
2021-06-21 11:51:49 +05:30