292 Commits

Author SHA1 Message Date
Austin Kerbow
c4d89203f3
[AMDGPU] Support preloading hidden kernel arguments (#98861)
Adds hidden kernel arguments to the function signature and marks them
inreg if they should be preloaded into user SGPRs. The normal kernarg
preloading logic then takes over with some additional checks for the
correct implicitarg_ptr alignment.

Special care is needed so that metadata for the hidden arguments is not
added twice when generating the code object.
2024-10-06 17:44:33 -07:00
Jay Foad
8d13e7b8c3
[AMDGPU] Qualify auto. NFC. (#110878)
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
2024-10-03 13:07:54 +01:00
Jay Foad
a6bae5cb37
[AMDGPU] Split GCNSubtarget into its own file. NFC. (#105525) 2024-08-21 19:11:02 +01:00
Jay Foad
63fae3ed65
[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298) 2024-07-17 21:11:00 +01:00
Jay Foad
f10a78b7e4 [AMDGPU] clang-tidy: use std::make_unique. NFC. 2024-07-17 07:58:09 +01:00
Jay Foad
fdb669b0dc [AMDGPU] clang-format: pass Triple by value and std::move it. NFC. 2024-07-16 15:52:58 +01:00
Stanislav Mekhanoshin
b132dd41eb
[AMDGPU] Remove wavefrontsize feature from GFX10+ (#98400)
Processor definition shall not include a default feature which may be
switched off by a different wave size. This allows not to write
-mattr=-wavefrontsize32,+wavefrontsize64 in tests.
2024-07-16 01:02:25 -07:00
Craig Topper
c97a8e4bcf [AMDGPU] Remove unneed static_cast from GCNSubtarget constructor. NFC
RegBankInfo is a std::unique_ptr<AMDGPURegisterBankInfo> so we don't
need the cast.
2024-07-10 12:54:17 -07:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Nicolai Hähnle
7e9b49f6b8
AMDGPU: Add plumbing for private segment size argument (#96445)
The actual size of scratch/private is determined at dispatch time, so
add more plumbing to request it. Will be used in subsequent change.
2024-06-25 16:20:51 +02:00
Andreas Jonson
cc19374afa
[AMDGPU] Swap range metadata to attribute for workitem id. (#94871)
Swap out range metadata to range attribute for calls to be able to
deprecate range metadata on calls in the future.
2024-06-09 10:29:50 -04:00
Janek van Oirschot
d86b68afd7
MCExpr-ify SIProgramInfo (#88257)
Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.
2024-05-09 13:02:32 +01:00
Jay Foad
6eb9e214b3
RFC: [AMDGPU] Check subtarget features for consistency (#86957)
Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to
check subtarget features for consistency and diagnose any
inconsistencies. To start with, the implementation just checks that
either wavefrontsize32 or wavefrontsize64 is selected.

checkSubtargetFeatures is called at the start of instruction selection.
This is pretty arbitrary. It is just a convenient point at which we have
access to the subtarget that we're going to use for codegenning a
particular function.
2024-05-09 11:37:28 +01:00
David Green
b24af43fdf
[AArch64] Improve scheduling latency into Bundles (#86310)
By default the scheduling info of instructions into a BUNDLE are given a
latency of 0 as they operate on the implicit register of the bundle.
This modifies that for AArch64 so that the latency is adjusted to use
the latency from the instruction in the bundle instead. This essentially
assumes that the bundled instructions are executed in a single cycle,
which for AArch64 is probably OK considering they are mostly used for
MOVPFX bundles, where this can help create slightly better scheduling
especially for in-order cores.
2024-04-12 10:57:01 +01:00
Jay Foad
9c58f3a234
[AMDGPU] Fix implicit $vcc operands after parsing MIR (#87781)
MIParser checks that implicit operands match the instruction definition,
so they have to be $vcc even in wave32 mode. Use the mirFileLoaded hook
to fix them after MIParser's checks, converting them to $vcc_lo which is
what that rest of CodeGen expects.

This is all just extending the fixImplicitOperands hack which was
introduced with GFX10, but at least it makes it possible to write a MIR
test which creates the same instructions that normal CodeGen would
generate.
2024-04-09 09:10:45 +01:00
Jun Wang
c4e517f59c
[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035)
A new function attribute named amdgpu_num_work_groups is added. This
attribute, which consists of three integers, allows programmers to let
the compiler know the number of workgroups to be launched in each of the
three dimensions and do optimizations based on that information.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-03-12 10:30:39 -07:00
Emma Pilkington
bc82cfb38d
[AMDGPU] Add an asm directive to track code_object_version (#76267)
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.

This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
2024-01-21 11:54:47 -05:00
Mariusz Sikora
a97028ac51
[AMDGPU] Update VOP instructions for GFX12 (#74853)
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2023-12-12 11:38:24 +01:00
Mirko Brkušanin
f5868cb6a6
[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062) 2023-12-04 13:04:42 +01:00
Austin Kerbow
0455596e1e [AMDGPU] Add DAG ISel support for preloaded kernel arguments
This patch adds the DAG isel changes for kernel argument preloading.
These changes are not usable with older firmware but subsequent patches
in the series will make the codegen backwards compatible. This patch
should only be submitted alongside that subsequent patch.

Preloading here begins from the start of the kernel arguments until the
amount of arguments indicated by the CL flag
amdgpu-kernarg-preload-count.

Aggregates and arguments passed by-ref are not supported.

Special care for the alignment of the kernarg segment is needed as well
as consideration of the alignment of addressable SGPR tuples when we
cannot directly use misaligned large tuples that the arguments are
loaded to.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D158579
2023-09-25 09:32:59 -07:00
Ivan Kosarev
bea56b0bc0 [AMDGPU] Have a subtarget feature to control use of real True16 instructions.
Real True16 instructions are as they are defined in the ISA. Fake True16
instructions are identical to real ones except that they take 32-bit
registers as operands and always use their low halves.

Reviewed By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D156100
2023-09-22 10:47:13 +01:00
Simon Pilgrim
47a9cd0343 [AMDGPU] Remove constexpr from getNumUserSGPRForField/getMaxNumPreloadedSGPRs to appease older gcc builds
Older versions of gcc wouldn't accept the constexpr getNumUserSGPRForField (introduced in D159439 / 343be5132e2831d85) as it couldn't treat the llvm_unreachable call as constexpr
2023-09-13 12:19:28 +01:00
Austin Kerbow
343be5132e [AMDGPU] Add utilities to track number of user SGPRs. NFC.
Factor out and unify some common code that calculates and tracks the
number of user SGRPs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159439
2023-09-12 08:52:30 -07:00
Matt Arsenault
53fb907df4 AMDGPU: Special case uniformity info for single lane workgroups
Constructors/destructors and OpenMP make use of single lane groups
in some cases.
2023-06-28 07:25:48 -04:00
Matt Arsenault
b9c6d9e6c3 AMDGPU: Propagate amdgpu-waves-per-eu with attributor
This will do a value range merging down the callgraph, unlike the
current pass which can only propagate values to undecorated functions
from a kernel.

This one is a bit weird due to the interaction with the implied range
from amdgpu-flat-workgroup-size. At the default group range of 1,1024,
the minimum implied bounds is 4 so this ends up introducing the
attribute on undecorated functions. We could probably simplify this by
ignoring it and propagating the raw values. The subtarget interaction
and the interaction with amdgpu-flat-workgroup-size only really clamp
invalid values (plus the lower bound doesn't seem to do anything as
far as I can tell anyway).
2023-06-16 15:04:08 -04:00
pvanhout
ecbd37d5a3 [AMDGPU] Port no-hsa-graphic-shaders.ll to code object V4
Split from D146023

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D152432
2023-06-09 09:07:53 +02:00
Matt Arsenault
3d0350b762 AMDGPU: Add MF independent version of getImplicitParameterOffset 2023-06-07 08:26:31 -04:00
Changpeng Fang
7ca3444fba AMDGPU: Use module flag to get code object version at IR level folow-up
Summary:
  This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.

Reviewers: arsenm

Differential Revision
  https://reviews.llvm.org/D143293
2023-02-10 11:16:38 -08:00
Changpeng Fang
54cf69c9d5 AMDGPU: Use module flag to get code object version at IR level
Summary:
  This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line.
In case the module flag is missing, we use the current default code object version supported in the compiler.

For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code
object version, That will be in a separate patch later.

For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file.
In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal.

Reviewer: arsenm

Differential Revision:
  https://reviews.llvm.org/D14313
2023-02-02 18:57:26 -08:00
Nicolai Hähnle
10cef708a7 AMDGPU: Clean up LDS-related occupancy calculations
Occupancy is expressed as waves per SIMD. This means that we need to
take into account the number of SIMDs per "CU" or, to be more precise,
the number of SIMDs over which a workgroup may be distributed.

getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs
into account at all.

At the same time, we need to take into account that WGP mode offers
access to a larger total amount of LDS, since this can affect how
non-power-of-two LDS allocations are rounded. To make this work
consistently, we distinguish between (available) local memory size and
addressable local memory size (which is always limited by 64kB on
gfx10+, even with WGP mode).

This change results in a massive amount of test churn. A lot of it is
caused by the fact that the default work group size is 1024, which means
that (due to rounding effects) the default occupancy on older hardware
is 8 instead of 10, which affects scheduling via register pressure
estimates. I've adjusted most tests by just running the UTC tools, but
in some cases I manually changed the work group size to 32 or 64 to make
sure that work group size chunkiness has no effect.

Differential Revision: https://reviews.llvm.org/D139468
2023-01-23 21:43:06 +01:00
Nicolai Hähnle
84610a82a1 AMDGPU: Add AMDGPUSubtarget::getEUsPerCU()
We will use this for more accurate occupancy computations. Note that
IsaInfo takes WGP mode vs. CU mode into account on gfx10+.

Differential Revision: https://reviews.llvm.org/D139467
2023-01-23 21:43:05 +01:00
Matt Arsenault
c16a58b36c Attributes: Add function getter to parse integer string attributes
The most common case for string attributes parses them as integers. We
don't have a convenient way to do this, and as a result we have
inconsistent missing attribute and invalid attribute handling
scattered around. We also have inconsistent radix usage to
getAsInteger; some places use the default 0 and others use base 10.

Update a few of the uses, but there are quite a lot of these.
2022-12-14 13:12:35 -05:00
Jay Foad
6443c0ee02 [AMDGPU] Stop using make_pair and make_tuple. NFC.
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.

Differential Revision: https://reviews.llvm.org/D139828
2022-12-14 13:22:26 +00:00
Valery Pykhtin
d09d834bb9 [AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs.
```
  /// \returns Minimum number of VGPRs that meets given number of waves per
  /// execution unit requirement supported by the subtarget.
  unsigned getMinNumVGPRs(unsigned WavesPerEU) const;

  /// \returns Maximum number of VGPRs that meets given number of waves per
  /// execution unit requirement supported by the subtarget.
  unsigned getMaxNumVGPRs(unsigned WavesPerEU) const;

  /// Return the maximum number of waves per SIMD for kernels using \p VGPRs
  /// VGPRs
  unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;
```

While working on RP tracking issues I noticed that getMinNumVGPRs return incorrect
values: the problem is large VGPR granule sizes on GFX10+ architectures. Some of the
occupancies aren't reachable because require the same amount of VGPR granules as others.
For example 19 waves occupancy on gfx1010 require the same amount of granules as 20 waves
so the resultng occupancy would be 20.

SGPRs have the same issue and even have inconsistency between getMaxNumSGPRs and getOccupancyWithNumSGPRs.
It will be addressed in the next patch.

Legend:
  # MinVGPR and MaxVGPR are values returned by getMinNumVGPRs and getMaxNumVGPRs for a given Occ.
  # (ONumber) is the value returned by getOccupancyWithNumVGPRs for a given MinVGPR or MaxVGPR.
  # R means range problem: MinVGPR should be less than MaxVGPR and both should refer to the same occupancy.

Unit test output without the fix:
```
./build/unittests/Target/AMDGPU/AMDGPUTests --gtest_filter=AMDGPU.TestVGPRLimitsPerOccupancy --print-cpu-reg-limits

 gfx90a gfx940:
Occ    MinVGPR        MaxVGPR
  8        0 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      257 (O1)     512 (O1)

 gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ    MinVGPR        MaxVGPR
 10        0 (O10)    24  (O10)
  9       25 (O9)     28  (O9)
  8       29 (O8)     32  (O8)
  7       33 (O7)     36  (O7)
  6       37 (O6)     40  (O6)
  5       41 (O5)     48  (O5)
  4       49 (O4)     64  (O4)
  3       65 (O3)     84  (O3)
  2       85 (O2)     128 (O2)
  1      129 (O1)     256 (O1)

 gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    32  (O16)
 15       33 (O12) R  32  (O16)
 14       33 (O12) R  32  (O16)
 13       33 (O12) R  32  (O16)
 12       33 (O12)    40  (O12)
 11       41 (O10) R  40  (O12)
 10       41 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      256 (O2) R   256 (O2)

 gfx1100w64 gfx1101w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    48  (O16)
 15       49 (O12) R  48  (O16)
 14       49 (O12) R  48  (O16)
 13       49 (O12) R  48  (O16)
 12       49 (O12)    60  (O12)
 11       61 (O10) R  60  (O12)
 10       61 (O10)    72  (O10)
  9       73 (O9)     84  (O9)
  8       85 (O8)     96  (O8)
  7       97 (O7)     108 (O7)
  6      109 (O6)     120 (O6)
  5      121 (O5)     144 (O5)
  4      145 (O4)     192 (O4)
  3      193 (O3)     252 (O3)
  2      253 (O2)     256 (O2)
  1      256 (O2) R   256 (O2)

 gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    64  (O16)
 15       65 (O12) R  64  (O16)
 14       65 (O12) R  64  (O16)
 13       65 (O12) R  64  (O16)
 12       65 (O12)    80  (O12)
 11       81 (O10) R  80  (O12)
 10       81 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     160 (O6)
  5      161 (O5)     192 (O5)
  4      193 (O4)     256 (O4)
  3      256 (O4) R   256 (O4)
  2      256 (O4) R   256 (O4)
  1      256 (O4) R   256 (O4)

 gfx1100w32 gfx1101w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    96  (O16)
 15       97 (O12) R  96  (O16)
 14       97 (O12) R  96  (O16)
 13       97 (O12) R  96  (O16)
 12       97 (O12)    120 (O12)
 11      121 (O10) R  120 (O12)
 10      121 (O10)    144 (O10)
  9      145 (O9)     168 (O9)
  8      169 (O8)     192 (O8)
  7      193 (O7)     216 (O7)
  6      217 (O6)     240 (O6)
  5      241 (O5)     256 (O5)
  4      256 (O5) R   256 (O5)
  3      256 (O5) R   256 (O5)
  2      256 (O5) R   256 (O5)
  1      256 (O5) R   256 (O5)

 gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    24  (O20)
 19       25 (O18) R  24  (O20)
 18       25 (O18)    28  (O18)
 17       29 (O16) R  28  (O18)
 16       29 (O16)    32  (O16)
 15       33 (O14) R  32  (O16)
 14       33 (O14)    36  (O14)
 13       37 (O12) R  36  (O14)
 12       37 (O12)    40  (O12)
 11       41 (O11)    44  (O11)
 10       45 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     84  (O6)
  5       85 (O5)     100 (O5)
  4      101 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      256 (O2) R   256 (O2)

 gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    48  (O20)
 19       49 (O18) R  48  (O20)
 18       49 (O18)    56  (O18)
 17       57 (O16) R  56  (O18)
 16       57 (O16)    64  (O16)
 15       65 (O14) R  64  (O16)
 14       65 (O14)    72  (O14)
 13       73 (O12) R  72  (O14)
 12       73 (O12)    80  (O12)
 11       81 (O11)    88  (O11)
 10       89 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     168 (O6)
  5      169 (O5)     200 (O5)
  4      201 (O4)     256 (O4)
  3      256 (O4) R   256 (O4)
  2      256 (O4) R   256 (O4)
  1      256 (O4) R   256 (O4)
```

After the fix:
```
 gfx90a gfx940:
Occ    MinVGPR        MaxVGPR
  8        0 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      257 (O1)     512 (O1)

 gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ    MinVGPR        MaxVGPR
 10        0 (O10)    24  (O10)
  9       25 (O9)     28  (O9)
  8       29 (O8)     32  (O8)
  7       33 (O7)     36  (O7)
  6       37 (O6)     40  (O6)
  5       41 (O5)     48  (O5)
  4       49 (O4)     64  (O4)
  3       65 (O3)     84  (O3)
  2       85 (O2)     128 (O2)
  1      129 (O1)     256 (O1)

 gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    32  (O16)
 15        0 (O16)    32  (O16)
 14        0 (O16)    32  (O16)
 13        0 (O16)    32  (O16)
 12       33 (O12)    40  (O12)
 11       33 (O12)    40  (O12)
 10       41 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      169 (O2)     256 (O2)

 gfx1100w64 gfx1101w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    48  (O16)
 15        0 (O16)    48  (O16)
 14        0 (O16)    48  (O16)
 13        0 (O16)    48  (O16)
 12       49 (O12)    60  (O12)
 11       49 (O12)    60  (O12)
 10       61 (O10)    72  (O10)
  9       73 (O9)     84  (O9)
  8       85 (O8)     96  (O8)
  7       97 (O7)     108 (O7)
  6      109 (O6)     120 (O6)
  5      121 (O5)     144 (O5)
  4      145 (O4)     192 (O4)
  3      193 (O3)     252 (O3)
  2      253 (O2)     256 (O2)
  1      253 (O2)     256 (O2)

 gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    64  (O16)
 15        0 (O16)    64  (O16)
 14        0 (O16)    64  (O16)
 13        0 (O16)    64  (O16)
 12       65 (O12)    80  (O12)
 11       65 (O12)    80  (O12)
 10       81 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     160 (O6)
  5      161 (O5)     192 (O5)
  4      193 (O4)     256 (O4)
  3      193 (O4)     256 (O4)
  2      193 (O4)     256 (O4)
  1      193 (O4)     256 (O4)

 gfx1100w32 gfx1101w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    96  (O16)
 15        0 (O16)    96  (O16)
 14        0 (O16)    96  (O16)
 13        0 (O16)    96  (O16)
 12       97 (O12)    120 (O12)
 11       97 (O12)    120 (O12)
 10      121 (O10)    144 (O10)
  9      145 (O9)     168 (O9)
  8      169 (O8)     192 (O8)
  7      193 (O7)     216 (O7)
  6      217 (O6)     240 (O6)
  5      241 (O5)     256 (O5)
  4      241 (O5)     256 (O5)
  3      241 (O5)     256 (O5)
  2      241 (O5)     256 (O5)
  1      241 (O5)     256 (O5)

 gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    24  (O20)
 19        0 (O20)    24  (O20)
 18       25 (O18)    28  (O18)
 17       25 (O18)    28  (O18)
 16       29 (O16)    32  (O16)
 15       29 (O16)    32  (O16)
 14       33 (O14)    36  (O14)
 13       33 (O14)    36  (O14)
 12       37 (O12)    40  (O12)
 11       41 (O11)    44  (O11)
 10       45 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     84  (O6)
  5       85 (O5)     100 (O5)
  4      101 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      169 (O2)     256 (O2)

 gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    48  (O20)
 19        0 (O20)    48  (O20)
 18       49 (O18)    56  (O18)
 17       49 (O18)    56  (O18)
 16       57 (O16)    64  (O16)
 15       57 (O16)    64  (O16)
 14       65 (O14)    72  (O14)
 13       65 (O14)    72  (O14)
 12       73 (O12)    80  (O12)
 11       81 (O11)    88  (O11)
 10       89 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     168 (O6)
  5      169 (O5)     200 (O5)
  4      201 (O4)     256 (O4)
  3      201 (O4)     256 (O4)
  2      201 (O4)     256 (O4)
  1      201 (O4)     256 (O4)
```

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D138443
2022-12-06 09:14:49 +01:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Carl Ritson
266b5dbc5d [AMDGPU] Add MIMG NSA threshold configuration attribute
Make MIMG NSA minimum addresses threshold an attribute that can
be set on a function or configured via command line.
This enables frontend tuning which allows increased NSA usage
where beneficial.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D134780
2022-09-28 20:03:18 +09:00
Fangrui Song
de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Jon Chesterfield
3a20597776 [amdgpu] Implement lds kernel id intrinsic
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.

There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.

It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.

The intent is to emit a __const array of LDS addresses and index into it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D125060
2022-07-19 17:46:19 +01:00
jeff
8a12f20ef7 [AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation 2022-07-14 16:24:13 -07:00
Austin Kerbow
6817031d0b [AMDGPU] Disable FillMFMAShadowMutation by default
Disable amdgpu mfma power sched.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D129172
2022-07-07 09:34:45 -07:00
Guillaume Chatelet
f1255186c7 [NFC][Alignment] Remove max functions between Align and MaybeAlign
`llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are
not used often enough to be required. They also make the code more opaque.

Differential Revision: https://reviews.llvm.org/D128121
2022-06-20 08:37:48 +00:00
Joe Nash
3732cd59be [AMDGPU] gfx11 vop3 and inherited vop instructions
This patch includes MC layer support for VOP3 encoded instructions and generic VOP support
classes.
Some VOP1 and VOP2 instructions which share an encoding with gfx10 and are using
the AssemblerPredicate = isGFX10Plus are also enabled. That predicate
will be changed to isGFX10Only in a later patch.

Patch 15/N for upstreaming of AMDGPU gfx11 architecture.

Depends on D126468

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D126475
2022-06-02 14:03:02 -04:00
Jay Foad
8a53b25ed5 [AMDGPU] Use default member initializers in Subtarget classes
Use default member initializers in AMDGPUSubtarget and subclasses. This
is to guard against adding a new feature boolean in AMDGPUSubtarget.h
but forgetting to initialize it to false in AMDGPUSubtarget.cpp.

This was mostly autogenerated by:
clang-tidy -checks=-*,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/*Subtarget.cpp

Differential Revision: https://reviews.llvm.org/D123613
2022-04-12 16:42:30 +01:00
Changpeng Fang
6733590db2 AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5
Summary:
  If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise
set it to 256 bytes for code object version 5 (and beyond).

Reviewers: arsenm

Differential Revision: https://reviews.llvm.org/D123262
2022-04-07 08:35:23 -07:00
Austin Kerbow
0c0636f782 [AMDGPU] Fix uninitialized value after 8d0c34fd4f 2022-03-07 11:32:01 -08:00
Stanislav Mekhanoshin
35ec58d8c0 [AMDGPU] gfx940 removes all image instructions
Differential Revision: https://reviews.llvm.org/D120763
2022-03-02 13:55:26 -08:00
Stanislav Mekhanoshin
2e2e64df4a [AMDGPU] Add gfx940 target
This is target definition only.

Differential Revision: https://reviews.llvm.org/D120688
2022-03-02 13:54:48 -08:00
Sebastian Neubauer
a5d4f82b73 [AMDGPU] Make enable-flat-scratch a subtarget feature
Use a subtarget feature instead of a command line argument to reduce
global state.
We want to enable flat scratch for graphics in some cases and this
doesn't work well with command line options.

Differential Revision: https://reviews.llvm.org/D119425
2022-02-11 18:23:07 +01:00
Stanislav Mekhanoshin
4e077c0a0b [AMDGPU] Remove feature register-banking
Since RegBankReassign pass was removed this feature is not
use for anything.

Differential Revision: https://reviews.llvm.org/D118195
2022-01-26 08:39:17 -08:00
Matt Arsenault
0b1140e883 AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch
This was approximating the entry point logic for flat_scratch_init,
which is not really the point. We need to account for whether we need
to reserve the SGPR pair used for flat_scratch, not whether we needed
the initialization kernel argument. If this was an arbitrary function,
we would end up over-reporting the number of potentially free
SGPRs. The logic for architected flat scratch also only applies to the
initialization in the kernel, not the reserved registers at the end.

Avoids compile failures in a future patch from allocating more SGPRs
than the subtarget supports.
2022-01-17 10:04:42 -05:00