332 Commits

Author SHA1 Message Date
Ruiling, Song
45e425e355
AMDGPU: Teach isArgPassedInSGPR() about cs_chain* calling convention (#67086)
This cs_chain and cs_chain_preserve use InReg attribute to indicate
argument passed through SGPR.
2023-09-22 22:24:17 +08:00
Pierre van Houtryve
fe2f67e4ba
[AMDGPU] Remove Code Object V2 (#65715)
Code Object V2 has been deprecated for more than a year now. We can
safely remove it from LLVM.

- [clang] Remove support for the `-mcode-object-version=2` option.
- [lld] Remove/refactor tests that were still using COV2
- [llvm] Update AMDGPUUsage.rst
- Code Object V2 docs are left for informational purposes because those
code objects may still be supported by the runtime/loaders for a while.
- [AMDGPU] Remove COV2 emission capabilities.
- [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2
- [AMDGPU] Update all tests that were still using COV2 - They are either
deleted or ported directly to code object v4 (as v3 is also planned to
be removed soon).
2023-09-21 12:00:45 +02:00
Austin Kerbow
69447d6afe [AMDGPU] Add ASM and MC updates for preloading kernargs
Add assembler directives for preloading kernel arguments that correspond
to new fields in the kernel descriptor for the length and offset of
arguments that will be placed in SGPRs prior to kernel launch. Alignment
of the arguments in SGPRs is equivalent to the kernarg segment when
accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated
directly after other user SGPRs.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D159459
2023-09-19 15:45:16 -07:00
Saiyedul Islam
466a8149b3
Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)" (#66060)
This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.
2023-09-12 15:13:59 +05:30
Saiyedul Islam
0a8d17e79b
[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Reviewed By: arsenm, jhuber6

Github PR: #65410

Differential Revision: https://reviews.llvm.org/D129818
2023-09-12 13:53:31 +05:30
Stanislav Mekhanoshin
cfe9a134bb [AMDGPU] Rename 64BitDPP feature and fix the checks
Names '64BitDPP' and especially 'DPP64' were found misleading, and
DPP64 can easily be mixed with DPP16 and DPP8 while these are
different concepts. DPP16 and DPP8 refers to lanes where DPP64
refers to the operand size.

In fact the essential part here is that these instructions are
executed on the DP ALU, so rename the feature accordingly.

I have also found a bug in a check for these instructions, which is
fixed here and a common utility function is now used.

Differential Revision: https://reviews.llvm.org/D158465
2023-08-22 11:00:10 -07:00
Diana Picus
26dc284498 [AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions
Lower formal arguments and returns for functions with the
`amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions:

* Put `inreg` arguments into SGPRs, starting at s0, and other arguments
into VGPRs, starting at v8. No arguments should end up on the stack, if
we don't have enough registers we should error out.

* Lower the return (which is always void) as an S_ENDPGM.

* Set the ScratchRSrc register to s48:51, as described in the docs.

* Set the SP to s32, matching amdgpu_gfx. This might be revisited in a
future patch.

Differential Revision: https://reviews.llvm.org/D153517
2023-08-21 11:16:17 +02:00
Mirko Brkusanin
de82fde22d AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergent
Patch by: Acim Maravic

Differential Revision: https://reviews.llvm.org/D157091
2023-08-18 18:23:40 +02:00
Jay Foad
e61ca23289 [AMDGPU] Add and use SIInstrFlags::GWS. NFC.
This reduces the number of places where we have to check for a list of
DS_GWS_* opcodes.

Differential Revision: https://reviews.llvm.org/D157099
2023-08-07 12:05:14 +01:00
Reid Kleckner
f86c81b2a8 [AMDGPU] Avoid CodeGen dependencies from AMDGPU/Utils and MCTargetDesc
This required two substantial changes:
1. Moving a `getRegBitWidth(TargetRegisterClass)` overload out of Utils
   and into CodeGen
2. Passing the string function name to AMDGPUPALMetadata instead of the
   MachineFunction

Other changes are minor or updates to accommodate the first two.

See issue #64166 for more information on the layering issue.

Differential Revision: https://reviews.llvm.org/D156486
2023-07-27 15:19:24 -07:00
Jon Chesterfield
a222951148 [amdgpu][nfc] Use unsigned for getIntegerPairAttribute to match the only call sites 2023-07-15 20:42:13 +01:00
Stephen Thomas
8aedad0fa0 [AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands
Add functions AMDGPU::DepCtr::encodeField*() and AMDGPU::DepCtr::decodeField*()
for each of vm_vsrc, va_vdst and sa_sdst. These are now used in
AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with
S_WAITCNT_DEPCTR operands easier and more readable.

Differential Revision: https://reviews.llvm.org/D154424
2023-07-04 11:02:12 +01:00
Stanislav Mekhanoshin
e2903abc15 [AMDGPU] Remove integer division in VOPD checks
There is no way any compiler can simplify this division, while
the check is done rather often.

Differential Revision: https://reviews.llvm.org/D152613
2023-06-12 15:01:53 -07:00
Ivan Kosarev
4e312abdfd [AMDGPU][NFC] Add a getRegBitWidth() helper for TargetRegisterClass operands.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D152257
2023-06-07 11:41:11 +01:00
Jay Foad
dcb834843e [AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC.
This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies
future changes to make some of it depend on the subtarget.

Differential Revision: https://reviews.llvm.org/D144650
2023-02-23 16:38:15 +00:00
Mirko Brkusanin
b3dc0e69cf [AMDGPU][MC][GFX11] Add Partial NSA format for image sample instructions
Image sample instructions that need more than 5 VGPRs for VAddr can use
partial NSA for NSA encoding format. VGPRs that can not fit into the
encoding are sequential after the last one.
This patch adds assembly and disassembly parts.

Differential Revision: https://reviews.llvm.org/D144033
2023-02-23 13:33:34 +01:00
Jay Foad
c9f4df57ca [AMDGPU] Move splitMUBUFOffset from AMDGPUBaseInfo to SIInstrInfo
Moving this out of AMDGPUBaseInfo enforces that AMDGPUBaseInfo should
not be calling into GCNSubtarget.

Differential Revision: https://reviews.llvm.org/D144564
2023-02-22 16:19:05 +00:00
Fangrui Song
432caca39a Simplify with hasFeature. NFC 2023-02-17 18:22:24 -08:00
Kazu Hirata
64dad4ba9a Use llvm::bit_cast (NFC) 2023-02-14 01:22:12 -08:00
Janek van Oirschot
e3515ba381 Reapply "[AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack"
Reapplies 142c28ffa1323e9a8d53200a22c80d5d778e0d0f as part of D140242 which got reverted due to amdgpu openmp test failures.

This diff fixes said failures by eliding most of `adjustInliningThresholdUsingCallee` for indirect calls as the callee function is unavailable for indirect calls.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D143498
2023-02-13 12:17:43 +00:00
Changpeng Fang
7ca3444fba AMDGPU: Use module flag to get code object version at IR level folow-up
Summary:
  This is part of the leftover work for https://reviews.llvm.org/D143138.
In this work, we pass code object version as an argument to initialize target ID
and use it for targetID dump.

Reviewers: arsenm

Differential Revision
  https://reviews.llvm.org/D143293
2023-02-10 11:16:38 -08:00
Archibald Elliott
8e3d7cf5de [NFC][TargetParser] Remove llvm/Support/TargetParser.h 2023-02-07 11:08:21 +00:00
Changpeng Fang
54cf69c9d5 AMDGPU: Use module flag to get code object version at IR level
Summary:
  This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line.
In case the module flag is missing, we use the current default code object version supported in the compiler.

For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code
object version, That will be in a separate patch later.

For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file.
In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal.

Reviewer: arsenm

Differential Revision:
  https://reviews.llvm.org/D14313
2023-02-02 18:57:26 -08:00
Yashwant Singh
422d379de2 [AMDGPU] Use tablegen to list uniform intrinsics
Right now we do opcode wise matching to identify uniform/non-divergent
AMDGPU intrinsics. It is duplicated at 2 places once at IR level uniformity analysis
and at MIR level. Moving them to single tablegen table for consistency and adding
and API rapper to access them.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D142961
2023-01-31 17:44:40 +05:30
Nicolai Hähnle
10cef708a7 AMDGPU: Clean up LDS-related occupancy calculations
Occupancy is expressed as waves per SIMD. This means that we need to
take into account the number of SIMDs per "CU" or, to be more precise,
the number of SIMDs over which a workgroup may be distributed.

getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs
into account at all.

At the same time, we need to take into account that WGP mode offers
access to a larger total amount of LDS, since this can affect how
non-power-of-two LDS allocations are rounded. To make this work
consistently, we distinguish between (available) local memory size and
addressable local memory size (which is always limited by 64kB on
gfx10+, even with WGP mode).

This change results in a massive amount of test churn. A lot of it is
caused by the fact that the default work group size is 1024, which means
that (due to rounding effects) the default occupancy on older hardware
is 8 instead of 10, which affects scheduling via register pressure
estimates. I've adjusted most tests by just running the UTC tools, but
in some cases I manually changed the work group size to 32 or 64 to make
sure that work group size chunkiness has no effect.

Differential Revision: https://reviews.llvm.org/D139468
2023-01-23 21:43:06 +01:00
Jay Foad
768aed1378 [MC] Make more use of MCInstrDesc::operands. NFC.
Change MCInstrDesc::operands to return an ArrayRef so we can easily use
it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end.
A future patch will remove opInfo_begin and opInfo_end.

Also use it instead of raw access to the OpInfo pointer. A future patch
will remove this pointer.

Differential Revision: https://reviews.llvm.org/D142213
2023-01-23 11:31:41 +00:00
Matt Arsenault
4d4894ab92 Partially reapply "AMDGPU: Invert handling of enqueued block detection"
This mostly reverts commit 270e96f435596449002fc89962595497481c8770.

Keep the attributor related changes around, but functionally restore
the old behavior as a workaround. Device enqueue goes back to not
working at -O0 with this version.
2023-01-12 15:02:16 -05:00
Jay Foad
f460c66581 [AMDGPU] Simplify getNumFlatOffsetBits. NFC.
Previously we considered this field to be either N-bit unsigned or
N+1-bit signed, depending on the instruction. I think it's conceptually
simpler to say that the field is always N+1-bit signed, but some
instructions do not allow negative values.

Differential Revision: https://reviews.llvm.org/D140883
2023-01-12 10:40:36 +00:00
Ivan Kosarev
2d945ef864 [AMDGPU][NFC] Rename GFX10A16 operands.
They do not seem to be GFX10-specific anymore. Also renames the
corresponding feature.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D141069
2023-01-09 17:18:46 +00:00
Matt Arsenault
270e96f435 Revert "AMDGPU: Invert handling of enqueued block detection"
This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e.

The runtime is having trouble with this at -O0 when the inputs are
always enabled.
2023-01-07 21:48:07 -05:00
Matt Arsenault
47288cc977 AMDGPU: Invert handling of enqueued block detection
Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.

There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
2023-01-06 21:16:08 -05:00
Matt Arsenault
4463badf46 AMDGPU: Use DenormalMode type in FP mode tracking
This simplies a future patch. The MIR handling should be fixed. We're
still printing these in custom MachineFunctionInfo as bools (plus the
inverted meaning is hard to follow).
2022-12-21 20:35:48 -05:00
Matt Arsenault
c16a58b36c Attributes: Add function getter to parse integer string attributes
The most common case for string attributes parses them as integers. We
don't have a convenient way to do this, and as a result we have
inconsistent missing attribute and invalid attribute handling
scattered around. We also have inconsistent radix usage to
getAsInteger; some places use the default 0 and others use base 10.

Update a few of the uses, but there are quite a lot of these.
2022-12-14 13:12:35 -05:00
Fangrui Song
67819a72c6 [CodeGen] llvm::Optional => std::optional 2022-12-13 09:06:36 +00:00
Petar Avramovic
cc6b10d1ee AMDGPU: Check if operand RC contains register used when printing
Disassembler can successfully decode sgpr register when only vgpr
registers are valid for the operand (e.g. VReg_* and VISrc_* operands).
In InstPrinter, detect when operand register class does not contain
register that is being printed. Does not result in an error.
Intended use is for disassembler tests.

Differential Revision: https://reviews.llvm.org/D139646
2022-12-09 17:55:57 +01:00
Valery Pykhtin
d09d834bb9 [AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs.
```
  /// \returns Minimum number of VGPRs that meets given number of waves per
  /// execution unit requirement supported by the subtarget.
  unsigned getMinNumVGPRs(unsigned WavesPerEU) const;

  /// \returns Maximum number of VGPRs that meets given number of waves per
  /// execution unit requirement supported by the subtarget.
  unsigned getMaxNumVGPRs(unsigned WavesPerEU) const;

  /// Return the maximum number of waves per SIMD for kernels using \p VGPRs
  /// VGPRs
  unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;
```

While working on RP tracking issues I noticed that getMinNumVGPRs return incorrect
values: the problem is large VGPR granule sizes on GFX10+ architectures. Some of the
occupancies aren't reachable because require the same amount of VGPR granules as others.
For example 19 waves occupancy on gfx1010 require the same amount of granules as 20 waves
so the resultng occupancy would be 20.

SGPRs have the same issue and even have inconsistency between getMaxNumSGPRs and getOccupancyWithNumSGPRs.
It will be addressed in the next patch.

Legend:
  # MinVGPR and MaxVGPR are values returned by getMinNumVGPRs and getMaxNumVGPRs for a given Occ.
  # (ONumber) is the value returned by getOccupancyWithNumVGPRs for a given MinVGPR or MaxVGPR.
  # R means range problem: MinVGPR should be less than MaxVGPR and both should refer to the same occupancy.

Unit test output without the fix:
```
./build/unittests/Target/AMDGPU/AMDGPUTests --gtest_filter=AMDGPU.TestVGPRLimitsPerOccupancy --print-cpu-reg-limits

 gfx90a gfx940:
Occ    MinVGPR        MaxVGPR
  8        0 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      257 (O1)     512 (O1)

 gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ    MinVGPR        MaxVGPR
 10        0 (O10)    24  (O10)
  9       25 (O9)     28  (O9)
  8       29 (O8)     32  (O8)
  7       33 (O7)     36  (O7)
  6       37 (O6)     40  (O6)
  5       41 (O5)     48  (O5)
  4       49 (O4)     64  (O4)
  3       65 (O3)     84  (O3)
  2       85 (O2)     128 (O2)
  1      129 (O1)     256 (O1)

 gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    32  (O16)
 15       33 (O12) R  32  (O16)
 14       33 (O12) R  32  (O16)
 13       33 (O12) R  32  (O16)
 12       33 (O12)    40  (O12)
 11       41 (O10) R  40  (O12)
 10       41 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      256 (O2) R   256 (O2)

 gfx1100w64 gfx1101w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    48  (O16)
 15       49 (O12) R  48  (O16)
 14       49 (O12) R  48  (O16)
 13       49 (O12) R  48  (O16)
 12       49 (O12)    60  (O12)
 11       61 (O10) R  60  (O12)
 10       61 (O10)    72  (O10)
  9       73 (O9)     84  (O9)
  8       85 (O8)     96  (O8)
  7       97 (O7)     108 (O7)
  6      109 (O6)     120 (O6)
  5      121 (O5)     144 (O5)
  4      145 (O4)     192 (O4)
  3      193 (O3)     252 (O3)
  2      253 (O2)     256 (O2)
  1      256 (O2) R   256 (O2)

 gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    64  (O16)
 15       65 (O12) R  64  (O16)
 14       65 (O12) R  64  (O16)
 13       65 (O12) R  64  (O16)
 12       65 (O12)    80  (O12)
 11       81 (O10) R  80  (O12)
 10       81 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     160 (O6)
  5      161 (O5)     192 (O5)
  4      193 (O4)     256 (O4)
  3      256 (O4) R   256 (O4)
  2      256 (O4) R   256 (O4)
  1      256 (O4) R   256 (O4)

 gfx1100w32 gfx1101w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    96  (O16)
 15       97 (O12) R  96  (O16)
 14       97 (O12) R  96  (O16)
 13       97 (O12) R  96  (O16)
 12       97 (O12)    120 (O12)
 11      121 (O10) R  120 (O12)
 10      121 (O10)    144 (O10)
  9      145 (O9)     168 (O9)
  8      169 (O8)     192 (O8)
  7      193 (O7)     216 (O7)
  6      217 (O6)     240 (O6)
  5      241 (O5)     256 (O5)
  4      256 (O5) R   256 (O5)
  3      256 (O5) R   256 (O5)
  2      256 (O5) R   256 (O5)
  1      256 (O5) R   256 (O5)

 gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    24  (O20)
 19       25 (O18) R  24  (O20)
 18       25 (O18)    28  (O18)
 17       29 (O16) R  28  (O18)
 16       29 (O16)    32  (O16)
 15       33 (O14) R  32  (O16)
 14       33 (O14)    36  (O14)
 13       37 (O12) R  36  (O14)
 12       37 (O12)    40  (O12)
 11       41 (O11)    44  (O11)
 10       45 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     84  (O6)
  5       85 (O5)     100 (O5)
  4      101 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      256 (O2) R   256 (O2)

 gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    48  (O20)
 19       49 (O18) R  48  (O20)
 18       49 (O18)    56  (O18)
 17       57 (O16) R  56  (O18)
 16       57 (O16)    64  (O16)
 15       65 (O14) R  64  (O16)
 14       65 (O14)    72  (O14)
 13       73 (O12) R  72  (O14)
 12       73 (O12)    80  (O12)
 11       81 (O11)    88  (O11)
 10       89 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     168 (O6)
  5      169 (O5)     200 (O5)
  4      201 (O4)     256 (O4)
  3      256 (O4) R   256 (O4)
  2      256 (O4) R   256 (O4)
  1      256 (O4) R   256 (O4)
```

After the fix:
```
 gfx90a gfx940:
Occ    MinVGPR        MaxVGPR
  8        0 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      257 (O1)     512 (O1)

 gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c:
Occ    MinVGPR        MaxVGPR
 10        0 (O10)    24  (O10)
  9       25 (O9)     28  (O9)
  8       29 (O8)     32  (O8)
  7       33 (O7)     36  (O7)
  6       37 (O6)     40  (O6)
  5       41 (O5)     48  (O5)
  4       49 (O4)     64  (O4)
  3       65 (O3)     84  (O3)
  2       85 (O2)     128 (O2)
  1      129 (O1)     256 (O1)

 gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    32  (O16)
 15        0 (O16)    32  (O16)
 14        0 (O16)    32  (O16)
 13        0 (O16)    32  (O16)
 12       33 (O12)    40  (O12)
 11       33 (O12)    40  (O12)
 10       41 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     80  (O6)
  5       81 (O5)     96  (O5)
  4       97 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      169 (O2)     256 (O2)

 gfx1100w64 gfx1101w64:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    48  (O16)
 15        0 (O16)    48  (O16)
 14        0 (O16)    48  (O16)
 13        0 (O16)    48  (O16)
 12       49 (O12)    60  (O12)
 11       49 (O12)    60  (O12)
 10       61 (O10)    72  (O10)
  9       73 (O9)     84  (O9)
  8       85 (O8)     96  (O8)
  7       97 (O7)     108 (O7)
  6      109 (O6)     120 (O6)
  5      121 (O5)     144 (O5)
  4      145 (O4)     192 (O4)
  3      193 (O3)     252 (O3)
  2      253 (O2)     256 (O2)
  1      253 (O2)     256 (O2)

 gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    64  (O16)
 15        0 (O16)    64  (O16)
 14        0 (O16)    64  (O16)
 13        0 (O16)    64  (O16)
 12       65 (O12)    80  (O12)
 11       65 (O12)    80  (O12)
 10       81 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     160 (O6)
  5      161 (O5)     192 (O5)
  4      193 (O4)     256 (O4)
  3      193 (O4)     256 (O4)
  2      193 (O4)     256 (O4)
  1      193 (O4)     256 (O4)

 gfx1100w32 gfx1101w32:
Occ    MinVGPR        MaxVGPR
 16        0 (O16)    96  (O16)
 15        0 (O16)    96  (O16)
 14        0 (O16)    96  (O16)
 13        0 (O16)    96  (O16)
 12       97 (O12)    120 (O12)
 11       97 (O12)    120 (O12)
 10      121 (O10)    144 (O10)
  9      145 (O9)     168 (O9)
  8      169 (O8)     192 (O8)
  7      193 (O7)     216 (O7)
  6      217 (O6)     240 (O6)
  5      241 (O5)     256 (O5)
  4      241 (O5)     256 (O5)
  3      241 (O5)     256 (O5)
  2      241 (O5)     256 (O5)
  1      241 (O5)     256 (O5)

 gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    24  (O20)
 19        0 (O20)    24  (O20)
 18       25 (O18)    28  (O18)
 17       25 (O18)    28  (O18)
 16       29 (O16)    32  (O16)
 15       29 (O16)    32  (O16)
 14       33 (O14)    36  (O14)
 13       33 (O14)    36  (O14)
 12       37 (O12)    40  (O12)
 11       41 (O11)    44  (O11)
 10       45 (O10)    48  (O10)
  9       49 (O9)     56  (O9)
  8       57 (O8)     64  (O8)
  7       65 (O7)     72  (O7)
  6       73 (O6)     84  (O6)
  5       85 (O5)     100 (O5)
  4      101 (O4)     128 (O4)
  3      129 (O3)     168 (O3)
  2      169 (O2)     256 (O2)
  1      169 (O2)     256 (O2)

 gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32:
Occ    MinVGPR        MaxVGPR
 20        0 (O20)    48  (O20)
 19        0 (O20)    48  (O20)
 18       49 (O18)    56  (O18)
 17       49 (O18)    56  (O18)
 16       57 (O16)    64  (O16)
 15       57 (O16)    64  (O16)
 14       65 (O14)    72  (O14)
 13       65 (O14)    72  (O14)
 12       73 (O12)    80  (O12)
 11       81 (O11)    88  (O11)
 10       89 (O10)    96  (O10)
  9       97 (O9)     112 (O9)
  8      113 (O8)     128 (O8)
  7      129 (O7)     144 (O7)
  6      145 (O6)     168 (O6)
  5      169 (O5)     200 (O5)
  4      201 (O4)     256 (O4)
  3      201 (O4)     256 (O4)
  2      201 (O4)     256 (O4)
  1      201 (O4)     256 (O4)
```

Reviewed By: #amdgpu, arsenm

Differential Revision: https://reviews.llvm.org/D138443
2022-12-06 09:14:49 +01:00
Dmitry Preobrazhensky
453eb9eb42 [AMDGPU][MC] Correct handling of mandatory literals
Differential Revision: https://reviews.llvm.org/D138661
2022-12-05 16:23:47 +03:00
Kazu Hirata
20cde15415 [Target] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 20:36:06 -08:00
Ron Lieberman
ca856fff1c Revert "enable code-object-version=5"
very sorry wrong repo.

This reverts commit d882ba7aeac4b496dccd1b10cb58bd691786b691.
2022-11-29 15:21:09 -06:00
Ron Lieberman
d882ba7aea enable code-object-version=5 2022-11-29 15:11:57 -06:00
Mateja Marjanovic
595a08847a [AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.

Differential Revision: https://reviews.llvm.org/D138205
2022-11-29 17:02:04 +01:00
Dmitry Preobrazhensky
9b8eb5fa8e [AMDGPU][MC][GFX11] Correct op_sel handling for permlane*16
Differential Revision: https://reviews.llvm.org/D137969
2022-11-29 18:45:22 +03:00
Dmitry Preobrazhensky
869fc7eabd [AMDGPU][MC][MI100+] Enable VOP3 variants of dot2c/dot4c/dot8c opcodes
Differential Revision: https://reviews.llvm.org/D138494
2022-11-29 17:38:18 +03:00
Kazu Hirata
aad2d272bf [Utils] Use std::optional in AMDGPUBaseInfo.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-25 22:43:00 -08:00
Dmitry Preobrazhensky
96155bf44b [AMDGPU][GFX11][NFC] Refactor VOPD operands handling (part 2)
Rename interface functions and operands to make code clearer.

Differential Revision: https://reviews.llvm.org/D138133
2022-11-18 14:15:05 +03:00
Dmitry Preobrazhensky
e468b1b740 [AMDGPU][GFX11] Refactor VOPD operands handling
Differential Revision: https://reviews.llvm.org/D137952
2022-11-16 16:29:12 +03:00
Pierre van Houtryve
7425077e31 [AMDGPU] Add & use hasNamedOperand, NFC
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D137540
2022-11-08 07:57:21 +00:00
Joe Nash
01b8140d3a [AMDGPU] Fix delay alu for VOPD with src2acc
V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to
inform passes that the instructions read the dst operand. The VOPD
versions of these instructions lacked the dummy operand, which was a
problem for inserting s_delay_alu.
Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand
tracking logic to account for it.

Reviewed By: dp

Differential Revision: https://reviews.llvm.org/D136629
2022-10-25 13:11:17 -04:00
Dmitry Preobrazhensky
fd7b0eeaf6 [AMDGPU][MC][GFX11] Add VOPD VGPR bank access validation
Differential Revision: https://reviews.llvm.org/D134960
2022-10-07 15:52:59 +03:00
Jay Foad
ddfa0f62d8 [AMDGPU] Add GFX11 feature for subtargets with more VGPRs
The full complement of physical VGPRs for GFX11 is 50% more than GFX10.
Some subtargets have this, others stay the same as GFX10. This affects
occupancy calculations.

Differential Revision: https://reviews.llvm.org/D134522
2022-09-23 20:18:23 +01:00