llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	cfe9a134bb	[AMDGPU] Rename 64BitDPP feature and fix the checks Names '64BitDPP' and especially 'DPP64' were found misleading, and DPP64 can easily be mixed with DPP16 and DPP8 while these are different concepts. DPP16 and DPP8 refers to lanes where DPP64 refers to the operand size. In fact the essential part here is that these instructions are executed on the DP ALU, so rename the feature accordingly. I have also found a bug in a check for these instructions, which is fixed here and a common utility function is now used. Differential Revision: https://reviews.llvm.org/D158465	2023-08-22 11:00:10 -07:00
Diana Picus	26dc284498	[AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions Lower formal arguments and returns for functions with the `amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions: * Put `inreg` arguments into SGPRs, starting at s0, and other arguments into VGPRs, starting at v8. No arguments should end up on the stack, if we don't have enough registers we should error out. * Lower the return (which is always void) as an S_ENDPGM. * Set the ScratchRSrc register to s48:51, as described in the docs. * Set the SP to s32, matching amdgpu_gfx. This might be revisited in a future patch. Differential Revision: https://reviews.llvm.org/D153517	2023-08-21 11:16:17 +02:00
Mirko Brkusanin	de82fde22d	AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergent Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D157091	2023-08-18 18:23:40 +02:00
Jay Foad	e61ca23289	[AMDGPU] Add and use SIInstrFlags::GWS. NFC. This reduces the number of places where we have to check for a list of DS_GWS_* opcodes. Differential Revision: https://reviews.llvm.org/D157099	2023-08-07 12:05:14 +01:00
Reid Kleckner	f86c81b2a8	[AMDGPU] Avoid CodeGen dependencies from AMDGPU/Utils and MCTargetDesc This required two substantial changes: 1. Moving a `getRegBitWidth(TargetRegisterClass)` overload out of Utils and into CodeGen 2. Passing the string function name to AMDGPUPALMetadata instead of the MachineFunction Other changes are minor or updates to accommodate the first two. See issue #64166 for more information on the layering issue. Differential Revision: https://reviews.llvm.org/D156486	2023-07-27 15:19:24 -07:00
Jon Chesterfield	a222951148	[amdgpu][nfc] Use unsigned for getIntegerPairAttribute to match the only call sites	2023-07-15 20:42:13 +01:00
Stephen Thomas	8aedad0fa0	[AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands Add functions AMDGPU::DepCtr::encodeField() and AMDGPU::DepCtr::decodeField() for each of vm_vsrc, va_vdst and sa_sdst. These are now used in AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with S_WAITCNT_DEPCTR operands easier and more readable. Differential Revision: https://reviews.llvm.org/D154424	2023-07-04 11:02:12 +01:00
Stanislav Mekhanoshin	e2903abc15	[AMDGPU] Remove integer division in VOPD checks There is no way any compiler can simplify this division, while the check is done rather often. Differential Revision: https://reviews.llvm.org/D152613	2023-06-12 15:01:53 -07:00
Ivan Kosarev	4e312abdfd	[AMDGPU][NFC] Add a getRegBitWidth() helper for TargetRegisterClass operands. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152257	2023-06-07 11:41:11 +01:00
Jay Foad	dcb834843e	[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC. This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650	2023-02-23 16:38:15 +00:00
Mirko Brkusanin	b3dc0e69cf	[AMDGPU][MC][GFX11] Add Partial NSA format for image sample instructions Image sample instructions that need more than 5 VGPRs for VAddr can use partial NSA for NSA encoding format. VGPRs that can not fit into the encoding are sequential after the last one. This patch adds assembly and disassembly parts. Differential Revision: https://reviews.llvm.org/D144033	2023-02-23 13:33:34 +01:00
Jay Foad	c9f4df57ca	[AMDGPU] Move splitMUBUFOffset from AMDGPUBaseInfo to SIInstrInfo Moving this out of AMDGPUBaseInfo enforces that AMDGPUBaseInfo should not be calling into GCNSubtarget. Differential Revision: https://reviews.llvm.org/D144564	2023-02-22 16:19:05 +00:00
Fangrui Song	432caca39a	Simplify with hasFeature. NFC	2023-02-17 18:22:24 -08:00
Kazu Hirata	64dad4ba9a	Use llvm::bit_cast (NFC)	2023-02-14 01:22:12 -08:00
Janek van Oirschot	e3515ba381	Reapply "[AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack" Reapplies 142c28ffa1323e9a8d53200a22c80d5d778e0d0f as part of D140242 which got reverted due to amdgpu openmp test failures. This diff fixes said failures by eliding most of `adjustInliningThresholdUsingCallee` for indirect calls as the callee function is unavailable for indirect calls. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D143498	2023-02-13 12:17:43 +00:00
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Yashwant Singh	422d379de2	[AMDGPU] Use tablegen to list uniform intrinsics Right now we do opcode wise matching to identify uniform/non-divergent AMDGPU intrinsics. It is duplicated at 2 places once at IR level uniformity analysis and at MIR level. Moving them to single tablegen table for consistency and adding and API rapper to access them. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D142961	2023-01-31 17:44:40 +05:30
Nicolai Hähnle	10cef708a7	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468	2023-01-23 21:43:06 +01:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Matt Arsenault	4d4894ab92	Partially reapply "AMDGPU: Invert handling of enqueued block detection" This mostly reverts commit 270e96f435596449002fc89962595497481c8770. Keep the attributor related changes around, but functionally restore the old behavior as a workaround. Device enqueue goes back to not working at -O0 with this version.	2023-01-12 15:02:16 -05:00
Jay Foad	f460c66581	[AMDGPU] Simplify getNumFlatOffsetBits. NFC. Previously we considered this field to be either N-bit unsigned or N+1-bit signed, depending on the instruction. I think it's conceptually simpler to say that the field is always N+1-bit signed, but some instructions do not allow negative values. Differential Revision: https://reviews.llvm.org/D140883	2023-01-12 10:40:36 +00:00
Ivan Kosarev	2d945ef864	[AMDGPU][NFC] Rename GFX10A16 operands. They do not seem to be GFX10-specific anymore. Also renames the corresponding feature. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D141069	2023-01-09 17:18:46 +00:00
Matt Arsenault	270e96f435	Revert "AMDGPU: Invert handling of enqueued block detection" This reverts commit 47288cc977fa31c44cc92b4e65044a5b75c2597e. The runtime is having trouble with this at -O0 when the inputs are always enabled.	2023-01-07 21:48:07 -05:00
Matt Arsenault	47288cc977	AMDGPU: Invert handling of enqueued block detection Invert the sense of the attribute and let the attributor figure this out like everything else. If needed we can have the not-OpenCL languages set amdgpu-no-default-queue and amdgpu-no-completion-action up front so they never have to pay the cost. There are also so many of these now, the offset use API should probably consider all of them at once. Maybe they should merge into one attribute with used fields. Having separate functions for each field in AMDGPUBaseInfo is also not the greatest API (might as well fix this when the patch to get the object version from the module lands).	2023-01-06 21:16:08 -05:00
Matt Arsenault	4463badf46	AMDGPU: Use DenormalMode type in FP mode tracking This simplies a future patch. The MIR handling should be fixed. We're still printing these in custom MachineFunctionInfo as bools (plus the inverted meaning is hard to follow).	2022-12-21 20:35:48 -05:00
Matt Arsenault	c16a58b36c	Attributes: Add function getter to parse integer string attributes The most common case for string attributes parses them as integers. We don't have a convenient way to do this, and as a result we have inconsistent missing attribute and invalid attribute handling scattered around. We also have inconsistent radix usage to getAsInteger; some places use the default 0 and others use base 10. Update a few of the uses, but there are quite a lot of these.	2022-12-14 13:12:35 -05:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Petar Avramovic	cc6b10d1ee	AMDGPU: Check if operand RC contains register used when printing Disassembler can successfully decode sgpr register when only vgpr registers are valid for the operand (e.g. VReg_* and VISrc_* operands). In InstPrinter, detect when operand register class does not contain register that is being printed. Does not result in an error. Intended use is for disassembler tests. Differential Revision: https://reviews.llvm.org/D139646	2022-12-09 17:55:57 +01:00
Valery Pykhtin	d09d834bb9	[AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs. ``` /// \returns Minimum number of VGPRs that meets given number of waves per /// execution unit requirement supported by the subtarget. unsigned getMinNumVGPRs(unsigned WavesPerEU) const; /// \returns Maximum number of VGPRs that meets given number of waves per /// execution unit requirement supported by the subtarget. unsigned getMaxNumVGPRs(unsigned WavesPerEU) const; /// Return the maximum number of waves per SIMD for kernels using \p VGPRs /// VGPRs unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const; ``` While working on RP tracking issues I noticed that getMinNumVGPRs return incorrect values: the problem is large VGPR granule sizes on GFX10+ architectures. Some of the occupancies aren't reachable because require the same amount of VGPR granules as others. For example 19 waves occupancy on gfx1010 require the same amount of granules as 20 waves so the resultng occupancy would be 20. SGPRs have the same issue and even have inconsistency between getMaxNumSGPRs and getOccupancyWithNumSGPRs. It will be addressed in the next patch. Legend: # MinVGPR and MaxVGPR are values returned by getMinNumVGPRs and getMaxNumVGPRs for a given Occ. # (ONumber) is the value returned by getOccupancyWithNumVGPRs for a given MinVGPR or MaxVGPR. # R means range problem: MinVGPR should be less than MaxVGPR and both should refer to the same occupancy. Unit test output without the fix: ``` ./build/unittests/Target/AMDGPU/AMDGPUTests --gtest_filter=AMDGPU.TestVGPRLimitsPerOccupancy --print-cpu-reg-limits gfx90a gfx940: Occ MinVGPR MaxVGPR 8 0 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 257 (O1) 512 (O1) gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c: Occ MinVGPR MaxVGPR 10 0 (O10) 24 (O10) 9 25 (O9) 28 (O9) 8 29 (O8) 32 (O8) 7 33 (O7) 36 (O7) 6 37 (O6) 40 (O6) 5 41 (O5) 48 (O5) 4 49 (O4) 64 (O4) 3 65 (O3) 84 (O3) 2 85 (O2) 128 (O2) 1 129 (O1) 256 (O1) gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64: Occ MinVGPR MaxVGPR 16 0 (O16) 32 (O16) 15 33 (O12) R 32 (O16) 14 33 (O12) R 32 (O16) 13 33 (O12) R 32 (O16) 12 33 (O12) 40 (O12) 11 41 (O10) R 40 (O12) 10 41 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1100w64 gfx1101w64: Occ MinVGPR MaxVGPR 16 0 (O16) 48 (O16) 15 49 (O12) R 48 (O16) 14 49 (O12) R 48 (O16) 13 49 (O12) R 48 (O16) 12 49 (O12) 60 (O12) 11 61 (O10) R 60 (O12) 10 61 (O10) 72 (O10) 9 73 (O9) 84 (O9) 8 85 (O8) 96 (O8) 7 97 (O7) 108 (O7) 6 109 (O6) 120 (O6) 5 121 (O5) 144 (O5) 4 145 (O4) 192 (O4) 3 193 (O3) 252 (O3) 2 253 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32: Occ MinVGPR MaxVGPR 16 0 (O16) 64 (O16) 15 65 (O12) R 64 (O16) 14 65 (O12) R 64 (O16) 13 65 (O12) R 64 (O16) 12 65 (O12) 80 (O12) 11 81 (O10) R 80 (O12) 10 81 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 160 (O6) 5 161 (O5) 192 (O5) 4 193 (O4) 256 (O4) 3 256 (O4) R 256 (O4) 2 256 (O4) R 256 (O4) 1 256 (O4) R 256 (O4) gfx1100w32 gfx1101w32: Occ MinVGPR MaxVGPR 16 0 (O16) 96 (O16) 15 97 (O12) R 96 (O16) 14 97 (O12) R 96 (O16) 13 97 (O12) R 96 (O16) 12 97 (O12) 120 (O12) 11 121 (O10) R 120 (O12) 10 121 (O10) 144 (O10) 9 145 (O9) 168 (O9) 8 169 (O8) 192 (O8) 7 193 (O7) 216 (O7) 6 217 (O6) 240 (O6) 5 241 (O5) 256 (O5) 4 256 (O5) R 256 (O5) 3 256 (O5) R 256 (O5) 2 256 (O5) R 256 (O5) 1 256 (O5) R 256 (O5) gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64: Occ MinVGPR MaxVGPR 20 0 (O20) 24 (O20) 19 25 (O18) R 24 (O20) 18 25 (O18) 28 (O18) 17 29 (O16) R 28 (O18) 16 29 (O16) 32 (O16) 15 33 (O14) R 32 (O16) 14 33 (O14) 36 (O14) 13 37 (O12) R 36 (O14) 12 37 (O12) 40 (O12) 11 41 (O11) 44 (O11) 10 45 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 84 (O6) 5 85 (O5) 100 (O5) 4 101 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32: Occ MinVGPR MaxVGPR 20 0 (O20) 48 (O20) 19 49 (O18) R 48 (O20) 18 49 (O18) 56 (O18) 17 57 (O16) R 56 (O18) 16 57 (O16) 64 (O16) 15 65 (O14) R 64 (O16) 14 65 (O14) 72 (O14) 13 73 (O12) R 72 (O14) 12 73 (O12) 80 (O12) 11 81 (O11) 88 (O11) 10 89 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 168 (O6) 5 169 (O5) 200 (O5) 4 201 (O4) 256 (O4) 3 256 (O4) R 256 (O4) 2 256 (O4) R 256 (O4) 1 256 (O4) R 256 (O4) ``` After the fix: ``` gfx90a gfx940: Occ MinVGPR MaxVGPR 8 0 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 257 (O1) 512 (O1) gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c: Occ MinVGPR MaxVGPR 10 0 (O10) 24 (O10) 9 25 (O9) 28 (O9) 8 29 (O8) 32 (O8) 7 33 (O7) 36 (O7) 6 37 (O6) 40 (O6) 5 41 (O5) 48 (O5) 4 49 (O4) 64 (O4) 3 65 (O3) 84 (O3) 2 85 (O2) 128 (O2) 1 129 (O1) 256 (O1) gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64: Occ MinVGPR MaxVGPR 16 0 (O16) 32 (O16) 15 0 (O16) 32 (O16) 14 0 (O16) 32 (O16) 13 0 (O16) 32 (O16) 12 33 (O12) 40 (O12) 11 33 (O12) 40 (O12) 10 41 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 169 (O2) 256 (O2) gfx1100w64 gfx1101w64: Occ MinVGPR MaxVGPR 16 0 (O16) 48 (O16) 15 0 (O16) 48 (O16) 14 0 (O16) 48 (O16) 13 0 (O16) 48 (O16) 12 49 (O12) 60 (O12) 11 49 (O12) 60 (O12) 10 61 (O10) 72 (O10) 9 73 (O9) 84 (O9) 8 85 (O8) 96 (O8) 7 97 (O7) 108 (O7) 6 109 (O6) 120 (O6) 5 121 (O5) 144 (O5) 4 145 (O4) 192 (O4) 3 193 (O3) 252 (O3) 2 253 (O2) 256 (O2) 1 253 (O2) 256 (O2) gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32: Occ MinVGPR MaxVGPR 16 0 (O16) 64 (O16) 15 0 (O16) 64 (O16) 14 0 (O16) 64 (O16) 13 0 (O16) 64 (O16) 12 65 (O12) 80 (O12) 11 65 (O12) 80 (O12) 10 81 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 160 (O6) 5 161 (O5) 192 (O5) 4 193 (O4) 256 (O4) 3 193 (O4) 256 (O4) 2 193 (O4) 256 (O4) 1 193 (O4) 256 (O4) gfx1100w32 gfx1101w32: Occ MinVGPR MaxVGPR 16 0 (O16) 96 (O16) 15 0 (O16) 96 (O16) 14 0 (O16) 96 (O16) 13 0 (O16) 96 (O16) 12 97 (O12) 120 (O12) 11 97 (O12) 120 (O12) 10 121 (O10) 144 (O10) 9 145 (O9) 168 (O9) 8 169 (O8) 192 (O8) 7 193 (O7) 216 (O7) 6 217 (O6) 240 (O6) 5 241 (O5) 256 (O5) 4 241 (O5) 256 (O5) 3 241 (O5) 256 (O5) 2 241 (O5) 256 (O5) 1 241 (O5) 256 (O5) gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64: Occ MinVGPR MaxVGPR 20 0 (O20) 24 (O20) 19 0 (O20) 24 (O20) 18 25 (O18) 28 (O18) 17 25 (O18) 28 (O18) 16 29 (O16) 32 (O16) 15 29 (O16) 32 (O16) 14 33 (O14) 36 (O14) 13 33 (O14) 36 (O14) 12 37 (O12) 40 (O12) 11 41 (O11) 44 (O11) 10 45 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 84 (O6) 5 85 (O5) 100 (O5) 4 101 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 169 (O2) 256 (O2) gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32: Occ MinVGPR MaxVGPR 20 0 (O20) 48 (O20) 19 0 (O20) 48 (O20) 18 49 (O18) 56 (O18) 17 49 (O18) 56 (O18) 16 57 (O16) 64 (O16) 15 57 (O16) 64 (O16) 14 65 (O14) 72 (O14) 13 65 (O14) 72 (O14) 12 73 (O12) 80 (O12) 11 81 (O11) 88 (O11) 10 89 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 168 (O6) 5 169 (O5) 200 (O5) 4 201 (O4) 256 (O4) 3 201 (O4) 256 (O4) 2 201 (O4) 256 (O4) 1 201 (O4) 256 (O4) ``` Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D138443	2022-12-06 09:14:49 +01:00
Dmitry Preobrazhensky	453eb9eb42	[AMDGPU][MC] Correct handling of mandatory literals Differential Revision: https://reviews.llvm.org/D138661	2022-12-05 16:23:47 +03:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Ron Lieberman	ca856fff1c	Revert "enable code-object-version=5" very sorry wrong repo. This reverts commit d882ba7aeac4b496dccd1b10cb58bd691786b691.	2022-11-29 15:21:09 -06:00
Ron Lieberman	d882ba7aea	enable code-object-version=5	2022-11-29 15:11:57 -06:00
Mateja Marjanovic	595a08847a	[AMDGPU] Add support for new LLVM vector types Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205	2022-11-29 17:02:04 +01:00
Dmitry Preobrazhensky	9b8eb5fa8e	[AMDGPU][MC][GFX11] Correct op_sel handling for permlane*16 Differential Revision: https://reviews.llvm.org/D137969	2022-11-29 18:45:22 +03:00
Dmitry Preobrazhensky	869fc7eabd	[AMDGPU][MC][MI100+] Enable VOP3 variants of dot2c/dot4c/dot8c opcodes Differential Revision: https://reviews.llvm.org/D138494	2022-11-29 17:38:18 +03:00
Kazu Hirata	aad2d272bf	[Utils] Use std::optional in AMDGPUBaseInfo.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-11-25 22:43:00 -08:00
Dmitry Preobrazhensky	96155bf44b	[AMDGPU][GFX11][NFC] Refactor VOPD operands handling (part 2) Rename interface functions and operands to make code clearer. Differential Revision: https://reviews.llvm.org/D138133	2022-11-18 14:15:05 +03:00
Dmitry Preobrazhensky	e468b1b740	[AMDGPU][GFX11] Refactor VOPD operands handling Differential Revision: https://reviews.llvm.org/D137952	2022-11-16 16:29:12 +03:00
Pierre van Houtryve	7425077e31	[AMDGPU] Add & use `hasNamedOperand`, NFC In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1. This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually. Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D137540	2022-11-08 07:57:21 +00:00
Joe Nash	01b8140d3a	[AMDGPU] Fix delay alu for VOPD with src2acc V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to inform passes that the instructions read the dst operand. The VOPD versions of these instructions lacked the dummy operand, which was a problem for inserting s_delay_alu. Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand tracking logic to account for it. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D136629	2022-10-25 13:11:17 -04:00
Dmitry Preobrazhensky	fd7b0eeaf6	[AMDGPU][MC][GFX11] Add VOPD VGPR bank access validation Differential Revision: https://reviews.llvm.org/D134960	2022-10-07 15:52:59 +03:00
Jay Foad	ddfa0f62d8	[AMDGPU] Add GFX11 feature for subtargets with more VGPRs The full complement of physical VGPRs for GFX11 is 50% more than GFX10. Some subtargets have this, others stay the same as GFX10. This affects occupancy calculations. Differential Revision: https://reviews.llvm.org/D134522	2022-09-23 20:18:23 +01:00
Joe Nash	b982ba2a6e	[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C Due to the encoding changes in GFX11, we had a hack in place that disables the use of VGPRs above 128. This patch removes the need for that hack. We introduce a new register class VGPR_32_Lo128 which is used for 16-bit operands of VOP1, VOP2, and VOPC instructions. This register class only has the low 128 VGPRs, but is otherwise identical to VGPR_32. Therefore, 16-bit VOP1, VOP2, and VOPC instructions are correctly limited to use the first 128 VGPRs, while the other instructions can freely use all 256. We introduce new pseduo-instructions used on GFX11 which have the suffix t16 (True 16) to use the VGPR_32_Lo128 register class. Reviewed By: foad, rampitec, #amdgpu Differential Revision: https://reviews.llvm.org/D133723	2022-09-20 09:56:28 -04:00
Dmitry Preobrazhensky	c89e60bf1f	[AMDGPU][MC][GFX11] Add VOPD literals validation Differential Revision: https://reviews.llvm.org/D133864	2022-09-15 16:29:53 +03:00
Dmitry Preobrazhensky	a80116efec	[AMDGPU][MC][GFX11] Add a helper function for identification of VOPD instructions Differential Revision: https://reviews.llvm.org/D133608	2022-09-13 12:41:39 +03:00
Kazu Hirata	7094ab4ee7	[llvm] Modernize bool literals (NFC) Identified with modernize-use-bool-literals.	2022-07-17 18:08:51 -07:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00

1 2 3 4 5 ...

277 Commits