llvm-project

Author	SHA1	Message	Date
Jay Foad	9c58f3a234	[AMDGPU] Fix implicit $vcc operands after parsing MIR (#87781 ) MIParser checks that implicit operands match the instruction definition, so they have to be $vcc even in wave32 mode. Use the mirFileLoaded hook to fix them after MIParser's checks, converting them to $vcc_lo which is what that rest of CodeGen expects. This is all just extending the fixImplicitOperands hack which was introduced with GFX10, but at least it makes it possible to write a MIR test which creates the same instructions that normal CodeGen would generate.	2024-04-09 09:10:45 +01:00
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Mariusz Sikora	a97028ac51	[AMDGPU] Update VOP instructions for GFX12 (#74853 ) Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2023-12-12 11:38:24 +01:00
Mirko Brkušanin	f5868cb6a6	[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062 )	2023-12-04 13:04:42 +01:00
Austin Kerbow	0455596e1e	[AMDGPU] Add DAG ISel support for preloaded kernel arguments This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patches in the series will make the codegen backwards compatible. This patch should only be submitted alongside that subsequent patch. Preloading here begins from the start of the kernel arguments until the amount of arguments indicated by the CL flag amdgpu-kernarg-preload-count. Aggregates and arguments passed by-ref are not supported. Special care for the alignment of the kernarg segment is needed as well as consideration of the alignment of addressable SGPR tuples when we cannot directly use misaligned large tuples that the arguments are loaded to. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D158579	2023-09-25 09:32:59 -07:00
Ivan Kosarev	bea56b0bc0	[AMDGPU] Have a subtarget feature to control use of real True16 instructions. Real True16 instructions are as they are defined in the ISA. Fake True16 instructions are identical to real ones except that they take 32-bit registers as operands and always use their low halves. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D156100	2023-09-22 10:47:13 +01:00
Simon Pilgrim	47a9cd0343	[AMDGPU] Remove constexpr from getNumUserSGPRForField/getMaxNumPreloadedSGPRs to appease older gcc builds Older versions of gcc wouldn't accept the constexpr getNumUserSGPRForField (introduced in D159439 / 343be5132e2831d85) as it couldn't treat the llvm_unreachable call as constexpr	2023-09-13 12:19:28 +01:00
Austin Kerbow	343be5132e	[AMDGPU] Add utilities to track number of user SGPRs. NFC. Factor out and unify some common code that calculates and tracks the number of user SGRPs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159439	2023-09-12 08:52:30 -07:00
Matt Arsenault	53fb907df4	AMDGPU: Special case uniformity info for single lane workgroups Constructors/destructors and OpenMP make use of single lane groups in some cases.	2023-06-28 07:25:48 -04:00
Matt Arsenault	b9c6d9e6c3	AMDGPU: Propagate amdgpu-waves-per-eu with attributor This will do a value range merging down the callgraph, unlike the current pass which can only propagate values to undecorated functions from a kernel. This one is a bit weird due to the interaction with the implied range from amdgpu-flat-workgroup-size. At the default group range of 1,1024, the minimum implied bounds is 4 so this ends up introducing the attribute on undecorated functions. We could probably simplify this by ignoring it and propagating the raw values. The subtarget interaction and the interaction with amdgpu-flat-workgroup-size only really clamp invalid values (plus the lower bound doesn't seem to do anything as far as I can tell anyway).	2023-06-16 15:04:08 -04:00
pvanhout	ecbd37d5a3	[AMDGPU] Port no-hsa-graphic-shaders.ll to code object V4 Split from D146023 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152432	2023-06-09 09:07:53 +02:00
Matt Arsenault	3d0350b762	AMDGPU: Add MF independent version of getImplicitParameterOffset	2023-06-07 08:26:31 -04:00
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Nicolai Hähnle	10cef708a7	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468	2023-01-23 21:43:06 +01:00
Nicolai Hähnle	84610a82a1	AMDGPU: Add AMDGPUSubtarget::getEUsPerCU() We will use this for more accurate occupancy computations. Note that IsaInfo takes WGP mode vs. CU mode into account on gfx10+. Differential Revision: https://reviews.llvm.org/D139467	2023-01-23 21:43:05 +01:00
Matt Arsenault	c16a58b36c	Attributes: Add function getter to parse integer string attributes The most common case for string attributes parses them as integers. We don't have a convenient way to do this, and as a result we have inconsistent missing attribute and invalid attribute handling scattered around. We also have inconsistent radix usage to getAsInteger; some places use the default 0 and others use base 10. Update a few of the uses, but there are quite a lot of these.	2022-12-14 13:12:35 -05:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Valery Pykhtin	d09d834bb9	[AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs. ``` /// \returns Minimum number of VGPRs that meets given number of waves per /// execution unit requirement supported by the subtarget. unsigned getMinNumVGPRs(unsigned WavesPerEU) const; /// \returns Maximum number of VGPRs that meets given number of waves per /// execution unit requirement supported by the subtarget. unsigned getMaxNumVGPRs(unsigned WavesPerEU) const; /// Return the maximum number of waves per SIMD for kernels using \p VGPRs /// VGPRs unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const; ``` While working on RP tracking issues I noticed that getMinNumVGPRs return incorrect values: the problem is large VGPR granule sizes on GFX10+ architectures. Some of the occupancies aren't reachable because require the same amount of VGPR granules as others. For example 19 waves occupancy on gfx1010 require the same amount of granules as 20 waves so the resultng occupancy would be 20. SGPRs have the same issue and even have inconsistency between getMaxNumSGPRs and getOccupancyWithNumSGPRs. It will be addressed in the next patch. Legend: # MinVGPR and MaxVGPR are values returned by getMinNumVGPRs and getMaxNumVGPRs for a given Occ. # (ONumber) is the value returned by getOccupancyWithNumVGPRs for a given MinVGPR or MaxVGPR. # R means range problem: MinVGPR should be less than MaxVGPR and both should refer to the same occupancy. Unit test output without the fix: ``` ./build/unittests/Target/AMDGPU/AMDGPUTests --gtest_filter=AMDGPU.TestVGPRLimitsPerOccupancy --print-cpu-reg-limits gfx90a gfx940: Occ MinVGPR MaxVGPR 8 0 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 257 (O1) 512 (O1) gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c: Occ MinVGPR MaxVGPR 10 0 (O10) 24 (O10) 9 25 (O9) 28 (O9) 8 29 (O8) 32 (O8) 7 33 (O7) 36 (O7) 6 37 (O6) 40 (O6) 5 41 (O5) 48 (O5) 4 49 (O4) 64 (O4) 3 65 (O3) 84 (O3) 2 85 (O2) 128 (O2) 1 129 (O1) 256 (O1) gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64: Occ MinVGPR MaxVGPR 16 0 (O16) 32 (O16) 15 33 (O12) R 32 (O16) 14 33 (O12) R 32 (O16) 13 33 (O12) R 32 (O16) 12 33 (O12) 40 (O12) 11 41 (O10) R 40 (O12) 10 41 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1100w64 gfx1101w64: Occ MinVGPR MaxVGPR 16 0 (O16) 48 (O16) 15 49 (O12) R 48 (O16) 14 49 (O12) R 48 (O16) 13 49 (O12) R 48 (O16) 12 49 (O12) 60 (O12) 11 61 (O10) R 60 (O12) 10 61 (O10) 72 (O10) 9 73 (O9) 84 (O9) 8 85 (O8) 96 (O8) 7 97 (O7) 108 (O7) 6 109 (O6) 120 (O6) 5 121 (O5) 144 (O5) 4 145 (O4) 192 (O4) 3 193 (O3) 252 (O3) 2 253 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32: Occ MinVGPR MaxVGPR 16 0 (O16) 64 (O16) 15 65 (O12) R 64 (O16) 14 65 (O12) R 64 (O16) 13 65 (O12) R 64 (O16) 12 65 (O12) 80 (O12) 11 81 (O10) R 80 (O12) 10 81 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 160 (O6) 5 161 (O5) 192 (O5) 4 193 (O4) 256 (O4) 3 256 (O4) R 256 (O4) 2 256 (O4) R 256 (O4) 1 256 (O4) R 256 (O4) gfx1100w32 gfx1101w32: Occ MinVGPR MaxVGPR 16 0 (O16) 96 (O16) 15 97 (O12) R 96 (O16) 14 97 (O12) R 96 (O16) 13 97 (O12) R 96 (O16) 12 97 (O12) 120 (O12) 11 121 (O10) R 120 (O12) 10 121 (O10) 144 (O10) 9 145 (O9) 168 (O9) 8 169 (O8) 192 (O8) 7 193 (O7) 216 (O7) 6 217 (O6) 240 (O6) 5 241 (O5) 256 (O5) 4 256 (O5) R 256 (O5) 3 256 (O5) R 256 (O5) 2 256 (O5) R 256 (O5) 1 256 (O5) R 256 (O5) gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64: Occ MinVGPR MaxVGPR 20 0 (O20) 24 (O20) 19 25 (O18) R 24 (O20) 18 25 (O18) 28 (O18) 17 29 (O16) R 28 (O18) 16 29 (O16) 32 (O16) 15 33 (O14) R 32 (O16) 14 33 (O14) 36 (O14) 13 37 (O12) R 36 (O14) 12 37 (O12) 40 (O12) 11 41 (O11) 44 (O11) 10 45 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 84 (O6) 5 85 (O5) 100 (O5) 4 101 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32: Occ MinVGPR MaxVGPR 20 0 (O20) 48 (O20) 19 49 (O18) R 48 (O20) 18 49 (O18) 56 (O18) 17 57 (O16) R 56 (O18) 16 57 (O16) 64 (O16) 15 65 (O14) R 64 (O16) 14 65 (O14) 72 (O14) 13 73 (O12) R 72 (O14) 12 73 (O12) 80 (O12) 11 81 (O11) 88 (O11) 10 89 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 168 (O6) 5 169 (O5) 200 (O5) 4 201 (O4) 256 (O4) 3 256 (O4) R 256 (O4) 2 256 (O4) R 256 (O4) 1 256 (O4) R 256 (O4) ``` After the fix: ``` gfx90a gfx940: Occ MinVGPR MaxVGPR 8 0 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 257 (O1) 512 (O1) gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c: Occ MinVGPR MaxVGPR 10 0 (O10) 24 (O10) 9 25 (O9) 28 (O9) 8 29 (O8) 32 (O8) 7 33 (O7) 36 (O7) 6 37 (O6) 40 (O6) 5 41 (O5) 48 (O5) 4 49 (O4) 64 (O4) 3 65 (O3) 84 (O3) 2 85 (O2) 128 (O2) 1 129 (O1) 256 (O1) gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64: Occ MinVGPR MaxVGPR 16 0 (O16) 32 (O16) 15 0 (O16) 32 (O16) 14 0 (O16) 32 (O16) 13 0 (O16) 32 (O16) 12 33 (O12) 40 (O12) 11 33 (O12) 40 (O12) 10 41 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 169 (O2) 256 (O2) gfx1100w64 gfx1101w64: Occ MinVGPR MaxVGPR 16 0 (O16) 48 (O16) 15 0 (O16) 48 (O16) 14 0 (O16) 48 (O16) 13 0 (O16) 48 (O16) 12 49 (O12) 60 (O12) 11 49 (O12) 60 (O12) 10 61 (O10) 72 (O10) 9 73 (O9) 84 (O9) 8 85 (O8) 96 (O8) 7 97 (O7) 108 (O7) 6 109 (O6) 120 (O6) 5 121 (O5) 144 (O5) 4 145 (O4) 192 (O4) 3 193 (O3) 252 (O3) 2 253 (O2) 256 (O2) 1 253 (O2) 256 (O2) gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32: Occ MinVGPR MaxVGPR 16 0 (O16) 64 (O16) 15 0 (O16) 64 (O16) 14 0 (O16) 64 (O16) 13 0 (O16) 64 (O16) 12 65 (O12) 80 (O12) 11 65 (O12) 80 (O12) 10 81 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 160 (O6) 5 161 (O5) 192 (O5) 4 193 (O4) 256 (O4) 3 193 (O4) 256 (O4) 2 193 (O4) 256 (O4) 1 193 (O4) 256 (O4) gfx1100w32 gfx1101w32: Occ MinVGPR MaxVGPR 16 0 (O16) 96 (O16) 15 0 (O16) 96 (O16) 14 0 (O16) 96 (O16) 13 0 (O16) 96 (O16) 12 97 (O12) 120 (O12) 11 97 (O12) 120 (O12) 10 121 (O10) 144 (O10) 9 145 (O9) 168 (O9) 8 169 (O8) 192 (O8) 7 193 (O7) 216 (O7) 6 217 (O6) 240 (O6) 5 241 (O5) 256 (O5) 4 241 (O5) 256 (O5) 3 241 (O5) 256 (O5) 2 241 (O5) 256 (O5) 1 241 (O5) 256 (O5) gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64: Occ MinVGPR MaxVGPR 20 0 (O20) 24 (O20) 19 0 (O20) 24 (O20) 18 25 (O18) 28 (O18) 17 25 (O18) 28 (O18) 16 29 (O16) 32 (O16) 15 29 (O16) 32 (O16) 14 33 (O14) 36 (O14) 13 33 (O14) 36 (O14) 12 37 (O12) 40 (O12) 11 41 (O11) 44 (O11) 10 45 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 84 (O6) 5 85 (O5) 100 (O5) 4 101 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 169 (O2) 256 (O2) gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32: Occ MinVGPR MaxVGPR 20 0 (O20) 48 (O20) 19 0 (O20) 48 (O20) 18 49 (O18) 56 (O18) 17 49 (O18) 56 (O18) 16 57 (O16) 64 (O16) 15 57 (O16) 64 (O16) 14 65 (O14) 72 (O14) 13 65 (O14) 72 (O14) 12 73 (O12) 80 (O12) 11 81 (O11) 88 (O11) 10 89 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 168 (O6) 5 169 (O5) 200 (O5) 4 201 (O4) 256 (O4) 3 201 (O4) 256 (O4) 2 201 (O4) 256 (O4) 1 201 (O4) 256 (O4) ``` Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D138443	2022-12-06 09:14:49 +01:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Carl Ritson	266b5dbc5d	[AMDGPU] Add MIMG NSA threshold configuration attribute Make MIMG NSA minimum addresses threshold an attribute that can be set on a function or configured via command line. This enables frontend tuning which allows increased NSA usage where beneficial. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D134780	2022-09-28 20:03:18 +09:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
jeff	8a12f20ef7	[AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation	2022-07-14 16:24:13 -07:00
Austin Kerbow	6817031d0b	[AMDGPU] Disable FillMFMAShadowMutation by default Disable amdgpu mfma power sched. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D129172	2022-07-07 09:34:45 -07:00
Guillaume Chatelet	f1255186c7	[NFC][Alignment] Remove max functions between Align and MaybeAlign `llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code more opaque. Differential Revision: https://reviews.llvm.org/D128121	2022-06-20 08:37:48 +00:00
Joe Nash	3732cd59be	[AMDGPU] gfx11 vop3 and inherited vop instructions This patch includes MC layer support for VOP3 encoded instructions and generic VOP support classes. Some VOP1 and VOP2 instructions which share an encoding with gfx10 and are using the AssemblerPredicate = isGFX10Plus are also enabled. That predicate will be changed to isGFX10Only in a later patch. Patch 15/N for upstreaming of AMDGPU gfx11 architecture. Depends on D126468 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126475	2022-06-02 14:03:02 -04:00
Jay Foad	8a53b25ed5	[AMDGPU] Use default member initializers in Subtarget classes Use default member initializers in AMDGPUSubtarget and subclasses. This is to guard against adding a new feature boolean in AMDGPUSubtarget.h but forgetting to initialize it to false in AMDGPUSubtarget.cpp. This was mostly autogenerated by: clang-tidy -checks=-,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/Subtarget.cpp Differential Revision: https://reviews.llvm.org/D123613	2022-04-12 16:42:30 +01:00
Changpeng Fang	6733590db2	AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5 Summary: If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise set it to 256 bytes for code object version 5 (and beyond). Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D123262	2022-04-07 08:35:23 -07:00
Austin Kerbow	0c0636f782	[AMDGPU] Fix uninitialized value after 8d0c34fd4f	2022-03-07 11:32:01 -08:00
Stanislav Mekhanoshin	35ec58d8c0	[AMDGPU] gfx940 removes all image instructions Differential Revision: https://reviews.llvm.org/D120763	2022-03-02 13:55:26 -08:00
Stanislav Mekhanoshin	2e2e64df4a	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688	2022-03-02 13:54:48 -08:00
Sebastian Neubauer	a5d4f82b73	[AMDGPU] Make enable-flat-scratch a subtarget feature Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and this doesn't work well with command line options. Differential Revision: https://reviews.llvm.org/D119425	2022-02-11 18:23:07 +01:00
Stanislav Mekhanoshin	4e077c0a0b	[AMDGPU] Remove feature register-banking Since RegBankReassign pass was removed this feature is not use for anything. Differential Revision: https://reviews.llvm.org/D118195	2022-01-26 08:39:17 -08:00
Matt Arsenault	0b1140e883	AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch This was approximating the entry point logic for flat_scratch_init, which is not really the point. We need to account for whether we need to reserve the SGPR pair used for flat_scratch, not whether we needed the initialization kernel argument. If this was an arbitrary function, we would end up over-reporting the number of potentially free SGPRs. The logic for architected flat scratch also only applies to the initialization in the kernel, not the reserved registers at the end. Avoids compile failures in a future patch from allocating more SGPRs than the subtarget supports.	2022-01-17 10:04:42 -05:00
Jay Foad	4db7422771	[AMDGPU] Improve zeroesHigh16BitsOfDest for GFX9 legacy opcodes Pseudos like V_MAD_U16 and V_FMA_F16 map down to what GFX9 calls v_mad_legacy_u16 and v_fma_legacy_f16, which are documented to have the same zeroing behaviour as on GFX8. Differential Revision: https://reviews.llvm.org/D115729	2021-12-15 13:14:48 +00:00
Matt Arsenault	2959e082e1	AMDGPU: Assume all amdhsa kernarg passed implicit arguments by default Previously we would require adding an attribute to kernels to enable the inputs passed in the kernarg segment, accessed by llvm.amdgcn.implicitarg.ptr. This violates the principle of being correct by default. Some OpenMP testcases were broken recently since it wasn't correctly setting this attribute, and no known frontends are setting this to anything other than the maximum. Most of the test changes are from load widening of argument loads since there now more implied dereferenceable bytes.	2021-12-04 10:38:25 -05:00
Matt Arsenault	ae0ba7dedd	AMDGPU: Optimize out implicit kernarg argument allocation if unused We already annotate whether llvm.amdgcn.implicitarg.ptr is known to be unused. Start using it to avoid allocating the implicit arguments if unneeded.	2021-12-04 10:38:25 -05:00
Matt Arsenault	90ff148719	AMDGPU: Account for implicit argument alignment for kernarg segment If a kernel had no formal arguments but did have the implicit arguments, we were reporting a required kernarg alignment of 4. For some reason we require an 8-byte alignment for this, even though there's no real advantage and I don't see where this is documented in the ABI. The code object header code also claims the minimum alignment is 16, which is what I thought you always got at runtime anyway so I don't know why this matters.	2021-11-09 17:48:37 -05:00
Matt Arsenault	8d4b74ac3f	AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set It should be semantically identical if it was set to the same value as the default. Also improve the documentation.	2021-10-22 16:23:50 -04:00
Jay Foad	3f34f75a68	[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32 As described in the comment, the way we change vcc to vcc_lo in these operands confuses addPhysRegDataDeps into treating them as implicit pseudo operands. Fix this by setting the correct latency from the SchedModel after addPhysRegDataDeps wrongly set it to 0. Differential Revision: https://reviews.llvm.org/D112317	2021-10-22 20:03:29 +01:00
Kazu Hirata	6fe949c4ed	[Target, Transforms] Use StringRef::contains (NFC)	2021-10-22 08:52:33 -07:00
Stanislav Mekhanoshin	082e22f3d7	[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all since an attempt to write to SGPRs mapped to FLAT_SCRATCH results in memory violation. This is not needed since GFX10 with architected flat scratch though since special SGPRs are not carving space from normal SGPRs. Differential Revision: https://reviews.llvm.org/D110376	2021-09-24 09:46:31 -07:00
Matt Arsenault	ec55dcedce	AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query Add an overload to pass the flat workgroup range in separately. This will allow the attributor to use the assumed value for amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.	2021-09-21 22:57:17 -04:00
Jacob Lambert	dc6e8dfdfe	[AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor.	2021-09-20 14:48:50 -07:00
Joe Nash	3ce1b9631a	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde	2021-09-14 15:11:27 -04:00
Daniil Fukalov	48958d02d2	[NFC][AMDGPU] Reduce includes dependencies. 1. Splitted out some parts of R600 target to separate modules/headers. 2. Reduced some include lists in headers. 3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()` and `R600TargetMachine::getSubtargetImpl()` had different return value type than base class. 4. Minor forward declarations cleanup. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D108596	2021-08-25 12:01:55 +03:00
Carl Ritson	7d4baf25aa	[AMDGPU] Add maximum NSA size limit ISA feature Add maximum NSA size limit as an ISA feature. Use this to reduce NSA usage on GFX10.1 to avoid stability issues with 4 and 5 dwords NSA instructions. Maintain use of longer NSA instructions on GFX10.3. Note: this also contains some minor fixes for GlobalISel which did not work correctly with non-NSA form instructions on GFX10. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D103348	2021-07-23 16:16:06 +09:00
David Stuttard	b8173c3178	[AMDGPU] Stop mulhi from doing 24 bit mul for uniform values Added support to check if architecture supports s_mulhi which is used as part of the decision whether or not to use valu 24 bit mul (if the mulhi gets transformed to a valu op anyway, then may as well use it). This is an extension of the work in D97063 Differential Revision: https://reviews.llvm.org/D103321 Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14	2021-07-05 10:33:23 +01:00

1 2 3 4 5 ...

278 Commits