llvm-project

Author	SHA1	Message	Date
Austin Kerbow	c4d89203f3	[AMDGPU] Support preloading hidden kernel arguments (#98861 ) Adds hidden kernel arguments to the function signature and marks them inreg if they should be preloaded into user SGPRs. The normal kernarg preloading logic then takes over with some additional checks for the correct implicitarg_ptr alignment. Special care is needed so that metadata for the hidden arguments is not added twice when generating the code object.	2024-10-06 17:44:33 -07:00
Jay Foad	8d13e7b8c3	[AMDGPU] Qualify auto. NFC. (#110878 ) Generated automatically with: $ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find lib/Target/AMDGPU/ -type f)	2024-10-03 13:07:54 +01:00
Jay Foad	a6bae5cb37	[AMDGPU] Split GCNSubtarget into its own file. NFC. (#105525 )	2024-08-21 19:11:02 +01:00
Jay Foad	63fae3ed65	[AMDGPU] clang-tidy: no else after return etc. NFC. (#99298 )	2024-07-17 21:11:00 +01:00
Jay Foad	f10a78b7e4	[AMDGPU] clang-tidy: use std::make_unique. NFC.	2024-07-17 07:58:09 +01:00
Jay Foad	fdb669b0dc	[AMDGPU] clang-format: pass Triple by value and std::move it. NFC.	2024-07-16 15:52:58 +01:00
Stanislav Mekhanoshin	b132dd41eb	[AMDGPU] Remove wavefrontsize feature from GFX10+ (#98400 ) Processor definition shall not include a default feature which may be switched off by a different wave size. This allows not to write -mattr=-wavefrontsize32,+wavefrontsize64 in tests.	2024-07-16 01:02:25 -07:00
Craig Topper	c97a8e4bcf	[AMDGPU] Remove unneed static_cast from GCNSubtarget constructor. NFC RegBankInfo is a std::unique_ptr<AMDGPURegisterBankInfo> so we don't need the cast.	2024-07-10 12:54:17 -07:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Nicolai Hähnle	7e9b49f6b8	AMDGPU: Add plumbing for private segment size argument (#96445 ) The actual size of scratch/private is determined at dispatch time, so add more plumbing to request it. Will be used in subsequent change.	2024-06-25 16:20:51 +02:00
Andreas Jonson	cc19374afa	[AMDGPU] Swap range metadata to attribute for workitem id. (#94871 ) Swap out range metadata to range attribute for calls to be able to deprecate range metadata on calls in the future.	2024-06-09 10:29:50 -04:00
Janek van Oirschot	d86b68afd7	MCExpr-ify SIProgramInfo (#88257 ) Convert members in SIProgramInfo affected by variables provided by AMDGPUResourceUsageAnalysis into MCExprs.	2024-05-09 13:02:32 +01:00
Jay Foad	6eb9e214b3	RFC: [AMDGPU] Check subtarget features for consistency (#86957 ) Implement GCNSubtarget::checkSubtargetFeatures as a canonical place to check subtarget features for consistency and diagnose any inconsistencies. To start with, the implementation just checks that either wavefrontsize32 or wavefrontsize64 is selected. checkSubtargetFeatures is called at the start of instruction selection. This is pretty arbitrary. It is just a convenient point at which we have access to the subtarget that we're going to use for codegenning a particular function.	2024-05-09 11:37:28 +01:00
David Green	b24af43fdf	[AArch64] Improve scheduling latency into Bundles (#86310 ) By default the scheduling info of instructions into a BUNDLE are given a latency of 0 as they operate on the implicit register of the bundle. This modifies that for AArch64 so that the latency is adjusted to use the latency from the instruction in the bundle instead. This essentially assumes that the bundled instructions are executed in a single cycle, which for AArch64 is probably OK considering they are mostly used for MOVPFX bundles, where this can help create slightly better scheduling especially for in-order cores.	2024-04-12 10:57:01 +01:00
Jay Foad	9c58f3a234	[AMDGPU] Fix implicit $vcc operands after parsing MIR (#87781 ) MIParser checks that implicit operands match the instruction definition, so they have to be $vcc even in wave32 mode. Use the mirFileLoaded hook to fix them after MIParser's checks, converting them to $vcc_lo which is what that rest of CodeGen expects. This is all just extending the fixImplicitOperands hack which was introduced with GFX10, but at least it makes it possible to write a MIR test which creates the same instructions that normal CodeGen would generate.	2024-04-09 09:10:45 +01:00
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Mariusz Sikora	a97028ac51	[AMDGPU] Update VOP instructions for GFX12 (#74853 ) Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2023-12-12 11:38:24 +01:00
Mirko Brkušanin	f5868cb6a6	[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062 )	2023-12-04 13:04:42 +01:00
Austin Kerbow	0455596e1e	[AMDGPU] Add DAG ISel support for preloaded kernel arguments This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patches in the series will make the codegen backwards compatible. This patch should only be submitted alongside that subsequent patch. Preloading here begins from the start of the kernel arguments until the amount of arguments indicated by the CL flag amdgpu-kernarg-preload-count. Aggregates and arguments passed by-ref are not supported. Special care for the alignment of the kernarg segment is needed as well as consideration of the alignment of addressable SGPR tuples when we cannot directly use misaligned large tuples that the arguments are loaded to. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D158579	2023-09-25 09:32:59 -07:00
Ivan Kosarev	bea56b0bc0	[AMDGPU] Have a subtarget feature to control use of real True16 instructions. Real True16 instructions are as they are defined in the ISA. Fake True16 instructions are identical to real ones except that they take 32-bit registers as operands and always use their low halves. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D156100	2023-09-22 10:47:13 +01:00
Simon Pilgrim	47a9cd0343	[AMDGPU] Remove constexpr from getNumUserSGPRForField/getMaxNumPreloadedSGPRs to appease older gcc builds Older versions of gcc wouldn't accept the constexpr getNumUserSGPRForField (introduced in D159439 / 343be5132e2831d85) as it couldn't treat the llvm_unreachable call as constexpr	2023-09-13 12:19:28 +01:00
Austin Kerbow	343be5132e	[AMDGPU] Add utilities to track number of user SGPRs. NFC. Factor out and unify some common code that calculates and tracks the number of user SGRPs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159439	2023-09-12 08:52:30 -07:00
Matt Arsenault	53fb907df4	AMDGPU: Special case uniformity info for single lane workgroups Constructors/destructors and OpenMP make use of single lane groups in some cases.	2023-06-28 07:25:48 -04:00
Matt Arsenault	b9c6d9e6c3	AMDGPU: Propagate amdgpu-waves-per-eu with attributor This will do a value range merging down the callgraph, unlike the current pass which can only propagate values to undecorated functions from a kernel. This one is a bit weird due to the interaction with the implied range from amdgpu-flat-workgroup-size. At the default group range of 1,1024, the minimum implied bounds is 4 so this ends up introducing the attribute on undecorated functions. We could probably simplify this by ignoring it and propagating the raw values. The subtarget interaction and the interaction with amdgpu-flat-workgroup-size only really clamp invalid values (plus the lower bound doesn't seem to do anything as far as I can tell anyway).	2023-06-16 15:04:08 -04:00
pvanhout	ecbd37d5a3	[AMDGPU] Port no-hsa-graphic-shaders.ll to code object V4 Split from D146023 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152432	2023-06-09 09:07:53 +02:00
Matt Arsenault	3d0350b762	AMDGPU: Add MF independent version of getImplicitParameterOffset	2023-06-07 08:26:31 -04:00
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Nicolai Hähnle	10cef708a7	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468	2023-01-23 21:43:06 +01:00
Nicolai Hähnle	84610a82a1	AMDGPU: Add AMDGPUSubtarget::getEUsPerCU() We will use this for more accurate occupancy computations. Note that IsaInfo takes WGP mode vs. CU mode into account on gfx10+. Differential Revision: https://reviews.llvm.org/D139467	2023-01-23 21:43:05 +01:00
Matt Arsenault	c16a58b36c	Attributes: Add function getter to parse integer string attributes The most common case for string attributes parses them as integers. We don't have a convenient way to do this, and as a result we have inconsistent missing attribute and invalid attribute handling scattered around. We also have inconsistent radix usage to getAsInteger; some places use the default 0 and others use base 10. Update a few of the uses, but there are quite a lot of these.	2022-12-14 13:12:35 -05:00
Jay Foad	6443c0ee02	[AMDGPU] Stop using make_pair and make_tuple. NFC. C++17 allows us to call constructors pair and tuple instead of helper functions make_pair and make_tuple. Differential Revision: https://reviews.llvm.org/D139828	2022-12-14 13:22:26 +00:00
Valery Pykhtin	d09d834bb9	[AMDGPU] Fix GCNSubtarget::getMinNumVGPRs, add unit test to check consistency between GCNSubtarget's getMinNumVGPRs, getMaxNumVGPRs and getOccupancyWithNumVGPRs. ``` /// \returns Minimum number of VGPRs that meets given number of waves per /// execution unit requirement supported by the subtarget. unsigned getMinNumVGPRs(unsigned WavesPerEU) const; /// \returns Maximum number of VGPRs that meets given number of waves per /// execution unit requirement supported by the subtarget. unsigned getMaxNumVGPRs(unsigned WavesPerEU) const; /// Return the maximum number of waves per SIMD for kernels using \p VGPRs /// VGPRs unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const; ``` While working on RP tracking issues I noticed that getMinNumVGPRs return incorrect values: the problem is large VGPR granule sizes on GFX10+ architectures. Some of the occupancies aren't reachable because require the same amount of VGPR granules as others. For example 19 waves occupancy on gfx1010 require the same amount of granules as 20 waves so the resultng occupancy would be 20. SGPRs have the same issue and even have inconsistency between getMaxNumSGPRs and getOccupancyWithNumSGPRs. It will be addressed in the next patch. Legend: # MinVGPR and MaxVGPR are values returned by getMinNumVGPRs and getMaxNumVGPRs for a given Occ. # (ONumber) is the value returned by getOccupancyWithNumVGPRs for a given MinVGPR or MaxVGPR. # R means range problem: MinVGPR should be less than MaxVGPR and both should refer to the same occupancy. Unit test output without the fix: ``` ./build/unittests/Target/AMDGPU/AMDGPUTests --gtest_filter=AMDGPU.TestVGPRLimitsPerOccupancy --print-cpu-reg-limits gfx90a gfx940: Occ MinVGPR MaxVGPR 8 0 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 257 (O1) 512 (O1) gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c: Occ MinVGPR MaxVGPR 10 0 (O10) 24 (O10) 9 25 (O9) 28 (O9) 8 29 (O8) 32 (O8) 7 33 (O7) 36 (O7) 6 37 (O6) 40 (O6) 5 41 (O5) 48 (O5) 4 49 (O4) 64 (O4) 3 65 (O3) 84 (O3) 2 85 (O2) 128 (O2) 1 129 (O1) 256 (O1) gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64: Occ MinVGPR MaxVGPR 16 0 (O16) 32 (O16) 15 33 (O12) R 32 (O16) 14 33 (O12) R 32 (O16) 13 33 (O12) R 32 (O16) 12 33 (O12) 40 (O12) 11 41 (O10) R 40 (O12) 10 41 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1100w64 gfx1101w64: Occ MinVGPR MaxVGPR 16 0 (O16) 48 (O16) 15 49 (O12) R 48 (O16) 14 49 (O12) R 48 (O16) 13 49 (O12) R 48 (O16) 12 49 (O12) 60 (O12) 11 61 (O10) R 60 (O12) 10 61 (O10) 72 (O10) 9 73 (O9) 84 (O9) 8 85 (O8) 96 (O8) 7 97 (O7) 108 (O7) 6 109 (O6) 120 (O6) 5 121 (O5) 144 (O5) 4 145 (O4) 192 (O4) 3 193 (O3) 252 (O3) 2 253 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32: Occ MinVGPR MaxVGPR 16 0 (O16) 64 (O16) 15 65 (O12) R 64 (O16) 14 65 (O12) R 64 (O16) 13 65 (O12) R 64 (O16) 12 65 (O12) 80 (O12) 11 81 (O10) R 80 (O12) 10 81 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 160 (O6) 5 161 (O5) 192 (O5) 4 193 (O4) 256 (O4) 3 256 (O4) R 256 (O4) 2 256 (O4) R 256 (O4) 1 256 (O4) R 256 (O4) gfx1100w32 gfx1101w32: Occ MinVGPR MaxVGPR 16 0 (O16) 96 (O16) 15 97 (O12) R 96 (O16) 14 97 (O12) R 96 (O16) 13 97 (O12) R 96 (O16) 12 97 (O12) 120 (O12) 11 121 (O10) R 120 (O12) 10 121 (O10) 144 (O10) 9 145 (O9) 168 (O9) 8 169 (O8) 192 (O8) 7 193 (O7) 216 (O7) 6 217 (O6) 240 (O6) 5 241 (O5) 256 (O5) 4 256 (O5) R 256 (O5) 3 256 (O5) R 256 (O5) 2 256 (O5) R 256 (O5) 1 256 (O5) R 256 (O5) gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64: Occ MinVGPR MaxVGPR 20 0 (O20) 24 (O20) 19 25 (O18) R 24 (O20) 18 25 (O18) 28 (O18) 17 29 (O16) R 28 (O18) 16 29 (O16) 32 (O16) 15 33 (O14) R 32 (O16) 14 33 (O14) 36 (O14) 13 37 (O12) R 36 (O14) 12 37 (O12) 40 (O12) 11 41 (O11) 44 (O11) 10 45 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 84 (O6) 5 85 (O5) 100 (O5) 4 101 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 256 (O2) R 256 (O2) gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32: Occ MinVGPR MaxVGPR 20 0 (O20) 48 (O20) 19 49 (O18) R 48 (O20) 18 49 (O18) 56 (O18) 17 57 (O16) R 56 (O18) 16 57 (O16) 64 (O16) 15 65 (O14) R 64 (O16) 14 65 (O14) 72 (O14) 13 73 (O12) R 72 (O14) 12 73 (O12) 80 (O12) 11 81 (O11) 88 (O11) 10 89 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 168 (O6) 5 169 (O5) 200 (O5) 4 201 (O4) 256 (O4) 3 256 (O4) R 256 (O4) 2 256 (O4) R 256 (O4) 1 256 (O4) R 256 (O4) ``` After the fix: ``` gfx90a gfx940: Occ MinVGPR MaxVGPR 8 0 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 257 (O1) 512 (O1) gfx600 gfx600 gfx601 gfx601 gfx601 gfx602 gfx602 gfx602 gfx700 gfx700 gfx701 gfx701 gfx702 gfx703 gfx703 gfx703 gfx704 gfx704 gfx705 gfx801 gfx801 gfx802 gfx802 gfx802 gfx803 gfx803 gfx803 gfx803 gfx805 gfx805 gfx810 gfx810 gfx900 gfx902 gfx904 gfx906 gfx908 gfx909 gfx90c: Occ MinVGPR MaxVGPR 10 0 (O10) 24 (O10) 9 25 (O9) 28 (O9) 8 29 (O8) 32 (O8) 7 33 (O7) 36 (O7) 6 37 (O6) 40 (O6) 5 41 (O5) 48 (O5) 4 49 (O4) 64 (O4) 3 65 (O3) 84 (O3) 2 85 (O2) 128 (O2) 1 129 (O1) 256 (O1) gfx1030w64 gfx1031w64 gfx1032w64 gfx1033w64 gfx1034w64 gfx1035w64 gfx1036w64 gfx1102w64 gfx1103w64: Occ MinVGPR MaxVGPR 16 0 (O16) 32 (O16) 15 0 (O16) 32 (O16) 14 0 (O16) 32 (O16) 13 0 (O16) 32 (O16) 12 33 (O12) 40 (O12) 11 33 (O12) 40 (O12) 10 41 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 80 (O6) 5 81 (O5) 96 (O5) 4 97 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 169 (O2) 256 (O2) gfx1100w64 gfx1101w64: Occ MinVGPR MaxVGPR 16 0 (O16) 48 (O16) 15 0 (O16) 48 (O16) 14 0 (O16) 48 (O16) 13 0 (O16) 48 (O16) 12 49 (O12) 60 (O12) 11 49 (O12) 60 (O12) 10 61 (O10) 72 (O10) 9 73 (O9) 84 (O9) 8 85 (O8) 96 (O8) 7 97 (O7) 108 (O7) 6 109 (O6) 120 (O6) 5 121 (O5) 144 (O5) 4 145 (O4) 192 (O4) 3 193 (O3) 252 (O3) 2 253 (O2) 256 (O2) 1 253 (O2) 256 (O2) gfx1030w32 gfx1031w32 gfx1032w32 gfx1033w32 gfx1034w32 gfx1035w32 gfx1036w32 gfx1102w32 gfx1103w32: Occ MinVGPR MaxVGPR 16 0 (O16) 64 (O16) 15 0 (O16) 64 (O16) 14 0 (O16) 64 (O16) 13 0 (O16) 64 (O16) 12 65 (O12) 80 (O12) 11 65 (O12) 80 (O12) 10 81 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 160 (O6) 5 161 (O5) 192 (O5) 4 193 (O4) 256 (O4) 3 193 (O4) 256 (O4) 2 193 (O4) 256 (O4) 1 193 (O4) 256 (O4) gfx1100w32 gfx1101w32: Occ MinVGPR MaxVGPR 16 0 (O16) 96 (O16) 15 0 (O16) 96 (O16) 14 0 (O16) 96 (O16) 13 0 (O16) 96 (O16) 12 97 (O12) 120 (O12) 11 97 (O12) 120 (O12) 10 121 (O10) 144 (O10) 9 145 (O9) 168 (O9) 8 169 (O8) 192 (O8) 7 193 (O7) 216 (O7) 6 217 (O6) 240 (O6) 5 241 (O5) 256 (O5) 4 241 (O5) 256 (O5) 3 241 (O5) 256 (O5) 2 241 (O5) 256 (O5) 1 241 (O5) 256 (O5) gfx1010w64 gfx1011w64 gfx1012w64 gfx1013w64: Occ MinVGPR MaxVGPR 20 0 (O20) 24 (O20) 19 0 (O20) 24 (O20) 18 25 (O18) 28 (O18) 17 25 (O18) 28 (O18) 16 29 (O16) 32 (O16) 15 29 (O16) 32 (O16) 14 33 (O14) 36 (O14) 13 33 (O14) 36 (O14) 12 37 (O12) 40 (O12) 11 41 (O11) 44 (O11) 10 45 (O10) 48 (O10) 9 49 (O9) 56 (O9) 8 57 (O8) 64 (O8) 7 65 (O7) 72 (O7) 6 73 (O6) 84 (O6) 5 85 (O5) 100 (O5) 4 101 (O4) 128 (O4) 3 129 (O3) 168 (O3) 2 169 (O2) 256 (O2) 1 169 (O2) 256 (O2) gfx1010w32 gfx1011w32 gfx1012w32 gfx1013w32: Occ MinVGPR MaxVGPR 20 0 (O20) 48 (O20) 19 0 (O20) 48 (O20) 18 49 (O18) 56 (O18) 17 49 (O18) 56 (O18) 16 57 (O16) 64 (O16) 15 57 (O16) 64 (O16) 14 65 (O14) 72 (O14) 13 65 (O14) 72 (O14) 12 73 (O12) 80 (O12) 11 81 (O11) 88 (O11) 10 89 (O10) 96 (O10) 9 97 (O9) 112 (O9) 8 113 (O8) 128 (O8) 7 129 (O7) 144 (O7) 6 145 (O6) 168 (O6) 5 169 (O5) 200 (O5) 4 201 (O4) 256 (O4) 3 201 (O4) 256 (O4) 2 201 (O4) 256 (O4) 1 201 (O4) 256 (O4) ``` Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D138443	2022-12-06 09:14:49 +01:00
Kazu Hirata	20cde15415	[Target] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:06 -08:00
Carl Ritson	266b5dbc5d	[AMDGPU] Add MIMG NSA threshold configuration attribute Make MIMG NSA minimum addresses threshold an attribute that can be set on a function or configured via command line. This enables frontend tuning which allows increased NSA usage where beneficial. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D134780	2022-09-28 20:03:18 +09:00
Fangrui Song	de9d80c1c5	[llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC With C++17 there is no Clang pedantic warning or MSVC C5051.	2022-08-08 11:24:15 -07:00
Jon Chesterfield	3a20597776	[amdgpu] Implement lds kernel id intrinsic Implement an intrinsic for use lowering LDS variables to different addresses from different kernels. This will allow kernels that cannot reach an LDS variable to avoid wasting space for it. There are a number of implicit arguments accessed by intrinsic already so this implementation closely follows the existing handling. It is slightly novel in that this SGPR is written by the kernel prologue. It is necessary in the general case to put variables at different addresses such that they can be compactly allocated and thus necessary for an indirect function call to have some means of determining where a given variable was allocated. Claiming an arbitrary SGPR into which an integer can be written by the kernel, in this implementation based on metadata associated with that kernel, which is then passed on to indirect call sites is sufficient to determine the variable address. The intent is to emit a __const array of LDS addresses and index into it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D125060	2022-07-19 17:46:19 +01:00
jeff	8a12f20ef7	[AMDGPU] Update the mechanism used to check for cycles and add eges in power-sched mutation	2022-07-14 16:24:13 -07:00
Austin Kerbow	6817031d0b	[AMDGPU] Disable FillMFMAShadowMutation by default Disable amdgpu mfma power sched. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D129172	2022-07-07 09:34:45 -07:00
Guillaume Chatelet	f1255186c7	[NFC][Alignment] Remove max functions between Align and MaybeAlign `llvm::max(Align, MaybeAlign)` and `llvm::max(MaybeAlign, Align)` are not used often enough to be required. They also make the code more opaque. Differential Revision: https://reviews.llvm.org/D128121	2022-06-20 08:37:48 +00:00
Joe Nash	3732cd59be	[AMDGPU] gfx11 vop3 and inherited vop instructions This patch includes MC layer support for VOP3 encoded instructions and generic VOP support classes. Some VOP1 and VOP2 instructions which share an encoding with gfx10 and are using the AssemblerPredicate = isGFX10Plus are also enabled. That predicate will be changed to isGFX10Only in a later patch. Patch 15/N for upstreaming of AMDGPU gfx11 architecture. Depends on D126468 Reviewed By: dp Differential Revision: https://reviews.llvm.org/D126475	2022-06-02 14:03:02 -04:00
Jay Foad	8a53b25ed5	[AMDGPU] Use default member initializers in Subtarget classes Use default member initializers in AMDGPUSubtarget and subclasses. This is to guard against adding a new feature boolean in AMDGPUSubtarget.h but forgetting to initialize it to false in AMDGPUSubtarget.cpp. This was mostly autogenerated by: clang-tidy -checks=-,cppcoreguidelines-prefer-member-initializer,modernize-use-default-member-init -header-filter=Subtarget -fix lib/Target/AMDGPU/Subtarget.cpp Differential Revision: https://reviews.llvm.org/D123613	2022-04-12 16:42:30 +01:00
Changpeng Fang	6733590db2	AMDGPU: Set implicit kernarg size to be of 256 bytes for code object version 5 Summary: If implicitarg_ptr intrinsic is not used, set implicit kernarg size to 0, otherwise set it to 256 bytes for code object version 5 (and beyond). Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D123262	2022-04-07 08:35:23 -07:00
Austin Kerbow	0c0636f782	[AMDGPU] Fix uninitialized value after 8d0c34fd4f	2022-03-07 11:32:01 -08:00
Stanislav Mekhanoshin	35ec58d8c0	[AMDGPU] gfx940 removes all image instructions Differential Revision: https://reviews.llvm.org/D120763	2022-03-02 13:55:26 -08:00
Stanislav Mekhanoshin	2e2e64df4a	[AMDGPU] Add gfx940 target This is target definition only. Differential Revision: https://reviews.llvm.org/D120688	2022-03-02 13:54:48 -08:00
Sebastian Neubauer	a5d4f82b73	[AMDGPU] Make enable-flat-scratch a subtarget feature Use a subtarget feature instead of a command line argument to reduce global state. We want to enable flat scratch for graphics in some cases and this doesn't work well with command line options. Differential Revision: https://reviews.llvm.org/D119425	2022-02-11 18:23:07 +01:00
Stanislav Mekhanoshin	4e077c0a0b	[AMDGPU] Remove feature register-banking Since RegBankReassign pass was removed this feature is not use for anything. Differential Revision: https://reviews.llvm.org/D118195	2022-01-26 08:39:17 -08:00
Matt Arsenault	0b1140e883	AMDGPU: Correct getMaxNumSGPR treatment of flat_scratch This was approximating the entry point logic for flat_scratch_init, which is not really the point. We need to account for whether we need to reserve the SGPR pair used for flat_scratch, not whether we needed the initialization kernel argument. If this was an arbitrary function, we would end up over-reporting the number of potentially free SGPRs. The logic for architected flat scratch also only applies to the initialization in the kernel, not the reserved registers at the end. Avoids compile failures in a future patch from allocating more SGPRs than the subtarget supports.	2022-01-17 10:04:42 -05:00

1 2 3 4 5 ...

292 Commits