llvm-project

Author	SHA1	Message	Date
Mariusz Sikora	966416b9e8	[AMDGPU][GFX12] Add new v_permlane16 variants (#75475 )	2023-12-15 10:14:38 +01:00
Mirko Brkušanin	47615ddc84	[AMDGPU][MC] Add GFX12 VFLAT, VSCRATCH and VGLOBAL encodings (#75193 )	2023-12-14 14:22:04 +01:00
Piotr Sobczak	fac093dd08	[AMDGPU] Update IEEE and DX10_CLAMP for GFX12 (#75030 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 13:52:40 +01:00
Mariusz Sikora	a97028ac51	[AMDGPU] Update VOP instructions for GFX12 (#74853 ) Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>	2023-12-12 11:38:24 +01:00
Kazu Hirata	586ecdf205	[llvm] Use StringRef::{starts,ends}_with (NFC) (#74956 ) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-11 21:01:36 -08:00
Mariusz Sikora	19c9f9c0bf	[AMDGPU] GFX12: Add s_prefetch_inst/data instructions (#74448 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-06 10:01:29 +01:00
Mirko Brkušanin	f5868cb6a6	[AMDGPU][MC] Add GFX12 VIMAGE and VSAMPLE encodings (#74062 )	2023-12-04 13:04:42 +01:00
Stanislav Mekhanoshin	87d884b5c8	[AMDGPU] Fix folding of v2i16/v2f16 splat imms (#72709 ) We can use inline constants with packed 16-bit operands, but these should use op_sel. Currently splat of inlinable constants is considered legal, which is not really true if we fail to fold it with op_sel and drop the high half. It may be legal as a literal but not as inline constant, but then usual literal checks must be performed. This patch makes these splat literals illegal but adds additional logic to the operand folding to keep current folds. This logic is somewhat heavy though. This has fixed constant bus violation in the fdot2 test.	2023-11-28 09:07:26 -08:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Diana	61332cb047	[AMDGPU] Emit backend_stack_size PAL metadata (#72509 ) For chain functions, PAL uses a `backend_stack_size` metadata item, which at the moment has the same meaning as `stack_frame_size_in_bytes`. We emit both for now in order to simplify coordination with PAL. The new item must be emitted in the `shader_functions` section, just as the metadata for other module entry functions. For simplicity, we mark chain functions as module entry functions and emit the same metadata for all of them.	2023-11-20 10:01:13 +01:00
Jay Foad	cc5c2efe4a	[AMDGPU] Fix and use isSISrcInlinableOperand. NFC. (#72101 )	2023-11-13 11:29:56 +00:00
Jun Wang	54470176af	[AMDGPU] Add inreg support for SGPR arguments (#67182 ) Function parameters marked with inreg are supposed to be allocated to SGPRs. However, for compute functions, this is ignored and function parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2023-11-08 11:35:52 -08:00
Pierre van Houtryve	4428b01faa	Reland: [AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-11-07 12:23:03 +01:00
Ivan Kosarev	d1e3d32088	[AMDGPU][NFCI] Decouple actual register encodings from HWEncoding values. (#69452 ) The HWEncoding values currently form a strange mix of actual register codes for some subtargets and types of operands and informational flags. This patch removes the dependency allowing arbitrary changes in the structure of HWEncoding values without breaking register encodings. Such changes, in turn, would make it possible to speed up and simplify getAVOperandEncoding() testing for AGPRs as well as other functions dealing with register codes downstream. They would also allow to maintain the same format of HWEncoding values across our downstream code bases, thus simplifying merging in mainline changes.	2023-10-25 13:24:50 +01:00
Konstantin Zhuravlyov	6cfb64276d	AMDGPU: Minor updates to program resource registers (#69525 ) - Be explicit about which program resource register is supported by which target - RSRC1 - FP16_OVFL is GFX9+ - WGP_MODE is GFX10+ - MEM_ORDERED is GFX10+ - FWD_PROGRESS is GFX10+ - RSRC3 - INST_PREF_SIZE is GFX11+ - TRAP_ON_START is GFX11+ - TRAP_ON_END is GFX11+ - IMAGE_OP is GFX11+ - Do not emit GFX11+ fields when disassembling GFX10 code objects - Tighten enforcement of reserved bits in disassembler --------- Co-authored-by: Konstantin Zhuravlyov <kzhuravl@amd.com>	2023-10-19 12:40:19 -04:00
pvanhout	868abf0961	Revert "[AMDGPU] Remove Code Object V3 (#67118 )" This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.	2023-10-18 12:55:36 +02:00
Pierre van Houtryve	544d91280c	[AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-10-16 08:21:48 +02:00
Stanislav Mekhanoshin	ab6c3d5034	[AMDGPU] Change the representation of double literals in operands (#68740 ) A 64-bit literal can be used as a 32-bit zero or sign extended operand. In case of double zeroes are added to the low 32 bits. Currently asm parser stores only high 32 bits of a double into an operand. To support codegen as requested by the https://github.com/llvm/llvm-project/issues/67781 we need to change the representation to store a full 64-bit value so that codegen can simply add immediates to an instruction. There is some code to support compatibility with existing tests and asm kernels. We allow to use short hex strings to represent only a high 32 bit of a double value as a valid literal.	2023-10-12 14:45:45 -07:00
Mirko Brkušanin	2cd2445c21	[AMDGPU] Src1 of VOP3 DPP instructions can be SGPR on supported subtargets (#67461 ) In order to avoid duplicating every dpp pseudo opcode that has src1, we allow it for all opcodes and add manual checks on subtargets that do not support it.	2023-09-29 11:54:49 +02:00
Stanislav Mekhanoshin	2024bfec9c	[AMDGPU] Remove int types from isSISrcFPOperand (#67401 ) This is NFCI, I don't believe there are any instructions using packed types in the ins dag, only in patterns, and the affected function is only used in the asm parser. However, int types shall not be reported as fp types. This may be usesul if we create an asm syntax for packed fp literals which we currently don't. If/when we do it that shall affect if we accept FP modifiers on these types or not. Say we could create a syntax like v2(-lit1, \|lit2\|) that would matter then.	2023-09-26 01:38:09 -07:00
Ivan Kosarev	9310baa596	[AMDGPU][NFC] Add True16 operand definitions. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D156103	2023-09-25 16:48:46 +01:00
Ruiling, Song	45e425e355	AMDGPU: Teach isArgPassedInSGPR() about cs_chain* calling convention (#67086 ) This cs_chain and cs_chain_preserve use InReg attribute to indicate argument passed through SGPR.	2023-09-22 22:24:17 +08:00
Pierre van Houtryve	fe2f67e4ba	[AMDGPU] Remove Code Object V2 (#65715 ) Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).	2023-09-21 12:00:45 +02:00
Austin Kerbow	69447d6afe	[AMDGPU] Add ASM and MC updates for preloading kernargs Add assembler directives for preloading kernel arguments that correspond to new fields in the kernel descriptor for the length and offset of arguments that will be placed in SGPRs prior to kernel launch. Alignment of the arguments in SGPRs is equivalent to the kernarg segment when accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated directly after other user SGPRs. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D159459	2023-09-19 15:45:16 -07:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit 0a8d17e79b02a92814a2a788d79df1f54d70ec3e.	2023-09-12 15:13:59 +05:30
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
Stanislav Mekhanoshin	cfe9a134bb	[AMDGPU] Rename 64BitDPP feature and fix the checks Names '64BitDPP' and especially 'DPP64' were found misleading, and DPP64 can easily be mixed with DPP16 and DPP8 while these are different concepts. DPP16 and DPP8 refers to lanes where DPP64 refers to the operand size. In fact the essential part here is that these instructions are executed on the DP ALU, so rename the feature accordingly. I have also found a bug in a check for these instructions, which is fixed here and a common utility function is now used. Differential Revision: https://reviews.llvm.org/D158465	2023-08-22 11:00:10 -07:00
Diana Picus	26dc284498	[AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions Lower formal arguments and returns for functions with the `amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions: * Put `inreg` arguments into SGPRs, starting at s0, and other arguments into VGPRs, starting at v8. No arguments should end up on the stack, if we don't have enough registers we should error out. * Lower the return (which is always void) as an S_ENDPGM. * Set the ScratchRSrc register to s48:51, as described in the docs. * Set the SP to s32, matching amdgpu_gfx. This might be revisited in a future patch. Differential Revision: https://reviews.llvm.org/D153517	2023-08-21 11:16:17 +02:00
Mirko Brkusanin	de82fde22d	AMDGPU/Uniformity/GlobalISel: G_AMDGPU atomics are always divergent Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D157091	2023-08-18 18:23:40 +02:00
Jay Foad	e61ca23289	[AMDGPU] Add and use SIInstrFlags::GWS. NFC. This reduces the number of places where we have to check for a list of DS_GWS_* opcodes. Differential Revision: https://reviews.llvm.org/D157099	2023-08-07 12:05:14 +01:00
Reid Kleckner	f86c81b2a8	[AMDGPU] Avoid CodeGen dependencies from AMDGPU/Utils and MCTargetDesc This required two substantial changes: 1. Moving a `getRegBitWidth(TargetRegisterClass)` overload out of Utils and into CodeGen 2. Passing the string function name to AMDGPUPALMetadata instead of the MachineFunction Other changes are minor or updates to accommodate the first two. See issue #64166 for more information on the layering issue. Differential Revision: https://reviews.llvm.org/D156486	2023-07-27 15:19:24 -07:00
Jon Chesterfield	a222951148	[amdgpu][nfc] Use unsigned for getIntegerPairAttribute to match the only call sites	2023-07-15 20:42:13 +01:00
Stephen Thomas	8aedad0fa0	[AMDGPU] Add functions for composing and decomposing S_WAIT_DEPCTR operands Add functions AMDGPU::DepCtr::encodeField() and AMDGPU::DepCtr::decodeField() for each of vm_vsrc, va_vdst and sa_sdst. These are now used in AMDGPUInsertDelayAlu and GCNHazardRecognizer so as to make working with S_WAITCNT_DEPCTR operands easier and more readable. Differential Revision: https://reviews.llvm.org/D154424	2023-07-04 11:02:12 +01:00
Stanislav Mekhanoshin	e2903abc15	[AMDGPU] Remove integer division in VOPD checks There is no way any compiler can simplify this division, while the check is done rather often. Differential Revision: https://reviews.llvm.org/D152613	2023-06-12 15:01:53 -07:00
Ivan Kosarev	4e312abdfd	[AMDGPU][NFC] Add a getRegBitWidth() helper for TargetRegisterClass operands. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152257	2023-06-07 11:41:11 +01:00
Jay Foad	dcb834843e	[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC. This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650	2023-02-23 16:38:15 +00:00
Mirko Brkusanin	b3dc0e69cf	[AMDGPU][MC][GFX11] Add Partial NSA format for image sample instructions Image sample instructions that need more than 5 VGPRs for VAddr can use partial NSA for NSA encoding format. VGPRs that can not fit into the encoding are sequential after the last one. This patch adds assembly and disassembly parts. Differential Revision: https://reviews.llvm.org/D144033	2023-02-23 13:33:34 +01:00
Jay Foad	c9f4df57ca	[AMDGPU] Move splitMUBUFOffset from AMDGPUBaseInfo to SIInstrInfo Moving this out of AMDGPUBaseInfo enforces that AMDGPUBaseInfo should not be calling into GCNSubtarget. Differential Revision: https://reviews.llvm.org/D144564	2023-02-22 16:19:05 +00:00
Fangrui Song	432caca39a	Simplify with hasFeature. NFC	2023-02-17 18:22:24 -08:00
Kazu Hirata	64dad4ba9a	Use llvm::bit_cast (NFC)	2023-02-14 01:22:12 -08:00
Janek van Oirschot	e3515ba381	Reapply "[AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack" Reapplies 142c28ffa1323e9a8d53200a22c80d5d778e0d0f as part of D140242 which got reverted due to amdgpu openmp test failures. This diff fixes said failures by eliding most of `adjustInliningThresholdUsingCallee` for indirect calls as the callee function is unavailable for indirect calls. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D143498	2023-02-13 12:17:43 +00:00
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Yashwant Singh	422d379de2	[AMDGPU] Use tablegen to list uniform intrinsics Right now we do opcode wise matching to identify uniform/non-divergent AMDGPU intrinsics. It is duplicated at 2 places once at IR level uniformity analysis and at MIR level. Moving them to single tablegen table for consistency and adding and API rapper to access them. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D142961	2023-01-31 17:44:40 +05:30
Nicolai Hähnle	10cef708a7	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468	2023-01-23 21:43:06 +01:00
Jay Foad	768aed1378	[MC] Make more use of MCInstrDesc::operands. NFC. Change MCInstrDesc::operands to return an ArrayRef so we can easily use it everywhere instead of the (IMHO ugly) opInfo_begin and opInfo_end. A future patch will remove opInfo_begin and opInfo_end. Also use it instead of raw access to the OpInfo pointer. A future patch will remove this pointer. Differential Revision: https://reviews.llvm.org/D142213	2023-01-23 11:31:41 +00:00
Matt Arsenault	4d4894ab92	Partially reapply "AMDGPU: Invert handling of enqueued block detection" This mostly reverts commit 270e96f435596449002fc89962595497481c8770. Keep the attributor related changes around, but functionally restore the old behavior as a workaround. Device enqueue goes back to not working at -O0 with this version.	2023-01-12 15:02:16 -05:00
Jay Foad	f460c66581	[AMDGPU] Simplify getNumFlatOffsetBits. NFC. Previously we considered this field to be either N-bit unsigned or N+1-bit signed, depending on the instruction. I think it's conceptually simpler to say that the field is always N+1-bit signed, but some instructions do not allow negative values. Differential Revision: https://reviews.llvm.org/D140883	2023-01-12 10:40:36 +00:00
Ivan Kosarev	2d945ef864	[AMDGPU][NFC] Rename GFX10A16 operands. They do not seem to be GFX10-specific anymore. Also renames the corresponding feature. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D141069	2023-01-09 17:18:46 +00:00

1 2 3 4 5 ...

303 Commits