llvm-project

Author	SHA1	Message	Date
Jay Foad	70fc970378	[AMDGPU] Move architected SGPR implementation into isel (#79120 )	2024-01-24 15:06:20 +00:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Emma Pilkington	bc82cfb38d	[AMDGPU] Add an asm directive to track code_object_version (#76267 ) Named '.amdhsa_code_object_version'. This directive sets the e_ident[ABIVERSION] in the ELF header, and should be used as the assumed COV for the rest of the asm file. This commit also weakens the --amdhsa-code-object-version CL flag. Previously, the CL flag took precedence over the IR flag. Now the IR flag/asm directive take precedence over the CL flag. This is implemented by merging a few COV-checking functions in AMDGPUBaseInfo.h.	2024-01-21 11:54:47 -05:00
Mariusz Sikora	3e6589f21c	[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917 ) - image_atomic_pk_add_f16 - image_atomic_pk_add_bf16 - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 - buffer_atomic_pk_add_bf16	2024-01-18 14:01:09 +01:00
Mariusz Sikora	c99da46fc1	[AMDGPU][GFX12] Add Atomic cond_sub_u32 (#76224 ) Co-authored-by: Vang Thao <Vang.Thao@amd.com>	2024-01-17 19:23:42 +01:00
Jay Foad	4a77414660	[AMDGPU] CodeGen for GFX12 8/16-bit SMEM loads (#77633 )	2024-01-17 10:28:03 +00:00
Jay Foad	daa4728dee	[AMDGPU] Add CodeGen support for GFX12 s_mul_u64 (#75825 )	2024-01-08 19:13:38 +00:00
David Green	d659bd1635	[GlobalISel][AArch64] Tail call libcalls. (#74929 ) This tries to allow libcalls to be tail called, using a similar method to DAG where the type is checked to make sure they match, and if so the backend, through lowerCall checks that the tailcall is valid for all arguments.	2024-01-03 07:59:36 +00:00
Mariusz Sikora	414d27419f	[AMDGPU] GFX12: select @llvm.prefetch intrinsic (#74576 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-15 17:15:55 +01:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Mirko Brkušanin	5879162f7f	[AMDGPU] CodeGen for GFX12 VBUFFER instructions (#75492 )	2023-12-15 13:45:03 +01:00
Mirko Brkušanin	26b14aedb7	[AMDGPU] CodeGen for GFX12 VIMAGE and VSAMPLE instructions (#75488 )	2023-12-15 12:40:23 +01:00
Mirko Brkušanin	a278ac577e	[AMDGPU] CodeGen for SMEM instructions (#75579 )	2023-12-15 12:10:33 +01:00
Piotr Sobczak	6eec80133b	[AMDGPU] Min/max changes for GFX12 (#75214 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2023-12-13 14:18:10 +01:00
Jay Foad	8005ee6da3	[AMDGPU] CodeGen for GFX12 64-bit scalar add/sub (#75070 )	2023-12-12 17:41:40 +00:00
Matt Arsenault	db8b85ac58	AMDGPU: Support llvm.exp10 (#65860 )	2023-12-02 21:56:35 +07:00
Acim-Maravic	f3138524db	[AMDGPU] Generic lowering for rint and nearbyint (#69596 ) The are three different rounding intrinsics, that are brought down to same instruction. Co-authored-by: Acim Maravic <acim.maravic@amd.com>	2023-11-14 18:49:21 +01:00
Pierre van Houtryve	4428b01faa	Reland: [AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-11-07 12:23:03 +01:00
Jay Foad	86f2e09250	[AMDGPU] Tweak handling of GlobalAddress operands in SI_PC_ADD_REL_OFFSET (#70960 ) When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the GlobalAddress operands have to be adjusted by 4 or 12 bytes to account for the offset from the end of the S_GETPC instruction to the literal operands. Do this all in SIInstrInfo::expandPostRAPseudo instead of duplicating the adjustment code in both AMDGPULegalizerInfo and SITargetLowering. NFCI.	2023-11-01 19:48:30 +00:00
pvanhout	868abf0961	Revert "[AMDGPU] Remove Code Object V3 (#67118 )" This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.	2023-10-18 12:55:36 +02:00
Jay Foad	bff17f9f23	[AMDGPU] Remove support for no-return buffer atomic intrinsics. NFC. (#69326 ) Thsi removes some of the machinery added by D85268, which was unused since D87719 changed all buffer atomic intrinsics to return a value.	2023-10-17 14:47:32 +01:00
Pierre van Houtryve	544d91280c	[AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-10-16 08:21:48 +02:00
Jay Foad	b49a0dbaeb	[AMDGPU] Fix comments about afn and arcp in fast unsafe fdiv handling (#68982 )	2023-10-13 19:23:53 +01:00
Thomas Symalla	aa5158cd1e	[AMDGPU] Use absolute relocations when compiling for AMDPAL and Mesa3D (#67791 ) The primary ISA-independent justification for using PC-relative addressing is that it makes code position-independent and therefore allows sharing of .text pages between processes. When not sharing .text pages, we can use absolute relocations instead, which will possibly prevent a bubble introduced by s_getpc_b64. Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>	2023-10-10 09:22:02 +02:00
Mirko Brkušanin	ecfdc23dd2	[AMDGPU] Select gfx1150 SALU Float instructions (#66885 )	2023-09-21 12:22:55 +02:00
Pierre van Houtryve	e9e3868707	[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346 ) Addresses the FIXME for both DAGISel and GISel.	2023-09-15 08:11:01 +02:00
Pierre van Houtryve	3d0353793b	[AMDGPU] Fix `HasFP32Denormals` check in FDIV32 lowering (#66212 ) Fixes SWDEV-403219	2023-09-14 08:47:10 +02:00
Matt Arsenault	c48248d7f9	AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp https://reviews.llvm.org/D158130	2023-09-12 23:23:10 +03:00
Matt Arsenault	72a7024add	AMDGPU: Correctly lower llvm.sqrt.f32 Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129	2023-09-12 23:22:54 +03:00
Stanislav Mekhanoshin	070c2570ad	[AMDGPU] Global ISel for packed fp32 instructions (#65803 )	2023-09-11 10:48:37 -07:00
Matt Arsenault	77c67436d9	LLT: Add some stub constructors for FP types This is to start documenting uses to ease a future migration to supporting different types with the same size. https://reviews.llvm.org/D150605	2023-09-03 08:33:19 -04:00
Martin	432eda5cc5	Remove dead assignments Dst is never used after creating the type making these assignments dead Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D158610	2023-08-24 12:30:27 +01:00
Matt Arsenault	c8eeee2be2	AMDGPU: Drop unsafe 1/sqrt -> rsq combine AMDGPUCodeGenPrepare implements a safer version of this that handles denormals correctly. https://reviews.llvm.org/D158032	2023-08-16 08:52:17 -04:00
Matt Arsenault	81b278e613	AMDGPU: Fix fast f32 exp2 Mirror of the previous log changes, OpenCL conformance doesn't like interpreting afn as ignore denormal handling but was previously hidden by flag dropping.	2023-08-15 10:48:46 -04:00
Matt Arsenault	4b7b4b9458	AMDGPU: Fix fast f32 log/log10 OpenCL conformance didn't like interpreting afn as ignore the denormal handling. https://reviews.llvm.org/D157940	2023-08-15 10:48:46 -04:00
Matt Arsenault	e09b3593ba	AMDGPU: Fix fast math log2 f32 Apparently afn doesn't allow you to drop the denormal handling according to OpenCL conformance. This was hidden by losing the flags during the library linking process. Fast log is still broken and needs more work. https://reviews.llvm.org/D157936	2023-08-15 10:48:46 -04:00
Matt Arsenault	1faa4797ca	AMDGPU: Handle unsafe exp.f32 with denormal handling I somehow missed this path when adding the new expansions. Saves a lot of instructions for afn + IEEE. https://reviews.llvm.org/D157867	2023-08-14 18:36:01 -04:00
Joe Nash	2fb4bfa5ba	[AMDGPU][True16] Fix ISel for A16 Image Instructions The 16-bit VAddr arguments to A16 image instructions are packed into legal VGPR_32 operands in AMDGPULegalizerInfo::legalizeImageIntrinsic on all subtargets. With True16, we also need to pack if the number of VAddr is one because VGPR_16 is not a legal argument to those Image instructions. No change to emitted code intended on subtargets pre-GFX11, and none on GFX11 until True16 is active. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D157426	2023-08-11 11:12:16 -04:00
Matt Arsenault	1030483561	AMDGPU/GlobalISel: Handle stacksave/stackrestore https://reviews.llvm.org/D156670	2023-08-11 10:25:01 -04:00
Mirko Brkusanin	fadf3e7f2b	[AMDGPU][GlobalISel] Update legalizer for G_ABS, G_SMIN, G_SMAX, G_UMIN, G_UMAX There is no need to increase the size of odd sized vectors if they are going to be scalarized by a different rule. Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D155865	2023-08-02 12:18:18 +02:00
Sameer Sahasrabuddhe	d9847cde48	[GlobalISel] convergent intrinsics Introduced the convergent equivalent of the existing G_INTRINSIC opcodes: - G_INTRINSIC_CONVERGENT - G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS Out of the targets that currently have some support for GlobalISel, the patch assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154766	2023-07-31 12:15:39 +05:30
Sameer Sahasrabuddhe	7c760b224b	Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556 This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.	2023-07-27 14:49:17 +05:30
Sameer Sahasrabuddhe	d0f7850b01	Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. The changes did not cover all occurrences of the deteleted function MachineInstr::getIntrinsicID().	2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe	baa3386edb	[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556	2023-07-27 10:00:45 +05:30
Matt Arsenault	e3fd8f83a8	AMDGPU: Correctly expand f64 sqrt intrinsic rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these users should be updated to call the intrinsic instead. The library and llpc expansions are slightly different. llpc uses an ldexp to do the scale; the library uses a multiply. Use ldexp to do the scale instead of the multiply. I believe v_ldexp_f64 and v_mul_f64 are always the same number of cycles, but it's cheaper to materialize the 32-bit integer constant than the 64-bit double constant. The libraries have another fast version of sqrt which will be handled separately. I am tempted to do this in an IR expansion instead. In the IR we could take advantage of computeKnownFPClass to avoid the 0-or-inf argument check.	2023-07-25 07:54:11 -04:00
Matt Arsenault	0295513238	AMDGPU: Filter out contract flags when lowering exp It is unsafe to contract the fsub into the fmul. It also increases code size by duplicating a constant.	2023-07-20 18:14:24 -04:00
Matt Arsenault	bd2dca0f74	AMDGPU: Use hex floats instead of ugly bitcasting	2023-07-17 19:54:04 -04:00
Matt Arsenault	fbe4ff8149	AMDGPU: Partially fix not respecting dynamic denormal mode The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still somewhat broken because it involves a mode switch and we need to query the original mode.	2023-07-11 15:14:52 -04:00
Matt Arsenault	5491666248	AMDGPU: Correctly lower llvm.exp.f32 The library expansion has too many paths for all the permutations of DAZ, unsafe and the 3 exp functions. It's easier to expand it in the backend when we know all of these things. The library currently misses the no-infinity check on the overflow, which this handles optimizing out. Some of the <3 x half> fast tests regress due to vector widening dropping flags which will be fixed separately. Apparently there is no exp10 intrinsic, but there should be. Adds some deadish code in preparation for adding one while I'm following along with the current library expansion.	2023-07-05 17:23:49 -04:00
Matt Arsenault	ed556a1ad5	AMDGPU: Correctly lower llvm.exp2.f32 Previously this did a fast math expansion only.	2023-07-05 17:23:48 -04:00

1 2 3 4 5 ...

648 Commits