llvm-project

Author	SHA1	Message	Date
pvanhout	868abf0961	Revert "[AMDGPU] Remove Code Object V3 (#67118 )" This reverts commit 544d91280c26fd5f7acd70eac4d667863562f4cc.	2023-10-18 12:55:36 +02:00
Jay Foad	bff17f9f23	[AMDGPU] Remove support for no-return buffer atomic intrinsics. NFC. (#69326 ) Thsi removes some of the machinery added by D85268, which was unused since D87719 changed all buffer atomic intrinsics to return a value.	2023-10-17 14:47:32 +01:00
Pierre van Houtryve	544d91280c	[AMDGPU] Remove Code Object V3 (#67118 ) V3 has been deprecated for a while as well, so it can safely be removed like V2 was removed. - [Clang] Set minimum code object version to 4 - [lld] Fix tests using code object v3 - Remove code object V3 from the AMDGPU backend, and delete or port v3 tests to v4. - Update docs to make it clear V3 can no longer be emitted.	2023-10-16 08:21:48 +02:00
Jay Foad	b49a0dbaeb	[AMDGPU] Fix comments about afn and arcp in fast unsafe fdiv handling (#68982 )	2023-10-13 19:23:53 +01:00
Thomas Symalla	aa5158cd1e	[AMDGPU] Use absolute relocations when compiling for AMDPAL and Mesa3D (#67791 ) The primary ISA-independent justification for using PC-relative addressing is that it makes code position-independent and therefore allows sharing of .text pages between processes. When not sharing .text pages, we can use absolute relocations instead, which will possibly prevent a bubble introduced by s_getpc_b64. Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>	2023-10-10 09:22:02 +02:00
Mirko Brkušanin	ecfdc23dd2	[AMDGPU] Select gfx1150 SALU Float instructions (#66885 )	2023-09-21 12:22:55 +02:00
Pierre van Houtryve	e9e3868707	[AMDGPU] Correctly restore FP mode in FDIV32 lowering (#66346 ) Addresses the FIXME for both DAGISel and GISel.	2023-09-15 08:11:01 +02:00
Pierre van Houtryve	3d0353793b	[AMDGPU] Fix `HasFP32Denormals` check in FDIV32 lowering (#66212 ) Fixes SWDEV-403219	2023-09-14 08:47:10 +02:00
Matt Arsenault	c48248d7f9	AMDGPU: Teach valueIsKnownNeverF32Denorm about frexp https://reviews.llvm.org/D158130	2023-09-12 23:23:10 +03:00
Matt Arsenault	72a7024add	AMDGPU: Correctly lower llvm.sqrt.f32 Make codegen emit correctly rounded sqrt by default. Emit the fast but only kind of fast expansion in AMDGPUCodeGenPrepare based on !fpmath, like the fdiv case. Hack around visitation ordering problems from AMDGPUCodeGenPrepare using forward iteration instead of a well behaved combiner. https://reviews.llvm.org/D158129	2023-09-12 23:22:54 +03:00
Stanislav Mekhanoshin	070c2570ad	[AMDGPU] Global ISel for packed fp32 instructions (#65803 )	2023-09-11 10:48:37 -07:00
Matt Arsenault	77c67436d9	LLT: Add some stub constructors for FP types This is to start documenting uses to ease a future migration to supporting different types with the same size. https://reviews.llvm.org/D150605	2023-09-03 08:33:19 -04:00
Martin	432eda5cc5	Remove dead assignments Dst is never used after creating the type making these assignments dead Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D158610	2023-08-24 12:30:27 +01:00
Matt Arsenault	c8eeee2be2	AMDGPU: Drop unsafe 1/sqrt -> rsq combine AMDGPUCodeGenPrepare implements a safer version of this that handles denormals correctly. https://reviews.llvm.org/D158032	2023-08-16 08:52:17 -04:00
Matt Arsenault	81b278e613	AMDGPU: Fix fast f32 exp2 Mirror of the previous log changes, OpenCL conformance doesn't like interpreting afn as ignore denormal handling but was previously hidden by flag dropping.	2023-08-15 10:48:46 -04:00
Matt Arsenault	4b7b4b9458	AMDGPU: Fix fast f32 log/log10 OpenCL conformance didn't like interpreting afn as ignore the denormal handling. https://reviews.llvm.org/D157940	2023-08-15 10:48:46 -04:00
Matt Arsenault	e09b3593ba	AMDGPU: Fix fast math log2 f32 Apparently afn doesn't allow you to drop the denormal handling according to OpenCL conformance. This was hidden by losing the flags during the library linking process. Fast log is still broken and needs more work. https://reviews.llvm.org/D157936	2023-08-15 10:48:46 -04:00
Matt Arsenault	1faa4797ca	AMDGPU: Handle unsafe exp.f32 with denormal handling I somehow missed this path when adding the new expansions. Saves a lot of instructions for afn + IEEE. https://reviews.llvm.org/D157867	2023-08-14 18:36:01 -04:00
Joe Nash	2fb4bfa5ba	[AMDGPU][True16] Fix ISel for A16 Image Instructions The 16-bit VAddr arguments to A16 image instructions are packed into legal VGPR_32 operands in AMDGPULegalizerInfo::legalizeImageIntrinsic on all subtargets. With True16, we also need to pack if the number of VAddr is one because VGPR_16 is not a legal argument to those Image instructions. No change to emitted code intended on subtargets pre-GFX11, and none on GFX11 until True16 is active. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D157426	2023-08-11 11:12:16 -04:00
Matt Arsenault	1030483561	AMDGPU/GlobalISel: Handle stacksave/stackrestore https://reviews.llvm.org/D156670	2023-08-11 10:25:01 -04:00
Mirko Brkusanin	fadf3e7f2b	[AMDGPU][GlobalISel] Update legalizer for G_ABS, G_SMIN, G_SMAX, G_UMIN, G_UMAX There is no need to increase the size of odd sized vectors if they are going to be scalarized by a different rule. Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D155865	2023-08-02 12:18:18 +02:00
Sameer Sahasrabuddhe	d9847cde48	[GlobalISel] convergent intrinsics Introduced the convergent equivalent of the existing G_INTRINSIC opcodes: - G_INTRINSIC_CONVERGENT - G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS Out of the targets that currently have some support for GlobalISel, the patch assumes that the convergent intrinsics only relevant to SPIRV and AMDGPU. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154766	2023-07-31 12:15:39 +05:30
Sameer Sahasrabuddhe	7c760b224b	Restore "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556 This restores commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. Originally reverted in d0f7850b01cf17e50a4f4b00e3b84dded94df6b8.	2023-07-27 14:49:17 +05:30
Sameer Sahasrabuddhe	d0f7850b01	Revert "[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR" This reverts commit baa3386edb11a2f9bcadda8cf58d56f3707c39fa. The changes did not cover all occurrences of the deteleted function MachineInstr::getIntrinsicID().	2023-07-27 10:14:24 +05:30
Sameer Sahasrabuddhe	baa3386edb	[GlobalISel] GIntrinsic subclass to represent intrinsics in Generic Machine IR Some opcodes in generic MIR represent calls to intrinsics, where the intrinsic ID is the first non-def operand to the instruction. These are now represented as a subclass of GenericMachineInstr, and the method MachineInstr::getIntrinsicID() is now moved to this subclass GIntrinsic. Some target-defined instructions behave like GMIR intrinsics, and have an Intrinsic::ID operand. But they should not be recognized as generic intrinsics, and should not use GIntrinsic::getIntrinsicID(). Separated these out by introducing a new AMDGPU::getIntrinsicID(). Reviewed By: arsenm, Pierre-vh Differential Revision: https://reviews.llvm.org/D155556	2023-07-27 10:00:45 +05:30
Matt Arsenault	e3fd8f83a8	AMDGPU: Correctly expand f64 sqrt intrinsic rocm-device-libs and llpc were avoiding using f64 sqrt intrinsics in favor of their own expansions. Port the expansion into the backend. Both of these users should be updated to call the intrinsic instead. The library and llpc expansions are slightly different. llpc uses an ldexp to do the scale; the library uses a multiply. Use ldexp to do the scale instead of the multiply. I believe v_ldexp_f64 and v_mul_f64 are always the same number of cycles, but it's cheaper to materialize the 32-bit integer constant than the 64-bit double constant. The libraries have another fast version of sqrt which will be handled separately. I am tempted to do this in an IR expansion instead. In the IR we could take advantage of computeKnownFPClass to avoid the 0-or-inf argument check.	2023-07-25 07:54:11 -04:00
Matt Arsenault	0295513238	AMDGPU: Filter out contract flags when lowering exp It is unsafe to contract the fsub into the fmul. It also increases code size by duplicating a constant.	2023-07-20 18:14:24 -04:00
Matt Arsenault	bd2dca0f74	AMDGPU: Use hex floats instead of ugly bitcasting	2023-07-17 19:54:04 -04:00
Matt Arsenault	fbe4ff8149	AMDGPU: Partially fix not respecting dynamic denormal mode The most notable issue was producing v_mad_f32 in functions with the dynamic mode, since it just ignores the mode. fdiv lowering is still somewhat broken because it involves a mode switch and we need to query the original mode.	2023-07-11 15:14:52 -04:00
Matt Arsenault	5491666248	AMDGPU: Correctly lower llvm.exp.f32 The library expansion has too many paths for all the permutations of DAZ, unsafe and the 3 exp functions. It's easier to expand it in the backend when we know all of these things. The library currently misses the no-infinity check on the overflow, which this handles optimizing out. Some of the <3 x half> fast tests regress due to vector widening dropping flags which will be fixed separately. Apparently there is no exp10 intrinsic, but there should be. Adds some deadish code in preparation for adding one while I'm following along with the current library expansion.	2023-07-05 17:23:49 -04:00
Matt Arsenault	ed556a1ad5	AMDGPU: Correctly lower llvm.exp2.f32 Previously this did a fast math expansion only.	2023-07-05 17:23:48 -04:00
Matt Arsenault	9c82dc6a6b	AMDGPU: Always use v_rcp_f16 and v_rsq_f16 These inherited the fast math checks from f32, but the manual suggests these should be accurate enough for unconditional use. The definition of correctly rounded is 0.5ulp, but the manual says "0.51ulp". I've been a bit nervous about changing this as the OpenCL conformance test does not cover half. Brute force produces identical values compared to a reference host implementation for all values.	2023-07-05 16:53:01 -04:00
Matt Arsenault	4e15f378ee	AMDGPU: Correctly lower llvm.log.f32 and llvm.log10.f32 Previously we expanded these in a fast-math way and the device libraries were relying on this behavior. The libraries have a pending change to switch to the new target intrinsic. Unlike the library version, this takes advantage of no-infinities on the result overflow check.	2023-07-05 15:30:35 -04:00
Matt Arsenault	003b58f65b	IR: Add llvm.frexp intrinsic Add an intrinsic which returns the two pieces as multiple return values. Alternatively could introduce a pair of intrinsics to separately return the fractional and exponent parts. AMDGPU has native instructions to return the two halves, but could use some generic legalization and optimization handling. For example, we should be able to handle legalization of f16 on older targets, and for bf16. Additionally antique targets need a hardware workaround which would be better handled in the backend rather than in library code where it is now.	2023-06-28 14:50:16 -04:00
Matt Arsenault	89ccfa1b39	AMDGPU: Use correct lowering for llvm.log2.f32 We previously directly codegened to v_log_f32, which is broken for denormals. The lowering isn't complicated, you simply need to scale denormal inputs and adjust the result. Note log and log10 are still not accurate enough, and will be fixed separately.	2023-06-23 08:37:37 -04:00
Matt Arsenault	92ee60b66f	AMDGPU: Drop and upgrade llvm.amdgcn.atomic.inc/dec to atomicrmw	2023-06-21 21:20:26 -04:00
Matt Arsenault	d0923a7739	AMDGPU: Correct constants used in fast math log expansion The division between float constants was done with less precision. Performing the divide in double and truncating to float provides the same value as used in the library fast math expansion.	2023-06-12 21:11:41 -04:00
Matt Arsenault	4e4c351ae5	AMDGPU: Avoid endpgm in middle of block for fallback trap lowering. This was inserting an s_endpgm in the middle of the block when it has to be a terminator. Split the block and insert a branch to a new block with the trap if it's not in a terminator position. Fixes verifier error on LDS in function with no trap support (and other trap sources).	2023-06-09 21:04:38 -04:00
Matt Arsenault	eece6ba283	IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsics AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.	2023-06-06 17:07:18 -04:00
Krzysztof Drewniak	23098bd454	[AMDGPU] Add intrinsic for converting global pointers to resources Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit pointer, the 16-bit stride/swizzling constant that replace the high 16 bits of an address in a buffer resource, the 32-bit extent/number of elements, and the 32-bit flags (the latter two being the 3rd and 4th wards of the resource), and combines them into a ptr addrspace(8). This intrinsic is lowered during the early phases of the backend. This intrinsic is needed so that alias analysis can correctly infer that a certain buffer resource points to the same memory as some global pointer. Previous methods of constructing buffer resources, which relied on ptrtoint, would not allow for such an inference. Depends on D148184 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D148957	2023-06-05 17:07:59 +00:00
Krzysztof Drewniak	ab37937812	[AMDGPU] Use resource base for buffer instruction MachineMemOperands 1. Remove the existing code that would encode the constant offsets (if there were any) on buffer intrinsic operations onto their `MachineMemOperand`s. As far as I can tell, this use of `offset` has no substantial impact on the generated code, especially since the same reasoning is performed by areMemAccessesTriviallyDisjoint(). 2. When a buffer resource intrinsic takes a pointer argument as the base resource/descriptor, place that memory argument in the value field of the MachineMemOperand attached to that intrinsic. This is more conservative than what would be produced by more typical LLVM code using GEP, as the Value (for alias analysis purposes) corresponding to accessing buffer[0] and buffer[1] is the same. However, the target-specific analysis of disjoint offsets covers a lot of the simple usecases. Despite this limitation, the new buffer intrinsics, combined with LLVM's existing pointer annotations, allow for non-trivial optimizations, as seen in the new tests, where marking two buffer descriptors "noalias" allows merging together loads and stores in a "load from A, modify loaded value, store to B" sequence, which would not be possible previously. Depends on D147547 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D148184	2023-06-05 17:06:57 +00:00
Krzysztof Drewniak	faa2c678aa	[AMDGPU] Add buffer intrinsics that take resources as pointers In order to enable the LLVM frontend to better analyze buffer operations (and to potentially enable more precise analyses on the backend), define versions of the raw and structured buffer intrinsics that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their rsrc arguments. The new intrinsics are named by replacing `buffer.` with `buffer.ptr`. One advantage to these intrinsic definitions is that, instead of specifying that a buffer load/store will read/write some memory, we can indicate that the memory read or written will be based on the pointer argument. This means that, for example, a read from a `noalias` buffer can be pulled out of a loop that is modifying a distinct buffer. In the future, we will define custom PseudoSourceValues that will allow us to package up the (buffer, index, offset) triples that buffer intrinsics contain and allow for more precise backend analysis. This work also enables creating address space 7, which represents manipulation of raw buffers using native LLVM load and store instructions. Where tests simply used a buffer intrinsic while testing some other code path (such as the tests for VGPR spills), they have been updated to use the new intrinsic form. Tests that are "about" buffer intrinsics (for instance, those that ensure that they codegen as expected) have been duplicated, either within existing files or into new ones. Depends on D145441 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D147547	2023-06-05 16:59:07 +00:00
Matt Arsenault	2f5a116cf7	AMDGPU: Expand casted f16 fmed3 pattern to fmin/fmax on gfx8 If we have legal f16 instructions but no f16 med3, we can save one instruction by expanding out the min/max sequence compared to casting to f32 and casting back.	2023-05-23 08:48:25 +01:00
Mateja Marjanovic	cf76074a36	[AMDGPU][GlobalISel] Check exact width in get*ClassForBitWidth and widen if necessary Instead of checking if the given bitwidth is less or equal to a bitwidth of an existing RegClass, check if it has the exact same value. For LLVM vector types that don't have a corresponding Register Class, widen them during legalization. That goes for G_EXTRACT_VECTOR_ELT, G_INSERT_VECTOR_ELT and G_BUILD_VECTOR. Differential revision: https://reviews.llvm.org/D148096 Reviewers: foad, arsenm	2023-05-03 17:32:24 +02:00
Mateja Marjanovic	6175ec0bb6	Revert "[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR" This reverts commit b25c7cafcbe1b52ea2d1ff5e5c2f13674b5f297d.	2023-05-03 17:28:01 +02:00
Mateja Marjanovic	b25c7cafcb	[AMDGPU][GlobalISel] Widen the vector operand in G_BUILD/INSERT/EXTRACT_VECTOR Widen the vector operand type in G_BUILD_VECTOR, G_INSERT_VECTOR_ELT, G_EXTRACT_VECTOR_ELT to the nearest larger RegClass.	2023-05-03 17:14:38 +02:00
Changpeng Fang	3bc1e084ee	AMDGPU: Created a subclass for the return address operand in the tail call return instruction Summary: This is to avoid using the callee saved registers for the return address of the tail call return instruction. Reviewers: arsenm, cdevadas Differential Revision: https://reviews.llvm.org/D147096	2023-04-10 10:53:33 -07:00
Jon Chesterfield	0507448d82	[amdgpu] Implement dynamic LDS accesses from non-kernel functions The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so. 1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel 2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables 3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen 4/ The assembler builds the lookup table using the metadata 5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144233	2023-04-04 20:06:34 +01:00
Mariusz Sikora	ea064ee2a3	[AMDGPU] Create Subtarget Features for some of 16 bits atomic fadd instructions Introducing Subtarget Features for instructions: - ds_pk_add_bf16 - ds_pk_add_f16 - ds_pk_add_rtn_bf16 - ds_pk_add_rtn_f16 - flat_atomic_pk_add_f16 - flat_atomic_pk_add_bf16 - global_atomic_pk_add_f16 - global_atomic_pk_add_bf16 - buffer_atomic_pk_add_f16 Differential Revision: https://reviews.llvm.org/D146701	2023-03-24 13:10:40 +01:00
Jay Foad	dcb834843e	[AMDGPU] Split SIModeRegisterDefaults out of AMDGPUBaseInfo. NFC. This is only used by CodeGen. Moving it out of AMDGPUBaseInfo simplifies future changes to make some of it depend on the subtarget. Differential Revision: https://reviews.llvm.org/D144650	2023-02-23 16:38:15 +00:00

1 2 3 4 5 ...

629 Commits