llvm-project

Author	SHA1	Message	Date
Matt Arsenault	8e0fadda10	AMDGPU: Bulk update all GlobalISel tests to use opaque pointers	2022-11-28 11:51:36 -05:00
jeff	f4e6149d82	[AMDGPU] Use V_PERM to match buildvectors when inputs are not canonicalized (i.e. can't use V_PACK) If we can not prove that f16 operands of a buildvector are canonicalized, then we can not lower into a V_PACK. In this scenario, we would previously lower into some combination of and(sdwa), shr, or. This patch allows for matching into V_PERM instead. Change-Id: Ifa4a74fdb81ef44f22ba490c7fdf81ec8aebc945	2022-10-03 12:58:29 -07:00
Pierre van Houtryve	9a67a6b72a	[AMDGPU][GISel] Legalize V2S16 G_BUILD_VECTOR Preparation patch for D134354 to make V2S16 G_BUILD_VECTOR legal. Also removes RegBankInfo's scalarization of small BUILD_VECTORs, replacing it with InstructionSelector logic instead. This allows for V2S16 BUILD_VECTOR instructions to survive all the way to ISel so we can select FMA/MAD_MIX instructions in D134354. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134433	2022-09-30 14:04:53 +00:00
Joe Nash	d1af09ad96	[AMDGPU] gfx11 Generate VOPD Instructions We form VOPD instructions in the GCNCreateVOPD pass by combining back-to-back component instructions. There are strict register constraints for creating a legal VOPD, namely that the matching operands (e.g. src0x and src0y, src1x and src1y) must be in different register banks. We add a PostRA scheduler mutation to put possible VOPD components back-to-back. Depends on D128442, D128270 Reviewed By: #amdgpu, rampitec Differential Revision: https://reviews.llvm.org/D128656	2022-07-05 09:18:19 -04:00
Jay Foad	c155a944fb	[AMDGPU] GFX11 CodeGen support for MIMG instructions This includes: - New llvm.amdgcn.image.msaa.load.* intrinsics - NSA changes, because MIMG-NSA is now limited to 3 dwords - Split CD forms of IMAGE_SAMPLE instructions out into separate test files since they are no longer supported in GFX11 Differential Revision: https://reviews.llvm.org/D127837	2022-06-16 18:23:14 +01:00
Jay Foad	3eb2281bc0	[AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions. This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size. Differential Revision: https://reviews.llvm.org/D114643	2022-05-18 10:19:35 +01:00
Joe Nash	3ce1b9631a	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde	2021-09-14 15:11:27 -04:00
Jay Foad	c0b3c5a064	[AMDGPU][GlobalISel] Run SIAddImgInit This pass is required to get correct codegen for image instructions with the tfe or lwe bits set. Differential Revision: https://reviews.llvm.org/D95132	2021-01-21 15:54:54 +00:00
Jay Foad	d28624a209	[AMDGPU] Stop adding an implicit def of vcc_hi for wave32 This doesn't seem to be needed for anything. Differential Revision: https://reviews.llvm.org/D92400	2020-12-02 10:11:42 +00:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Sebastian Neubauer	a343b9b032	Revert "[AMDGPU] Insert waitcnt after returning from call" This reverts commit ca907bfb57d8ad3ec3bcc2cff2abab7b1b933af6. According to michel.daenzer, > This completely broke the Mesa radeonsi driver on Navi 14. Xorg + > xterm come up with major corruption & psychedelic colours.	2020-09-23 17:16:39 +02:00
Sebastian Neubauer	ca907bfb57	[AMDGPU] Insert waitcnt after returning from call When memory operations are outstanding on function calls, either the caller or the callee can insert a waitcnt to ensure that all reads are finished. Calls need some time to be executed, so if the callee inserts the waitcnt, filling the instruction buffer and waiting for memory will be interleaved, hiding some latency. This comes at the cost of having a waitcnt inside functions that may not be needed as no memory operations are outstanding. For function calls, this is already implemented. The same principal applies to returns: If the caller inserts a waitcnt after the call, the callee does not have to wait and the return and memory operation can be run in parallel. This commit implements waiting in the caller after returning from a function call. Differential Revision: https://reviews.llvm.org/D87674	2020-09-23 12:17:59 +02:00
Jay Foad	590964c835	[AMDGPU] More accurate gfx10 latencies Differential Revision: https://reviews.llvm.org/D81012	2020-06-04 10:29:32 +01:00
Matt Arsenault	48eda37282	AMDGPU/GlobalISel: Start selecting image intrinsics Does not handled atomics yet.	2020-03-30 17:33:04 -04:00

14 Commits