llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	bb1fe36977	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721	2022-01-24 11:51:08 -08:00
Stanislav Mekhanoshin	c27f8fb968	[AMDGPU] Remove cndmask from readsExecAsData Differential Revision: https://reviews.llvm.org/D117909	2022-01-24 11:24:47 -08:00
Matt Arsenault	18aabae8e2	AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills These have negative / out of bounds frame index values and would assert when trying to set the BitVector. Fixed stack objects can't be colored away so ignore them.	2022-01-24 09:45:41 -05:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit a97e20a3a8a58be751f023e610758310d5664562.	2022-01-24 09:26:52 -05:00
Abinav Puthan Purayil	912af6b570	[AMDGPU][GlobalISel] Remove the post ':' part of vreg operands in fsh combine tests.	2022-01-24 16:30:40 +05:30
Jay Foad	aa50b93e7c	[AMDGPU][GlobalISel] Add more sign/zero/any-extension tests Add s1 to s16 cases, and for sgprs s1 to s64 and s32 to s64.	2022-01-24 10:16:51 +00:00
Jay Foad	906ebd5830	[AMDGPU][GlobalISel] Regenerate checks in inst-select-*ext.mir	2022-01-24 10:16:51 +00:00
Abinav Puthan Purayil	68b70d17d8	[GlobalISel] Fold or of shifts with constant amount to funnel shift. This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0 and C1 are constants where C0 + C1 is the bit-width of the shift instructions. Differential Revision: https://reviews.llvm.org/D116529	2022-01-24 10:43:32 +05:30
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
Sebastian Neubauer	ae2f9c8be8	[AMDGPU] Remove lz and nomip combine from codegen These combines have been moved into the IR combiner in D116042. Differential Revision: https://reviews.llvm.org/D116116	2022-01-21 12:09:08 +01:00
Stanislav Mekhanoshin	41ebd19681	[AMDGPU] Do not ignore exec use where exec is read as data Compares, v_cndmask_b32, and v_readfirstlane_b32 use EXEC in a way which modifies the result. This implicit EXEC use shall not be ignored for the purposes of instruction moves. Differential Revision: https://reviews.llvm.org/D117814	2022-01-20 14:05:22 -08:00
Nico Weber	9122b5072a	[llvm] Remove an old bot cleanup command	2022-01-20 15:02:35 -05:00
Stanislav Mekhanoshin	94a0660c14	[AMDGPU] Regenerate remat-vop.mir. NFC.	2022-01-20 11:15:42 -08:00
Matt Arsenault	237502c1a4	AMDGPU: Fix asm in test using wrong IR type for physical register	2022-01-20 12:56:53 -05:00
Matt Arsenault	064cea9c9a	AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection Avoids a test diff with SDAG.	2022-01-20 12:56:53 -05:00
Matt Arsenault	2d1f9aa27d	AMDGPU/GlobalISel: Regenerate test checks with -NEXT	2022-01-20 12:56:53 -05:00
Matt Arsenault	08549ba51e	AMDGPU/GlobalISel: Explicitly set -global-isel-abort in failure tests If the default mode is the fallback, this would fail since it would end up seeing the DAG failure message instead.	2022-01-20 12:56:53 -05:00
Matt Arsenault	2e49e0cfde	AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics Emit an error if the return value is used on subtargets that do not support them. Previously we were falling back to the DAG on selection failure, where it would emit this error and then fail again.	2022-01-20 12:46:45 -05:00
Matt Arsenault	be7e938e27	AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd This code is not structured to handle the legacy buffer intrinsics and was miscompiling them.	2022-01-20 12:12:06 -05:00
Matt Arsenault	8ff3c9e0be	AMDGPU/GlobalISel: Fix selection of gfx90a FP atomics The struct/raw forms for the buffer atomics now work as expected. However, we're incorrectly handling the legacy form (which we probably shouldn't handle at all). We also are not diagnosing the use of the return value on gfx908. These will be addressed separately.	2022-01-20 12:12:06 -05:00
Matt Arsenault	89c447e4e6	AMDGPU: Stop reserving 36-bytes before kernel arguments for amdpal This was inheriting the mesa behavior, and as far as I know nobody is using opencl kernels with amdpal. The isMesaKernel check was irrelevant because this property needs to be held for all functions.	2022-01-20 12:12:05 -05:00
Abinav Puthan Purayil	d8b690409d	[AMDGPU] Set MemoryVT for truncstores in tblgen. GlobalISelEmitter was skipping these patterns when its predicates were checked. This patch should allow us to select d16_hi stores in GlobalISel. Differential Revision: https://reviews.llvm.org/D117762	2022-01-20 19:05:12 +05:30
Jay Foad	847bb26820	[AMDGPU] Regenerate some MIR checks	2022-01-20 12:41:40 +00:00
Konstantina	c7b71acef2	[AMDGPU][NFC] Add autogenerated tests for vgpr-tuple-allocation.ll Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D117692	2022-01-19 15:06:16 -08:00
Yaxun (Sam) Liu	15f54dd5e4	AMDGPU: Account for usage HIP-style dynamic LDS Disable promote alloca to LDS when HIP-style dynamic LDS since the size is unknown at compile time. Patch by: Siu Chi Chan Reviewed by: Matt Arsenault, Yaxun Liu Differential Revision: https://reviews.llvm.org/D117494	2022-01-19 13:05:29 -05:00
Matt Arsenault	052503979e	AMDGPU/GlobalISel: Fix introducing f16 fmed3 for gfx8	2022-01-19 10:43:21 -05:00
Matt Arsenault	adab71711e	AMDGPU/GlobalISel: Fix legalize failure on i65 ctpop	2022-01-19 10:26:28 -05:00
Matt Arsenault	b965617ccc	GlobalISel: Fix assert on unmerge to different element of casted vector This was failing if a G_UNMERGE_VALUES produced a different element type than the cast result type.	2022-01-19 10:13:31 -05:00
Matt Arsenault	7f26a1027f	AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences Arbitrary stack pointers are accessed using MUBUF instructions with the voffset field, which is interpreted as the swizzled address. We want to fold fold into the MUBUF form to use the SP in the SGPR offset, and previously we were special casing the interpretation of the pointer value if the access memory operand said it was relative to the stack pointer. 690f5b7a0128a210093e9b217932743ad35b5c5a removed this check, and moved the DAG path to special casing copies from SGPRs. This is not an entirely sound approach, since it's still changing the interpretation of pointer values based the context. Introduce a new pseudo which corresponds to the wave-to-vector address transform. This way the memory instruction has consistent semantics where the incoming pointer is always interpreted as a vector address, and we're not obligated to optimize into the MUBUF offset-only addressing mode. The DAG should probably have an equivalent pseudo. This should fix some correctness issues, and folding this into addressing modes will be a future optimization patch.	2022-01-19 10:13:31 -05:00
Jay Foad	0bc14a0a98	[AMDGPU] Tweak some compares in wqm.ll test This prevents the compares from being optimized away when D86578 lands, which seems unintended. Also fixed some unused results.	2022-01-19 12:42:56 +00:00
Piotr Sobczak	8dfb417e67	[AMDGPU] Fix missing waitcnt issue Ignore out of order counters when merging brackets. The fact that there was a pending event in the old state does not guarantee that the waitcnt was generated, so we still need to conservatively re-process the block. The patch fixes a correctness issue where the block was not re-processed and the waitcnt not inserted in consequence. Differential Revision: https://reviews.llvm.org/D117544	2022-01-19 10:54:44 +01:00
Jay Foad	7af959673e	[AMDGPU] Tweak some compares in wave32.ll test This prevents the compares from being optimized away when D86578 lands, which seems unintended.	2022-01-19 08:14:06 +00:00
Matt Arsenault	e3dd47f987	AMDGPU: Fix using deprecated buffer intrinsics in test	2022-01-18 19:02:47 -05:00
Matt Arsenault	da72822763	GlobalISel: Fix CSEMIRBuilder mishandling constant folds of vectors This was ignoring the requested result register, resulting in a missing def when this happened in the IRTranslator. Fixes some crashes and verifier errors at -O0. Alternatively we could pass DstOps to the constant fold functions.	2022-01-18 17:21:02 -05:00
Matt Arsenault	42098c4a30	GlobalISel: Fix legalization error where CSE leaves behind dead defs If the conversion artifact introduced in the unmerge of cast of merge combine already existed in the function, this would introduce dead copies which kept the old casts around, neither of which were deleted, and would fail legalization. This would fail as follows: The G_UNMERGE_VALUES of the G_SEXT of the G_BUILD_VECTOR would introduce a G_SEXT for each of the scalars. Some of the required G_SEXTs already existed in the function, so CSE moves them up in the function and introduces a copy to the original result register. The introduced CSE copies are dead, since the originally G_SEXTs were already directly used. These copies add a use to the illegal G_SEXTs, so they are not deleted. The artifact combiner does not see the defs that need to be updated, since it was hidden inside the CSE builder. I see 2 potential fixes, and opted for the mechanically simpler one, which is to just not insert the cast if the result operand isn't used. Alternatively, we could not insert the cast directly into the result register, and use replaceRegOrBuildCopy similar to the case where there is no conversion. I suspect this is a wider problem in the artifact combiner.	2022-01-18 17:04:40 -05:00
Matt Arsenault	82de129ab8	AMDGPU: Remove llvm.amdgcn.alignbit and handle bitcode upgrade to fshr	2022-01-18 14:08:36 -05:00
Matt Arsenault	de1600a1d9	AMDGPU: Avoid enabling kernel workitem IDs with reqd_work_group_size	2022-01-18 13:52:04 -05:00
Matt Arsenault	984451eafc	PostRAPseudos: Don't preserve kills on some implicit copy operands This fixes a verifier error I ran into at -O0. A subregister copy had an implicit kill of an overlapping superregister, which was partially redefined by the copy. The preserved implicit operand killed subregisters made live earlier in the sequence. AMDGPU already uses similar logic for whether to preserve the kill of the superregister on the final instruction if there's overlap.	2022-01-18 13:52:04 -05:00
Matt Arsenault	f5ff1cab43	AMDGPU/GlobalISel: Regenerate base test checks	2022-01-18 11:26:47 -05:00
Vang Thao	10ed1eca24	[MachineSink] Allow sinking of constant or ignorable physreg uses For AMDGPU, any use of the physical register EXEC prevents sinking even if it is not a real physical register read. Add check to see if a physical register use can be ignored for sinking. Also perform same constant and ignorable physical register check when considering sinking in loops. https://reviews.llvm.org/D116053	2022-01-18 14:17:40 +00:00
Christudasan Devadasan	56a5d78893	[AMDGPU] Disable optimizeEndCf at -O0 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116819	2022-01-18 02:48:52 -05:00
Carl Ritson	ed4d8fdafd	[AMDGPU] Autogenerate wqm.ll Switch wqm.ll to be autogenerated. Replace gfx6 and gfx8 targets with gfx9 (wave64) and gfx10 (wave32). Reviewed By: kmitropoulou Differential Revision: https://reviews.llvm.org/D117455	2022-01-18 16:04:40 +09:00
Matt Arsenault	dc2457c8cf	AMDGPU: Fix crashing on calls to C functions from graphics contexts If we had one of the shader calling conventions calling a default calling convention callee, this would crash when the caller did not have anything to pass to the workitem ID. This is illegal, but we still need to produce something sensible. llvm-reduce likes to replace calls to intrinsics with calls to null or undef, so this does appear and is helpful to avoid hard erroring. Pass undef in this case, as already happened for the other implicit arguments. It might make sense to define the behavior here and pass null for the pointers, and -1 for the workitem ID. We do have extra bits in the workitem ID, so this wouldn't conflict with a valid value.	2022-01-17 10:13:25 -05:00
Matt Arsenault	9392b40d4b	AMDGPU/GlobalISel: Fix selection of constant 32-bit addrspace loads Unfortunately the selection patterns still rely on the address space from the memory operand instead of using the pointer type. Add this address space to the list of cases supported by global-like loads. Alternatively we would have to adjust the address space of the memory operand to deviate from the underlying IR value, which looks ugly and is more work in the legalizer. This doesn't come up in the DAG path because it uses a different selection strategy where the cast is inserted during the addressing mode matching.	2022-01-17 10:06:33 -05:00
Matt Arsenault	c3a74183a5	AMDGPU/GlobalISel: Fix legalization failure for s65 shifts This was trying to clamp s65 down to s32, which wasn't handled so we need to promote all the way to s128 first. Having to order the legalization rules in just the right way is rather dissatisfying, but I'm not sure how smart the legalizer should be in trying to interpret the rules.	2022-01-17 10:04:41 -05:00
Matt Arsenault	d97fb55ff3	AMDGPU/GlobalISel: Add failing ABI lowering testcases	2022-01-17 09:38:35 -05:00
Matt Arsenault	e09f98a69a	AMDGPU: Fix LiveVariables error after optimizing VGPR ranges This was not removing the block from the live set depending on the specific depth first visit order. Fixes a verifier error in the OpenCL conformance tests.	2022-01-17 09:38:35 -05:00
Matt Arsenault	81004269e5	AMDGPU/GlobalISel: Fix test not matching test name This was testing an s48 load instead of an s64 load as intended.	2022-01-17 09:38:35 -05:00
Nikita Popov	873a7ee7e4	[MachineInstr] Don't include debug uses in bundle header (PR52817) Following the recommendation in https://github.com/llvm/llvm-project/issues/52817#issuecomment-1007635426, this excludes debug instructions when finalizing the bundle. As uses in debug instructions don't have effects, they will no longer be included in the BUNDLE header. Fixes https://github.com/llvm/llvm-project/issues/52817. Differential Revision: https://reviews.llvm.org/D116945	2022-01-17 10:43:21 +01:00
Craig Topper	454256ef4f	[AMDGPU] Correct the known bits calculation for MUL_I24. I'm not entirely sure, but based on how ComputeNumSignBits handles ISD::MUL, I believe this code was miscounting the number of sign bits. As an example of an incorrect result let's say that countMinSignBits returned 1 for the left hand side and 24 for the right hand side. LHSValBits would be 23 and RHSValBits would be 0 and the sum would be 23. This would cause the code to set 9 high bits as zero/one. Now suppose the real values for the left side is 0x800000 and the right hand side is 0xffffff. The product is 0x00800000 which has 8 sign bits not 9. The number of valid bits for the left and right operands is now the number of non-sign bits + 1. If the sum of the valid bits of the left and right sides exceeds 32, then the result may overflow and we can't say anything about the sign of the result. If the sum is 32 or less then it won't overflow and we know the result has at least 1 sign bit. For the previous example, the code will now calculate the left side valid bits as 24 and the right side as 1. The sum will be 25 and the sign bits will be 32 - 25 + 1 which is 8, the correct value. Differential Revision: https://reviews.llvm.org/D116469	2022-01-14 08:54:54 -08:00

1 2 3 4 5 ...

5146 Commits