llvm-project

Author	SHA1	Message	Date
Stanislav Mekhanoshin	79606ee85c	[AMDGPU] Check atomics aliasing in the clobbering annotation MemorySSA considers any atomic a def to any operation it dominates just like a barrier or fence. That is correct from memory state perspective, but not required for the no-clobber metadata since we are not using it for reordering. Skip such atomics during the scan just like a barrier if it does not alias with the load. Differential Revision: https://reviews.llvm.org/D118661	2022-02-01 12:33:25 -08:00
Stanislav Mekhanoshin	c2b18a3cc5	[AMDGPU] Allow scalar loads after barrier Currently we cannot convert a vector load into scalar if there is dominating barrier or fence. It is considered a clobbering memory access to prevent memory operations reordering. While reordering is not possible the actual memory is not being clobbered by a barrier or fence and we can still use a scalar load for a uniform pointer. The solution is not to bail on a first clobbering access but traverse MemorySSA to the root excluding barriers and fences. Differential Revision: https://reviews.llvm.org/D118419	2022-02-01 11:43:17 -08:00
Fangrui Song	1494d064fa	[AMDGPU][test] Add dso_local to prevent preemptible alias resolution	2022-02-01 10:23:45 -08:00
Jay Foad	d2e5d3512b	[StructurizeCFG] Clean up some boolean not instructions In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get away with modifying an existing compare instruction instead. (StructurizeCFG is generally run late in the pipeline so instcombine does not clean them up for us.) Differential Revision: https://reviews.llvm.org/D118623	2022-02-01 09:35:37 +00:00
Changpeng Fang	1194b9cdda	AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2 Reviewers: t-tye, b-sumner, arsenm, and bcahoon Fixes: SWDEV-307188, SWDEV-307189 Differential Revision: https://reviews.llvm.org/D118272	2022-01-31 18:07:47 -08:00
Jay Foad	8faad29634	Revert "[Local] invertCondition: try modifying an existing ICmpInst" This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016. Apparently it is not safe to modify the condition even if it passes the hasOneUse test, because StructurizeCFG might have other references to the condition that are not manifest in the IR use-def chains.	2022-01-31 14:55:36 +00:00
Jay Foad	ae68b3a457	[AMDGPU] Add test for a problem with noclobber metadata If AMDGPUAnnotateUniformValues finds a load from a uniform pointer with no potentially clobbering stores between the kernel entry point and the load instruction, it adds noclobber metadata to the address. This is unsafe because it can get applied to other loads in the same which do have aliasing stores. Differential Revision: https://reviews.llvm.org/D118458	2022-01-31 11:09:34 +00:00
Jay Foad	a6b54ddaba	[Local] invertCondition: try modifying an existing ICmpInst This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pipeline, instcombine does not clean up the xor-of-cmp pattern. Differential Revision: https://reviews.llvm.org/D118478	2022-01-31 10:44:17 +00:00
Matt Arsenault	33b45ee44b	AMDGPU: Handle addrspacecast of constant 32-bit to flat I accidentally made this work on the GlobalISel path, and there's no real reason not to handle this.	2022-01-27 11:01:44 -05:00
Matt Arsenault	d77c7c80d1	AMDGPU: Fix broken check lines in test	2022-01-27 11:01:44 -05:00
Matt Arsenault	aa88b65392	AMDGPU/GlobalISel: Fix assert on invalid cond code for llvm.amdgcn.icmp	2022-01-27 10:34:06 -05:00
Matt Arsenault	f482e86980	AMDGPU/GlobalISel: Fix flat_scratch_init handling for shaders I don't think this is actually defined for mesa, but this is what we were doing on the DAG path.	2022-01-27 10:20:52 -05:00
Jay Foad	185cb8e82c	[AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access Swizzled accesses are not merged, but there is no particular reason not to merge two instructions if any of the intervening instructions happens to be a swizzled access. This moves the check for swizzled accesses out of checkAndPrepareMerge into collectMergeableInsts where I think it makes more sense. Differential Revision: https://reviews.llvm.org/D118267	2022-01-27 14:40:58 +00:00
Jay Foad	c5d2b97a69	[AMDGPU] Precommit test for swizzled store aliasing two loads	2022-01-27 12:13:06 +00:00
Jay Foad	15b11e00f0	[AMDGPU] Update MachineMemOperands syntax in commented out tests	2022-01-27 10:56:35 +00:00
Jay Foad	b30d9df457	[AMDGPU] Remove unused CI check lines	2022-01-27 10:53:42 +00:00
Jay Foad	3b259a6842	[AMDGPU] Remove unused GFX6 check lines	2022-01-27 10:48:32 +00:00
Stanislav Mekhanoshin	dbf278b984	[AMDGPU] Prevent aliasing of SrcC and Dst in MAI Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read. Note that this inevitably increases register pressure to the point where some programs will become uncompilable. This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst. Fixes: SWDEV-318900 Differential Revision: https://reviews.llvm.org/D117844	2022-01-26 14:48:20 -08:00
Matt Arsenault	045be6ff36	AMDGPU/GlobalISel: Fold wave address into mubuf addressing modes	2022-01-26 15:25:26 -05:00
Matt Arsenault	2d670de84c	GlobalISel: Avoid crash on asm with lying result types The physical register in the asm has the wrong type for the declared IR. It seems to work in the DAG by extracting the 4 elements that are defined in the IR from the register, but that isn't handled here. This doesn't seem to be a well tested path since other mismatched cases are crashing the DAG asm handling.	2022-01-26 15:23:59 -05:00
Matt Arsenault	09fc311af7	AMDGPU/GlobalISel: Mostly fix BFI patterns Most importantly, fixes constant bus errors in the 64-bit cases. It's surprising to me these were even passing the selection test using SReg_* sources. Also fixes pattern matching in the 32-bit cases, with simple operands. These patterns aren't working in a few cases, like with mixed SGPR inputs. The patterns aren't looking through the SGPR->VGPR copies like they need to. The vector cases also have some unmerges of build_vector which are obscuring the inputs.	2022-01-26 15:06:50 -05:00
Matt Arsenault	eb88e793ff	AMDGPU: Add some additional test coverage for BFI matching Try to stress constant bus restriction enforcement since some of these are broken for GlobalISel. Split the r600 test because some of these cases don't compile (and all the ones using return values are discarded).	2022-01-26 15:06:50 -05:00
Matt Arsenault	2f33396e4e	AMDGPU: Switch bfi pattern test to generated checks and add gfx10	2022-01-26 15:06:50 -05:00
Matt Arsenault	e6564f39c7	AMDGPU: Emit user sgpr count directives in text asm We were emitting these in the object file but not printing them.	2022-01-26 13:51:12 -05:00
Jay Foad	0b9ee8ec16	[AMDGPU] SILoadStoreOptimizer: Precommit tests for merging across a swizzled access	2022-01-26 17:35:17 +00:00
Konstantina	aa418b9133	[AMDGPU][SIWholeQuadMode] Use the right VCC register to activate the correct lanes. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D118096	2022-01-26 08:54:39 -08:00
Nikita Popov	a5e324e3e2	[AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers AMDGPUHSAMetadataStreamer currently assumes that pointer arguments without align attribute have ABI alignment of the pointee type. This is incompatible with opaque pointers, but also plain incorrect: Pointer arguments without explicit alignment have alignment 1. It is the responsibility of the frontent to add correct align annotations. Differential Revision: https://reviews.llvm.org/D118229	2022-01-26 15:45:14 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
Sebastian Neubauer	4723f3cf03	[AMDGPU][GlobalISel] Combine unmerge of undef Fold (unmerge undef) -> undef, undef, ... Differential Revision: https://reviews.llvm.org/D118138	2022-01-26 12:30:36 +01:00
Sebastian Neubauer	6680466663	[AMDGPU][NFC] Pre-commit regenerated test	2022-01-26 12:30:33 +01:00
Benjamin Kramer	0776f6e04d	[LSV] Vectorize loads of vectors by turning it into a larger vector Use shufflevector to do the subvector extracts. This allows a lot more load merging on AMDGPU and also on NVPTX when <2 x half> is involved. Differential Revision: https://reviews.llvm.org/D117219	2022-01-26 11:38:41 +01:00
Stanislav Mekhanoshin	bb1fe36977	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721	2022-01-24 11:51:08 -08:00
Stanislav Mekhanoshin	c27f8fb968	[AMDGPU] Remove cndmask from readsExecAsData Differential Revision: https://reviews.llvm.org/D117909	2022-01-24 11:24:47 -08:00
Matt Arsenault	18aabae8e2	AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills These have negative / out of bounds frame index values and would assert when trying to set the BitVector. Fixed stack objects can't be colored away so ignore them.	2022-01-24 09:45:41 -05:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit a97e20a3a8a58be751f023e610758310d5664562.	2022-01-24 09:26:52 -05:00
Abinav Puthan Purayil	912af6b570	[AMDGPU][GlobalISel] Remove the post ':' part of vreg operands in fsh combine tests.	2022-01-24 16:30:40 +05:30
Jay Foad	aa50b93e7c	[AMDGPU][GlobalISel] Add more sign/zero/any-extension tests Add s1 to s16 cases, and for sgprs s1 to s64 and s32 to s64.	2022-01-24 10:16:51 +00:00
Jay Foad	906ebd5830	[AMDGPU][GlobalISel] Regenerate checks in inst-select-*ext.mir	2022-01-24 10:16:51 +00:00
Abinav Puthan Purayil	68b70d17d8	[GlobalISel] Fold or of shifts with constant amount to funnel shift. This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0 and C1 are constants where C0 + C1 is the bit-width of the shift instructions. Differential Revision: https://reviews.llvm.org/D116529	2022-01-24 10:43:32 +05:30
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
Sebastian Neubauer	ae2f9c8be8	[AMDGPU] Remove lz and nomip combine from codegen These combines have been moved into the IR combiner in D116042. Differential Revision: https://reviews.llvm.org/D116116	2022-01-21 12:09:08 +01:00
Stanislav Mekhanoshin	41ebd19681	[AMDGPU] Do not ignore exec use where exec is read as data Compares, v_cndmask_b32, and v_readfirstlane_b32 use EXEC in a way which modifies the result. This implicit EXEC use shall not be ignored for the purposes of instruction moves. Differential Revision: https://reviews.llvm.org/D117814	2022-01-20 14:05:22 -08:00
Nico Weber	9122b5072a	[llvm] Remove an old bot cleanup command	2022-01-20 15:02:35 -05:00
Stanislav Mekhanoshin	94a0660c14	[AMDGPU] Regenerate remat-vop.mir. NFC.	2022-01-20 11:15:42 -08:00
Matt Arsenault	237502c1a4	AMDGPU: Fix asm in test using wrong IR type for physical register	2022-01-20 12:56:53 -05:00
Matt Arsenault	064cea9c9a	AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection Avoids a test diff with SDAG.	2022-01-20 12:56:53 -05:00
Matt Arsenault	2d1f9aa27d	AMDGPU/GlobalISel: Regenerate test checks with -NEXT	2022-01-20 12:56:53 -05:00
Matt Arsenault	08549ba51e	AMDGPU/GlobalISel: Explicitly set -global-isel-abort in failure tests If the default mode is the fallback, this would fail since it would end up seeing the DAG failure message instead.	2022-01-20 12:56:53 -05:00
Matt Arsenault	2e49e0cfde	AMDGPU/GlobalISel: Directly diagnose return value use for FP atomics Emit an error if the return value is used on subtargets that do not support them. Previously we were falling back to the DAG on selection failure, where it would emit this error and then fail again.	2022-01-20 12:46:45 -05:00
Matt Arsenault	be7e938e27	AMDGPU/GlobalISel: Stop handling llvm.amdgcn.buffer.atomic.fadd This code is not structured to handle the legacy buffer intrinsics and was miscompiling them.	2022-01-20 12:12:06 -05:00

1 2 3 4 5 ...

5177 Commits