llvm-project

Author	SHA1	Message	Date
Matt Arsenault	8b8b491379	AMDGPU/GlobalISel: Fix assertions on invalid addrspacecasts Fixes some assert on invalid situations and starts directly emitting the error.	2022-02-04 17:28:49 -05:00
Matt Arsenault	f1f15d0285	AMDGPU: Fix failing test	2022-02-04 16:54:09 -05:00
Matt Arsenault	4622afa94c	AMDGPU: Convert AMDGPUResourceUsageAnalysis to a Module pass This is more precise in the face of indirect calls and aliases, still assuming the call target is defined somewhere in the current module. This sometimes changes the order the functions are printed, and also changes the point where context errors are printed relative to stdout. This also likely has negative consequences for compile time and memory usage.	2022-02-04 15:56:04 -05:00
Matt Arsenault	935abab65c	AMDGPU: Use module level register maximums for unknown callees Compute the theoretical register budget based on the IR function signature/attributes, and use the global maximum register budgets for unknown callees. This should fix the kernel reported register usage in the presence of indirect calls. The previous fix in 2b08f6af62afbf32e89a6a392dbafa92c62f7bdf was incorrect becauset it was only taking the maximum in the known call graph, and missing something that was either outside of it or codegened later. This fixes a second case I discovered where calls to aliases also did not work as expected. CallGraphAnalysis misses these, so functions called through aliases were not codegened ahead of callers as expected. CallGraphAnalysis should probably be fixed to understand this case, and there's likely a bug with IPRA here. This fixes numerous failures in the conformance test at -O0.	2022-02-04 15:56:03 -05:00
Jessica Paquette	9a61e731ff	[GlobalISel] Combine (G_ADDO x, 0) -> x + no carry out Similar to the G_MULO change. The code for checking if a constant is legal/pre-legalize is shared between these, and is kind of hairy. So, factor it out into a new function: `isConstantLegalOrBeforeLegalizer`. To make the refactoring clean, further refactor `isLegalOrBeforeLegalizer` into a wrapper for two functions: - `isPreLegalize` - `isLegal` This is a bit easier to read in general. https://godbolt.org/z/KW7oszP1o Differential Revision: https://reviews.llvm.org/D118655	2022-02-03 14:25:15 -08:00
Vang Thao	2ca194ff55	[AMDGPU] Fix scheduler live-ins with debug inst at start of block GCNDownwardRPTracker RPTracker.reset() skips debug instructions for NextMI so RPTracker.getNext() will never give the beginning of a sched region if it is a debug value. In this case we will never set the live-ins for that block. Add check to see if getNext also equals the MI after skipping debug instructions. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D118853	2022-02-03 12:41:32 -08:00
Thomas Symalla	476babcc1d	[AMDGPU] Introduce new ISel combine for trunc-slr patterns In some cases, when selecting a (trunc (slr)) pattern, the slr gets translated to a v_lshrrev_b3e2_e64 instruction whereas the truncation gets selected to a sequence of v_and_b32_e64 and v_cmp_eq_u32_e64. In the final ISA, this appears as selecting the nth-bit: v_lshrrev_b32_e32 v0, 2, v1 v_and_b32_e32 v0, 1, v0 v_cmp_eq_u32_e32 vcc_lo, 1, v0 However, when the value used in the right shift is known at compilation time, the whole sequence can be reduced to two VALUs when the constant operand in the v_and is adjusted to (1 << lshrrev_operand): v_and_b32_e32 v0, (1 << 2), v1 v_cmp_ne_u32_e32 vcc_lo, 0, v0 In the example above, the following pseudo-code: v0 = (v1 >> 2) v0 = v0 & 1 vcc_lo = (v0 == 1) would be translated to: v0 = v1 & 0b100 vcc_lo = (v0 == 0b100) which should yield an equivalent result. This is a little bit hard to test as one needs to force the SelectionDAG to contain the nodes before instruction selection, but the test sequence was roughly derived from a production shader. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D118461	2022-02-03 18:06:44 +01:00
Jay Foad	b9cf52bc3d	[AMDGPU] Simplify AMDGPUAnnotateUniformValues::visitLoadInst Always set uniform metadata on the pointer if it is an instruction, but otherwise do not bother to create a trivial getelementptr instruction, because AMDGPUInstrInfo::isUniformMMO can already detect that various non-instruction pointers are uniform. Most of the test case churn is from tests that used undef as a pointer, which AMDGPUInstrInfo::isUniformMMO treats as uniform. Differential Revision: https://reviews.llvm.org/D118909	2022-02-03 16:27:48 +00:00
Jay Foad	42fc05e09c	[AMDGPU] Tweak tests in noclobber-barrier.ll Tweak some of the tests to demonstrate AMDGPUAnnotateUniformValues::visitLoadInst inserting a trivial getelementptr instruction, just to have somewhere to put amdgpu.uniform metadata. NFC.	2022-02-03 16:10:51 +00:00
Thomas Symalla	78bf2e0a3f	[AMDGPU] Update two Codegen tests. (NFC) This change adds a new Codegen test with auto-generated checks and updates divergence-driven-trunc-to-i1.ll with auto-generated checks. This is in preparation to D118461 to visualize the Codegen changes.	2022-02-03 10:28:53 +01:00
Matt Arsenault	d6fdbbcace	AMDGPU: Add second emergency slot for SGPR to vmem for large frames In a future change, we will sometimes use a VGPR offset for doing spills to memory, in which case we need 2 free VGPRs to do the SGPR spill. In most cases we could spill the VGPR along with the SGPR being spilled, but we don't have any free lanes for SGPR_1024 in wave32 so we could still potentially need a second scavenging slot.	2022-02-02 19:05:05 -05:00
Matt Arsenault	a96dbb9035	CodeGen: Use asm register names in warning message This was using the ugly tablegenerated register enum names, which are really hideous for register tuples on AMDGPU. Use the prettier names which are recognized by the asm parser.	2022-02-02 14:20:12 -05:00
Matt Arsenault	245e25f9c3	AMDGPU: Implement isAsmClobberable Warn on inline assembly clobbering reserved registers. It should also warn on at least some reserved register defs, but that isn't happening right now. If you have a def and re-use of a register we reserve, the register coalescer will eliminate the intermediate virtual register. When the reserved reg def is introduced later by the backend, it will end up clobbering the value the register coalescer assumed was live through the range. There is also isInlineAsmReadOnlyReg, although I don't understand what the distinction really is. It's called in SelectionDAGBuilder, long before the set of reserved registers is frozen so I'm not sure how that can possibly work reliably. Unfortunately this is also using the ugly tablegenerated names for the registers.	2022-02-02 14:20:12 -05:00
Jay Foad	ddd3807e69	[AMDGPU] Use new target MMO flag MONoClobber This allows us to set the noclobber flag on (the MMO of) a load instruction instead of on the pointer. This fixes a bug where noclobber was being applied to all loads from the same pointer, even if some of them were clobbered. Differential Revision: https://reviews.llvm.org/D118775	2022-02-02 17:12:36 +00:00
Stanislav Mekhanoshin	79606ee85c	[AMDGPU] Check atomics aliasing in the clobbering annotation MemorySSA considers any atomic a def to any operation it dominates just like a barrier or fence. That is correct from memory state perspective, but not required for the no-clobber metadata since we are not using it for reordering. Skip such atomics during the scan just like a barrier if it does not alias with the load. Differential Revision: https://reviews.llvm.org/D118661	2022-02-01 12:33:25 -08:00
Stanislav Mekhanoshin	c2b18a3cc5	[AMDGPU] Allow scalar loads after barrier Currently we cannot convert a vector load into scalar if there is dominating barrier or fence. It is considered a clobbering memory access to prevent memory operations reordering. While reordering is not possible the actual memory is not being clobbered by a barrier or fence and we can still use a scalar load for a uniform pointer. The solution is not to bail on a first clobbering access but traverse MemorySSA to the root excluding barriers and fences. Differential Revision: https://reviews.llvm.org/D118419	2022-02-01 11:43:17 -08:00
Fangrui Song	1494d064fa	[AMDGPU][test] Add dso_local to prevent preemptible alias resolution	2022-02-01 10:23:45 -08:00
Jay Foad	d2e5d3512b	[StructurizeCFG] Clean up some boolean not instructions In some cases StructurizeCFG inserts i1 xor instructions to invert predicates. Add a quick loop to clean these up afterwards if we can get away with modifying an existing compare instruction instead. (StructurizeCFG is generally run late in the pipeline so instcombine does not clean them up for us.) Differential Revision: https://reviews.llvm.org/D118623	2022-02-01 09:35:37 +00:00
Changpeng Fang	1194b9cdda	AMDGPU {NFC}: Add code object v5 support and generate metadata for implicit kernel args Summary: Add code object v5 support (deafult is still v4) Generate metadata for implicit kernel args for the new ABI Set the metadata version to be 1.2 Reviewers: t-tye, b-sumner, arsenm, and bcahoon Fixes: SWDEV-307188, SWDEV-307189 Differential Revision: https://reviews.llvm.org/D118272	2022-01-31 18:07:47 -08:00
Jay Foad	8faad29634	Revert "[Local] invertCondition: try modifying an existing ICmpInst" This reverts commit a6b54ddaba2d5dc0f72dcc4591c92b9544eb0016. Apparently it is not safe to modify the condition even if it passes the hasOneUse test, because StructurizeCFG might have other references to the condition that are not manifest in the IR use-def chains.	2022-01-31 14:55:36 +00:00
Jay Foad	ae68b3a457	[AMDGPU] Add test for a problem with noclobber metadata If AMDGPUAnnotateUniformValues finds a load from a uniform pointer with no potentially clobbering stores between the kernel entry point and the load instruction, it adds noclobber metadata to the address. This is unsafe because it can get applied to other loads in the same which do have aliasing stores. Differential Revision: https://reviews.llvm.org/D118458	2022-01-31 11:09:34 +00:00
Jay Foad	a6b54ddaba	[Local] invertCondition: try modifying an existing ICmpInst This avoids various cases where StructurizeCFG would otherwise insert an xor i1 instruction, and it since it generally runs late in the pipeline, instcombine does not clean up the xor-of-cmp pattern. Differential Revision: https://reviews.llvm.org/D118478	2022-01-31 10:44:17 +00:00
Matt Arsenault	33b45ee44b	AMDGPU: Handle addrspacecast of constant 32-bit to flat I accidentally made this work on the GlobalISel path, and there's no real reason not to handle this.	2022-01-27 11:01:44 -05:00
Matt Arsenault	d77c7c80d1	AMDGPU: Fix broken check lines in test	2022-01-27 11:01:44 -05:00
Matt Arsenault	aa88b65392	AMDGPU/GlobalISel: Fix assert on invalid cond code for llvm.amdgcn.icmp	2022-01-27 10:34:06 -05:00
Matt Arsenault	f482e86980	AMDGPU/GlobalISel: Fix flat_scratch_init handling for shaders I don't think this is actually defined for mesa, but this is what we were doing on the DAG path.	2022-01-27 10:20:52 -05:00
Jay Foad	185cb8e82c	[AMDGPU] SILoadStoreOptimizer: Allow merging across a swizzled access Swizzled accesses are not merged, but there is no particular reason not to merge two instructions if any of the intervening instructions happens to be a swizzled access. This moves the check for swizzled accesses out of checkAndPrepareMerge into collectMergeableInsts where I think it makes more sense. Differential Revision: https://reviews.llvm.org/D118267	2022-01-27 14:40:58 +00:00
Jay Foad	c5d2b97a69	[AMDGPU] Precommit test for swizzled store aliasing two loads	2022-01-27 12:13:06 +00:00
Jay Foad	15b11e00f0	[AMDGPU] Update MachineMemOperands syntax in commented out tests	2022-01-27 10:56:35 +00:00
Jay Foad	b30d9df457	[AMDGPU] Remove unused CI check lines	2022-01-27 10:53:42 +00:00
Jay Foad	3b259a6842	[AMDGPU] Remove unused GFX6 check lines	2022-01-27 10:48:32 +00:00
Stanislav Mekhanoshin	dbf278b984	[AMDGPU] Prevent aliasing of SrcC and Dst in MAI Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read. Note that this inevitably increases register pressure to the point where some programs will become uncompilable. This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst. Fixes: SWDEV-318900 Differential Revision: https://reviews.llvm.org/D117844	2022-01-26 14:48:20 -08:00
Matt Arsenault	045be6ff36	AMDGPU/GlobalISel: Fold wave address into mubuf addressing modes	2022-01-26 15:25:26 -05:00
Matt Arsenault	2d670de84c	GlobalISel: Avoid crash on asm with lying result types The physical register in the asm has the wrong type for the declared IR. It seems to work in the DAG by extracting the 4 elements that are defined in the IR from the register, but that isn't handled here. This doesn't seem to be a well tested path since other mismatched cases are crashing the DAG asm handling.	2022-01-26 15:23:59 -05:00
Matt Arsenault	09fc311af7	AMDGPU/GlobalISel: Mostly fix BFI patterns Most importantly, fixes constant bus errors in the 64-bit cases. It's surprising to me these were even passing the selection test using SReg_* sources. Also fixes pattern matching in the 32-bit cases, with simple operands. These patterns aren't working in a few cases, like with mixed SGPR inputs. The patterns aren't looking through the SGPR->VGPR copies like they need to. The vector cases also have some unmerges of build_vector which are obscuring the inputs.	2022-01-26 15:06:50 -05:00
Matt Arsenault	eb88e793ff	AMDGPU: Add some additional test coverage for BFI matching Try to stress constant bus restriction enforcement since some of these are broken for GlobalISel. Split the r600 test because some of these cases don't compile (and all the ones using return values are discarded).	2022-01-26 15:06:50 -05:00
Matt Arsenault	2f33396e4e	AMDGPU: Switch bfi pattern test to generated checks and add gfx10	2022-01-26 15:06:50 -05:00
Matt Arsenault	e6564f39c7	AMDGPU: Emit user sgpr count directives in text asm We were emitting these in the object file but not printing them.	2022-01-26 13:51:12 -05:00
Jay Foad	0b9ee8ec16	[AMDGPU] SILoadStoreOptimizer: Precommit tests for merging across a swizzled access	2022-01-26 17:35:17 +00:00
Konstantina	aa418b9133	[AMDGPU][SIWholeQuadMode] Use the right VCC register to activate the correct lanes. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D118096	2022-01-26 08:54:39 -08:00
Nikita Popov	a5e324e3e2	[AMDGPUHSAMetadataStreamer] Do not assume ABI alignment for pointers AMDGPUHSAMetadataStreamer currently assumes that pointer arguments without align attribute have ABI alignment of the pointee type. This is incompatible with opaque pointers, but also plain incorrect: Pointer arguments without explicit alignment have alignment 1. It is the responsibility of the frontent to add correct align annotations. Differential Revision: https://reviews.llvm.org/D118229	2022-01-26 15:45:14 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
Sebastian Neubauer	4723f3cf03	[AMDGPU][GlobalISel] Combine unmerge of undef Fold (unmerge undef) -> undef, undef, ... Differential Revision: https://reviews.llvm.org/D118138	2022-01-26 12:30:36 +01:00
Sebastian Neubauer	6680466663	[AMDGPU][NFC] Pre-commit regenerated test	2022-01-26 12:30:33 +01:00
Benjamin Kramer	0776f6e04d	[LSV] Vectorize loads of vectors by turning it into a larger vector Use shufflevector to do the subvector extracts. This allows a lot more load merging on AMDGPU and also on NVPTX when <2 x half> is involved. Differential Revision: https://reviews.llvm.org/D117219	2022-01-26 11:38:41 +01:00
Stanislav Mekhanoshin	bb1fe36977	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721	2022-01-24 11:51:08 -08:00
Stanislav Mekhanoshin	c27f8fb968	[AMDGPU] Remove cndmask from readsExecAsData Differential Revision: https://reviews.llvm.org/D117909	2022-01-24 11:24:47 -08:00
Matt Arsenault	18aabae8e2	AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills These have negative / out of bounds frame index values and would assert when trying to set the BitVector. Fixed stack objects can't be colored away so ignore them.	2022-01-24 09:45:41 -05:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit a97e20a3a8a58be751f023e610758310d5664562.	2022-01-24 09:26:52 -05:00
Abinav Puthan Purayil	912af6b570	[AMDGPU][GlobalISel] Remove the post ':' part of vreg operands in fsh combine tests.	2022-01-24 16:30:40 +05:30

1 2 3 4 5 ...

5191 Commits