llvm-project

Author	SHA1	Message	Date
Amara Emerson	b309bc04ee	[GlobalISel] Combine out-of-range shifts to undef. Differential Revision: https://reviews.llvm.org/D144303	2023-02-17 15:05:00 -08:00
Nick Desaulniers	a3a84c9e25	[llvm] add CallBrPrepare pass to pipelines Capstone of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8 Clang changes are still necessary to enable the use of outputs along indirect edges of asm goto statements. Link: https://github.com/llvm/llvm-project/issues/53562 Reviewed By: void Differential Revision: https://reviews.llvm.org/D140180	2023-02-16 17:58:34 -08:00
Jay Foad	8a17cd9905	AMDGPU: Add a regression test case for D143963	2023-02-16 17:11:32 +00:00
Jay Foad	8e5a41e827	Revert "AMDGPU: Override getNegatedExpression constant handling" This reverts commit 11c3cead23783e65fb30e673d62771352078ff05. It was causing infinite loops in the DAG combiner.	2023-02-16 17:11:32 +00:00
Jay Foad	9305b63d69	[AMDGPU] Add another G_UNMERGE_VALUES legalization test case	2023-02-16 16:45:35 +00:00
Florian Hahn	2ac85cd563	[AMDGPU] Regenerate check lines to enable updating for D144050.	2023-02-16 16:38:15 +00:00
Diana Picus	819dfc338b	[AMDGPU] Autogenerate checks for several tests. NFCI	2023-02-16 10:54:34 +01:00
Matt Arsenault	11c3cead23	AMDGPU: Override getNegatedExpression constant handling Ignore the multiple use heuristics of the default implementation, and report cost based on inline immediates. This is mostly interesting for -0 vs. 0. Gets a few small improvements. fneg_fadd_0_f16 is a small regression. We could probably avoid this if we handled folding fneg into div_fixup.	2023-02-15 05:21:00 -04:00
Stanislav Mekhanoshin	12b4f9e2af	[AMDGPU] Do not apply schedule metric for regions with spilling D139710 has added a metric to increase schedule's ILP while staying within the same occupancy. Do not bother to apply this metric to a region which is known to have spilling, it may result in spilling to reappear after the previous stage and will do no good if we already spilling anyway. It may also reduce compile time a bit for such regions. Fixes: SWDEV-377300 Differential Revision: https://reviews.llvm.org/D143934	2023-02-14 12:16:46 -08:00
Matt Arsenault	09dd4d870e	DAG: Remove hasBitPreservingFPLogic This doesn't make sense as an option. fneg and fabs are bit preserving by definition. If a target has some fneg or fabs instruction that are not bitpreserving it's incorrect to lower fneg/fabs to use it.	2023-02-14 10:25:24 -04:00
Matt Arsenault	f3c008ca77	DAG: Relax foldBitcastedFPLogic conditions Requiring a bitcast to exist was unhelpful. The most basic cases are always going to be a CopyFromReg or load, so they would need a new cast inserted. Don't require a bitcast if it's a free operation. I don't think this logic makes particularly much sense (it seems to be imparting special interpretation of bitcast), but this needs to be in sync with foldSignChangeInBitcast. We should also get rid of this hasBitPreservingFPLogic hook. fabs/fneg are bitpreserving or incorrectly implemented, so this should just be a regular legality check.	2023-02-14 07:59:10 -04:00
Matt Arsenault	4f0eb57222	AMDGPU: Teach getNegatedExpression about rcp	2023-02-14 04:02:39 -04:00
Matt Arsenault	ce4b719f33	AMDGPU: Add test for getNegatedExpression with rcp	2023-02-14 04:02:39 -04:00
Matt Arsenault	0a669bd894	AMDGPU: Add additional tests for combiner infinite loop	2023-02-14 04:02:38 -04:00
pvanhout	04f6934589	[DAG] Handle build_vector with all undefs in reduceBuildVecTruncToBitCast While working on D143731 I hit a case where a build_vector with 2 undef operands could be generated (with one undef hidden behind a bitcast). That made `reduceBuildVecTruncToBitCast` crash because it seems to assume there is at least one good operand. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D143886	2023-02-14 08:52:28 +01:00
Arthur Eubanks	7c6b46e87e	Revert "[DAGCombiner] handle more store value forwarding" This reverts commit f35a09daebd0a90daa536432e62a2476f708150d. Causes miscompiles, see D138899	2023-02-13 19:07:28 -08:00
Christudasan Devadasan	1c9e6238fe	[AMDGPU] Allow architected SGPRs for workgroup IDs Some subtargets use architected SGPRs for workgroup IDs instead of the regular SGPRs. This patch enables the support for the same and is guarded under the subtarget feature FeatureArchitectedSGPRs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D143707	2023-02-13 22:11:35 +05:30
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Pierre van Houtryve	70924673af	[RFC][GISel] Add a way to ignore COPY instructions in InstructionSelector RFC to add a way to ignore COPY instructions when pattern-matching MIR in GISel. - Add a new "GISelFlags" class to TableGen. Both `Pattern` and `PatFrags` defs can use it to alter matching behaviour. - Flags start at zero and are scoped: the setter returns a `SaveAndRestore` object so that when the current scope ends, the flags are restored to their previous values. This allows child patterns to modify the flags without affecting the parent pattern. - Child patterns always reuse the parent's pattern, but they can override its values. For more examples, see `GlobalISelEmitterFlags.td` tests. - [AMDGPU] Use the IgnoreCopies flag in BFI patterns, which are known to be bothered by cross-regbank copies. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136234	2023-02-10 08:37:42 +01:00
Pierre van Houtryve	d9a6fc82f5	[AMDGPU] Run unmerge combines post regbankselect RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which left us with a lot of unmerge/merge pairs that could be simplified. These often got in the way of pattern matching and made codegen worse. This patch: - Makes the necessary changes to the merge/unmerge combines so they can run post RegBankSelect - Adds relevant unmerge combines to the list of RegBankSelect combines for AMDGPU - Updates some tablegen patterns that were missing explicit cross-regbank copies (V_BFI patterns were causing constant bus violations with this change). This seems to be mostly beneficial for code quality. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D142192	2023-02-10 08:34:23 +01:00
Andrew Savonichev	c65b4d64d4	[SelectionDAG] Do not second-guess alignment for alloca Alignment of an alloca in IR can be lower than the preferred alignment on purpose, but this override essentially treats the preferred alignment as the minimum alignment. The patch changes this behavior to always use the specified alignment. If alignment is not set explicitly in LLVM IR, it is set to DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign. Tests are changed as well: explicit alignment is increased to match the preferred alignment if it changes output, or omitted when it is hard to determine the right value (e.g. for pointers, some structs, or weird types). Differential Revision: https://reviews.llvm.org/D135462	2023-02-09 18:45:20 +03:00
Mirko Brkusanin	43924cbd29	[AMDGPU][GlobalISel] Fix selection of image sample g16 instructions Pre-GFX10 A16 modifier would imply G16. From GFX10 and onwards there are separate instructions for 16bit gradients. This fixes the condition for selecting G16 opcodes. Also stop adding G16 flag to instructions that do not use gradients for GFX10 onwards.	2023-02-09 16:26:55 +01:00
Stanislav Mekhanoshin	94def1b44e	[AMDGPU] Do not exapnd fp atomics on gfx940 FP atomics are safe on gfx940. This fixes regression after D131560. Fixes: SWDEV-380468 Differential Revision: https://reviews.llvm.org/D143603	2023-02-08 13:22:04 -08:00
Stanislav Mekhanoshin	3e9f2af27a	[AMDGPU] Update atomic tests. NFC. This is to precommit tests before future patch.	2023-02-08 12:55:34 -08:00
Jay Foad	805da0e298	[AMDGPU] Fix some LABEL check lines	2023-02-06 15:53:13 +00:00
Jay Foad	785cc4d71a	[AMDGPU] Fix DOS line endings in some tests	2023-02-06 15:53:13 +00:00
Ruiling Song	be3f4591af	AMDGPU: Mark control flow intrinsics non-duplicable This is used to help get simplified CFG for divergent regions as well as get better code generation in some cases. For example, with below IR: ``` define amdgpu_kernel void @test() { bb: br label %bb1 bb1: %tmp = phi i32 [ 0, %bb ], [ %tmp5, %bb4 ] %tid = call i32 @llvm.amdgcn.workitem.id.x() %cnd = icmp eq i32 %tid, 0 br i1 %cnd, label %bb4, label %bb2 bb2: %tmp3 = add nsw i32 %tmp, 1 br label %bb4 bb4: %tmp5 = phi i32 [ %tmp3, %bb2 ], [ %tmp, %bb1 ] store volatile i32 %tmp5, ptr addrspace(1) undef br label %bb1 } ``` We got below assembly before the change: ``` v_mov_b32_e32 v1, 0 v_cmp_eq_u32_e32 vcc, 0, v0 s_branch .LBB0_2 .LBB0_1: ; %bb4 ; in Loop: Header=BB0_2 Depth=1 s_mov_b32 s2, -1 s_mov_b32 s3, 0xf000 buffer_store_dword v1, off, s[0:3], 0 s_waitcnt vmcnt(0) .LBB0_2: ; %bb ; =>This Inner Loop Header: Depth=1 s_and_saveexec_b64 s[0:1], vcc s_xor_b64 s[0:1], exec, s[0:1] ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 killed $exec s_cbranch_execnz .LBB0_1 ; %bb.3: ; %bb2 ; in Loop: Header=BB0_2 Depth=1 s_or_b64 exec, exec, s[0:1] s_waitcnt expcnt(0) v_add_i32_e64 v1, s[0:1], 1, v1 s_branch .LBB0_1 ``` After the change: ``` s_mov_b32 s0, 0 v_cmp_eq_u32_e32 vcc, 0, v0 s_mov_b32 s2, -1 s_mov_b32 s3, 0xf000 v_mov_b32_e32 v0, s0 s_branch .LBB0_2 .LBB0_1: ; %bb4 ; in Loop: Header=BB0_2 Depth=1 buffer_store_dword v0, off, s[0:3], 0 s_waitcnt vmcnt(0) .LBB0_2: ; %bb1 ; =>This Inner Loop Header: Depth=1 s_and_saveexec_b64 s[0:1], vcc s_cbranch_execnz .LBB0_1 ; %bb.3: ; %bb2 ; in Loop: Header=BB0_2 Depth=1 s_or_b64 exec, exec, s[0:1] s_waitcnt expcnt(0) v_add_i32_e64 v0, s[0:1], 1, v0 s_branch .LBB0_1 ``` We are using one less VGPR, one less s_xor_, and better LICM with one additional branch after the change. Please note the experiment was done with reverting the workaround D139780, as it will stop the tail-duplication completely for this case. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D118250	2023-02-06 15:32:44 +08:00
Stanislav Mekhanoshin	dd0caa82de	[AMDGPU] Fix liveness in the SIOptimizeExecMaskingPreRA.cpp If a condition register def happens past the newly created use we do not properly update LIS. It has two problems: 1) We do not extend defining segment to the end of its block marking it a live-out (this is regression after https://reviews.llvm.org/rG09d38dd7704a52e8ad2d5f8f39aaeccf107f4c56) 2) We do not extend use segment to the beginning of the use block marking it a live-in. Fixes: SWDEV-379563 Differential Revision: https://reviews.llvm.org/D143302	2023-02-05 12:21:28 -08:00
Matt Arsenault	6ce86a7eff	AMDGPU: Ensure flat loads are broken into dword in functions We were assuming we could rely on the flat scratch init detection to imply if there are possible flat addressed stack objects, which doesn't work outside of a kernel. We should have a way to prove if a given flat access can't access the stack. We could use a not-stack parameter attribute to avoid these splits. Make the minimally correct change for GlobalISel; I'll address this better in my larger patch to rewrite load and store legalization. Fixes: SWDEV-218237	2023-02-05 05:25:15 -04:00
Matt Arsenault	d53699cc45	AMDGPU: Add some regression tests that infinite looped combiner Prevent a future patch from introducing an infinite combine loop.	2023-02-04 08:21:18 -04:00
Matt Arsenault	a7fad92ba8	AMDGPU: Add more tests to fneg modifier with casting tests	2023-02-03 07:28:47 -04:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Matt Arsenault	0f8b3b97fd	AMDGPU: Add additional tests for is.fpclass legalization	2023-02-02 22:50:23 -04:00
Matt Arsenault	3a01b4a93d	AMDGPU: Regenerate test checks Use right prefix order to get merging. Also drop -verify-machineinstrs and add -amdgpu-enable-delay-alu=0	2023-02-02 22:50:23 -04:00
Matt Arsenault	36cfe26a52	AMDGPU: Try to unfold fneg source when matching legacy fmin/fmax This is NFC as it stands, since other combines will effectively prevent this from being reachable. This will avoid regressions in a future change which tries to make better use of select source modifiers. Didn't bother with the GlobalISel part for now, since the baseline combine doesn't seem to work on the existing test.	2023-02-02 22:50:23 -04:00
Chen Zheng	f35a09daeb	[DAGCombiner] handle more store value forwarding When lowering calls on target like PPC, some stack loads will be generated for by value parameters. Node CALLSEQ_START prevents such loads from being combined. Suggested by @RolandF, this patch removes the unnecessary loads for the byval parameter by extending ForwardStoreValueToDirectLoad Reviewed By: nemanjai, RolandF Differential Revision: https://reviews.llvm.org/D138899	2023-02-01 21:06:17 -05:00
Matt Arsenault	e9c49901a4	AMDGPU/GlobalISel: Add stub custom regbankselect pass Uniformity analysis needs to be the fundamental basis for regbank decisions. The considerations of the default pass are secondary, but potentially useful for some edge cases (e.g. selecting AGPRs when arbitrary loads and stores can directly use them). This needs to be a separate pass since it requires new analysis dependencies. Boilerplate to subclass the existing pass which does nothing different.	2023-01-30 16:18:20 -04:00
Matt Arsenault	68d4656722	AMDGPU: Don't insert pointer bitcasts for printf lowering Cleanup leftover typed pointer handling.	2023-01-28 21:49:10 -04:00
Matt Arsenault	9a22aeb91d	Attributes: Check declarations for dereferenceable bytes This will allow tablegen to start directly marking intrinsics as dereferenceable in a useful way. Not sure if callsites should override or use the max.	2023-01-28 20:40:23 -04:00
Matt Arsenault	d4299b3825	AMDGPU: Convert fcopysign tests to generated checks and add cases	2023-01-28 07:57:28 -04:00
Matt Arsenault	8e6406c2ce	AMDGPU: Add fneg and select test	2023-01-28 07:57:28 -04:00
Matt Arsenault	93ec3fa402	AMDGPU: Support atomicrmw uinc_wrap/udec_wrap For now keep the exising intrinsics working.	2023-01-27 22:17:16 -04:00
Matt Arsenault	d416f7e7f1	AMDGPU: Add fneg from integer tests	2023-01-25 22:38:53 -04:00
Matt Arsenault	881194d47c	AMDGPU: Remove redundant test implicit-arg-v5-opt.ll already covers this with more cases.	2023-01-25 20:14:04 -04:00
Matt Arsenault	5a4a8eb2b6	AMDGPU: Convert some tests to opaque pointers	2023-01-25 14:33:20 -04:00
Austin Kerbow	913837eaa3	[ScheduleDAG] Fix removing edges with weak deps In SUnit::removePred edges are removed from the Preds and Succs lists before updating the bookkeeping. This could result in incorrect values for NumPreds/SuccsLeft and cause WeakPreds/SuccsLeft to underflow, since the incorrect SDep will be used to update these values. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D142325	2023-01-25 10:05:50 -08:00
Matt Arsenault	778cf5431c	IR: Add atomicrmw uinc_wrap and udec_wrap These are essentially add/sub 1 with a clamping value. AMDGPU has instructions for these. CUDA/HIP expose these as atomicInc/atomicDec. Currently we use target intrinsics for these, but those do no carry the ordering and syncscope. Add these to atomicrmw so we can carry these and benefit from the regular legalization processes.	2023-01-24 17:55:11 -04:00
Stanislav Mekhanoshin	2968380717	[AMDGPU] Add missing gfx11 tests in the directive-amdgcn-target.ll. NFC.	2023-01-24 09:58:02 -08:00
Yashwant Singh	2a832d0f09	[AMDGPU] Add missing physical register check in SIFoldOperands::tryFoldLoad tryFoldLoad() is not meant to work on physical registers moreover use_nodbg_instructions(reg) makes the compiler buggy when called with physical reg Fix for SWDEV-373493 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D141895	2023-01-24 14:24:41 +05:30
Nicolai Hähnle	10cef708a7	AMDGPU: Clean up LDS-related occupancy calculations Occupancy is expressed as waves per SIMD. This means that we need to take into account the number of SIMDs per "CU" or, to be more precise, the number of SIMDs over which a workgroup may be distributed. getOccupancyWithLocalMemSize was wrong because it didn't take SIMDs into account at all. At the same time, we need to take into account that WGP mode offers access to a larger total amount of LDS, since this can affect how non-power-of-two LDS allocations are rounded. To make this work consistently, we distinguish between (available) local memory size and addressable local memory size (which is always limited by 64kB on gfx10+, even with WGP mode). This change results in a massive amount of test churn. A lot of it is caused by the fact that the default work group size is 1024, which means that (due to rounding effects) the default occupancy on older hardware is 8 instead of 10, which affects scheduling via register pressure estimates. I've adjusted most tests by just running the UTC tools, but in some cases I manually changed the work group size to 32 or 64 to make sure that work group size chunkiness has no effect. Differential Revision: https://reviews.llvm.org/D139468	2023-01-23 21:43:06 +01:00

1 2 3 4 5 ...

6171 Commits