llvm-project

Author	SHA1	Message	Date
Vikram	64fc892cda	[AMDGPU] Autogenerate carryout-selection.ll, uaddo.ll, usubo.ll (NFC) Differential Revision: https://reviews.llvm.org/D143987	2023-02-24 02:03:56 -05:00
Piotr Sobczak	ab174c57f4	[AMDGPU] Add more tests for buffer intrinsics Add more tests for buffer intrinsics with large voffsets.	2023-02-23 14:39:12 +01:00
Mirko Brkusanin	926746d22a	[AMDGPU][GFX11] Legalize and select partial NSA MIMG instructions If more registers are needed for VAddr then the NSA format allows then the final register can act as a contigous set of remaining addresses. Update legalizer to pack register for this new format and allow instruction selection to use NSA encoding when number of addresses exceeds max size. Also update SIShrinkInstructions to handle partial NSA. Differential Revision: https://reviews.llvm.org/D144034	2023-02-23 13:33:34 +01:00
Diana Picus	da629d3381	[AMDGPU] Add GISel RUN lines to 2 existing tests. NFC This adds a bit of coverage for GlobalISel. Differential Revision: https://reviews.llvm.org/D144555	2023-02-23 09:46:54 +01:00
Piotr Sobczak	1b9b4f3bfa	[AMDGPU][NFC] Convert llvm.amdgcn tests to autogen	2023-02-23 08:21:12 +01:00
Konstantina Mitropoulou	944f429b21	[AMDGPU] Improve the lowering of raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics Currently, raw_buffer_load_{i8,i16} and struct_buffer_load_{i8,i16} intrinsics are lowered as buffer_load_{u8,u16}. This patch combines buffer_load_{u8,u16} and sign extension instructions in order to generate buffer_load_{i8,i16} instructions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144313	2023-02-22 09:01:33 -08:00
Joe Nash	80a8e6805a	[AMDGPU] Don't set src mods on permlane16 v_permlane16_b32 and v_permlanex16_b32 should not set abs and neg src modifiers on any input, but they can set op_sel on src0 or src1 to represent fi or bc when desired. The ISel patterns were setting the src_modifier bits to -1, effectively setting abs and neg as well, whenever it was intended to set op_sel, due to an error in ISel. ISel should now correctly only set the op_sel bits. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D144519	2023-02-22 11:41:52 -05:00
Jessica Del	fc672b6a8b	[AMDGPU] Improved wide multiplies These checks show optimized instructions if an operand is known to be (partially) zero. Change-Id: Ie2f6d0d3ee9d5b279d1f4c1dd0787492e39cc77a Differential Revision: https://reviews.llvm.org/D140208	2023-02-22 16:39:06 +01:00
Jay Foad	e2eee902a4	[AMDGPU] Fix an assertion failure when folding into src2 of V_FMAC_F16 D139469 "[AMDGPU] Enable OMod on more VOP3 instructions" caused an assertion failure when trying to fold into src2 of V_FMAC_F16. It would temporarily convert the instruction to V_FMA_F16_gfx9 and add an opsel operand, but if the fold still failed then it would forget to remove the opsel operand. Differential Revision: https://reviews.llvm.org/D144558	2023-02-22 14:26:03 +00:00
Jessica Del	c9fd858172	[AMDGPU] MIR-Tests for Multiplication using KBA These tests show inefficient behavior that will be optimized by a later change. By using Known Bits Analysis, we can avoid unnecessary multiplications or additions with 0.	2023-02-21 14:47:56 +01:00
pvanhout	8e68c12045	[AMDGPU] Remove function with incompatible features Adds a new pass that removes functions if they use features that are not supported on the current GPU. This change is aimed at preventing crashes when building code at O0 that uses idioms such as `if (ISA_VERSION >= N) intrinsic_a(); else intrinsic_b();` where ISA_VERSION is not constexpr, and intrinsic_a is not selectable on older targets. This is a pattern that's used all over the ROCm device libs. The main motive behind this change is to allow code using ROCm device libs to be built at O0. Note: the feature checking logic is done ad-hoc in the pass. There is no other pass that needs (or will need in the foreseeable future) to do similar feature-checking logic so I did not see a need to generalize the feature checking logic yet. It can (and should probably) be generalized later and moved to a TargetInfo-like class or helper file. Reviewed By: arsenm, Joe_Nash Differential Revision: https://reviews.llvm.org/D139000	2023-02-21 10:42:39 +01:00
Jessica Del	959216f9b1	[AMDGPU] MIR-Tests for Multiplication using KBA These tests show inefficient behavior that will be optimized by a later change. By using Known Bits Analysis, we can avoid unnecessary multiplications or additions with 0.	2023-02-21 08:41:56 +01:00
Konstantina Mitropoulou	a0e258da19	[AMDGPU] Add tests for future commit Reviewed By: foad Differential Revision: https://reviews.llvm.org/D144312	2023-02-20 21:36:25 -08:00
Tiwari Abhinav Ashok Kumar	bfb1559fbe	[NFC] Fix missing colon in CHECK directives Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D144412	2023-02-21 00:13:04 +05:30
Matt Arsenault	c177051f60	AMDGPU: Restrict foldFreeOpFromSelect combine based on legal source mods Provides a small code size savings for some f32 cases.	2023-02-19 22:05:54 -04:00
Matt Arsenault	28d8889d27	AMDGPU: Teach fneg combines that select has source modifiers We do match source modifiers for f32 typed selects already, but the combiner code was never informed of this. A long time ago the documentation lied and stated that source modifiers don't work for v_cndmask_b32 when they in fact do. We had a bunch fo code operating under the assumption that they don't support source modifiers, so we tried to move fnegs around to work around this. Gets a few small improvements here and there. The main hazard to watch out for is infinite loops in the combiner since we try to move fnegs up and down the DAG. For now, don't fold fneg directly into select. The generic combiner does this for a restricted set of cases when getNegatedExpression obviously shows an improvement for both operands. It turns out to be trickier to avoid infinite looping the combiner in conjunction with pulling out source modifiers, so leave this for a later commit.	2023-02-19 20:13:38 -04:00
Amara Emerson	b309bc04ee	[GlobalISel] Combine out-of-range shifts to undef. Differential Revision: https://reviews.llvm.org/D144303	2023-02-17 15:05:00 -08:00
Nick Desaulniers	a3a84c9e25	[llvm] add CallBrPrepare pass to pipelines Capstone of https://discourse.llvm.org/t/rfc-syncing-asm-goto-with-outputs-with-gcc/65453/8 Clang changes are still necessary to enable the use of outputs along indirect edges of asm goto statements. Link: https://github.com/llvm/llvm-project/issues/53562 Reviewed By: void Differential Revision: https://reviews.llvm.org/D140180	2023-02-16 17:58:34 -08:00
Jay Foad	8a17cd9905	AMDGPU: Add a regression test case for D143963	2023-02-16 17:11:32 +00:00
Jay Foad	8e5a41e827	Revert "AMDGPU: Override getNegatedExpression constant handling" This reverts commit 11c3cead23783e65fb30e673d62771352078ff05. It was causing infinite loops in the DAG combiner.	2023-02-16 17:11:32 +00:00
Jay Foad	9305b63d69	[AMDGPU] Add another G_UNMERGE_VALUES legalization test case	2023-02-16 16:45:35 +00:00
Florian Hahn	2ac85cd563	[AMDGPU] Regenerate check lines to enable updating for D144050.	2023-02-16 16:38:15 +00:00
Diana Picus	819dfc338b	[AMDGPU] Autogenerate checks for several tests. NFCI	2023-02-16 10:54:34 +01:00
Matt Arsenault	11c3cead23	AMDGPU: Override getNegatedExpression constant handling Ignore the multiple use heuristics of the default implementation, and report cost based on inline immediates. This is mostly interesting for -0 vs. 0. Gets a few small improvements. fneg_fadd_0_f16 is a small regression. We could probably avoid this if we handled folding fneg into div_fixup.	2023-02-15 05:21:00 -04:00
Stanislav Mekhanoshin	12b4f9e2af	[AMDGPU] Do not apply schedule metric for regions with spilling D139710 has added a metric to increase schedule's ILP while staying within the same occupancy. Do not bother to apply this metric to a region which is known to have spilling, it may result in spilling to reappear after the previous stage and will do no good if we already spilling anyway. It may also reduce compile time a bit for such regions. Fixes: SWDEV-377300 Differential Revision: https://reviews.llvm.org/D143934	2023-02-14 12:16:46 -08:00
Matt Arsenault	09dd4d870e	DAG: Remove hasBitPreservingFPLogic This doesn't make sense as an option. fneg and fabs are bit preserving by definition. If a target has some fneg or fabs instruction that are not bitpreserving it's incorrect to lower fneg/fabs to use it.	2023-02-14 10:25:24 -04:00
Matt Arsenault	f3c008ca77	DAG: Relax foldBitcastedFPLogic conditions Requiring a bitcast to exist was unhelpful. The most basic cases are always going to be a CopyFromReg or load, so they would need a new cast inserted. Don't require a bitcast if it's a free operation. I don't think this logic makes particularly much sense (it seems to be imparting special interpretation of bitcast), but this needs to be in sync with foldSignChangeInBitcast. We should also get rid of this hasBitPreservingFPLogic hook. fabs/fneg are bitpreserving or incorrectly implemented, so this should just be a regular legality check.	2023-02-14 07:59:10 -04:00
Matt Arsenault	4f0eb57222	AMDGPU: Teach getNegatedExpression about rcp	2023-02-14 04:02:39 -04:00
Matt Arsenault	ce4b719f33	AMDGPU: Add test for getNegatedExpression with rcp	2023-02-14 04:02:39 -04:00
Matt Arsenault	0a669bd894	AMDGPU: Add additional tests for combiner infinite loop	2023-02-14 04:02:38 -04:00
pvanhout	04f6934589	[DAG] Handle build_vector with all undefs in reduceBuildVecTruncToBitCast While working on D143731 I hit a case where a build_vector with 2 undef operands could be generated (with one undef hidden behind a bitcast). That made `reduceBuildVecTruncToBitCast` crash because it seems to assume there is at least one good operand. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D143886	2023-02-14 08:52:28 +01:00
Arthur Eubanks	7c6b46e87e	Revert "[DAGCombiner] handle more store value forwarding" This reverts commit f35a09daebd0a90daa536432e62a2476f708150d. Causes miscompiles, see D138899	2023-02-13 19:07:28 -08:00
Christudasan Devadasan	1c9e6238fe	[AMDGPU] Allow architected SGPRs for workgroup IDs Some subtargets use architected SGPRs for workgroup IDs instead of the regular SGPRs. This patch enables the support for the same and is guarded under the subtarget feature FeatureArchitectedSGPRs. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D143707	2023-02-13 22:11:35 +05:30
Changpeng Fang	7ca3444fba	AMDGPU: Use module flag to get code object version at IR level folow-up Summary: This is part of the leftover work for https://reviews.llvm.org/D143138. In this work, we pass code object version as an argument to initialize target ID and use it for targetID dump. Reviewers: arsenm Differential Revision https://reviews.llvm.org/D143293	2023-02-10 11:16:38 -08:00
Pierre van Houtryve	70924673af	[RFC][GISel] Add a way to ignore COPY instructions in InstructionSelector RFC to add a way to ignore COPY instructions when pattern-matching MIR in GISel. - Add a new "GISelFlags" class to TableGen. Both `Pattern` and `PatFrags` defs can use it to alter matching behaviour. - Flags start at zero and are scoped: the setter returns a `SaveAndRestore` object so that when the current scope ends, the flags are restored to their previous values. This allows child patterns to modify the flags without affecting the parent pattern. - Child patterns always reuse the parent's pattern, but they can override its values. For more examples, see `GlobalISelEmitterFlags.td` tests. - [AMDGPU] Use the IgnoreCopies flag in BFI patterns, which are known to be bothered by cross-regbank copies. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136234	2023-02-10 08:37:42 +01:00
Pierre van Houtryve	d9a6fc82f5	[AMDGPU] Run unmerge combines post regbankselect RegBankSelect can insert G_UNMERGE_VALUES in a lot of places which left us with a lot of unmerge/merge pairs that could be simplified. These often got in the way of pattern matching and made codegen worse. This patch: - Makes the necessary changes to the merge/unmerge combines so they can run post RegBankSelect - Adds relevant unmerge combines to the list of RegBankSelect combines for AMDGPU - Updates some tablegen patterns that were missing explicit cross-regbank copies (V_BFI patterns were causing constant bus violations with this change). This seems to be mostly beneficial for code quality. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D142192	2023-02-10 08:34:23 +01:00
Andrew Savonichev	c65b4d64d4	[SelectionDAG] Do not second-guess alignment for alloca Alignment of an alloca in IR can be lower than the preferred alignment on purpose, but this override essentially treats the preferred alignment as the minimum alignment. The patch changes this behavior to always use the specified alignment. If alignment is not set explicitly in LLVM IR, it is set to DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign. Tests are changed as well: explicit alignment is increased to match the preferred alignment if it changes output, or omitted when it is hard to determine the right value (e.g. for pointers, some structs, or weird types). Differential Revision: https://reviews.llvm.org/D135462	2023-02-09 18:45:20 +03:00
Mirko Brkusanin	43924cbd29	[AMDGPU][GlobalISel] Fix selection of image sample g16 instructions Pre-GFX10 A16 modifier would imply G16. From GFX10 and onwards there are separate instructions for 16bit gradients. This fixes the condition for selecting G16 opcodes. Also stop adding G16 flag to instructions that do not use gradients for GFX10 onwards.	2023-02-09 16:26:55 +01:00
Stanislav Mekhanoshin	94def1b44e	[AMDGPU] Do not exapnd fp atomics on gfx940 FP atomics are safe on gfx940. This fixes regression after D131560. Fixes: SWDEV-380468 Differential Revision: https://reviews.llvm.org/D143603	2023-02-08 13:22:04 -08:00
Stanislav Mekhanoshin	3e9f2af27a	[AMDGPU] Update atomic tests. NFC. This is to precommit tests before future patch.	2023-02-08 12:55:34 -08:00
Jay Foad	805da0e298	[AMDGPU] Fix some LABEL check lines	2023-02-06 15:53:13 +00:00
Jay Foad	785cc4d71a	[AMDGPU] Fix DOS line endings in some tests	2023-02-06 15:53:13 +00:00
Ruiling Song	be3f4591af	AMDGPU: Mark control flow intrinsics non-duplicable This is used to help get simplified CFG for divergent regions as well as get better code generation in some cases. For example, with below IR: ``` define amdgpu_kernel void @test() { bb: br label %bb1 bb1: %tmp = phi i32 [ 0, %bb ], [ %tmp5, %bb4 ] %tid = call i32 @llvm.amdgcn.workitem.id.x() %cnd = icmp eq i32 %tid, 0 br i1 %cnd, label %bb4, label %bb2 bb2: %tmp3 = add nsw i32 %tmp, 1 br label %bb4 bb4: %tmp5 = phi i32 [ %tmp3, %bb2 ], [ %tmp, %bb1 ] store volatile i32 %tmp5, ptr addrspace(1) undef br label %bb1 } ``` We got below assembly before the change: ``` v_mov_b32_e32 v1, 0 v_cmp_eq_u32_e32 vcc, 0, v0 s_branch .LBB0_2 .LBB0_1: ; %bb4 ; in Loop: Header=BB0_2 Depth=1 s_mov_b32 s2, -1 s_mov_b32 s3, 0xf000 buffer_store_dword v1, off, s[0:3], 0 s_waitcnt vmcnt(0) .LBB0_2: ; %bb ; =>This Inner Loop Header: Depth=1 s_and_saveexec_b64 s[0:1], vcc s_xor_b64 s[0:1], exec, s[0:1] ; kill: def $sgpr0_sgpr1 killed $sgpr0_sgpr1 killed $exec s_cbranch_execnz .LBB0_1 ; %bb.3: ; %bb2 ; in Loop: Header=BB0_2 Depth=1 s_or_b64 exec, exec, s[0:1] s_waitcnt expcnt(0) v_add_i32_e64 v1, s[0:1], 1, v1 s_branch .LBB0_1 ``` After the change: ``` s_mov_b32 s0, 0 v_cmp_eq_u32_e32 vcc, 0, v0 s_mov_b32 s2, -1 s_mov_b32 s3, 0xf000 v_mov_b32_e32 v0, s0 s_branch .LBB0_2 .LBB0_1: ; %bb4 ; in Loop: Header=BB0_2 Depth=1 buffer_store_dword v0, off, s[0:3], 0 s_waitcnt vmcnt(0) .LBB0_2: ; %bb1 ; =>This Inner Loop Header: Depth=1 s_and_saveexec_b64 s[0:1], vcc s_cbranch_execnz .LBB0_1 ; %bb.3: ; %bb2 ; in Loop: Header=BB0_2 Depth=1 s_or_b64 exec, exec, s[0:1] s_waitcnt expcnt(0) v_add_i32_e64 v0, s[0:1], 1, v0 s_branch .LBB0_1 ``` We are using one less VGPR, one less s_xor_, and better LICM with one additional branch after the change. Please note the experiment was done with reverting the workaround D139780, as it will stop the tail-duplication completely for this case. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D118250	2023-02-06 15:32:44 +08:00
Stanislav Mekhanoshin	dd0caa82de	[AMDGPU] Fix liveness in the SIOptimizeExecMaskingPreRA.cpp If a condition register def happens past the newly created use we do not properly update LIS. It has two problems: 1) We do not extend defining segment to the end of its block marking it a live-out (this is regression after https://reviews.llvm.org/rG09d38dd7704a52e8ad2d5f8f39aaeccf107f4c56) 2) We do not extend use segment to the beginning of the use block marking it a live-in. Fixes: SWDEV-379563 Differential Revision: https://reviews.llvm.org/D143302	2023-02-05 12:21:28 -08:00
Matt Arsenault	6ce86a7eff	AMDGPU: Ensure flat loads are broken into dword in functions We were assuming we could rely on the flat scratch init detection to imply if there are possible flat addressed stack objects, which doesn't work outside of a kernel. We should have a way to prove if a given flat access can't access the stack. We could use a not-stack parameter attribute to avoid these splits. Make the minimally correct change for GlobalISel; I'll address this better in my larger patch to rewrite load and store legalization. Fixes: SWDEV-218237	2023-02-05 05:25:15 -04:00
Matt Arsenault	d53699cc45	AMDGPU: Add some regression tests that infinite looped combiner Prevent a future patch from introducing an infinite combine loop.	2023-02-04 08:21:18 -04:00
Matt Arsenault	a7fad92ba8	AMDGPU: Add more tests to fneg modifier with casting tests	2023-02-03 07:28:47 -04:00
Changpeng Fang	54cf69c9d5	AMDGPU: Use module flag to get code object version at IR level Summary: This patch introduces a mechanism to check the code object version from the module flag, This avoids checking from command line. In case the module flag is missing, we use the current default code object version supported in the compiler. For tools whose inputs are not IR, we may need other approach (directive, for example) to check the code object version, That will be in a separate patch later. For LIT tests update, we directly add module flag if there is only a single code object version associated with all checks in one file. In cause of multiple code object version in one file, we use the "sed" method to "clone" the checks to achieve the goal. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D14313	2023-02-02 18:57:26 -08:00
Matt Arsenault	0f8b3b97fd	AMDGPU: Add additional tests for is.fpclass legalization	2023-02-02 22:50:23 -04:00
Matt Arsenault	3a01b4a93d	AMDGPU: Regenerate test checks Use right prefix order to get merging. Also drop -verify-machineinstrs and add -amdgpu-enable-delay-alu=0	2023-02-02 22:50:23 -04:00

1 2 3 4 5 ...

6187 Commits