llvm-project

Author	SHA1	Message	Date
Yashwant Singh	14fb4040e2	[AMDGPU][test] precommiting tests for D136663 More tests for si-peephole-sdwa pass	2022-10-26 22:08:28 +05:30
Haohai Wen	21f23a37c6	[SelectionDAG] Clamp stack alignment for memset, memmove memcpy has clamped dst stack alignment to NaturalStackAlignment if hasStackRealignment is false. We should also clamp stack alignment for memset and memmove. If we don't clamp, SelectionDAG may first do tail call optimization which requires no stack realignment. Then memmove, memset in same function may be lowered to load/store with larger alignment leading to PEI emit stack realignment code which is absolutely not correct. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D136456	2022-10-26 16:45:31 +08:00
Pierre van Houtryve	c1b2920c6e	[AMDGPU] Autogenerate llvm.amdgcn.fcmp.ll Prep commit for adding GISel run lines to that test. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136591	2022-10-26 07:00:34 +00:00
Joe Nash	01b8140d3a	[AMDGPU] Fix delay alu for VOPD with src2acc V_FMAC_F32 and V_DOT2C_F32_F16 have a dummy src2 operand tied to vdst to inform passes that the instructions read the dst operand. The VOPD versions of these instructions lacked the dummy operand, which was a problem for inserting s_delay_alu. Introduce the dummy src2 operand on the VOPD versions, and fix the VOPD operand tracking logic to account for it. Reviewed By: dp Differential Revision: https://reviews.llvm.org/D136629	2022-10-25 13:11:17 -04:00
Thomas Symalla	1f23cf4e50	[NFC][AMDGPU] Pre-commit test for D136432 Nested BFI instruction with multiple uses.	2022-10-25 10:52:32 +02:00
Petar Avramovic	cbc378ecb8	GlobalISel: Artifact combine merge-like and unmerges into merge-like Recognize when sub-vectors have been split to elements which are used to build large vector. This happens when instructions have different vector sizes available. For example a few arithmetic instruction are required to process all elements of larger vector that can be stored using one instruction. Differential Revision: https://reviews.llvm.org/D109242	2022-10-24 13:33:06 +02:00
Petar Avramovic	e6c778f861	GlobalISel: Artifact combine merge-like and unmerge into unmerge Recognize when source could have been unmerged to pieces with DstTy without having to split source to smaller elements and then merge small elements into DstTy pieces. This happens when vector was meant to be split to sub-vectors but there was leftover. At this point artifact combiner have already dealt with leftover and we can continue to use sub-vectors. Differential Revision: https://reviews.llvm.org/D109241	2022-10-24 13:33:05 +02:00
Petar Avramovic	f1aa598046	GlobalISel: Artifact combine merge-like and unmerge into copy Recognize copy that is represented as split of a source register to elements that were reassembled to another register with the same type. Differential Revision: https://reviews.llvm.org/D109240	2022-10-24 13:33:05 +02:00
Petar Avramovic	51b98db487	GlobalISel: Precommit for artifact combine patches Differential Revision: https://reviews.llvm.org/D117655	2022-10-24 13:33:05 +02:00
Pierre van Houtryve	eccdedd6f7	[AMDGPU] Autogenerate icmp codegen test Switch to autogenerated tests so we can use the same test for GISel and DAGIsel. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136446	2022-10-24 06:37:50 +00:00
Craig Topper	db25f51e37	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit e8b3ffa532b8ebac5dcdf17bb91b47817382c14d. The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but passes locally. I'm really confused.	2022-10-22 22:50:43 -07:00
Craig Topper	e8b3ffa532	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-22 21:51:45 -07:00
Pierre van Houtryve	ed5fe7f3a1	[AMDGPU][GISel] Re-enable some working tests These tests had been commented out but seem to not be crashing. Not sure if codegen is perfect in each of them, but even if it's not I think it's better to put a TODO to fix codegen than remove the test outright, unless codegen is plain wrong (then I'd still rather XFAIL rather than hide it) Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136341	2022-10-21 06:39:40 +00:00
Pierre van Houtryve	824dd811be	[AMDGPU][DAG] Fix trunc/shift combine condition The condition needs to be different for right-shifts, else we may lose information in some cases. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D136059	2022-10-21 06:36:07 +00:00
Joe Nash	ad6698562c	[AMDGPU] V_LDEXP_F16 encoding fix and doc update. The amdgcn.ldexp.* intrinsics take an i32 value as src1. The V_LDEXP_F16 instruction considers src1 an f16 operand, and therefore src1 is implicitly truncated to 16 bits when lowering to that instruction from the intrinsic. This is unlikely to result in an error in practice because values that large are not useful. The operand class of src1 in the True16 version of the instruction has been corrected to encode correctly on GFX11. Reviewed By: foad, rampitec Differential Revision: https://reviews.llvm.org/D136195	2022-10-19 09:52:53 -04:00
Simon Pilgrim	9708d88017	Revert rG42230efccf8fe1185be5fa6c23dce0a8183d6ec9 "[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1" @foad was right - this isn't actually going to help with D136042 as much as hoped, we need a better AMDGPU-specific solution as other targets are likely to make use of it	2022-10-19 12:07:41 +01:00
Simon Pilgrim	42230efccf	[DAG] Fold (sra (or (shl x, c1), (shl y, c2)), c1) -> (sext_inreg (or x, (shl y,c2-c1)) iff c2 >= c1 Helps with some of the AMDGPU regressions identified in D136042 where we were losing signed BFE patterns after sinking shifts behind logic ops. Differential Revision: https://reviews.llvm.org/D136081	2022-10-19 11:18:49 +01:00
Jay Foad	11ceafd768	[AMDGPU] Add test case for a VOPD s_delay_alu insertion bug	2022-10-19 10:52:56 +01:00
Thomas Symalla	09fbdde42c	[NFC][AMDGPU] Add tests for dependent v_bfi instructions. This commit adds a few tests which are used to test the codegen of nested v_bfi instructions. These instruction sequences are being generated when using the canonical form for bitfieldInsert and having the sequences being transformed by SimplifyDemandedBits. This is a pre-commit for a change which enables the backend to lower these instruction sequences into v_bfi instructions.	2022-10-18 16:57:48 +02:00
Simon Pilgrim	efd0d66269	[AMDGPU] Add regression test cases reported on D136042	2022-10-17 14:54:27 +01:00
Simon Pilgrim	0aa9a7f8d9	[AMDGPU] Regenerate bfe-combine.ll and bfe-patterns.ll	2022-10-17 14:41:14 +01:00
Jay Foad	0c22f4f5fe	[AMDGPU] Common up some generated checks in fnearbyint.ll Also remove -mattr=-flat-for-global which is not needed for generated checks.	2022-10-17 11:02:19 +01:00
Peter Rong	c2e7c9cb33	[CodeGen] Using ZExt for extractelement indices. In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`. This is because IRTranslator uses SExt for indices. In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt. This change includes both documentation, SelectionDAG and IRTranslator. We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86 This patch fixes issue #57452. Differential Revision: https://reviews.llvm.org/D132978	2022-10-15 15:45:35 -07:00
Anshil Gandhi	94ac8f3a8c	[BranchRelaxation] Fix test for duplicate branch instruction This patch is a follow up for D134557, inserting a check for a duplicate unconditional branch to fall through. Differential Revision: https://reviews.llvm.org/D135975	2022-10-14 12:21:26 -06:00
Sander de Smalen	02df03c5b7	[AArch64][SME] Add support for arm_locally_streaming functions. Functions with `aarch64_sme_pstatesm_body` will emit a SMSTART at the start of the function, and a SMSTOP at the end of the function, such that all operations use the right value for vscale. Because the placement of these nodes is critically important (i.e. no vscale-dependent operations should be done before SMSTART has been issued), we require glueing the CopyFromReg to the Entry node such that we can insert the SMSTART as part of that glued chain. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131582	2022-10-14 13:47:53 +00:00
Leon Clark	6370bc2435	Add f16 nearbyint support. Enable lowering of FNEARBYINT for f16 and extend existing tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135124	2022-10-14 08:05:24 +01:00
Matt Arsenault	99dff82118	AMDGPU: Fix failing test with expensive checks Fixes failure after d383adec4d3914492e67267462e6f00fdd4934af	2022-10-13 23:34:20 -07:00
Anshil Gandhi	d383adec4d	[BranchRelaxation] Fall through only if block has no unconditional branches Prior to inserting an unconditional branch from X to its fall through basic block, check if X has any terminators to avoid inserting additional branches. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134557	2022-10-13 22:48:41 -06:00
Leon Clark	98852a0f3d	Precommit for SWDEV-353076: Add check directives to existing tests. Add FileCheck directives to existing tests in preparation for new tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135788	2022-10-13 08:02:37 +01:00
Matt Arsenault	838fd611b7	AMDGPU: Fix assertion on <1 x i16> vectors Fixes issue 58331.	2022-10-12 17:25:24 -07:00
Matt Arsenault	575eed3dac	AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs If inline asm has a VGPR def, it must have come from a VGPR write somewhere inside the asm. This should be further extended to all read after write hazards.	2022-10-12 17:25:24 -07:00
Joe Nash	5f095e4751	[AMDGPU] Add GFX11 tests for fcmp and ballot. NFC Reviewed By: foad Differential Revision: https://reviews.llvm.org/D135782	2022-10-12 15:56:54 -04:00
Craig Topper	ac9209751a	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit 0148df8157f05ecf3b1064508e6f012aefb87dad. Getting a lit test failures on AMDGPU but I can't reproduce it so far. Reverting to investigate.	2022-10-11 16:30:40 -07:00
Craig Topper	0148df8157	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-11 16:20:55 -07:00
Jessica Paquette	0f1a51e173	[GlobalISel] Allow vectors in redundant or + add combines We support KnownBits for vectors, so we can enable these. https://godbolt.org/z/r9a9W4Gj1 Differential Revision: https://reviews.llvm.org/D135719	2022-10-11 15:31:09 -07:00
Abinav Puthan Purayil	3d9f011a9c	[AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor and the metadata map specific to code object v5 and later Unfortunately, we have a broken handling of this in the runtime of rocm 5.3. The runtime is expected to handle this correctly when v5 becomes the default. Differential Revision: https://reviews.llvm.org/D134714	2022-10-11 23:28:43 +05:30
Pierre van Houtryve	4d815bfae0	[GISel] Add redundant bitcast folding combine Simply folds away bitcasts that cancel each other. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135146	2022-10-11 15:03:08 +00:00
Weining Lu	42b70793a1	Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC" Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html k: A memory operand whose address is formed by a base register and (optionally scaled) index register. m: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as st.w and ld.w. ZB: An address that is held in a general-purpose register. The offset is zero. ZC: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ll.w and sc.w. Note: The INLINEASM SDNode flags in below tests are updated because the new introduced enum `Constraint_k` is added before `Constraint_m`. llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/X86/callbr-asm-kill.mir This patch passes `ninja check-all` on a X86 machine with all official targets and the LoongArch target enabled. Differential Revision: https://reviews.llvm.org/D134638	2022-10-11 19:51:48 +08:00
Joe Nash	8a7d4993b7	[AMDGPU] Fix True16 patterns for cmp on GFX11 These patterns should have a True16 version and a non-true16 version. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135609	2022-10-10 16:41:06 -04:00
Arthur Eubanks	f3a928e233	[opt] Don't translate legacy -analysis flag to require<analysis> Tests relying on this should explicitly use -passes='require<analysis>,foo'.	2022-10-07 14:54:34 -07:00
Matt Arsenault	74ef03d38a	AMDGPU: Update SlotIndexes independently of LiveIntervals Apparently StackColoring depends on SlotIndexes, but not LiveIntervals. If regalloc fast were manually requested, LiveIntervals would be dropped before SILowerSGPRSpills but not SlotIndexes. SILowerSGPRSpills preserved SlotIndexes, but only through LiveIntervals. As a result, SILowerSGPRSpills was incorrectly reporting it preserved SlotIndexes. Start updating these directly, instead of depending on LiveIntervals also being available.	2022-10-07 13:15:15 -07:00
Pierre van Houtryve	36c3833783	[GISel] Add Trunc/Lshr/BuildVector Folding Similar to the current "Trunc/BuildVector" folding - which folds low element extracts of BuildVectors, folds hi element extracts done using bitshifts. For D134354 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135148	2022-10-07 08:44:03 +00:00
Jeffrey Byrnes	8e5b96cce9	[AMDGPU] Add test coverage to ensure first regallocfast only allocates SGPR Register allocation is split into two passes, and the expected behavior is that the first pass only should only work on virtual SGPRs. Whereas the second pass works on virtual VGPRs. This adds a test case which breaks if the first pass allocates VGPRs. Differential Revision: https://reviews.llvm.org/D135331	2022-10-06 14:31:51 -07:00
Pierre van Houtryve	3ec0085c3f	[DAG] Update `isKnownNeverNaN` for `FMA/FMAD` We can still get a NaN even if none of the operands are NaN, e.g. from +inf/-inf. D50804 didn't catch that. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134854	2022-10-06 06:52:36 +00:00
Pierre van Houtryve	bb71079e30	[AMDGPU][GISel] Add missing V2S16 BUILD_VECTOR_TRUNC legalization Previously we would be unable to legalize V2S16 BUILD_VECTOR_TRUNC on GFX8 & below as the custom legalization was missing. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135149	2022-10-06 06:48:53 +00:00
Carl Ritson	c316332e17	[Sink] Allow sinking of invariant loads across critical edges Invariant loads can always be sunk. Reviewed By: foad, arsenm Differential Revision: https://reviews.llvm.org/D135133	2022-10-06 09:21:12 +09:00
jeff	cebec42089	[DAGCombiner] [AMDGPU] Allow vector loads in MatchLoadCombine Since SROA chooses promotion based on reaching load / stores of allocas, we may run into scenarios in which we alloca a vector, but promote it to an integer. The result of which is the familiar LoadCombine pattern (i.e. ZEXT, SHL, OR). However, instead of coming directly from distinct loads, the elements to be combined are coming from ExtractVectorElements which stem from a shared load. This patch identifies such a pattern and combines it into a load. Change-Id: I0bc06588f11e88a0a975cde1fd71e9143e6c42dd	2022-10-04 12:16:00 -07:00
Pierre van Houtryve	75b292cb14	[AMDGPU][DAG] Fix insert_vector_elt lowering for 8 bit elements The bitmask used to extract the bits assumed 16 bit elements and wasn't taking the size of the elements into account. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D135156	2022-10-04 14:48:15 +00:00
Pierre van Houtryve	c93104073c	[AMDGPU] Always lower SHUFFLE_VECTOR Make it illegal, remove InstructionSelector logic for it Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D134967	2022-10-04 14:23:17 +00:00
Jay Foad	af947d9fcb	[ISel] Fix crash in new FMA DAG combine Fix a crash in the FMA combine added by D132837 and amended by D134810. In cases where the newly created node could be folded, the combiner would fail this assertion: llc: DAGCombiner.cpp:268: void (anonymous namespace)::DAGCombiner::AddToWorklist(llvm::SDNode *): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed. Differential Revision: https://reviews.llvm.org/D135150	2022-10-04 15:13:18 +01:00

1 2 3 4 5 ...

5832 Commits