llvm-project

Author	SHA1	Message	Date
Igor Kirillov	deb429e5b0	Revert "[CodeGen] Improve ExpandMemCmp for more efficient non-register aligned sizes handling (#69942 )" This reverts commit 9bcb30d31813bbdea6b65789f64aed3f0e58d507.	2023-10-27 14:12:45 +00:00
Christudasan Devadasan	a0eb6b88f9	[AMDGPU] Try to fix the block prologs broken by RA inserted instructions (#69924 ) The insertion point determined by RA while attempting spills and liverange split at the beginning of a block goes wrong at times, and the newly inserted vector instructions are placed before the exec-mask restore instruction which is wrong. It occurs mainly due to the dependency on isBasicBlockPrologue that doesn't account early inserted instructions (spills and splits) during RA and causes the block prolog break. A better approach for deciding the insertion point should be worked out. For now, improving the helper function to consider all possible early insertions. This patch includes the spill instructions. The copies associated with liverange split should also be included in the block prolog.	2023-10-27 19:10:18 +05:30
Simon Pilgrim	37d9dc4793	[X86] Add test case for Issue #66150	2023-10-27 14:37:19 +01:00
Christudasan Devadasan	f9cd789658	[AMDGPU] Add pseudo instructions for SGPR spill to VGPR (#69923 ) For a future patch, is it important to keep the lowered SGPR spills to be recognized as spill instructions during regalloc. Directly lowering them into V_WRITELANE/V_READLANE won't allow us to attach the SPILL flag to their instructions. This patch introduces the pseudo instructions with the SGPRSpill flag set in their Desc. They will get lowered to equivalent instructions later during post RA pseudo expansion.	2023-10-27 17:24:10 +05:30
Igor Kirillov	9bcb30d318	[CodeGen] Improve ExpandMemCmp for more efficient non-register aligned sizes handling (#69942 ) * Enhanced the logic of ExpandMemCmp pass to merge contiguous subsequences in LoadSequence, based on sizes allowed in `AllowedTailExpansions`. * This enhancement seeks to minimize the number of basic blocks and produce optimized code when using memcmp with non-register aligned sizes. * Enable this feature for AArch64 with memcmp sizes modulo 8 equal to 3, 5, and 6.	2023-10-27 12:41:08 +01:00
Luke Lau	c8e1fbc3cc	[RISCV] Keep same SEW/LMUL ratio if possible in forward transfer (#69788 ) For instructions like vmv.s.x and friends where we don't care about LMUL or the SEW/LMUL ratio, we can change the LMUL in its state so that it has the same SEW/LMUL ratio as the previous state. This allows us to avoid more VL toggles later down the line (i.e. use vsetvli zero, zero, which requires that the SEW/LMUL ratio must be the same) This is an alternative approach to the idea in #69259, but note that they don't catch exactly the same test cases.	2023-10-27 12:16:28 +01:00
Matt Arsenault	b8b491c9d7	AMDGPU: Add infinite looping testcase after subrange spilling change This infinite looped after d8127b2ba8a87a610851b9a462f2fc2526c36e37	2023-10-27 17:42:14 +09:00
Phoebe Wang	58d4fe287e	[X86][EVEX512] Do not allow 512-bit memcpy without EVEX512 (#70420 ) Solves crash mentioned in #65920.	2023-10-27 15:26:05 +08:00
Craig Topper	116eb323b1	[RISCV] Correct copyPhysReg for GPRPF64. (#70419 ) GPRF64 represents a pair of registers. We were only copying the even part. We need to copy the odd part too.	2023-10-26 23:54:46 -07:00
Craig Topper	8ff1422353	[RISCV] Fix incorrect use of Zfa fli instruction for negative minimum value. (#70411 ) isSmallestNormalized() only considers the magnitude, not the sign.	2023-10-26 22:11:58 -07:00
Craig Topper	be0cbe9173	[RISCV] Add test cases showing fli being used for negative min normalized value. We can only use fli for the positive normalized value.	2023-10-26 22:11:58 -07:00
Amara Emerson	c9e8b73694	[AArch64][GlobalISel] Add support for extending indexed loads. (#70373 )	2023-10-26 13:38:09 -07:00
David Green	3fe8fd712b	[AArch64] Fix st2 check for nearby store with debug info. It needs to be skipping over debug instructions, whilst not counting them in the MaxLookupDist.	2023-10-26 21:37:04 +01:00
Alex Richardson	e39f6c1844	[opt] Infer DataLayout from triple if not specified There are many tests that specify a target triple/CPU flags but no DataLayout which can lead to IR being generated that has unusual behaviour. This commit attempts to use the default DataLayout based on the relevant flags if there is no explicit override on the command line or in the IR file. One thing that is not currently possible to differentiate from a missing datalayout `target datalayout = ""` in the IR file since the current APIs don't allow detecting this case. If it is considered useful to support this case (instead of passing "-data-layout=" on the command line), I can change IR parsers to track whether they have seen such a directive and change the callback type. Differential Revision: https://reviews.llvm.org/D141060	2023-10-26 12:07:37 -07:00
Craig Topper	d307dc5b51	[RISCV][GISel] Allow G_AND/G_OR/G_XOR to have s32 types on RV64. Even though we don't have W instructions for them. This treats them the same as other binary operators.	2023-10-26 11:00:43 -07:00
Amara Emerson	93659947d2	[AArch64][GlobalISel] Add support for pre-indexed loads/stores. (#70185 ) The pre-index matcher just needs some small heuristics to make sure it doesn't cause regressions. Apart from that it's a simple change, since the only difference is an immediate operand of '1' vs '0' in the instruction.	2023-10-26 10:29:12 -07:00
Matt Harding	bf92eba697	Fix comment in wasm unreachable test (#70340 ) Some textual editing errors got through this pull request that was merged a few weeks ago: https://github.com/llvm/llvm-project/pull/65876 This patch clears up the unintentional duplicated line, and white-space at the end of the lines.	2023-10-26 10:23:32 -07:00
Simon Pilgrim	13a349425b	[AArch64] Regenerate addr-of-ret-addr.ll	2023-10-26 15:35:17 +01:00
Simon Pilgrim	aaabf50d52	[AArch64] Regenerate tests to show missing constant comments	2023-10-26 15:35:17 +01:00
Alexander Richardson	f118d474eb	[AMDGPU] Use alloca address space in rewrite-out-arguments.ll (#70269 ) This is needed for the transform to fire with a correct data layout. Pre-commiting this change to keep the diff of D141060 smaller.	2023-10-26 15:08:58 +01:00
Luke Lau	2e85123bfe	[VP] Check if VP ops with functional intrinsics are speculatable (#69504 ) Noticed whilst working on #69494. VP intrinsics whose functional equivalent is an intrinsic were being marked as their lanes being non-speculatable, even if the underlying intrinsic was speculatable. This meant that ```llvm %1 = call <4 x i32> @llvm.vp.umax(<4 x i32> %x, <4 x i32> %y, <4 x i1> %mask, i32 %evl) ``` would be expanded out to ```llvm %.splatinsert = insertelement <4 x i32> poison, i32 %evl, i64 0 %.splat = shufflevector <4 x i32> %.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer %1 = icmp ult <4 x i32> <i32 0, i32 1, i32 2, i32 3>, %.splat %2 = and <4 x i1> %1, %mask %3 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y) ``` instead of ```llvm %1 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y) ``` The cause of this was isSafeToSpeculativelyExecuteWithOpcode checking the function attributes for the VP instruction itself, not the functional intrinsic. Since isSafeToSpeculativelyExecuteWithOpcode expects an already materialized instruction, we can't use it directly for the intrinsic case. So this fixes it by manually checking the function attributes on the intrinsic.	2023-10-26 13:46:32 +01:00
Christudasan Devadasan	bb2b7530ad	[AMDGPU] precommit lit test for PR 69924.	2023-10-26 17:43:14 +05:30
Jay Foad	e9c4dc18bc	Revert "[AMDGPU] Use `S_CSELECT` for uniform i1 ext (#69703 )" This reverts commit a1260b5209968c08886e3c6183aa793de8931578. It was causing some Vulkan CTS failures.	2023-10-26 12:56:32 +01:00
Christudasan Devadasan	16fbc45f48	Revert "[AMDGPU] Cleanup hasUnwantedEffectsWhenEXECEmpty function (#70206 )" This reverts commit 7ce613fc77af092dd6e9db71ce3747b75bc5616e.	2023-10-26 17:04:28 +05:30
Piotr Sobczak	80abbeca8e	[Inline Spiller] Consider bundles when marking defs as dead Fix bug where the code expects just a single MI, but a series of bundled MIs need to be handled instead. The semi-formed bundled are created by SplitKit for the case where not all lanes are live (buildSingleSubRegCopy). Then the remat kicks in, and since the values that are copied in the bundle do not need to be preserved due to the remat (dead defs), all instructions in the bundle should be marked as dead. However, only the first one gets marked as dead, which causes the verifier to complain later with error: "Live range continues after dead def flag". Differential Revision: https://reviews.llvm.org/D156999	2023-10-26 11:52:55 +02:00
Piotr Sobczak	24865a6423	[Inline Spiller] Pre-commit test	2023-10-26 11:52:54 +02:00
Simon Pilgrim	547dc46122	[DAG] SimplifyDemandedBits - ensure we drop NSW/NUW flags when we simplify a SHL node's input We already do this for variable shifts, but we missed it for constant shifts Fixes #69965	2023-10-26 10:34:58 +01:00
Simon Pilgrim	12dfcc0238	[DAG] Update test case for Issue #69965 The previous reduced test case just showed a minor codegen regression, this test now shows the actual miscompilation	2023-10-26 10:34:58 +01:00
Piotr Sobczak	ba3d6e0499	[AMDGPU] Rematerialize scalar loads (#68778 ) Extend the list of instructions that can be rematerialized in SIInstrInfo::isReallyTriviallyReMaterializable() to support scalar loads. Try shrinking instructions to remat only the part needed for current context. Add SIInstrInfo::reMaterialize target hook, and handle shrinking of S_LOAD_DWORDX16_IMM to S_LOAD_DWORDX8_IMM as a proof of concept.	2023-10-26 11:34:33 +02:00
Oliver Stannard	339faffd05	Revert "[AArch64] Move SLS later in pass pipeline" The (MF.size() == 0) assertis is triggering when building at -O0. Reverting this while I work out what is going wrong. This reverts commit 7e8eccd990d37d2771ca5ad7a84f54c3cfc4a5e1.	2023-10-26 09:50:13 +01:00
Pierre van Houtryve	a1260b5209	[AMDGPU] Use `S_CSELECT` for uniform i1 ext (#69703 ) Solves #59869	2023-10-26 09:57:14 +02:00
Matthew Devereau	18775a4941	[AArch64][SVE2] Use rshrnb for masked stores (#70026 ) This patch is a follow up on https://reviews.llvm.org/D155299. This patch combines add+lsr to rshrnb when 'B' in: C = A + B D = C >> Shift is equal to (1 << (Shift-1), and the bits in the top half of each vector element are zeroed or ignored, such as in a truncating masked store.	2023-10-26 08:42:25 +01:00
Stanislav Mekhanoshin	4602802240	[AMDGPU] Shrink to SOPK with 32-bit signed literals (#70263 ) A literal like 0xffff8000 is valid to be used as KIMM in a SOPK instruction, but at the moment our checks expect it to be fully sign extended to a 64-bit signed integer. This is not required since all cases which are being shrunk only accept 32-bit operands. We need to sign extend the operand to 64-bit though so it passes the verifier and properly printed.	2023-10-26 00:26:54 -07:00
Luke Lau	c285b7f513	[RISCV] Add tests for vmadd for VP intrinsics. NFC (#70042 ) We have VP tests for vmacc but not vmadd. This copies the vmacc tests but swaps the false operand of vp.merge to be the multiplicand instead of the addend. This shows how we could fold the vmerge into the vmadd's mask if we commuted %a and %b.	2023-10-26 07:59:25 +01:00
KAWASHIMA Takahiro	926173c614	[AArch64] Prevent argument promotion of vector with size > 128 bits (#70034 ) This patch prevents argument promotion from promoting pointers to fixed-length vector types larger than 128 bits like `<8 x float>` into the values of the pointees. Such vector types are used for SVE VLS but there is no ABI for SVE VLS arguments and the backend cannot lower such value arguments. Fixes #69147	2023-10-26 14:51:20 +09:00
Craig Topper	109aa586f0	[RISCV] Add an experimental pseudoinstruction to represent a rematerializable constant materialization sequence. (#69983 ) Rematerialization during register allocation is currently limited to a single instruction with no inputs. This patch introduces a pseudoinstruction that represents the materialization of a constant. I've started with a sequence of 2 instructions for now, which covers at least the common LUI+ADDI(W) case. This instruction will be expanded into real instructions immediately after register allocation using a new pass. This gives the post-RA scheduler a chance to separate the 2 instructions to improve ILP. I believe this matches the approach used by AArch64. Unfortunately, this loses some CSE opportunies when an LUI value is used by multiple constants with different LSBs. This feature is off by default and a new backend command line option is added to enable it for testing. This avoids the spill and reloads reported in #69586.	2023-10-25 17:20:32 -07:00
Craig Topper	716c0220f2	[RISCV][GISel] Add instruction selection for G_FADD/G_FSUB/G_FMUL/G_FDIV with F/D extensions. (#69808 )	2023-10-25 13:37:34 -07:00
Evgenii Kudriashov	0a8f54c3fe	[X86][GlobalISel] Add legalization of 64-bit G_ICMP for i686 (#69478 )	2023-10-25 22:30:42 +02:00
Craig Topper	c2b64dfaa4	[RISCV][GISel] Add regbank selection for G_FADD/G_FSUB/G_FMUL/G_FDIV with F/D extensions. (#69805 ) This includes the plumbing for ValueMapping and PartialMapping.	2023-10-25 12:48:17 -07:00
Craig Topper	da1736eba6	[RISCV][GISel] Add legalizer support for G_FADD/G_FSUB/G_FMUL/G_FDIV with F/D extensions. (#69804 ) This a simple patch to get initial FP support started.	2023-10-25 12:40:43 -07:00
Craig Topper	d32e801d74	[RISCV][GISel] Add FP calling convention support (#69138 ) This includes support for using GPRs, FPRs, and stack.	2023-10-25 12:30:03 -07:00
Matthias Braun	94aaaf4fb4	Update m68k tests to new block placement e3cf80c5c1fe55efd8216575ccadea0ab087e79c affected block placement of some tests in the experimental m68k target. This updates them.	2023-10-25 11:33:56 -07:00
Craig Topper	7fde4ffbd3	[Mips][GISel] Fix a couple issues with passing f64 in 32-bit GPRs. (#69131 ) MipsIncomingValueHandler::assignCustomValue should return 1 instead of 2. The return value is the number of additional ArgLocs being consumed. It's assumed that at least 1 is consumed. Correct the LocVT used for the spill when there are no registers left. It should be f64 instead of i32. This allows a workaround to be removed in the SelectionDAG path.	2023-10-25 11:28:22 -07:00
Mircea Trofin	c362cc2705	[mlgo][regalloc] Fix reference file post e3cf80c	2023-10-25 11:23:15 -07:00
Craig Topper	674b53d1a4	[RISCV][GISel] Add widenScalarToNextPow2 to G_SEXTLOAD/G_ZEXTLOAD legalization. This fixes i8->i48 on RV64.	2023-10-25 11:20:18 -07:00
Craig Topper	8efd6799f0	[RISCV][GISel] Add clampScalar G_ZEXTLOAD/G_SEXTLOAD legalization rules. This fixes i8->i16 on RV32/RV64 and i8/i16/i32->i64 on RV64.	2023-10-25 10:23:55 -07:00
Simon Pilgrim	c60bd0e578	[X86] Regenerate select-mmx.ll Change i686 check-prefix to the more standard X86 instead of I32	2023-10-25 18:10:51 +01:00
Christudasan Devadasan	7ce613fc77	[AMDGPU] Cleanup hasUnwantedEffectsWhenEXECEmpty function (#70206 ) The readlane & writelane instructions don't really depend on the the EXEC mask and they should return false from here.	2023-10-25 22:10:16 +05:30
Min-Yih Hsu	e696379c0d	[RISCV][GISel] Falling back to SDISel for scalable vector type values (#70133 ) This patch also tests the fallback of unsupported formal arguments.	2023-10-25 09:02:34 -07:00
Simon Pilgrim	c9c9bf0f20	[DAG] WidenVectorOperand - add basic handling for *_EXTEND_VECTOR_INREG nodes Fixes Issue #70208	2023-10-25 16:52:15 +01:00

... 44 45 46 47 48 ...

52796 Commits