llvm-project

Author	SHA1	Message	Date
David Sherwood	007917b95c	[MVE] Fold fadd(select(..., +0.0)) into a predicated fadd We already have patterns for matching fadd(select(..., -0.0)), but an upcoming patch will lead to patterns using +0.0 as the identity instead of -0.0. I'm adding support for these patterns now to avoid any regressions for MVE. Differential Revision: https://reviews.llvm.org/D127275	2022-06-10 11:09:55 +01:00
Nikita Popov	c10921fa1a	[CGP] Also freeze ctlz/cttz operand when despeculating D125887 changed the ctlz/cttz despeculation transform to insert a freeze for the introduced branch on zero. While this does fix the "branch on poison" issue, we may still get in trouble if we pick a different value for the branch and for the ctz argument (i.e. non-zero for the branch, but zero for the ctz). To avoid this, we should use the same frozen value in both positions. This does cause a regression in RISCV codegen by introducing an additional sext. The DAG looks like this: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %3 t4: i64 = AssertSext t2, ValueType:ch:i32 t23: i64 = freeze t4 t9: ch = CopyToReg t0, Register:i64 %0, t23 t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32> t18: ch = TokenFactor t9, t16 t25: i64 = sign_extend_inreg t23, ValueType:ch:i32 t24: i64 = setcc t25, Constant:i64<0>, seteq:ch t28: i64 = and t24, Constant:i64<1> t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68> t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80> I don't see a really obvious way to improve this, as we can't push the freeze past the AssertSext (which may produce poison). Differential Revision: https://reviews.llvm.org/D126638	2022-06-10 09:46:10 +02:00
Jay Foad	6c372daa84	[AMDGPU] New GFX11 intrinsic llvm.amdgcn.s.sendmsg.rtn Add new intrinsic and codegen support for the s_sendmsg_rtn_b32 and s_sendmsg_rtn_b64 instructions. Differential Revision: https://reviews.llvm.org/D127315	2022-06-10 08:15:23 +01:00
Jay Foad	b0a3849439	[AMDGPU] Update dlc usage for GFX11 In GFX10 dlc controlled L1 cache bypass. In GFX11 it has been repurposed to control MALL NOALLOC, and glc controls L1 as well as L0 cache bypass. Update the documentation and SIMemoryLegalizer accordingly. Set dlc for nontemporal and volatile accesses. Differential Revision: https://reviews.llvm.org/D127405	2022-06-10 08:10:34 +01:00
Yeting Kuo	f68cad9087	[RISCV] Lower VLEFF/VLSEGFF SDNodes to MachineInstrs with VL outputs. The patch is a replacement of D125199. PseudoReadVL with vtype has worry for computing same vtypes of VLEFF/VLSEGFF in two different places, DAGToDAG and InsertVSETVLI. VLEFF/VLSEGFF MI with VL output still could provide the vtype of VLEFF/VLSEGFF to the users of its VL. The patch names the new pseudo as original VLEFF/VLSEGFF name suffixed "_VL" and expand them in RISCVInsertVSETVLI pass. This patch also reverts commit 4537aae0d57e17c217c192d8977012ba475b130c, "[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.". Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126794	2022-06-10 13:57:10 +08:00
Craig Topper	8bbcb98848	[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates. For an addition with simm14 and simm15 immediates with 2 or 3 trailing bits, we can use a shXadd instruction and an addi to do the addition. This patch teaches RISCVMergeBaseOffset to see through this pattern. I don't think the sh1add case occurs because we use two addis for that, but I implemented it for completeness. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127376	2022-06-09 16:07:35 -07:00
Jay Foad	ffe86e3bdd	[AMDGPU] Update SIInsertHardClauses for GFX11 Changes for GFX11: - Clauses may not mix instructions of different types, and there are more types. For example image instructions with and without a sampler are now different types. - The max size of a clause is explicitly documented as 63 instructions. Previously it was implicitly assumed to be 64. This is such a tiny difference that it does not seem worth making it conditional on the subtarget. - It can be beneficial to clause stores as well as loads. Differential Revision: https://reviews.llvm.org/D127391	2022-06-09 21:29:56 +01:00
Simon Pilgrim	72a049d778	[X86][AVX2] LowerINSERT_VECTOR_ELT - support v4i64 insertion as BLENDI(X, SCALAR_TO_VECTOR(Y))	2022-06-09 21:18:10 +01:00
Stanislav Mekhanoshin	23db8e4b43	[AMDGPU] Use v_mad_u64_u32 for IMAD32 Nic Curtis done the experiments to prove it is faster than a separate mul and add. Fixes: SWDEV-332806 Differential Revision: https://reviews.llvm.org/D127253	2022-06-09 11:39:49 -07:00
Stanislav Mekhanoshin	5c974d086c	[AMDGPU] Fix hazard handling of v_cmpx to permlane - VOP3 and SDWA forms of V_CMPX were not handled - Hazard only exists if the compare defines EXEC (i.e. V_CMPX) forwarded to the permlane. Differential Revision: https://reviews.llvm.org/D127344	2022-06-09 10:33:54 -07:00
Ahmed Bougacha	c68b469e07	[AArch64][SVE] Don't crash on pre-legalizer types in extload combine. This was assuming the vector types were MVTs, but they don't have to be. Note that the concrete output of the test isn't very useful, since it's dominated by nonsensical calling convention lowering for the weird types. Differential Revision: https://reviews.llvm.org/D126505	2022-06-09 10:33:21 -07:00
Kito Cheng	cfa463fdc6	[RISCV][NFC] Update testcase for D126861	2022-06-10 00:18:02 +08:00
Kito Cheng	4b11f90903	[RISCV] Fix missing stack pointer recover In order to make sure the stack point is right through the EH region, we also need to restore stack pointer from the frame pointer if we don't preserve stack space within prologue/epilogue for outgoing variables, normally it's just checking the variable sized object is present or not is enough, but we also don't preserve that at prologue/epilogue when have vector objects in stack. Example to show what happened: ``` try { sp adjust for outgoing args. // 1. Sp changed. func_call // 2. Exception raised sp restore // Oh, not restored } catch { // 3. And now we are here. } // 4. Prepare to return!, restore return address from stack, but...sp is wrong. // 5. Screw up! ``` Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D126861	2022-06-09 23:38:50 +08:00
Kito Cheng	8b3426569e	[RISCV] Pre-commit testcase for PR55442 The testcase show the stack pointer isn't recovered when we got exception from `_Z3fooiiiiiiiiiiPi`, and then we screw up due to restore return address from wrong stack pointer. NOTE: Trigger conditions: 1. Frame pointer is required. 2. Stack has out-going argument 3. Vector extension is enabled. Another run-able testcase: $ clang++ -target riscv64-unknown-linux-gnu -march=rv64gcv test.cpp ``` void __attribute__((noinline)) foo(int, int, int, int, int, int, int, int, int, int, int ){ throw int(0); } int main(int argc, char *argv) { int exception_value = 1; try { foo(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0); } catch (int i) { exception_value = i; } return exception_value; } ``` Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D126860	2022-06-09 23:35:38 +08:00
Jay Foad	a3fc8adb7e	[AMDGPU] Add GFX11 test coverage for the memory legalizer	2022-06-09 15:35:56 +01:00
Simon Pilgrim	7dbfcfa735	[DAG] combineInsertEltToShuffle - if EXTRACT_VECTOR_ELT fails to match an existing shuffle op, try to replace an undef op if there is one. This should fix a number of shuffle regressions in D127115 where the re-ordered combines mean we fail to fold a EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT sequence into a BUILD_VECTOR if we extract from more than one vector source.	2022-06-09 14:56:14 +01:00
Nicolai Hähnle	264d1136f9	AMDGPU/GISel: Introduce custom legalization of G_MUL The generic legalizer framework is still used to reduce the problem to scalar multiplication with the bit size a multiple of 32. Generating optimal code sequences for big integer multiplication is somewhat tricky and has a number of target-specific intricacies: - The target has V_MAD_U64_U32 instructions that multiply two 32-bit factors and add a 64-bit accumulator. Most partial products should use this instruction. - The accumulator is mapped to consecutive 32-bit GPRs, and partial- product multiply-adds can feed the accumulator into each other directly. (The register allocator's support for that is somewhat limited, but that only matters for 128-bit integers and larger.) - OTOH, on some hardware, V_MAD_U64_U32 requires the accumulator to be stored in an even-aligned pair of GPRs. To avoid excessive register copies, it makes sense to compute odd partial products separately from even partial products (where a partial product src0[j0] * src1[j1] is "odd" if j0 + j1 is odd) and add both halves together as a final step. - We can combine G_MUL+G_ADD into a single cascade of multiply-adds. - The target can keep many carry-bits in flight simultaneously, so combining carries using G_UADDE is preferable over G_ZEXT + G_ADD. - Not addressed by this patch: When the factors are sign-extended, the V_MAD_I64_I32 instruction (signed version!) can be used. It is difficult to address these points generically: 1) Finding matching pairs of G_MUL and G_UMULH to find a wide multiply is expensive. We could add a G_UMUL_LOHI generic instruction and conditionally use that in the generic legalizer, but by itself this wouldn't allow us to use the accumulation capability of V_MAD_U64_U32. One could attempt to find matching G_ADD + G_UADDE post-legalization, but this is also expensive. 2) Similarly, making sense of the legalization outcome of a wide pre-legalization G_MUL+G_ADD pair is extremely expensive. 3) How could the generic legalizer possibly deal with the particular idiosyncracy of "odd" vs. "even" partial products. All this points in the direction of directly emitting an ideal code sequence during legalization, but the generic legalizer should not be burdened with such overly target-specific concerns. Hence, a custom legalization. Note that the implemented approach is different from that used by SelectionDAG because narrowing of scalars works differently in general. SelectionDAG iteratively cuts wide scalars into low and high halves until a legal size is reached. By contrast, GlobalISel does the narrowing in a single shot, which should be better for compile-time and for the quality of the generated code. This patch leaves three gaps open: 1. When the factors are uniform, we should execute the multiplication on the SALU. Register bank mapping already ensures this. However, the resulting code sequence is not optimal because it doesn't fully use the carry-in capabilities of S_ADDC_U32. (V_MAD_U64_U32 doesn't have a carry-in.) It is very difficult to fix this after the fact, so we should really use a different legalization sequence in this case. Unfortunately, we don't have a divergence analysis and so cannot make that choice. (This only matters for 128-bit integers and larger.) 2. Avoid unnecessary multiplies when sources are known to be zero- or sign-extended. The challenge is that the legalizer does not currently have access to GISelKnownBits. 3. When the G_MUL is followed by a G_ADD, we should consider combining the two instructions into a single multiply-add sequence, to utilize the accumulator of V_MAD_U64_U32 fully. (Unless the multiply has multiple uses and the implied duplication of the multiply is an overall negative). However, this is also not true when the factors are uniform: in that case, it is generally better to not combine the two operations, so that the multiply can be done on the SALU. Again, we don't have a divergence analysis available and so cannot make an informed choice. Differential Revision: https://reviews.llvm.org/D124844	2022-06-09 13:38:56 +02:00
Denis Antrushin	e99e821ce8	[FixupStatepoints] Precommit test for D127308. NFC	2022-06-09 17:04:48 +07:00
Sam Parker	447c411fef	[ARM][ParallelDSP] Fix self reference bug Ensure we don't generate a smlad intrinsic that takes itself as an argument. Differential Revision: https://reviews.llvm.org/D127213	2022-06-09 09:10:57 +00:00
Guillaume Chatelet	dc3367970e	[SelectionDAG] Handle bzero/memset libcalls globally instead of per target Differential Revision: https://reviews.llvm.org/D127279	2022-06-09 08:34:55 +00:00
Lian Wang	91e31fd205	[RISCV][VP] Add fp test of widen and split for vp.setcc Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D127079	2022-06-09 08:14:12 +00:00
Lian Wang	362a02dabe	[RISCV][test] Add widen STEP_VECTOR tests. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127371	2022-06-09 07:47:04 +00:00
Craig Topper	4bcfc41846	[SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value. This matches what we do in IR. For the RISC-V test case, this allows us to use -8 for the AND mask instead of materializing a constant in a register. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127335	2022-06-08 14:55:58 -07:00
Simon Pilgrim	14d50df272	[AMDGPU] Regenerate combine-cond-add-sub.ll	2022-06-08 21:10:12 +01:00
Florian Mayer	0593ce5f0b	[MC] Add 'G' to augmentation string for MTE instrumented functions This was agreed on in https://lists.llvm.org/pipermail/llvm-dev/2020-May/141345.html The thread proposed two options * add a character to augmentation string and handle in libuwind * use a separate personality function. It was determined that this is the simpler and better option. This is part of ARM's Aarch64 ABI: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id22 The next step after this is teaching libunwind to untag when this augmentation character is set. Reviewed By: MaskRay, eugenis Differential Revision: https://reviews.llvm.org/D127007	2022-06-08 12:36:32 -07:00
Kai Nacke	d897a14c2e	[SystemZ] Fix check for zero size when lowering memcmp. During lowering of memcmp/bcmp, the check for a size of 0 is done in 2 different ways. In rare cases this can lead to a crash in SystemZSelectionDAGInfo::EmitTargetCodeForMemcmp(). The root cause is that SelectionDAGBuilder::visitMemCmpBCmpCall() checks for a constant int value which is not yet evaluated. When the value is turned into a SDValue, then the evaluation is done and results in a ConstantSDNode. But EmitTargetCodeForMemcmp() expects the special case of 0 length to be handled, which results in an assertion. The fix is to turn the value into a SDValue, so that both functions use the same check. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D126900	2022-06-08 14:52:13 -04:00
Thomas Lively	aff679a48c	[WebAssembly] Implement remaining relaxed SIMD instructions Add codegen, intrinsics, and builtins for the i16x8.relaxed_q15mulr_s, i16x8.dot_i8x16_i7x16_s, and i32x4.dot_i8x16_i7x16_add_s instructions. These are the last instructions from the relaxed SIMD proposal[1] that had not been implemented. [1]: https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md. Differential Revision: https://reviews.llvm.org/D127170	2022-06-08 10:32:10 -07:00
Simon Pilgrim	d6bb577ffb	[X86] Regenerate slow-pmulld.ll with common SSE check prefixes Add back some unused check prefixes to simplify the D127115 regeneration	2022-06-08 18:23:25 +01:00
David Green	a1aef4f374	[AArch64] Remove ToBeRemoved from AArch64MIPeepholeOpt The ToBeRemoved is used to remove any MachineInstructions that are no longer needed, making sure we don't invalidate the iterator that is currently in use by erasing the instruction straight away. This makes issues for keeping the code in SSA from though, where subsequent transforms that require SSA form may have been broken by previous peepholes. If, instead, we use make_early_inc_range the iteration issue shouldn't be present, so long as we do not remove the subsequent instruction in the peephole optimizations. That way the code between transforms is kept in SSA form, meaning hopefully less things that can go wrong. Differential Revision: https://reviews.llvm.org/D127296	2022-06-08 17:26:07 +01:00
Simon Pilgrim	26053cddb4	[WebAssembly] Regenerate simd-build-vector.ll to show full codegen	2022-06-08 16:54:26 +01:00
Craig Topper	e4ba24c17d	[RISCV] Support (addi (addi globaladdr, C1), C2) in RISCVMergeBaseOffset. Add with immediates in the range [-4096, -2049] or [2048, 4095] get convert to two ADDIs. Teach RISCVMergeBaseOffset to recognize this pattern as well. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126843	2022-06-08 08:20:37 -07:00
Craig Topper	33f4da2455	[RISCV] Support LUI+ADDIW in RISCVMergeBaseOffsetOpt::matchLargeOffset. LUI+ADDIW always produces a simm32. This allows us to always fold it into a global offset. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126729	2022-06-08 08:19:21 -07:00
Jonas Paulsson	88c1cd86ee	[SystemZ] Use STDY/STEY/LDY/LEY for VR32/VR64 in eliminateFrameIndex(). When e.g. a VR64 register is spilled to a stack slot requiring a long (20-bit) displacement, it is possible to use an FP opcode if the allocated phys reg allows it. This eliminates the use of a separate LAY instruction. Reviewed By: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D115406	2022-06-08 17:10:31 +02:00
David Green	33ead6e444	[AArch64] Add tests for bitcast high register extracts. NFC	2022-06-08 15:26:31 +01:00
Shao-Ce SUN	862f30a428	[RISCV] Add ISD::EH_DWARF_CFA Based on D24038. LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D126181	2022-06-08 22:03:30 +08:00
Kito Cheng	6a6f632b93	Revert "[RISCV] Testcase to show wrong register allocation result of subreg liveness" Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS. This reverts commit cbe22c794348a1962af8a5d21fbedbb65974d94c.	2022-06-08 21:19:27 +08:00
Simon Pilgrim	27f970aac8	[Hexagon] Regenerate build-vector-v4i8-zext.ll to show full codegen	2022-06-08 11:50:49 +01:00
Paul Walker	d88354213c	[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST. Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work with TypeSize directly rather than silently casting to unsigned. To accomplish this I've extended TypeSize with an interface that essentially allows TypeSize division when both operands have the same number of dimensions. There still exists combinations of scalable vector bitcasts that cause compiler crashes. I call these out by adding "is missing" entries to sve-bitcast. Depends on D126957. Fixes: #55114 Differential Revision: https://reviews.llvm.org/D127126	2022-06-08 10:30:07 +01:00
Paul Walker	a1121c31d8	[SVE] Fix incorrect code generation for bitcasts of unpacked vector types. Bitcasting between unpacked scalable vector types of different element counts is not a NOP because the live elements are laid out differently. 01234567 e.g. nxv2i32 = XX??XX?? nxv4f16 = X?X?X?X? Differential Revision: https://reviews.llvm.org/D126957	2022-06-08 10:30:07 +01:00
Kito Cheng	7207373e1e	Revert "[SplitKit] Handle early clobber + tied to def correctly" Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS. This reverts commit e14d04909df4e52e531f6c2e045c3cf9638dd817.	2022-06-08 13:05:35 +08:00
python3kgae	12ca031b0d	[DirectX][Fail crash in DXILPrepareModule pass when input has typed ptr. Check supportsTypedPointers instead of hasSetOpaquePointersValue when query if has typed ptr. Reviewed By: beanz Differential Revision: https://reviews.llvm.org/D127268	2022-06-07 21:11:24 -07:00
Kito Cheng	e14d04909d	[SplitKit] Handle early clobber + tied to def correctly Spliter will try to extend a live range into `r` slot for a use operand, that's works on most situaion, however that not work correctly when the operand has tied to def, and the def operand is early clobber. Give an example to demo what's wrong: 0 %0 = ... 16 early-clobber %0 = Op %0 (tied-def 0), ... 32 ... = Op %0 Before extend: %0 = [0r, 0d) [16e, 32d) The point we want to extend is 0d to 16e not 16r in this case, but if we use 16r here we will extend nothing because that already contained in [16e, 32d). This patch add check for detect such case and adjust the extend point. Detailed explanation for testcase: https://reviews.llvm.org/D126047 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D126048	2022-06-08 11:33:05 +08:00
Kito Cheng	cbe22c7943	[RISCV] Testcase to show wrong register allocation result of subreg liveness This testcase show the live range isn't construct correctly when subreg liveness is enabled. In the testcase `early-clobber-tied-def-subreg-liveness.ll`, first operand of `vsext.vf2 v8, v16, v0.t` is both def and use, and the use is come from the memory location of `.L__const._Z3foov.var_49`, it's load and spilled into stack, and then...v8 is overwrite by another instructions. ``` lui a0, %hi(.L__const._Z3foov.var_49) addi a0, a0, %lo(.L__const._Z3foov.var_49) ... vle16.v v8, (a0) # Load value from var_49 ... addi a0, sp, 16 ... vs2r.v v8, (a0) # Spill ... vl2r.v v8, (a1) # Reload ... lui a0, %hi(.L__const._Z3foov.var_40) addi a0, a0, %lo(.L__const._Z3foov.var_40) vle16.v v8, (a0) # Load value...into v8??? vmsbc.vx v0, v8, a0 # And use that. ... vsext.vf2 v8, v16, v0.t # But v8 is here...which is expect value from the reload ``` The `early-clobber-tied-def-subreg-liveness.mir` has more detailed infomation for that, `%25.sub_vrm2_0` is defined in 64, and used in 464, and defined again in 464, and we has used an inline asm to clobber all vector register for trigger spliter. ``` 0B bb.0.entry: 16B %0:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_49 32B %1:gpr = ADDI %0:gpr, target-flags(riscv-lo) @__const._Z3foov.var_49 48B dead $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype 64B undef %25.sub_vrm2_0:vrn4m2nov0 = PseudoVLE16_V_M2 %1:gpr, 2, 4, implicit $vl, implicit $vtype 80B %3:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_48 96B %4:gpr = ADDI %3:gpr, target-flags(riscv-lo) @__const._Z3foov.var_48 112B %5:vr = PseudoVLE8_V_M1 %4:gpr, 2, 3, implicit $vl, implicit $vtype 128B %6:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_46 144B %7:gpr = ADDI %6:gpr, target-flags(riscv-lo) @__const._Z3foov.var_46 160B %25.sub_vrm2_1:vrn4m2nov0 = PseudoVLE16_V_M2 %7:gpr, 2, 4, implicit $vl, implicit $vtype 176B %9:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_45 192B %10:gpr = ADDI %9:gpr, target-flags(riscv-lo) @__const._Z3foov.var_45 208B %25.sub_vrm2_2:vrn4m2nov0 = PseudoVLE16_V_M2 %10:gpr, 2, 4, implicit $vl, implicit $vtype 224B INLINEASM &"" [sideeffect] [attdialect], $0:[clobber], ... 240B %12:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_44 256B %13:gpr = ADDI %12:gpr, target-flags(riscv-lo) @__const._Z3foov.var_44 272B dead $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype 288B %25.sub_vrm2_3:vrn4m2nov0 = PseudoVLE16_V_M2 %13:gpr, 2, 4, implicit $vl, implicit $vtype 304B $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype 320B %16:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_40 336B %17:gpr = ADDI %16:gpr, target-flags(riscv-lo) @__const._Z3foov.var_40 352B %18:vrm2 = PseudoVLE16_V_M2 %17:gpr, 2, 4, implicit $vl, implicit $vtype 368B $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype 384B %20:gpr = LUI 1048572 400B %21:gpr = ADDIW %20:gpr, 928 416B early-clobber %22:vr = PseudoVMSBC_VX_M2 %18:vrm2, %21:gpr, 2, 4, implicit $vl, implicit $vtype 432B $x0 = PseudoVSETIVLI 2, 9, implicit-def $vl, implicit-def $vtype 448B $v0 = COPY %22:vr 464B early-clobber %25.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %25.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, killed $v0, 2, 4, 0, implicit $vl, implicit $vtype 480B %26:gpr = LUI target-flags(riscv-hi) @var_47 496B %27:gpr = ADDI %26:gpr, target-flags(riscv-lo) @var_47 512B PseudoVSSEG4E16_V_M2 %25:vrn4m2nov0, %27:gpr, 2, 4, implicit $vl, implicit $vtype 528B PseudoRET ``` When spliter will try to split %25: ``` selectOrSplit VRN4M2NoV0:%25 [64r,160r:4)[160r,208r:0)[208r,288r:1)[288r,464e:2)[464e,512r:3) 0@160r 1@208r 2@288r 3@464e 4@64r L0000000000000030 [160r,512r:0) 0@160r L00000000000000C0 [208r,512r:0) 0@208r L0000000000000300 [288r,512r:0) 0@288r L000000000000000C [64r,464e:1)[464e,512r:0) 0@464e 1@64r weight:1.179245e-02 w=1.179245e-02 ``` ``` Best local split range: 64r-208r, 6.999861e-03, 3 instrs enterIntvBefore 64r: not live leaveIntvAfter 208r: valno 1 useIntv [64B;216r): [64B;216r):1 blit [64r,160r:4): [64r;160r)=1(%29)(recalc) blit [160r,208r:0): [160r;208r)=1(%29)(recalc) blit [208r,288r:1): [208r;216r)=1(%29)(recalc) [216r;288r)=0(%28)(recalc) blit [288r,464e:2): [288r;464e)=0(%28)(recalc) blit [464e,512r:3): [464e;512r)=0(%28)(recalc) rewr %bb.0 464e:0 early-clobber %28.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %25.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, $v0, 2, 4, 0, implicit $vl, implicit $vtype rewr %bb.0 288r:0 %28.sub_vrm2_3:vrn4m2nov0 = PseudoVLE16_V_M2 %13:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 208r:1 %29.sub_vrm2_2:vrn4m2nov0 = PseudoVLE16_V_M2 %10:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 160r:1 %29.sub_vrm2_1:vrn4m2nov0 = PseudoVLE16_V_M2 %7:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 64r:1 undef %29.sub_vrm2_0:vrn4m2nov0 = PseudoVLE16_V_M2 %1:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 464B:0 early-clobber %28.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %28.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, $v0, 2, 4, 0, implicit $vl, implicit $vtype rewr %bb.0 512B:0 PseudoVSSEG4E16_V_M2 %28:vrn4m2nov0, %27:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 216B:1 undef %28.sub_vrm1_0_sub_vrm1_1_sub_vrm1_2_sub_vrm1_3_sub_vrm1_4_sub_vrm1_5:vrn4m2nov0 = COPY %29.sub_vrm1_0_sub_vrm1_1_sub_vrm1_2_sub_vrm1_3_sub_vrm1_4_sub_vrm1_5:vrn4m2nov0 queuing new interval: %28 [216r,288r:0)[288r,464e:1)[464e,512r:2) 0@216r 1@288r 2@464e L000000000000000C [216r,216d:0)[464e,512r:1) 0@216r 1@464e L0000000000000300 [288r,512r:0) 0@288r L00000000000000C0 [216r,512r:0) 0@216r L0000000000000030 [216r,512r:0) 0@216r weight:8.706897e-03 Enqueuing %28 queuing new interval: %29 [64r,160r:0)[160r,208r:1)[208r,216r:2) 0@64r 1@160r 2@208r L000000000000000C [64r,216r:0) 0@64r L00000000000000C0 [208r,216r:0) 0@208r L0000000000000030 [160r,216r:0) 0@160r weight:1.097826e-02 Enqueuing %29 ``` The live range of first part subreg of %25 is become [216r,216d:0)[464e,512r:1), however first live range should live until 464e rather than just live and [216r,216d:0). And then the register allocator allocated wrong result accroding the live range info. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126047	2022-06-08 11:27:24 +08:00
David Green	bccbf5276e	[AArch64] Remove isDef32 isDef32 would attempt to make a guess at which SelectionDag nodes were 32bit sources, and use the nature of 32bit AArch64 instructions implicitly zeroing the upper register half to not emit zext that were expected to already be zero. This was a bit fragile though, needing to guess at the correct opcodes that do not become 32bit defs later in ISel. This patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes. A part of SelectArithExtendedRegister was left with the same logic as a heuristic to prevent some regressions from it picking less optimal sequences. The AArch64MIPeepholeOpt pass also needs to be taught that a COPY from a FPR will become a FMOVSWr, which it lowers immediately to make sure that remains true through register allocation. Fixes #55833 Differential Revision: https://reviews.llvm.org/D127154	2022-06-07 18:57:59 +01:00
Simon Pilgrim	a083f3caa1	[DAG] combineShuffleOfSplatVal - fold shuffle(splat,undef) -> splat, iff the splat contains no UNDEF elements As noticed on D127115 - we were missing this fold, instead just having the shuffle(shuffle(x,undef,splatmask),undef) fold. We should be able to merge these into one using SelectionDAG::isSplatValue, but we'll need to match the shuffle's undef handling first. This also exposed an issue in SelectionDAG::isSplatValue which was incorrectly propagating the undef mask across a bitcast (it was trying to just bail with a APInt::isSubsetOf if it found any undefs but that was actually the wrong way around so didn't fire for partial undef cases).	2022-06-07 16:42:24 +01:00
Craig Topper	0c66deb498	[RISCV] Scalarize gather/scatter on RV64 with Zve32* extension. i64 indices aren't supported on Zve32*. Scalarize gathers to prevent generating illegal instructions. Since InstCombine will aggressively canonicalize GEP indices to pointer size, we're pretty much always going to have an i64 index. Trying to predict when SelectionDAG will find a smaller index from the TTI hook used by the ScalarizeMaskedMemIntrinPass seems fragile. To optimize this we probably need an IR pass to rewrite it earlier. Test RUN lines have also been added to make sure the strided load/store optimization still works. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127179	2022-06-07 08:07:50 -07:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
Matt Arsenault	22cc497502	AMDGPU: Fix not checking liveness in test	2022-06-07 10:00:25 -04:00
Simon Pilgrim	f5507978a3	[X86] getFauxShuffleMask - add VSELECT/BLENDV handling First step towards enabling shuffle combining starting from VSELECT/BLENDV nodes - this should eventually help improve the codegen reported at Issue #54819	2022-06-07 14:46:25 +01:00
Simon Pilgrim	61984f9199	[X86] x86-interleaved-access.ll - use nounwind to remove cfi noise from tests	2022-06-07 14:46:25 +01:00

1 2 3 4 5 ...

43669 Commits