llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	aeb3c772d3	[X86] Add shift by splat modulo amount vector tests Shows failure to fold zero_extend_vector_inreg(and(x, c)) -> bitcast(and(x,c')) when we're only demanding the 0'th extended element, such as with the SSE variable shift ops.	2021-11-16 20:46:17 +00:00
Philip Reames	8d85e945b2	[SCEV] Canonicalize X - urem X, Y patterns There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version. The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency. The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions. Differential Revision: https://reviews.llvm.org/D114018	2021-11-16 11:59:21 -08:00
Victor Huang	ae27ca9a67	[PowerPC] PPC backend optimization on conditional trap intrustions This patch adds PPC back end optimization to analyze the arguments of a conditional trap instruction to execute one of the following: 1. Delete it if never trap 2. Replace it if always trap 3. Otherwise keep it Reviewed By: nemanjai, amyk, PowerPC Differential revision: https://reviews.llvm.org/D111434	2021-11-16 13:11:57 -06:00
David Sherwood	4607459022	[AArch64] Fix TypeSize->uint64_t implicit conversion in AArch64ISelLowering::hasAndNot For now I've just changed the code to only return true from AArch64ISelLowering::hasAndNot if the vector is fixed-length. Once we have the right patterns or DAG combines to use bic/bif we can also enable this for SVE. Test added here: CodeGen/AArch64/vselect-constants.ll Differential Revision: https://reviews.llvm.org/D113994	2021-11-16 16:25:16 +00:00
Jon Chesterfield	30b29db7c7	[amdgpu] Don't crash on empty global ctor/dtor Global ctor/dtor can be an empty array, which is a Constant not a ConstantArray. The cast<ConstantArray> therefore asserts / crashes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D113800	2021-11-16 14:36:08 +00:00
Kai Luo	c0da8a4e40	[CGP][PowerPC] Pre-commit test case for D113872. NFC.	2021-11-16 09:18:49 +00:00
Serguei Katkov	0ecb12a27f	[STATEPOINT] Force implicit-def for lr register. STATEPOINT instruction behavior is similar to call instruction. In aarch64 BL instruction implicitly define lr register, so STATEPOINT instruction should do the same. However STATEPOINT is a general pseudo instruction and I could not find a way to override list of implicit defs for specific target. So this patch post processes inserting STATEPOINT instruction by adding implisit dead def for lr. Reviewers: reames, loicottet, ostannard Reviewed By: reames Subscribers: danilaml, hiraditya, kristof.beyls, llvm-commits, yrouban Differential Revision: https://reviews.llvm.org/D111114	2021-11-16 12:52:00 +07:00
Amara Emerson	dc84770d55	[GlobalISel] Add a store-merging optimization pass and enable for AArch64. This is a first attempt at a constant value consecutive store merging pass, a counterpart to the DAGCombiner's store merging optimization. The high level goals of this pass: * Have a simple and efficient algorithm. As close to linear time as we can get. Thus, prioritizing scalability of the algorithm over merging every corner case we can find. The DAGCombiner's store merging code has been the source of compile time and complexity issues in the past and I wanted to avoid that. * Don't introduce any new data structures for ordering memory operations. In MIR, we don't have the concept of chains like we do in the DAG, and the instruction order is stricter than enforcing ordering with graph edges. Although I considered adding something similar, I couldn't justify the overhead. The pass is current split into 3 main parts. The main store merging code focuses on identifying candidate stores and managing the candidate group that's under consideration for merging. Analyzing addressing of stores is a potentially complex part and for now there's just a basic implementation to identify easy cases. Finally, the other main bit of complexity is the alias analysis, which tries to follow the same logic as the DAG's AA. Currently this implementation only supports merging of constant stores. Stores of arbitrary variables are technically possible with a very small change, but the DAG chooses not to do this. Doing so here makes most code worse since there's extra overhead in merging values into wider registers. On AArch64 -Os, this optimization results in very minor savings on CTMark. Differential Revision: https://reviews.llvm.org/D109131	2021-11-15 21:10:39 -08:00
Craig Topper	391b0ba603	[RISCV] Override TargetLowering::hasAndNot for Zbb. Differential Revision: https://reviews.llvm.org/D113937	2021-11-15 18:44:07 -08:00
Craig Topper	d90eeab0ed	[RISCV] Add test cases to prepare for overring TargetLowering::hasAndNot. NFC These test files are copied directly from AArch64. Some of the cases may benefit from ANDN with the Zbb extension. Somes cases already improve use ANDN. selectcc-to-shiftand.ll also contains tests that test select->and conversion even when a ANDN isn't needed. I think this improves our coverage of these optimizations. Differential Revision: https://reviews.llvm.org/D113935	2021-11-15 18:44:07 -08:00
Fabian Wolff	b484fa8289	[X86] Fix crash with inline asm using wrong register name Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D113834	2021-11-16 10:38:12 +08:00
Matt Arsenault	659887b405	AMDGPU: Mark prolog/epilog SCC defs as dead A future change will add SCC liveness checks. Since we are still relying on forward register scavenging, add dead flags to avoid spuriously detecting SCC as live.	2021-11-15 21:35:06 -05:00
Matt Arsenault	e6bfbd7e0d	AMDGPU: Regenerate test checks	2021-11-15 21:35:06 -05:00
Craig Topper	233def40f7	[DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs. It's possible that the mask is already a NOT. At least if InstCombine hasn't canonicalized the input. In that case we will form an ANDN with X instead of with Y. So we don't need to worry about Y being a constant. We might need to check that X isn't a constant instead, but we don't have a test case for that yet. This fixes a size regression found when trying to enable this combine for RISCV in D113937. Differential Revision: https://reviews.llvm.org/D113948	2021-11-15 17:15:51 -08:00
Ben Shi	4c3d916c4b	[RISCV] Optimize immediate materialisation with SHADD Use LUI+SHADD+ADDI to compose specific immediates. Reviewed By: craig.topper, luismarques Differential Revision: https://reviews.llvm.org/D113568	2021-11-15 23:34:28 +00:00
Ben Shi	39256ed58c	[RISCV][test] Add more tests of immediate materialisation Reviewed By: craig.topper, luismarques Differential Revision: https://reviews.llvm.org/D113567	2021-11-15 23:34:27 +00:00
Nico Weber	833393e021	[asm] Correctly handle special names in variants There's really no reason why anyone should use these special names in a variant. I noticed this while reading the code: all other writes to OS are guarded by this conditional, and the behavior with the check seems more correct, so let's add the check. Differential Revision: https://reviews.llvm.org/D113909	2021-11-15 15:37:09 -05:00
Lei Huang	f50c6c1718	[PowerPC] Fix 32bit vector insert instructions for ISA3.1 The platform independent ISD::INSERT_VECTOR_ELT take a element index, but vins* instructions take a byte index. Update 32bit td patterns for vector insert to handle the element index accordingly. Since vector insert for non constant index are supported in ISA3.1, there is no need to use platform specific ISD node, PPCISD::VECINSERT. Update td pattern to directly use ISD::INSERT_VECTOR_ELT instead. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D113802	2021-11-15 14:36:39 -06:00
Craig Topper	f59307bfdc	[RISCV] Teach needVSETVLIPHI to handle mask register instructions. This handles the case where the mask register instruction input comes from a Phi of vsetvlis. If the VLMAX is the same as the VLMAX required by the mask register instruction, we can avoid a vsetvli. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113204	2021-11-15 09:57:28 -08:00
Sanjay Patel	3d01507c2d	[x86] fold vector (X > -1) & Y to shift+andn (2nd try) The first try at this patch ( bf5748a1af0d ) was reverted ( 5be64d416481 ) because it could crash. The cause of that problem was failing to account for the optional peek-through-bitcast in the enclosing function. This version of the patch adds a clause to avoid the fold in case of bitcasts because it is unlikely to be profitable in that scenario. A test case based on https://llvm.org/PR52504 was added to make sure we don't have that problem again. Original commit message: and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y This avoids the -1 constant vector in favor of an arithmetic shift instruction if it exists (the ISA is still not complete after all these years...). We catch this pattern late in combining by matching PCMPGT, so it should not interfere with more general folds. Differential Revision: https://reviews.llvm.org/D113603	2021-11-15 11:09:32 -05:00
Sanjay Patel	6efe64cf9f	[x86] add test for vector signbit mask fold (PR52504); NFC This goes with D113603 - which was reverted because it could crash on this and similar examples.	2021-11-15 11:09:31 -05:00
Simon Pilgrim	ea9e6aa423	[X86] getAVX512Node() - find constant broadcasts to encourage load-folding If an operand is a bitcasted or widended constant, try to more aggressively create broadcastable constants for folding, which in particular helps non-VLX modes. I've refactored getAVX512Node so that VLX targets can make better use of this as well. NOTE: In the future, I think we should consider removing the broadcast of constant data from DAG entirely and move this to either X86InstrInfo::foldMemoryOperand or a new pass - AVX1/2 targets has similar problems with missed (whole vector) folds that need to be improved as well. Differential Revision: https://reviews.llvm.org/D113845	2021-11-15 15:52:03 +00:00
Hans Wennborg	5be64d4164	Revert "[x86] fold vector (X > -1) & Y to shift+andn" This casued assertion failures: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:9446: void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode , llvm::SDNode ): Assertion `(!From->hasAnyUseOfValue(i) \|\| From->getValueType(i) == To->getValueType(i)) && "Cannot use this version of ReplaceAllUsesWith!"' failed. See comment on the code review. (Had to update some expectations in test/CodeGen/X86/vselect-zero.ll manually due to other changes having landed after the reverted one.) > and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y > > This avoids the -1 constant vector in favor of an arithmetic shift > instruction if it exists (the ISA is still not complete after all > these years...). > > We catch this pattern late in combining by matching PCMPGT, so it > should not interfere with more general folds. > > Differential Revision: https://reviews.llvm.org/D113603 This reverts commit bf5748a1af0d2f6f9396d9dc6ac89d15de41eee7.	2021-11-15 12:35:49 +01:00
Dmitry Preobrazhensky	91f4650ebb	[AMDGPU][MC][GFX10] Corrected global_atomic_fcmpswap* Corrected src data size of global_atomic_fcmpswap and global_atomic_fcmpswap_x2 opcodes. Differential Revision: https://reviews.llvm.org/D113746	2021-11-15 12:51:12 +03:00
David Green	4c3bfdc7f1	[ARM] Fix GatherScatter AddLikeOr condition	2021-11-15 09:44:41 +00:00
Peter Waller	599ea3e73f	[AArch64][SVE] Break false dependencies for inactive lanes of FP unary operations Follow up to D105889, covering instructions using sve_fp_2op_p_zd_HSD: frintn, frintp, frintm, frintz, frinta, frintx, frinti, frecpx and fsqrt. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D113485	2021-11-15 09:15:21 +00:00
Chen Zheng	eec9ca622c	[PowerPC] guard update form prepare with non-const increment with option Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D113471	2021-11-15 02:16:46 +00:00
Simon Pilgrim	c3a772fdf5	[X86] Add getPack helper This helper provides a more complete approach for lowering to X86ISD::PACKSS/PACKUS nodes - testing for existing suitable sign/zero extension before recreating it. It also optionally packs the upper half instead of the lower half.	2021-11-14 21:27:15 +00:00
Koakuma	3e0f3041cc	[SPARC] Zero-extend the operands when doing UMULO on 64-bit integers On SPARC, S/UMULO operation on 64-bit integers works by extending the value to 128-bit, then doing a multiplication and checking the upper half of the result. This makes UMULO works correctly by putting a zero in the upper half rather than doing a sign extension. Reviewed By: LemonBoy Differential Revision: https://reviews.llvm.org/D110555	2021-11-14 19:59:52 +01:00
Sanjay Patel	254c5246e9	[DAGCombiner] match inverted/swapped patterns for vselect of mask of signbit This was noted as a follow-up to D113212 / D113426: 4fc1fc4005f7 7e30404c3b6c 11522cfcad6b https://alive2.llvm.org/ce/z/e4o96b The canonicalization rules for these IR patterns are complicated, and we were not matching the expected forms in 2 out of the 3 cases. We can make codegen more robust by matching the swapped forms (and that will also work if these patterns are created late).	2021-11-14 09:35:26 -05:00
Simon Pilgrim	f4143ffed7	[X86] Widen 128/256-bit VPTERNLOG patterns to 512-bit on non-VLX targets Similar to what we've done for other ops, this patch widens VPTERNLOG to a 512-bit op for non-VLX targets. Fixes regressions in D113192 Differential Revision: https://reviews.llvm.org/D113827	2021-11-14 13:40:53 +00:00
David Green	355ee18c5d	[TypePromotion] Extend TypePromotion::isSafeWrap This modifies the preconditions of TypePromotion's isSafeWrap method, to allow it to work from all constants from the ICmp. Using the code: %a = add %x, C1 %c = icmp ult %a, C2 According to Alive, we can prove that is equivalent to icmp ult (add zext(%x), sext(C1)), zext(C2) given C1 <=s 0 and C1 >s C2. https://alive2.llvm.org/ce/z/CECYZB Which is similar to what is already present. We can also prove icmp ult (add zext(%x), sext(C1)), sext(C2) given C1 <=s 0 and C1 <=s C2. https://alive2.llvm.org/ce/z/KKgyeL The PrepareWrappingAdds method was removed, and the constants are now altered to sext or zext directly as required by the above methods. Differential Revision: https://reviews.llvm.org/D113678	2021-11-14 11:18:31 +00:00
Matt Arsenault	54172326e0	AMDGPU: Regenerate test checks Regenerate with -NEXT checks to make a future diff clearer.	2021-11-13 11:35:35 -05:00
Craig Topper	82bc6a094e	[X86] Promote f16 STRICT_FROUND to f32 and call libc. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D113817	2021-11-12 21:37:03 -08:00
Phoebe Wang	e49fcfc7cd	[X86][ABI] Change the alignment of f80 in 32-bit calling convention to meet with different data layout Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113739	2021-11-13 10:00:34 +08:00
Craig Topper	8909dc5ebe	[RISCV] Fixed duplicate RUN line on float-intrinsics.ll. NFC We had two identical RV64I RUN lines. One should be RV32I.	2021-11-12 16:27:36 -08:00
Craig Topper	02bed66cd5	[RISCV] Improve codegen for i32 udiv/urem by constant on RV64. The division by constant optimization often produces constants that are uimm32, but not simm32. These constants require 3 or 4 instructions to materialize without Zba. Since these instructions are often used by a multiply with a LHS that needs to be zero extended with an AND, we can switch the MUL to a MULHU by shifting both inputs left by 32. Once we shift the constant left, the upper 32 bits no longer need to be 0 so constant materialization is free to use LUI+ADDIW. This reduces the constant materialization from 4 instructions to 3 in some cases while also reducing the zero extend of the LHS from 2 shifts to 1. Differential Revision: https://reviews.llvm.org/D113805	2021-11-12 14:49:10 -08:00
Sanjay Patel	6c32dd4dfa	[AArch64][x86] add tests for swapped cmp+vselect patterns; NFC These patterns were noted in the recent D113212 and follow-ups. I did not bother to duplicate every test because it should be clear if we recognize the swaps from a smaller sample. We have complete coverage for the original patterns.	2021-11-12 15:49:46 -05:00
Simon Pilgrim	3170670541	[AMDGPU] Regenerate udiv.ll tests	2021-11-12 17:57:40 +00:00
Simon Pilgrim	6bb71738e2	[X86] convertShiftLeftToScale - improve vXi8 constant handling Add support for v32i8/v64i8 converting shift-by-constant to multiply-by-constant. This helps us avoid the generic vXi8 shift lowering, and a lot of VPBLENDVB ops which can be particularly slow. We also needed to reorder a few shift lowering patterns to prevent regressions, particularly for XOP+AVX2 (Excavator) targets (which can split to fast v16i8 shifts) and AVX512-BWI targets (which prefers to extend to fast v32i16 shifts).	2021-11-12 16:48:10 +00:00
Jay Foad	a70bbb5f7a	[AMDGPU] Simplify 64-bit division/remainder expansion The old expansion open-coded a 64-bit addition in a strange way, by adding the high parts without carry-in from the low part, and then adding the carry back in later on. Fixing this saves a couple of instructions and makes the code much easier to understand. Differential Revision: https://reviews.llvm.org/D113679	2021-11-12 15:48:41 +00:00
Simon Pilgrim	59087dce3b	[X86] combineX86ShufflesConstants - constant fold from target shuffles unless optsize = true Currently we only constant fold target shuffles if any of the sources has one use, or it would remove a variable shuffle mask - the aim being to avoid constant pool bloat. This patch proposes we should constant fold by default and only limit this if optsize is enabled - I've added a basic test for this in vector-mul.ll (the pmuludq case is by far the most common), I can add other specific test cases if people need them. This should permit further constant folding, break some instruction dependencies and help reduce shuffle port pressure. Differential Revision: https://reviews.llvm.org/D113748	2021-11-12 14:02:43 +00:00
Sanjay Patel	bf5748a1af	[x86] fold vector (X > -1) & Y to shift+andn and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y This avoids the -1 constant vector in favor of an arithmetic shift instruction if it exists (the ISA is still not complete after all these years...). We catch this pattern late in combining by matching PCMPGT, so it should not interfere with more general folds. Differential Revision: https://reviews.llvm.org/D113603	2021-11-12 08:17:46 -05:00
Phoebe Wang	4721ee7029	Add nounwind for tests. NFC	2021-11-12 21:09:05 +08:00
Markus Lavin	4e94e25c90	Fix minor deficiency in machine-sink. Register uses that are MRI->isConstantPhysReg() should not inhibit sinking transformation. Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D111531	2021-11-12 08:01:13 +01:00
Serge Pavlov	3057e850b8	[X86] Preserve FPSW when popping x87 stack When compiler converts x87 operations to stack model, it may insert instructions that pop top stack element. To do it the compiler inserts instruction FSTP right after the instruction that calculates value on the stack. It can break the code that uses FPSW set by the last instruction. For example, an instruction FXAM is usually followed by FNSTSW, but FSTP is inserted after FXAM. As FSTP leaves condition code in FPSW undefined, the compiler produces incorrect code. With this change FSTP in inserted after the FPSW consumer if the last instruction sets FPSW. Differential Revision: https://reviews.llvm.org/D113335	2021-11-12 12:00:09 +07:00
Phoebe Wang	74b979abcd	[X86][FP16] Avoid to generate VZEXT_MOVL with i16 This fixes the crash due to lacking VZEXT_MOVL support with i16. Reviewed By: LuoYuanke, RKSimon Differential Revision: https://reviews.llvm.org/D113661	2021-11-12 09:32:29 +08:00
Sanjay Patel	ce89335fe8	[x86] add tests and RUNs for vector compares; NFC More coverage for D113603	2021-11-11 14:18:25 -05:00
Craig Topper	eb44f3fc58	[RISCV] Add rv32i/rv64i command lines to some floating point tests. NFC This improves our coverage of soft float libcalls lowering. Remove most of the test cases from rv64i-single-softfloat.ll. They were duplicated in the test files that now test softflow. Only a couple test cases for constrained FP remain. Those should be removed when we start supporting constrained FP. This is follow up from D113528.	2021-11-11 10:56:27 -08:00
Craig Topper	8e85717dbf	[RISCV] Fix non-sensical intrinsic names in rv64i-single-softfloat.ll. NFC Many of these had an extra 'f' at the beginning of their name that caused them to not be treated as intrinsics. I'm not sure what fpround was supposed to be so I deleted it. frem was changed from an intrinsic to an instruction. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D113528	2021-11-11 08:36:34 -08:00

1 2 3 4 5 ...

41060 Commits