llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	fbe6eac8bd	[X86][AVX] Add PR50053 test case	2021-07-26 17:57:38 +01:00
Stephen Tozer	31e7551217	[DebugInfo] Correctly update debug users of SSA values in tail duplication During tail duplication, SSA values may be updated and have their uses replaced with a virtual register, and any debug instructions that use that value are deleted. This patch fixes the implementation of the debug instruction deletion to work correctly for debug instructions that use the SSA value multiple times, by batching deletions so that we don't attempt to delete the same instruction twice. Differential Revision: https://reviews.llvm.org/D106557	2021-07-26 17:27:57 +01:00
Simon Pilgrim	c8472db0a8	[X86][AVX] Prefer vinsertf128 to vperm2f128 on AVX1 targets Splatting the lower xmm with vinsertf128 is at least as quick as vperm2f128, and a lot faster on some AMD targets. First step towards PR50053	2021-07-26 11:11:56 +01:00
Simon Pilgrim	f64e251560	[X86][SSE] Don't scrub address math from interleaved shuffle tests	2021-07-26 11:03:31 +01:00
Roman Lebedev	9ebd0dbf0f	[NFC][Codegen][X86] Improve test coverage for insertions into XMM vector	2021-07-25 21:08:03 +03:00
Simon Pilgrim	b95f66ad78	[X86][SSE] LowerRotate - perform modulo on the amount splat source directly. If the rotation amount is a known splat, perform the modulo on the splat source, and then perform the splat. That way the amount-extension performed later by LowerScalarVariableShift can fold the splats away without any multiple-use issues. Fixes one of the concerns raised on D104156	2021-07-25 17:30:32 +01:00
Roman Lebedev	fa0910e6de	[NFC][Codegen][X86] Improve test coverage for repeated insertions of the same scalar into different elements	2021-07-25 17:37:04 +03:00
Sanjay Patel	1ce05ad619	[x86] improve CMOV codegen by pushing add into operands, part 2 This is a minimum extension of D106607 to allow folding for 2 non-zero constantsi that can be materialized as immediates.. In the reduced test examples, we save 1 instruction by rolling the constants into LEA/ADD. In the motivating test from the bullet benchmark, we absorb both of the constant moves into add ops via LEA magic, so we reduce by 2 instructions. Differential Revision: https://reviews.llvm.org/D106684	2021-07-25 10:05:41 -04:00
Simon Pilgrim	15b883f457	[X86][AVX] Adjust AllowBWIVPERMV3 tolerance to account for VariableCrossLaneShuffleDepth As noticed on D105390 - we were hardwiring the depth limit for combining to VPERMI2W/VPERMI2B instructions. Not only had we made the limit too low, we hadn't accounted for slow/fast shuffles via the VariableCrossLaneShuffleDepth control	2021-07-25 14:05:11 +01:00
Simon Pilgrim	f8191ee32b	[X86] Add additional div-mod-pair negative test coverage As suggested on D106745	2021-07-24 15:21:46 +01:00
Simon Pilgrim	01f20581dd	[X86] Add i128 div-mod-pair test coverage	2021-07-24 14:00:53 +01:00
Simon Pilgrim	478b22d95a	[CGP] despeculateCountZeros - Don't create is-zero branch if cttz/ctlz source is known non-zero If value tracking can confirm that the cttz/ctlz source is known non-zero then we don't need to create a branch (which DAG will struggle to recover from). Differential Revision: https://reviews.llvm.org/D106685	2021-07-24 13:11:49 +01:00
Sanjay Patel	937e7c60c8	[x86] add more tests for add with CMOV of constants; NFC See D106607 / https://llvm.org/PR51069 for details.	2021-07-24 06:23:36 -04:00
Craig Topper	cc6d302c91	[X86] Fix a bug in TEST with immediate creation This code tries to form a TEST from CMP+AND with an optional truncate in between. If we looked through the truncate, we may have extra bits in the AND mask that shouldn't participate in the checks. Normally SimplifyDemendedBits takes care of this, but the AND may have another user. So manually mask out any extra bits. Fixes PR51175. Differential Revision: https://reviews.llvm.org/D106634	2021-07-23 09:03:53 -07:00
Sanjay Patel	f060aa1cf3	[x86] improve CMOV codegen by pushing add into operands This is not the transform direction we want in general, but by the time we have a CMOV, we've already tried everything else that could be better. The transform increases the uses of the other add operand, but that is safe according to Alive2: https://alive2.llvm.org/ce/z/Yn6p-A We could probably extend this to other binops (not just add). This is the motivating pattern discussed in: https://llvm.org/PR51069 The test with i8 shows a missed fold because there's a trunc sitting in front of the add. That can be handled with a small follow-up. Differential Revision: https://reviews.llvm.org/D106607	2021-07-23 09:39:32 -04:00
Sanjay Patel	028eb43654	[x86] add tests for add X, (cmov constants); NFC	2021-07-23 09:39:32 -04:00
Simon Pilgrim	71d0fd3564	[X86][AVX] lowerV2X128Shuffle - attempt to recognise broadcastf128 subvector load As noticed on PR50053 we were failing to recognise when a shuffle of a load was really a subvector broadcast load	2021-07-23 13:10:38 +01:00
Craig Topper	f26ac73fa9	[X86] Add test case simplified from PR51175. NFC	2021-07-22 23:22:39 -07:00
Tianqing Wang	bec4a8157d	[X86] Update MachineLoopInfo in CMOV conversion. If a CMOV is in a loop and is converted to branches, CMOV conversion wouldn't add newly created basic blocks to loop info. Since the candidates is collected based on loops, instructions in these basic blocks will be ignored. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D104623	2021-07-21 10:53:46 +08:00
Jon Roelofs	be8738324c	[MachineVerifier] Diagnose invalid INSERT_SUBREGs Differential revision: https://reviews.llvm.org/D105953	2021-07-20 17:32:29 -07:00
Fangrui Song	3924877932	[IR] Rename `comdat noduplicates` to `comdat nodeduplicate` In the textual format, `noduplicates` means no COMDAT/section group deduplication is performed. Therefore, if both sets of sections are retained, and they happen to define strong external symbols with the same names, there will be a duplicate definition linker error. In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`. The name describes the corollary instead of the immediate semantics. The name can cause confusion to other binary formats (ELF, wasm) which have implemented/ want to implement the "no deduplication" selection kind. Rename it to be clearer. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D106319	2021-07-20 12:47:10 -07:00
Roman Lebedev	5b51bd1878	[TLI] prepareSREMEqFold(): use correct VT for the final VSELECT (PR51133) We were using the wrong VT for this final VSELECT, it should be in the final comparison VT, not the source value's VT. Fixes https://bugs.llvm.org/show_bug.cgi?id=51133	2021-07-19 16:44:00 +03:00
Simon Pilgrim	754b1cd713	[X86][SSE] Fix copy+paste typo in dot3_float4_as_float3 partial load test	2021-07-19 11:50:30 +01:00
Simon Pilgrim	fe494fafa9	[X86][SSE] Add codegen tests dot2/3 dot product of 128-bit dereferenceable float data Based off the codegen reports on PR51075 - hopefully we can handle some of this in SLP or VectorCombine, but we usually have to leave load combining until the backend so at least some of these patterns will still appear even then.	2021-07-19 10:44:25 +01:00
Eli Friedman	6601be4419	[X86] Remove incorrect use of known bits in shuffle simplification. This reverts commit 2a419a0b9957ebac9e11e4b43bc9fbe42a9207df. The result of a shufflevector must not propagate poison from any element other than the one noted in the shuffle mask. The regressions outside of fptoui-may-overflow.ll can probably be recovered some other way; for example, using isGuaranteedNotToBePoison. See discussion on https://reviews.llvm.org/D106053 for more background. Differential Revision: https://reviews.llvm.org/D106222	2021-07-18 18:13:11 -07:00
Simon Pilgrim	fd7a54c709	[DAG] DAGCombiner::foldSelectOfBinops - propagate the common flags to the merged binop As discussed on D106058 - we were failing to keep the common flags. This matches the behaviour in InstCombinerImpl::foldSelectOpOp.	2021-07-18 18:38:59 +01:00
Simon Pilgrim	5643be96bc	[DAG] Enable foldSelectOfBinops on select(setcc(),binop(),binop()) calls	2021-07-18 18:38:59 +01:00
Simon Pilgrim	3a1b38049a	[X86] Add i32 (shl (sr[la] exact sel(X,Y), C1), C2) test Shows failure to fold sel(sra(X,C1),sra(Y,C1)) -> sra(sel(X,Y),C1) (and to retain the flags)	2021-07-18 16:48:57 +01:00
Simon Pilgrim	51a12d2ff0	[X86][SSE] matchShuffleWithPACK - avoid poison pollution from bitcasting multiple elements together. D106053 exposed that we've not been taking into account that by bitcasting smaller elements together and then performing a ComputeKnownBits on the result we'd be allowing a poison element to influence other neighbouring elements being used in the pack. Instead we now peek through any existing bitcast to ensure that the source type already matches the width source of the pack node we're trying to match. This has also been a chance to stop matchShuffleWithPACK creating unused nodes on the fly which could affect oneuse tests during shuffle lowering/combining. The only regression we're seeing is due to being unable to peek through a bitcast as its on the other side of a extract_subvector - which should go away once we finally allow shuffle combining across different vector widths (by making matchShuffleWithPACK using const SelectionDAG& we've gotten closer to this - see PR45974).	2021-07-18 14:25:28 +01:00
Simon Pilgrim	d2458bcdc6	[X86][SSE] combineX86ShufflesRecursively - bail if constant folding fails due to oneuse limits. Fixes issue reported on D105827 where a single shuffle of a constant (with multiple uses) was caught in an infinite loop where one shuffle (UNPCKL) used an undef arg but then that got recombined to SHUFPS as the constant value had its own undef that confused matching.....	2021-07-16 19:21:46 +01:00
Guozhi Wei	5609c8b607	[X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D104684	2021-07-16 10:16:03 -07:00
Jon Roelofs	6c40abb6fe	Revert "[MachineVerifier] Diagnose invalid INSERT_SUBREGs" This reverts commit dd57ba1a17b93dbe211d04cb2d4de5f6dc898d60. It broke some tests: http://45.33.8.238/linux/51314/step_12.txt	2021-07-16 09:53:55 -07:00
Simon Pilgrim	52cd0c5a8d	[X86] Regenerate twoaddr-lea.ll test checks.	2021-07-16 17:43:36 +01:00
Jon Roelofs	dd57ba1a17	[MachineVerifier] Diagnose invalid INSERT_SUBREGs Differential revision: https://reviews.llvm.org/D105953	2021-07-16 09:43:12 -07:00
Matt Arsenault	e91da668d0	GlobalISel: Track argument pointeriness with arg flags Since we're still building on top of the MVT based infrastructure, we need to track the pointer type/address space on the side so we can end up with the correct pointer LLTs when interpreting CCValAssigns.	2021-07-15 19:11:40 -04:00
Harald van Dijk	a8ad917054	[X86] Fix handling of maskmovdqu in X32 The maskmovdqu instruction is an odd one: it has a 32-bit and a 64-bit variant, the former using EDI, the latter RDI, but the use of the register is implicit. In 64-bit mode, a 0x67 prefix can be used to get the version using EDI, but there is no way to express this in assembly in a single instruction, the only way is with an explicit addr32. This change adds support for the instruction. When generating assembly text, that explicit addr32 will be added. When not generating assembly text, it will be kept as a single instruction and will be emitted with that 0x67 prefix. When parsing assembly text, it will be re-parsed as ADDR32 followed by MASKMOVDQU64, which still results in the correct bytes when converted to machine code. The same applies to vmaskmovdqu as well. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103427	2021-07-15 22:56:08 +01:00
Simon Pilgrim	0aece73aba	[DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z)) Similar to the folds performed in InstCombinerImpl::foldSelectOpOp, this attempts to push a select further up to help merge a pair of binops. I'm primarily interested in select(cond,add(x,y),add(x,z)) folds to help expose pointer math (see https://bugs.llvm.org/show_bug.cgi?id=51069 etc.) but I've tried to use the more generic isBinOp(). Differential Revision: https://reviews.llvm.org/D106058	2021-07-15 16:08:30 +01:00
Tim Northover	5d7632ee72	MachO: don't emit L... private symbols in do_not_dead_strip sections. The linker can sometimes drop the do_not_dead_strip if it can't associate the atom with a symbol (the other place to specify no dead-stripping in MachO files).	2021-07-15 14:40:43 +01:00
Max Kazantsev	69a3acffdf	[Test] We can benefit from pipelining of ymm load/stores This patch demonstrates a scenario when we need to load/store a single 64-byte value, which is done by 2 ymm loads and stores in AVX. The current codegen choses the following sequence: load ymm0 load ymm1 store ymm1 store ymm0 If we instead stored ymm0 before ymm1, we could execute 2nd load and 1st store in parallel.	2021-07-15 17:15:14 +07:00
Philip Reames	e75a2dfe20	[tests] Stablize tests for possible change in deref semantics There's a potential change in dereferenceability attribute semantics in the nearish future. See llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree" and D99100 for context. This change simply adds appropriate attributes to tests to keep transform logic exercised under both old and new/proposed semantics. Note that for many of these cases, O3 would infer exactly these attributes on the test IR. This change handles the idiomatic pattern of a dereferenceable object being passed to a call which can not free that memory. There's a couple other tests which need more one-off attention, they'll be handled in another change.	2021-07-14 13:05:43 -07:00
Djordje Todorovic	df686842bc	[RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs This new MIR pass removes redundant DBG_VALUEs. After the register allocator is done, more precisely, after the Virtual Register Rewriter, we end up having duplicated DBG_VALUEs, since some virtual registers are being rewritten into the same physical register as some of existing DBG_VALUEs. Each DBG_VALUE should indicate (at least before the LiveDebugValues) variables assignment, but it is being clobbered for function parameters during the SelectionDAG since it generates new DBG_VALUEs after COPY instructions, even though the parameter has no assignment. For example, if we had a DBG_VALUE $regX as an entry debug value representing the parameter, and a COPY and after the COPY, DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets rewritten into $regX, we'd end up having redundant DBG_VALUE. This breaks the definition of the DBG_VALUE since some analysis passes might be built on top of that premise..., and this patch tries to fix the MIR with the respect to that. This first patch performs bacward scan, by trying to detect a sequence of consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one variable but the last one: For example: (1) DBG_VALUE $edi, !"var1", ... (2) DBG_VALUE $esi, !"var2", ... (3) DBG_VALUE $edi, !"var1", ... ... in this case, we can remove (1). By combining the forward scan that will be introduced in the next patch (from this stack), by inspecting the statistics, the RemoveRedundantDebugValues removes 15032 instructions by using gdb-7.11 as a testbed. Differential Revision: https://reviews.llvm.org/D105279	2021-07-14 04:29:42 -07:00
Simon Pilgrim	ee71c1bbcc	[X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction. We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs (and for FP_TO_UINT, negative float values are undefined). We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic: small := CVTTPS2SI(x); fp_to_ui(x) := small \| (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31)) Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering. @TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded). Original Patch by @TomHender (Tom Hender) Differential Revision: https://reviews.llvm.org/D89697	2021-07-14 12:03:49 +01:00
Simon Pilgrim	3cee36c5ac	[X86][SSE] X86ISD::FSETCC nodes (cmpss/cmpsd) return a 0/-1 allbits signbits result (REAPPLIED) Annoyingly, i686 cmpsd handling still fails to remove the unnecessary neg(and(x,1)) Reapplied rGe4aa6ad13216 with fix for intrinsic variants of the opcode which uses a vector return type	2021-07-13 12:31:09 +01:00
Simon Pilgrim	afdae7c5d7	[X86][SSE] Add signbit tests to show cmpss/cmpsd intrinsics not recognised as 'allbits' results. This adds test coverage for the crash reported on rGe4aa6ad13216	2021-07-13 11:25:52 +01:00
Vitaly Buka	606551ee98	Revert "[X86][SSE] X86ISD::FSETCC nodes (cmpss/cmpsd) return a 0/-1 allbits signbits result" Fails here https://lab.llvm.org/buildbot/#/builders/37/builds/5267 This reverts commit e4aa6ad132164839a4a97dff0d433ea4766f77f1.	2021-07-12 22:26:54 -07:00
Craig Topper	d5c97f4bf0	[X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack. There are some calls to functions like `__alloca` that are missing a regmask operand. Lack of a regmask operand means that all registers that aren't mentioned by def operands are preserved. __alloca only updates EAX and ESP and has def operands for them so this is ok. Because there is no regmask the register allocator won't spill the FP registers across the call. Assuming we want to keep the FP stack untoched across these calls, we need to handle this is in the FP stackifier. We might want to add a proper regmask operand to the code that creates these calls to indicate all registers are preserved, but we'd still need this change to the FP stackifier to know to preserve the FP stack for such a regmask. The test is kind of long, but bugpoint wasn't able to reduce it any further. Fixes PR50782 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D105762	2021-07-12 10:15:38 -07:00
Simon Pilgrim	e4aa6ad132	[X86][SSE] X86ISD::FSETCC nodes (cmpss/cmpsd) return a 0/-1 allbits signbits result Annoyingly, i686 cmpsd handling still fails to remove the unnecessary neg(and(x,1))	2021-07-12 09:56:59 +01:00
Simon Pilgrim	99718d5377	[X86][SSE] Add signbit tests to show cmpss/cmpsd ops not recognised as 'allbits' results.	2021-07-12 09:41:10 +01:00
Simon Pilgrim	a328ee6577	[X86] Add tests from D93707 for fsub_strict(x,fneg(y)) -> fadd_strict(x,y) folds. Also, add matching i686 coverage to strict-fadd-combines.ll and regenerate checks.	2021-07-10 15:08:58 +01:00
Nico Weber	97c675d3d4	Revert "Revert "Temporarily do not drop volatile stores before unreachable"" This reverts commit 52aeacfbf5ce5f949efe0eae029e56db171ea1f7. There isn't full agreement on a path forward yet, but there is agreement that this shouldn't land as-is. See discussion on https://reviews.llvm.org/D105338 Also reverts unreviewed "[clang] Improve `-Wnull-dereference` diag to be more in-line with reality" This reverts commit f4877c78c0fc98be47b926439bbfe33d5e1d1b6d. And all the related changes to tests: This reverts commit 9a0152799f8e4a59e0483728c9f11c8a7805616f. This reverts commit 3f7c9cc27422f7302cf5a683eeb3978e6cb84270. This reverts commit 329f8197ef59f9bd23328b52d623ba768b51dbb2. This reverts commit aa9f58cc2c48ca6cfc853a2467cd775dc7622746. This reverts commit 2df37d5ddd38091aafbb7d338660e58836f4ac80. This reverts commit a72a44181264fd83e05be958c2712cbd4560aba7.	2021-07-09 11:44:34 -04:00

1 2 3 4 5 ...

17044 Commits