llvm-project

Author	SHA1	Message	Date
Philip Reames	e6ad9ef4e7	[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order. I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions. We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause. Differential Revision: https://reviews.llvm.org/D115387	2021-12-13 16:56:22 -08:00
Alexey Bataev	ddce6e0561	[SLP]Improve vectorization of cmp instructions sequences. Final attempt to vectorize bundles of comptatible cmp instructions after all other instructions processing. Metric: SLP.NumVectorInstructions Program results results0 diff test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 1.00 5.00 400.0% test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test 8.00 11.00 37.5% test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test 20.00 26.00 30.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1344.00 1648.00 22.6% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1344.00 1648.00 22.6% test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test 102.00 124.00 21.6% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 118.00 133.00 12.7% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3233.00 3554.00 9.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3233.00 3554.00 9.9% test-suite :: MultiSource/Benchmarks/Olden/power/power.test 64.00 70.00 9.4% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 7879.00 8604.00 9.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/simulator/simulator.test 50.00 54.00 8.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 27.00 29.00 7.4% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 8345.00 8955.00 7.3% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 694.00 738.00 6.3% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 361.00 382.00 5.8% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 409.00 430.00 5.1% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 140.00 147.00 5.0% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 140.00 147.00 5.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4013.00 4206.00 4.8% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 966.00 1011.00 4.7% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 65.00 68.00 4.6% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4219.00 4381.00 3.8% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1911.00 1973.00 3.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 62.00 64.00 3.2% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 62.00 64.00 3.2% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 852.00 877.00 2.9% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 852.00 877.00 2.9% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1624.00 1668.00 2.7% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 39.00 40.00 2.6% test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 613.00 624.00 1.8% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 378.00 383.00 1.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 293.00 295.00 0.7% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 297.00 299.00 0.7% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5522.00 5534.00 0.2% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5522.00 5534.00 0.2% Differential Revision: https://reviews.llvm.org/D114799	2021-12-01 07:26:29 -08:00
Alexey Bataev	ce14d1b690	[SLP]Do not reorder reduction nodes. The final reduction nodes should not be reordered, the order does not matter for reductions. Also, it might be profitable to vectorize smaller reduction trees, reduction cost may compensate small tree cost. Part of D111574 Differential Revision: https://reviews.llvm.org/D112467	2021-10-26 07:41:24 -07:00
Roman Lebedev	d7378259aa	[SimplifyCFG] SimplifyCondBranchToTwoReturns(): really only deal with different ret blocks This function is called when some predecessor of an empty return block ends with a conditional branch, with both successors being empty ret blocks. Now, because of the way SimplifyCFG works, it might happen to simplify one of the blocks in a way that makes a conditional branch into an unconditional one, since it's destinations are now identical, but it might not have actually simplified said conditional branch into an unconditional one yet. So, we have to check that ourselves first, especially now that SimplifyCFG aggressively tail-merges all ret and resume blocks. Even if it was an unconditional branch already, `SimplifyCFGOpt::simplifyReturn()` doesn't call `FoldReturnIntoUncondBranch()` by default.	2021-07-23 00:36:59 +03:00
Roman Lebedev	9c4c2f2472	[SimplifyCFG] Tail-merging all blocks with `ret` terminator Based ontop of D104598, which is a NFCI-ish refactoring. Here, a restriction, that only empty blocks can be merged, is lifted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104597	2021-06-24 13:15:39 +03:00
Juneyoung Lee	8a156d1c27	[InstCombine] Fully disable select to and/or i1 folding This is a patch that disables the poison-unsafe select -> and/or i1 folding. It has been blocking D72396 and also has been the source of a few miscompilations described in llvm.org/pr49688 . D99674 conditionally blocked this folding and successfully fixed the latter one. The former one was still blocked, and this patch addresses it. Note that a few test functions that has `_logical` suffix are now deoptimized. These are created by @nikic to check the impact of disabling this optimization by copying existing original functions and replacing and/or with select. I can see that most of these are poison-unsafe; they can be revived by introducing freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll) because I think they are good targets for freeze fix. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101191	2021-05-06 09:29:52 +09:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Juneyoung Lee	278aa65cc4	[IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93793	2020-12-30 04:21:04 +09:00
Roman Lebedev	897c985e1e	[InstCombine] Canonicalize SPF to abs intrinsic This patch enables canonicalization of SPF_ABS and SPF_ABS to the abs intrinsic. This is a recommit, the original try was 05d4c4ebc2fb006b8a2bd05b24c6aba10dd2eef8, but it was reverted due to an apparent miscompile, which since then has just been fixed by the previous commit. Differential Revision: https://reviews.llvm.org/D87188	2020-12-18 21:18:14 +03:00
Nikita Popov	20b386aae0	[LoopUtils] Fix neutral value for vector.reduce.fadd Use -0.0 instead of 0.0 as the start value. The previous use of 0.0 was fine for all existing uses of this function though, as it is always generated with fast flags right now, and thus nsz.	2020-10-29 21:45:13 +01:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Nikita Popov	13e19d2e7c	Revert "[InstCombine] Canonicalize SPF_ABS to abs intrinc" This reverts commit 05d4c4ebc2fb006b8a2bd05b24c6aba10dd2eef8. mstorsjo reports a miscompile after this change in https://reviews.llvm.org/D87188#2281093. Reverting until I can investigate this.	2020-09-18 09:38:26 +02:00
Nikita Popov	05d4c4ebc2	[InstCombine] Canonicalize SPF_ABS to abs intrinc Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic. To be conservative, the one-use check on the comparison is retained, this may be relaxed if all goes well. It's pretty likely that this will uncover places that missing handling for the abs() intrinsic. Please report any seen performance regressions. Differential Revision: https://reviews.llvm.org/D87188	2020-09-17 22:28:34 +02:00
Sanjay Patel	b6315aee5b	[VectorCombine] try to form vector compare and binop to eliminate scalar ops binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1) --> vcmp = cmp Pred X, VecC ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0 This is a larger pattern than the existing extractelement folds because we can't reasonably vectorize the sub-patterns with constants based on cost model calcs (it doesn't usually make sense to replace a single extracted scalar op with constant operand with a vector op). I salvaged as much of the existing logic as I could, but there might be better ways to share and reduce code. The motivating case from PR43745: https://bugs.llvm.org/show_bug.cgi?id=43745 ...is the special case of a 2-way reduction. We tried to get SLP to handle that particular pattern in D59710, but that caused crashing and regressions. This patch is more general, but hopefully safer. The v2f64 test with SSE2 surprised me - the cost model accounting looks like this: OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4 NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5 Differential Revision: https://reviews.llvm.org/D82474	2020-06-29 10:38:52 -04:00
Sanjay Patel	2f3549f813	Revert "[VectorCombine] add test for scalable vectors; NFC" This reverts commit 700ec6b848c02ca3de9751d63a7a5a26671c3fe9. An extra test diff snuck here.	2020-06-28 12:43:11 -04:00
Sanjay Patel	700ec6b848	[VectorCombine] add test for scalable vectors; NFC	2020-06-28 12:42:00 -04:00
Sanjay Patel	a809cea68c	[PhaseOrdering] add test for missed vectorization; NFC (PR43745) Either SLP or VectorCombine should be able to form vector compares reliably on this example.	2020-06-23 11:57:32 -04:00
Sanjay Patel	8953ecf22b	[InstCombine] reassociate diff of sums into sum of diffs This is the integer sibling to D81491. (a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) --> (a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3]) Removing the "experimental" from these intrinsics is likely not too far away.	2020-06-22 20:47:09 -04:00
Sanjay Patel	de65b356dc	[VectorCombine] add/use pass-level IRBuilder This saves creating/destroying a builder every time we perform some transform. The tests show instruction ordering diffs resulting from always inserting at the root instruction now, but those should be benign.	2020-06-22 09:01:29 -04:00
Sanjay Patel	cce625f73d	[VectorCombine] improve IR debugging by providing/salvaging value names The tests are regenerated to show the diffs, but there should be no functional change from this patch.	2020-06-22 08:35:47 -04:00
Sanjay Patel	b5fb26951a	[InstCombine] reassociate FP diff of sums into sum of diffs (a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) --> (a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3]) This should be the last step in solving PR43953: https://bugs.llvm.org/show_bug.cgi?id=43953 We started emitting reduction intrinsics with: D80867/ rGe50059f6b6b3 So it's a relatively easy pattern match now to re-order those ops. Also, I have not seen any complaints for the switch to intrinsics yet, so I'll propose to remove the "experimental" tag from the intrinsics soon. Differential Revision: https://reviews.llvm.org/D81491	2020-06-14 09:09:03 -04:00
Sanjay Patel	e50059f6b6	[x86] form reduction intrinsics from vectorizers instead of raw IR Motivating examples are seen in the PhaseOrdering tests based on: https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we have intrinsics there, some pass can fold them. The intrinsics are still named "experimental" at this point, but if there is no fallout from this patch, that will be a good indicator that it is safe to finalize them. Differential Revision: https://reviews.llvm.org/D80867	2020-06-05 12:38:49 -04:00
Sanjay Patel	22c4c6dd38	[PhaseOrdering] add tests for reductions; NFC (PR43953)	2020-06-05 12:38:49 -04:00
Sanjay Patel	6438ea45e0	[VectorCombine] position pass after SLP in the optimization pipeline rather than before There are 2 known problem patterns shown in the test diffs here: vector horizontal ops (an x86 specialization) and vector reductions. SLP has greater ability to match and fold those than vector-combine, so let SLP have first chance at that. This is a quick fix while we continue to improve vector-combine and possibly canonicalize to reduction intrinsics. In the longer term, we should improve matching of these patterns because if they were created in the "bad" forms shown here, then we would miss optimizing them. I'm not sure what is happening with alias analysis on the addsub test. The old pass manager now shows an extra line for that, and we see an improvement that comes from SLP vectorizing a store. I don't know what's missing with the new pass manager to make that happen. Strangely, I can't reproduce the behavior if I compile from C++ with clang and invoke the new PM with "-fexperimental-new-pass-manager". Differential Revision: https://reviews.llvm.org/D80236	2020-05-22 12:22:44 -04:00
Sanjay Patel	81e9ede3a2	[VectorCombine] forward walk through instructions to improve chaining of transforms This is split off from D79799 - where I was proposing to fully iterate over a function until there are no more transforms. I suspect we are still going to want to do something like that eventually. But we can achieve the same gains much more efficiently on the current set of regression tests just by reversing the order that we visit the instructions. This may also reduce the motivation for D79078, but we are still not getting the optimal pattern for a reduction.	2020-05-16 13:08:01 -04:00
Sanjay Patel	43017ceb78	[PhaseOrdering] add vector reduction tests; NFC These are based on tests originally included in: D79078	2020-05-16 12:51:10 -04:00

26 Commits