llvm-project

Author	SHA1	Message	Date
Arthur Eubanks	9c6250ee41	Revert "[SLP] Schedule only sub-graph of vectorizable instructions" This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d. Causes a miscompile, see comments on D118538. Required updating bottom-to-top-reorder.ll.	2022-03-01 17:31:16 -08:00
Arthur Eubanks	6987ac7903	Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]" This reverts commit a3e9b32c00959ad5c73189d8378d019fbe80ade5. Required for reverting D118538.	2022-03-01 17:28:52 -08:00
Philip Reames	8cb0ac5825	[SLP] Check invariant that all instructions in bundle are in same block [NFC]	2022-02-28 13:17:44 -08:00
Alexey Bataev	e4b9640867	[SLP]Improve bottom-to-top reordering. Currently bottom-to-top reordering analysis counts orders of the operands and then adds natural order counts for the operand users. It is very conservative, this the user nodes themselves may require reordering. Patch improves bottom-to-top analysis by checking for the user nodes if they require/allows the reordring. If the user node must be reordered, has reused scalars, is an alternate op vectorization node, is a non-ordered gather node or may allow reordering because of the reordered operands, such node is considered as the node that allows reodring and is not counted as a node with the natural order. Differential Revision: https://reviews.llvm.org/D120492	2022-02-28 06:48:46 -08:00
Philip Reames	319265328c	[SLP] Remove field unused after 33ce97f to silence buildbots [NFC]	2022-02-27 10:18:10 -08:00
Philip Reames	33ce97f413	[SLP] Use BatchAA to reduce capture analysis cost [NFC] SLP makes very heavy use of aliasing queries to construct pointer dependencies for scheduling purposes. AA internally usings pointerMayBeCaptured to prove some noalias results. In a local profile, we were spending about 4% of total O2 time in capture tracking. By using BatchAA interface - which caches capture results - this drops to 2%. Note that there is no invalidation of BatchAA here. This assumes that no transformation done by SLP invalidates alias or capture results. This is the same assumption made by the existing AliasCache, so this is not a new assumption in the code.	2022-02-27 09:47:24 -08:00
Evgeniy Brevnov	10e99eb7e4	[SLP] "Normal" instructions should not go between PHI and Lading pad Currently, SLP can insert "shuffle" instruction beween PHI and Landing pad instruction. The problem is demonstrated by LIT test. The solution is to adjust insertion point once we are done with PHI generation. Differential Revision: https://reviews.llvm.org/D120552	2022-02-26 11:44:26 +07:00
Vasileios Porpodas	4bbc3290a2	[SLP] Fix for the min/max intrinsic cost. The min/max intrinsic cost is currently too low because in the cost calculation we subtract the cost of the vector compare as we will not emit it. For the cost of the vector compare we are currently passing BAD_ICMP_PREDICATE which returns 3, the worst case cost. I think we should be passing VecPred instead, since we know the predicates of the compare instr. I think this is related to commit b3b993a7ad817 which introduced the predicate argument to getCmpSelInstrCost(). https://reviews.llvm.org/rGb3b993a7ad817c3c5801341fa78f34332900eb83 Differential Revision: https://reviews.llvm.org/D120439	2022-02-24 18:08:40 -08:00
Philip Reames	ed54296ea3	[SLP] Fastpath instructions not in block being scheduled [nfc]	2022-02-23 13:51:36 -08:00
Philip Reames	a4541fdfe4	[SLP] Replace a impossible branch condition with an assert [NFC] An entire bundle must be inside the scheduling window. Assert that this property holds as opposed to checking it at runtime.	2022-02-23 13:43:45 -08:00
Philip Reames	9a40f9f681	{SLP] Make it clear ScheduleDataMap is keyed by instructions [NFC]	2022-02-23 13:31:36 -08:00
Philip Reames	9392c0d4ef	Revert "[SLP] Remove cap on schedule window size" This reverts commit 6adf4b039e095224edbbecda5972e5e3353b53b6. Reverting while investigating https://github.com/llvm/llvm-project/issues/54029	2022-02-23 13:12:07 -08:00
Philip Reames	a83441e8cd	Revert "[SLP] Simplify extendSchedulingRegion" This reverts commit 8c85f3a0523070ef656e30e368df0a679c1400cd.	2022-02-23 13:12:07 -08:00
Philip Reames	222e8610f1	[SLP] Rearrange fields in ScheduleData for density [NFC]	2022-02-23 12:33:43 -08:00
Philip Reames	a3e9b32c00	[SLP] Remove SchedulingPriority from ScheduleData [NFC] First step in trying to shrink the memory footprint of ScheduleData to improve cache locality.	2022-02-23 11:43:46 -08:00
Philip Reames	8c85f3a052	[SLP] Simplify extendSchedulingRegion This change uses instruction's comesBefore method to simplify the code significantly. There's little compile time concern here because getSpillCost already calls comesBefore on every basic block which contains a vectorization candidate. The only additional times we'll build basic block ordering is when we can't schedule a vector candidate anywhere in the containing block. Differential Revision: https://reviews.llvm.org/D120364	2022-02-23 11:23:38 -08:00
Philip Reames	6adf4b039e	[SLP] Remove cap on schedule window size This cap was first added in 848c1aa45 (back in 2015). Per the original commit message, the purpose was to avoid a compile time explosion in long basic blocks. The algorithmic problem in scheduling has now been fixed in 0539a26d. In the meantime, the code has rotten fairly badly. Some intermediate refactoring caused the size to only be incremented if both iterators advance in the window search. This causes the size to be badly undercounted when near one end of a basic block. We no longer have any test which exercises the logic in an intentional way; there's one test which differs with this change, but the changes appear fairly orthoganol to the purpose of the test file. Unfortunately, we no longer have the original motivating example, so it's possible that it also hits some other issue. I tested locally with a large example, but even at it's worst, that one doesn't demonstrate anything too extreme even without the algorithmic fix. It's clearly faster with, but only by ~20% which doesn't seem in line with the original commit message. If regressions with this patch are seen, please file a bug and I'll try to fix any other algorithmic problems which fall out.	2022-02-23 08:27:45 -08:00
Brendon Cahoon	3cc15e2cb6	[SLP] Fix assert from non-constant index in insertelement A call to getInsertIndex() in getTreeCost() is returning None, which causes an assert because a non-constant index value for insertelement was not expected. This case occurs when the insertelement index value is defined with a PHI. Differential Revision: https://reviews.llvm.org/D120223	2022-02-22 15:57:14 -06:00
Philip Reames	8612b11c86	[SLP] Use isInSchedulingRegion consistently [NFC]	2022-02-22 10:27:16 -08:00
Philip Reames	0539a26d91	[SLP] Schedule only sub-graph of vectorizable instructions SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-02-22 10:15:55 -08:00
Philip Reames	3ad0bdae8f	[SLP] Address post commit comment from 2e50760	2022-02-18 10:57:15 -08:00
Alexey Bataev	b0a0df9809	[SLP]Fix vectorization of the alternate cmp instruction with swapped predicates. If the alternate cmp instruction is a swapped predicate of the main cmp instruction, need to generate alternate instruction, not the one with the swapped predicate. Also, the lane with the alternate opcode should be selected only, if the corresponding operands are not compatible. Correctness confirmed: https://alive2.llvm.org/ce/z/94BG66 Differential Revision: https://reviews.llvm.org/D119855	2022-02-18 04:27:45 -08:00
Alexey Bataev	d1cd64ffdd	[SLP][NFC]Fix misprint in function name, NFC.	2022-02-17 05:57:51 -08:00
Arthur Eubanks	826fae51d2	[SLPVectorizer][OpaquePtrs] Check GEP source element type Fixes a miscompile with opaque pointers. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119980	2022-02-16 14:47:20 -08:00
Philip Reames	2e50760775	[SLP] Add assert that entities are scheduled as expected Requested in D118538	2022-02-15 12:21:49 -08:00
Anton Afanasyev	b7574b092a	[SLP] Don't try to vectorize pair with insertelement Particularly this breaks vectorization of insertelements where some of intermediate (i.e. not last) insertelements are used externally. Fixes PR52275 Fixes #51617 Differential Revision: https://reviews.llvm.org/D119679	2022-02-15 16:12:59 +03:00
Anton Afanasyev	954ea0f044	[SLP] Simplify indices processing for insertelements Get rid of non-constant and undef indices of insertelements at `buildTree()` stage. Fix bugs. Differential Revision: https://reviews.llvm.org/D119623	2022-02-14 14:50:44 +03:00
Anton Afanasyev	cd685f5736	[NFC][SLP] Set default parameter for Offset equal to zero	2022-02-11 17:22:33 +03:00
Alexey Bataev	370ea1a199	[SLP][NFC]Fix comment, NFC.	2022-02-09 07:14:14 -08:00
Djordje Todorovic	afd54e1ed1	[SLPVectorizer] Fix "unused variable" build warning	2022-02-07 10:38:19 +01:00
Benjamin Kramer	ce9417348e	[SLP] Skip a DenseSet<unsigned> -> bit vector conversion. NFCI.	2022-02-06 00:57:47 +01:00
Philip Reames	0cc6165d05	[SLP] Strengthen internal asserts about scheduled node state [NFC] All members of a scheduled bundle must have valid dependencies, with no unscheduled ones, and only the lead element gets marked scheduled.	2022-02-04 12:22:52 -08:00
Philip Reames	f3f8e3da9f	[SLP] Remove ScheduleData::UnscheduledDepsInBundle field [NFC-ish] We can simply compute the value of this field on demand. Doing so clarifies the behavior when one of the instructions within a bundle doesn't have valid dependencies. I vaguely thing this could change behavior slightly, but none of the test cases are affected, and my attempts to write one by hand have failed. This also minorly reduces memory usage, but that's a secondary value at best.	2022-02-04 10:12:09 -08:00
Philip Reames	bb9964ba43	[SLP] Have only ready items in ready list [NFC] This adds the assertion that all items in the ready list are in-fact scheduleable entities ready to be scheduled. This involves changing the ReadyInsts structure to be a set, and fixing a couple places where we left nodes on the list when they were no longer ready.	2022-02-03 19:49:24 -08:00
Philip Reames	2cbc92fb11	[SLP] Strengthen internal invariant assertions slightly This builds on the invariant checks introduced in 1519629, and adds a couple more than seem to hold without additional work.	2022-02-03 14:56:39 -08:00
Philip Reames	1519629a20	[SLP] Add basic self consistency asserts into scheduling The idea here is to have a verify routine we can call during scheduling to ensure broken invariants are reported. The intent is to help in debugging scheduling bugs. At the moment, only the most basic properties are checked as adding several I thought held reported failures.	2022-02-03 13:27:35 -08:00
Philip Reames	6d0c007bc1	[SLP] Fix a typo in comment	2022-02-03 09:11:47 -08:00
Alexey Bataev	802ceb8343	[SLP]Excluded external uses from the reordering estimation. Compiler adds the estimation for the external uses during operands reordering analysis, which makes it tend to prefer duplicates in the lanes rather than diamond/shuffled match in the graph. It changes the sizes of the vector operands and may prevent some vectorization. We don't need this kind of estimation for the analysis phase, because we just need to choose the most compatible instruction and it does not matter if it has external user or used in the non-matching lane. Instead, we count the number of unique instruction in the lane and see if the reassociation changes the number of unique scalars to be power of 2 or not. If we have power of 2 unique scalars in the lane, it is considered more profitable rather than having non-power-of-2 number of unique scalars. Metric: SLP.NumVectorInstructions test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test 70.00 86.00 22.9% test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test 346.00 353.00 2.0% test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test 346.00 353.00 2.0% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 235.00 239.00 1.7% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 235.00 239.00 1.7% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8723.00 8834.00 1.3% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00 1.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00 1.1% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9184.00 0.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00 0.3% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4235.00 4245.00 0.2% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00 0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 783.00 782.00 -0.1% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 69.00 68.00 -1.4% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 207.00 192.00 -7.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 89.00 80.00 -10.1% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 89.00 80.00 -10.1% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 260.00 215.00 -17.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 256.00 211.00 -17.6% MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same. SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by one <4 x> load. External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and not inlined anymore. External/SPEC/CINT2017rate/541.leela_r - same xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in multi-block tree, the result is pretty the same. External/SPEC/CINT2017speed/631.deepsjeng_s - same. MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same as before. MultiSource/Benchmarks/MiBench/consumer-jpeg - same. Differential Revision: https://reviews.llvm.org/D116688	2022-02-03 06:50:06 -08:00
Alexey Bataev	ad2a0ccf8f	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-03 06:24:10 -08:00
Alexey Bataev	8a1dfbc4d8	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit 842a2360a84692f2e4c37cc3e652640e6627d004 to fix the bugs reported by users in https://reviews.llvm.org/D115955#3291538.	2022-02-02 12:06:36 -08:00
Alexey Bataev	842a2360a8	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-02 10:32:52 -08:00
Benjamin Kramer	0c3d22a592	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit 83620bd2ad867f706c699d0f2b8be10e43d9f3d7. It's causing miscompilations, see review comments at https://reviews.llvm.org/D115955	2022-02-02 13:08:51 +01:00
Alexey Bataev	83620bd2ad	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-02-01 09:54:20 -08:00
Benjamin Kramer	5281f0dab2	Revert "[SLP]Alternate vectorization for cmp instructions." This reverts commit afaaecc88c6e5989de8a6a0266610860ef99d9d6. Crashes when compiling SciPy, test case https://reviews.llvm.org/P8276	2022-02-01 11:40:43 +01:00
Alexey Bataev	afaaecc88c	[SLP]Alternate vectorization for cmp instructions. Added support for alternate ops vectorization of the cmp instructions. It allows to vectorize either cmp instructions with same/swapped predicate but different (swapped) operands kinds or cmp instructions with different predicates and compatible operands kinds. Differential Revision: https://reviews.llvm.org/D115955	2022-01-31 11:11:25 -08:00
Philip Reames	6888081e32	[SLP] Use moveBefore to simplify code [NFC]	2022-01-28 12:44:07 -08:00
Philip Reames	746e435ff7	Revert "[SLP] Add a clarifying assert in block scheduling [NFC]" This reverts commit db49a78900f5e4b59714565876b5dbb5e2dfe840. The reasoning in the patch applied to a downstream branch, and I got myself confused when trying to split apart pieces. Thankfully, the assert was simply weaker than the actual invariant currently upstream which is that ReadyInsts is not empty.	2022-01-28 12:10:31 -08:00
Philip Reames	db49a78900	[SLP] Add a clarifying assert in block scheduling [NFC] The fact we could have a block with a valid scheduling window, but nothing to schedule was surprising to me. After digging through the code, this can only happen if we don't find anything to directly vectorize. However, the reduction handling code relies on this mode, so we can't simply consider such trees unvectorizeable. The assert conveys both that this situation can happen, but also that it can only happen for an immediate gather. Context: We built the bundle before deciding that vectorization of a bundle is possible. A side effect of bundle construction is manipulating the scheduling window, so a bundle which isn't vectorizable can cause the creation or expansion of a scheduling window.	2022-01-28 11:08:59 -08:00
Alexey Bataev	cec8b614f3	[SLP]Do not reorder top nodes if they do not require reordering. No need to reorder the top nodes, if they are not stores or insertelement instructions and each node should be analized only once, when the bottom-to-top analysis is performed. We still endup with extractelements for the top node scalars and the final shuffle just adds an extra cost and currently crashes the compiler for PHI nodes. Differential Revision: https://reviews.llvm.org/D116760	2022-01-28 09:16:18 -08:00
eopXD	6be77561f8	[SLP][NFC] Add debug logs for entry. Tell the users they are specifying something without vector register. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D117980	2022-01-24 09:05:21 -08:00

1 2 3 4 5 ...

1054 Commits