1095 Commits

Author SHA1 Message Date
Vasileios Porpodas
b1333f03d9 Recommit "[SLP] Support internal users of splat loads"
Code review: https://reviews.llvm.org/D121940

This reverts commit 359dbb0d3daa8295848a09ddd083c79f6851888e.
2022-04-18 15:58:01 -07:00
Vasileios Porpodas
359dbb0d3d Revert "[SLP] Support internal users of splat loads"
This reverts commit f8e1337115623cb879f734940fd9dfeb29a611e7.
2022-04-18 12:12:34 -07:00
Vasileios Porpodas
f8e1337115 [SLP] Support internal users of splat loads
Until now we would only accept a broadcast load pattern if it is only used
by a single vector of instructions.

This patch relaxes this, and allows for the broadcast to have more than one
user vector, as long as all of its uses are internal to the SLP graph and
vectorized.

Differential Revision: https://reviews.llvm.org/D121940
2022-04-18 11:59:44 -07:00
serge-sans-paille
fa5a4e1b95 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since a96638e50ef5 detected a few
regressions, fixing them.
2022-04-13 20:53:19 +02:00
Alexey Bataev
0e1f4d4d3c [SLP]Improve reductions analysis and emission, part 1.
Currently SLP vectorizer walks through the instructions and selects
3 main classes of values: 1) reduction operations - instructions with same
reduction opcode (add, mul, min/max, etc.), which build the reduction,
2) reduced values - instructions with the same opcodes, but different
from the reduction opcode, 3) extra arguments - all other values,
instructions from the different basic block rather than the root node,
instructions with to many/less uses.

This scheme is not very efficient. It excludes some instructions and all
non-instruction values from the reductions (constants, proficient
gathers), to many possibly reduced values are marked as extra arguments.
Patch improves this process by introducing a bit extended analysis
stage. During this stage, we still try to select 3 classes of the
values: 1) reduction operations - same as before, 2) possibly reduced
values - all instructions from the current block/non-instructions, which
may build a vectorization tree, 3) extra arguments - instructions from
the different basic blocks. Additionally, an extra sorting of the
possibly reduced values occurs to build the scalar sequences which
highly likely will bed vectorized, e.g. loads are grouped by the
distance between them, constants are grouped together, cmp instructions
are sorted by their compare types and predicates, extractelement
instructions are sorted by the vector operand, etc. Also, these groups
are reordered by their length so the longest group is the first in the
list of the possibly reduced values.

The vectorization process tries to emit the reductions for all these
groups. These reductions, remaining non-vectorized possible reduced
values and extra arguments are then combined into the final expression
just like it was before.

Differential Revision: https://reviews.llvm.org/D114171
2022-04-12 17:46:11 -07:00
Philip Reames
7d6e8f2a96 [slp] Delete dead scalar instructions feeding vectorized instructions
If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We can go ahead and delete them as well.

This is purely for test output quality and readability. It should have no effect in any sane pipeline.

Differential Revision: https://reviews.llvm.org/D122493
2022-03-28 20:10:13 -07:00
Philip Reames
f80aaa675f [SLP] Simplify eraseInstruction [NFC]
This simplifies the implementation of eraseInstruction by moving the odd-replace-users-with-undef handling back to the only caller which uses it.  This handling was not obviously correct, so add the asserts which make it clear why this is safe to do at all.  The result is simpler code and stronger assertions.
2022-03-25 12:01:52 -07:00
Philip Reames
48cc9287f5 Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)
The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling).  Most of these were fixed over the weekend and have had several days to bake.  The last was fixed this morning after being noticed in manual review of test changes yesterday.  See the review thread for links to each change.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

   Before this patch:

   704357 SLP                          - Number of calcDeps actions
   699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
   59 SLP                          - Number of ReScheduleOnFail actions
   10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

   After this patch:

   102895 SLP                          - Number of calcDeps actions
   161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
   55 SLP                          - Number of ReScheduleOnFail actions
   10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-03-25 10:39:23 -07:00
Philip Reames
ec858f0201 [SLP] Optimize stacksave dependence handling [NFC]
After writing the commit message for 4b1bace28, realized that the mentioned optimization was rather straight forward.  We already have the code for scanning a block during region initialization, we can simply keep track if we've seen a stacksave or stackrestore.  If we haven't, none of these dependencies are relevant and we can avoid the relatively expensive scans entirely.
2022-03-25 10:04:10 -07:00
Philip Reames
a16308c282 [SLP] Explicit track required stacksave/alloca dependency (try 3)
This is an extension of commit b7806c to handle one last case noticed in test changes for D118538.  Again, this is thought to be a latent bug in the existing code, though this time I have not managed to reduce tests for the original algoritthm.

The prior attempt had failed to account for this case:
  %a = alloca i8
  stacksave
  stackrestore
  store i8 0, i8* %a

If we allow '%a' to reorder into the stacksave/restore region, then the alloca will be deallocated before the use.  We will have taken a well defined program, and introduced a use-after-free bug.

There's also an inverse case where the alloca originally follows the stackrestore, and we need to prevent the reordering it above the restore.

Compile time wise, we potentially do an extra scan of the block for each alloca seen in a bundle.  This is significantly more expensive than the stacksave rooted version and is why I'd tried to avoid this in the initial patch.  There is room to optimize this (by essentially caching a "has stacksave" bit per block), but I'm leaving that to future work if it actually shows up in practice.  Since allocas in bundles should be rare in practice, I suspect we can defer the complexity for a long while.
2022-03-25 10:04:10 -07:00
Philip Reames
d9756fa723 [slp] Factor out a lambda to avoid uplicating code a third time in upcoming patch [nfc] 2022-03-25 09:02:39 -07:00
Alexey Bataev
20973c0841 [SLP][NFC]Fix param name in comments, NFC. 2022-03-24 05:58:42 -07:00
Vasileios Porpodas
39aa202aff Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit e6ead19b774718113007ecb1a4449d7af0cbcfeb.
2022-03-23 18:32:17 -07:00
Arthur Eubanks
e6ead19b77 Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash."
This reverts commit 27bd8f94928201f87f6b659fc2228efd539e8245.

Causes crashes, see comments in D121973
2022-03-23 10:57:45 -07:00
serge-sans-paille
1b89c83254 Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D122181
2022-03-23 11:06:13 +01:00
Vasileios Porpodas
27bd8f9492 Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit f7d7d2a08d16356c57f6d2d36bc2fc0589a55df9.
2022-03-22 16:41:55 -07:00
Arthur Eubanks
f7d7d2a08d Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads.""
This reverts commit 79613185d305013de743cdbd6690e4d77c8af27e.

Causes crashes, see comments in https://reviews.llvm.org/D121973.
2022-03-22 13:33:49 -07:00
Vasileios Porpodas
79613185d3 Recommit "[SLP] Fix lookahead operand reordering for splat loads."
Original review: https://reviews.llvm.org/D121354

The original commit 9136145eb019e1d18c966d4d06a3df349b88cc14 broke the build on several targets.

Differential Revision: https://reviews.llvm.org/D121973
2022-03-21 15:57:32 -07:00
Philip Reames
ee7324b898 Rename mayBeMemoryDependent to mayHaveNonDefUseDependency [nfc] 2022-03-21 10:01:40 -07:00
Alexey Bataev
79a182371e [SLP]Make stricter check for instructions that do not require
scheduling.

Need to check that the instructions with external operands can be
reordered safely before actualy exclude them from the scheduling.
2022-03-21 06:09:12 -07:00
Philip Reames
b7806c8b37 [SLP] Explicit track required stacksave/alloca dependency
The semantics of an inalloca alloca instruction requires that it not be reordered with a preceeding stacksave intrinsic call.  Unfortunately, there's no def/use edge or memory dependence edge.  (THe memory point is slightly subtle, but in general a new allocation can't alias with a call which executes strictly before it comes into existance.)

I'd tried to tackle this same case previously in 689babdf6, but the fix chosen there turned out to be incomplete.  As such, this change contains a fully revert of the first fix attempt.

This was noticed when investigating problems which surfaced with D118538, but this is definitely an existing bug.  This time around, I managed to reduce a couple of additional cases, including one which was being actively miscompiled even without the new scheduling change.  (See test diffs)

Compile time wise, we only spend extra time when seeing a stacksave (rare), and even then we walk the block at most once per schedule window extension.  Likely a non-issue.
2022-03-20 13:58:45 -07:00
Kazu Hirata
bce1bf0ee2 [Transform] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC) 2022-03-20 10:41:22 -07:00
Philip Reames
6253b77da9 [SLP] Respect control dependence within a block during scheduling
This fixes an active miscompile visible in the test changes.  The basic problem is that the scheduling dependency graph didn't have any edges for control dependence within a single basic block.  The result is that we could (and in some rare cases *did*) perform reorderings within a block which could introduce new undefined behavior along paths which didn't previously contain any.

Impact wise, we have two major cases where control is not guaranteed to reach a later instruction in the block: may throw calls, and calls containing infinite loops.
* The former case was mostly covered by the memory dependencies, and to trigger require a function which can throw, but not write to memory.  In theory, such a case is possible, but not likely in practice.
* The later case is likely more of an issue in practice.  After this code was first written, we changed the IR semantics to allow well defined infinite loops without satisifying mustprogress.  Even for C/C++ - which do imply mustprogress - recent changes to how we treat atomics (e.g. an atomic read does not always imply a write) could expose this issue.  I'm a bit shocked we don't seem to have a bug report which hit this in real code actually.

Compile time wise, this results in a single extra scan of the scheduling window in the common case.  Since we stop scanning at the next instruction which isn't guaranteed to execute, no matter what order we traverse instructions in, we scan the block once.  The exception to this is that when we extend the scheduling window downwards, we invalidate all dependencies, and thus rescan.  So the potentially expensive case is when we a call in a big schedule window which is frequently extended.  We could optimize this case (by caching the last instruction not guaranteeed to transfer execution and scanning only the extended window) and starting there), but I decided to leave the complexity until it mattered.  That same case is already degenerate with memory dependences which is more expensive than the control dependence scan.

We could also consider combining the memory dependence and control dependence sets to reduce memory usage, but since it complicates the code slightly and makes debugging a bit harder, I went with the simplest scheme for now.

This was noticed while trying to understand the failures reported against D118538, but is not otherwise related to that change.
2022-03-19 13:36:24 -07:00
Philip Reames
1093949cff [SLP] Add comment clarifying assumption that tripped me up [NFC]
I keep thinking this assumption is probably exploitable for a bug in the existing implementation, but all of my attempts at writing a test case have failed.  So for the moment, just document this very subtle assumption.
2022-03-18 11:40:19 -07:00
Kazu Hirata
3e0f7c7881 [Vectorize] Fix an 'unused function' warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3917:13: error:
  unused function 'needToScheduleSingleInstruction'
  [-Werror,-Wunused-function]
2022-03-18 11:24:57 -07:00
Kazu Hirata
b3d8c0d069 [Vectorize] Fix an 'unused variable' warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8148:18: error:
  unused variable 'SDTE' [-Werror,-Wunused-variable]
2022-03-18 11:24:54 -07:00
Philip Reames
8f108c32bc Revert "[SLP] Optionally preserve MemorySSA"
This reverts commit 1cfa986d68e2f04854ef30c432b8aa28e13a9706.  See https://github.com/llvm/llvm-project/issues/54256 for why I'm discontinuing the project.

Seperately, it turns out that while this patch does correctly preserve MSSA, it's correct only at the end of the pass; not between vectorization attempts.  Even if we decide to resurrect this, we'll need to fix that before reapplying.
2022-03-18 10:45:59 -07:00
Vasileios Porpodas
9136145eb0 Revert "[SLP] Fix lookahead operand reordering for splat loads." due to build failures
This reverts commit 5efa78985bf5cbba1c4346ba41a16435fc516446.
2022-03-17 18:22:04 -07:00
Vasileios Porpodas
5efa78985b [SLP] Fix lookahead operand reordering for splat loads.
Splat loads are inexpensive in X86. For a 2-lane vector we need just one
instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads
to worse code. This patch adds a new score dedicated for splat loads.

Please note that a splat is usually three IR instructions:
- It is usually a load and 2 inserts:
 %ld = load double, double* %gep
 %ins1 = insertelement <2 x double> poison, double %ld, i32 0
 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1

- But it can also be a load, an insert and a shuffle:
 %ld = load double, double* %gep
 %ins = insertelement <2 x double> poison, double %ld, i32 0
 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer

Because of this some of the lit tests contain more IR instructions.

Differential Revision: https://reviews.llvm.org/D121354
2022-03-17 18:05:54 -07:00
Alexey Bataev
d65cc85977 [SLP]Do not schedule instructions with constants/argument/phi operands and external users.
No need to schedule entry nodes where all instructions are not memory
read/write instructions and their operands are either constants, or
arguments, or phis, or instructions from others blocks, or their users
are phis or from the other blocks.
The resulting vector instructions can be placed at
the beginning of the basic block without scheduling (if operands does
not need to be scheduled) or at the end of the block (if users are
outside of the block).
It may save some compile time and scheduling resources.

Differential Revision: https://reviews.llvm.org/D121121
2022-03-17 11:03:45 -07:00
Nikita Popov
1dbeb64493 [SLP] Avoid unnecessary getIncomingValueForBlock() call (NFC)
This code just wants to check all incoming values, we don't care
care what the incoming block is here.
2022-03-17 12:23:46 +01:00
Alexey Bataev
150ea76543 Revert "[SLP]Do not schedule instructions with constants/argument/phi operands and external users."
This reverts commit 1eeb2bfe727323332800e8d390f2f8c63c953779 to fix
a bug reported in https://reviews.llvm.org/D121121
2022-03-16 13:54:59 -07:00
Alexey Bataev
1eeb2bfe72 [SLP]Do not schedule instructions with constants/argument/phi operands and external users.
No need to schedule entry nodes where all instructions are not memory
read/write instructions and their operands are either constants, or
arguments, or phis, or instructions from others blocks, or their users
are phis or from the other blocks.
The resulting vector instructions can be placed at
the beginning of the basic block without scheduling (if operands does
not need to be scheduled) or at the end of the block (if users are
outside of the block).
It may save some compile time and scheduling resources.

Differential Revision: https://reviews.llvm.org/D121121
2022-03-16 06:05:43 -07:00
Philip Reames
1cfa986d68 [SLP] Optionally preserve MemorySSA
This initial patch adds code to preserve MemorySSA through a run of SLP vectorizer. The eventual plan is to use MemorySSA to accelerate SLP's memory dependence checking, but we're a ways from that.  In particular, this patch is correct, but really slow. It's being landed so that we can work incrementally in tree, not because it's expected to be useful to anyone just yet.

The broader effort is being tracked in https://github.com/llvm/llvm-project/issues/54256.  Its worth noting expicitly that this may not work out, and if not, we will be reverting all of the MSSA support in SLP at some point in the next few weeks.

Differential Revision: https://reviews.llvm.org/D117926
2022-03-15 16:36:15 -07:00
Nikita Popov
8361c5da30 [SLPVectorizer] Handle external load/store pointer uses with opaque pointers
In this case we may not generate a bitcast, so the new load/store
becomes the external user.
2022-03-14 16:55:09 +01:00
serge-sans-paille
ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Philip Reames
a2e9c68fcd [SLP] Extract a helper for buildvector [nfc] 2022-03-07 19:11:40 -08:00
Philip Reames
8ab3befa3f [SLP] Fix spelling in a lambda name [NFC] 2022-03-07 18:52:57 -08:00
Philip Reames
deae979a2c Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""
This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on the original review.
2022-03-03 11:35:34 -08:00
Philip Reames
738042711b Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""
Root issue which triggered the revert was fixed in 689bab.  No changes in the reapplied patch.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instr
uction potentially vectorized to the last. This window can include a very large number of unrelated instruct
ions which are not being considered for vectorization. This change switches the code to only schedule the su
b-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

    Before this patch:

    704357 SLP                          - Number of calcDeps actions
     699021 SLP                          - Number of schedule calls
       5598 SLP                          - Number of ReSchedule actions
         59 SLP                          - Number of ReScheduleOnFail actions
      10084 SLP                          - Number of schedule resets
       8523 SLP                          - Number of vector instructions generated

    After this patch:

    102895 SLP                          - Number of calcDeps actions
     161916 SLP                          - Number of schedule calls
       5637 SLP                          - Number of ReSchedule actions
         55 SLP                          - Number of ReScheduleOnFail actions
      10083 SLP                          - Number of schedule resets
       8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-03-02 10:47:20 -08:00
Philip Reames
689babdf68 [SLP] Don't try to vectorize allocas
While a collection of allocas are technically vectorizeable - by forming a wider alloca - this was not a transform SLP actually knows how to do.  Instead, we were forming a bundle with missing dependencies, and then relying on the scheduling code to preserve program order if multiple instructions were scheduleable at once.  I haven't been able to write a test case, but I'm 99% sure this was wrong in some edge case.

The unknown op case was flowing down the shufflevector path.  This did result in some splat handling being lost with this change, but the same lack of splat handling is visible in a whole bunch of simple examples for the gather path.  I didn't consider this interesting to fix given how narrow the splat of allocas case is.
2022-03-02 10:08:43 -08:00
Arthur Eubanks
9c6250ee41 Revert "[SLP] Schedule only sub-graph of vectorizable instructions"
This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bottom-to-top-reorder.ll.
2022-03-01 17:31:16 -08:00
Arthur Eubanks
6987ac7903 Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]"
This reverts commit a3e9b32c00959ad5c73189d8378d019fbe80ade5.

Required for reverting D118538.
2022-03-01 17:28:52 -08:00
Philip Reames
8cb0ac5825 [SLP] Check invariant that all instructions in bundle are in same block [NFC] 2022-02-28 13:17:44 -08:00
Alexey Bataev
e4b9640867 [SLP]Improve bottom-to-top reordering.
Currently bottom-to-top reordering analysis counts orders of the
operands and then adds natural order counts for the operand users. It is
very conservative, this the user nodes themselves may require
reordering. Patch improves bottom-to-top analysis by checking for the
user nodes if they require/allows the reordring. If the user node must
be reordered, has reused scalars, is an alternate op vectorization node,
is a non-ordered gather node or may allow reordering because of the
reordered operands, such node is considered as the node that allows
reodring and is not counted as a node with the natural order.

Differential Revision: https://reviews.llvm.org/D120492
2022-02-28 06:48:46 -08:00
Philip Reames
319265328c [SLP] Remove field unused after 33ce97f to silence buildbots [NFC] 2022-02-27 10:18:10 -08:00
Philip Reames
33ce97f413 [SLP] Use BatchAA to reduce capture analysis cost [NFC]
SLP makes very heavy use of aliasing queries to construct pointer dependencies for scheduling purposes.  AA internally usings pointerMayBeCaptured to prove some noalias results.  In a local profile, we were spending about 4% of total O2 time in capture tracking.  By using BatchAA interface - which caches capture results - this drops to 2%.

Note that there is no invalidation of BatchAA here.  This assumes that no transformation done by SLP invalidates alias or capture results.  This is the same assumption made by the existing AliasCache, so this is not a new assumption in the code.
2022-02-27 09:47:24 -08:00
Evgeniy Brevnov
10e99eb7e4 [SLP] "Normal" instructions should not go between PHI and Lading pad
Currently, SLP can insert "shuffle" instruction beween PHI and Landing pad instruction. The problem is demonstrated by LIT test. The solution is to adjust insertion point once we are done with PHI generation.

Differential Revision: https://reviews.llvm.org/D120552
2022-02-26 11:44:26 +07:00
Vasileios Porpodas
4bbc3290a2 [SLP] Fix for the min/max intrinsic cost.
The min/max intrinsic cost is currently too low because in the cost calculation
we subtract the cost of the vector compare as we will not emit it.
For the cost of the vector compare we are currently passing BAD_ICMP_PREDICATE
which returns 3, the worst case cost.
I think we should be passing VecPred instead, since we know the predicates of
the compare instr.

I think this is related to commit b3b993a7ad817 which introduced the predicate
argument to getCmpSelInstrCost().
https://reviews.llvm.org/rGb3b993a7ad817c3c5801341fa78f34332900eb83

Differential Revision: https://reviews.llvm.org/D120439
2022-02-24 18:08:40 -08:00
Philip Reames
ed54296ea3 [SLP] Fastpath instructions not in block being scheduled [nfc] 2022-02-23 13:51:36 -08:00