3258 Commits

Author SHA1 Message Date
Vasileios Porpodas
5efa78985b [SLP] Fix lookahead operand reordering for splat loads.
Splat loads are inexpensive in X86. For a 2-lane vector we need just one
instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads
to worse code. This patch adds a new score dedicated for splat loads.

Please note that a splat is usually three IR instructions:
- It is usually a load and 2 inserts:
 %ld = load double, double* %gep
 %ins1 = insertelement <2 x double> poison, double %ld, i32 0
 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1

- But it can also be a load, an insert and a shuffle:
 %ld = load double, double* %gep
 %ins = insertelement <2 x double> poison, double %ld, i32 0
 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer

Because of this some of the lit tests contain more IR instructions.

Differential Revision: https://reviews.llvm.org/D121354
2022-03-17 18:05:54 -07:00
Alexey Bataev
d65cc85977 [SLP]Do not schedule instructions with constants/argument/phi operands and external users.
No need to schedule entry nodes where all instructions are not memory
read/write instructions and their operands are either constants, or
arguments, or phis, or instructions from others blocks, or their users
are phis or from the other blocks.
The resulting vector instructions can be placed at
the beginning of the basic block without scheduling (if operands does
not need to be scheduled) or at the end of the block (if users are
outside of the block).
It may save some compile time and scheduling resources.

Differential Revision: https://reviews.llvm.org/D121121
2022-03-17 11:03:45 -07:00
Florian Hahn
151c144350
[LV] Use usesScalars in widenPHIInstruction.
This uses the existing VPlan helpers to check whether there are scalar
uses of a phi recipe. It remove one of the few remaining dependencies on
the cost model from VPlan code generation.

Depends on D121612.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121613
2022-03-17 13:16:32 +00:00
Florian Hahn
a6e70e4056
[VPlan] VPInterleaveRecipe only requires the first lane of the address.
VPInterleaveRecipe only uses the first lane of the address. Add
onlyFirstLaneUsed implementation. This is needed for a follow-up patch.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121612
2022-03-17 11:56:43 +00:00
Nikita Popov
1dbeb64493 [SLP] Avoid unnecessary getIncomingValueForBlock() call (NFC)
This code just wants to check all incoming values, we don't care
care what the incoming block is here.
2022-03-17 12:23:46 +01:00
Alexey Bataev
150ea76543 Revert "[SLP]Do not schedule instructions with constants/argument/phi operands and external users."
This reverts commit 1eeb2bfe727323332800e8d390f2f8c63c953779 to fix
a bug reported in https://reviews.llvm.org/D121121
2022-03-16 13:54:59 -07:00
Malhar Jajoo
a36d269658 [VPlan] Avoid collecting scalars for SVE
This patch ensures scalars (except for uniforms) are no
longer collected (prior to LVP planning phase) for
scalable vectorization.

This is to avoid the chances of generating scalarized
instructions later (during LVP execute phase) as they
are not supported for scalable vectorization.

Relevant test has also been added.

Differential Revision: https://reviews.llvm.org/D121452
2022-03-16 16:33:34 +00:00
Alexey Bataev
1eeb2bfe72 [SLP]Do not schedule instructions with constants/argument/phi operands and external users.
No need to schedule entry nodes where all instructions are not memory
read/write instructions and their operands are either constants, or
arguments, or phis, or instructions from others blocks, or their users
are phis or from the other blocks.
The resulting vector instructions can be placed at
the beginning of the basic block without scheduling (if operands does
not need to be scheduled) or at the end of the block (if users are
outside of the block).
It may save some compile time and scheduling resources.

Differential Revision: https://reviews.llvm.org/D121121
2022-03-16 06:05:43 -07:00
Philip Reames
1cfa986d68 [SLP] Optionally preserve MemorySSA
This initial patch adds code to preserve MemorySSA through a run of SLP vectorizer. The eventual plan is to use MemorySSA to accelerate SLP's memory dependence checking, but we're a ways from that.  In particular, this patch is correct, but really slow. It's being landed so that we can work incrementally in tree, not because it's expected to be useful to anyone just yet.

The broader effort is being tracked in https://github.com/llvm/llvm-project/issues/54256.  Its worth noting expicitly that this may not work out, and if not, we will be reverting all of the MSSA support in SLP at some point in the next few weeks.

Differential Revision: https://reviews.llvm.org/D117926
2022-03-15 16:36:15 -07:00
Florian Hahn
ca1b2fc9fb
[LV] Remove LoopVectorBody from InnerLoopVectorizer. (NFCI)
Update places still referencing LoopVectorBody to use the vector loop to
get the vector loop header. This is needed to move vector loop
code-generation to VPlan completely, which in turn is needed to model
pre-header & exit blocks in VPlan as well.
2022-03-15 08:22:31 +00:00
Florian Hahn
4a0481e981
[LV] Check for users of truncated IVs, add more detailed comment.
Add missing outside user check for truncated IVs. Also hoist the code in
the helper with additional explanations.

Fixes #54370.
2022-03-14 19:39:30 +00:00
Nikita Popov
8361c5da30 [SLPVectorizer] Handle external load/store pointer uses with opaque pointers
In this case we may not generate a bitcast, so the new load/store
becomes the external user.
2022-03-14 16:55:09 +01:00
Florian Hahn
d621ae30e2
[LV] Remove dead Loop argument from emitMinimumVector... (NFC)
The argument is not used, remove it.
2022-03-14 15:47:40 +00:00
Florian Hahn
3ee2d908a9
[LV] Remove dead Loop argument from emitSCEVChecks. (NFC)
The argument is not used, remove it.
2022-03-14 13:00:03 +00:00
Florian Hahn
8896c36624
[LV] Do not set insert point in completeLoopSkeleton. (NFCI)
The insertion point for the builder used during VPlan code generation is
set during code generation. Setting the insert point here is dead code
and can be removed.
2022-03-14 12:21:26 +00:00
Florian Hahn
1c0fc1f074
[VPlan] Ensure each iv user is only visited once in transform.
If a recipe has multiple uses of an IV, we crash. It causes a crash when
building llvm-test-suite.

Exposed by 95f76bff1c40bc1c2f.
2022-03-13 21:42:17 +00:00
Florian Hahn
95f76bff1c
[LV] Create & use VPScalarIVSteps for all scalar users.
This patch is a follow-up to D115953. It updates optimizeInductions
to also introduce new VPScalarIVStepsRecipes if an IV has both vector
and scalar uses.

It updates all uses that only need scalar values to use the newly
created recipe for the scalar steps.

This completes untangling of VPWidenIntOrFpInductionRecipe
code-generation. Now the recipe *only* creates the widened vector
values, as it says on the tin.

The code to genereate IR has been moved directly to
VPWidenIntOrFpInductionRecipe::execute.

Note that the recipe has been updated to hold a reference to
ScalarEvolution, which is needed to expand the step, until we can place
the corresponding SCEV expansion in the pre-header.

Depends on D120827.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D120828
2022-03-13 17:15:24 +00:00
serge-sans-paille
ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Florian Hahn
d3e1094473
[VPlan] Implement VPCanonicalIVPHIRecipe::onlyFirstLaneUsed.
The recipe only uses the first lane of its operands.

Suggested & split off D120827.
2022-03-11 18:07:26 +00:00
Florian Hahn
ecea477df3
[VPlan] Helper to check if a recipe uses scalar values of op.
This patch adds a helper to check if a recipe only uses scalars of a
given operand. This is similar to onlyFirstLaneUsed, which was
introduced earlier.

By default, usesScalars falls back on onlyFirstLaneUsed.

Will be used by D120828.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D120827
2022-03-11 13:41:08 +00:00
Florian Hahn
a12403cfea
[LV] Do not consider instrs dead if used by phi that's not in plan.
Single value phis won't be modeled in VPlan. If the phi only gets used
outside the loop, the current code misses the fact that the incoming
value is not dead. Update the code to also look through such phis to
check for outside users.

Fixes #54266
2022-03-09 16:04:44 +00:00
Philip Reames
a2e9c68fcd [SLP] Extract a helper for buildvector [nfc] 2022-03-07 19:11:40 -08:00
Philip Reames
8ab3befa3f [SLP] Fix spelling in a lambda name [NFC] 2022-03-07 18:52:57 -08:00
Roman Lebedev
2f80ea7f4f
[NFC][LV] Use different braces in debug output
The analysis passes output function name encapsulated in `'` braces,
but LV uses `"`. Harmonizing this may help in creating an update script
for the LV costmodel test checks.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D121105
2022-03-07 19:32:37 +03:00
Philip Reames
deae979a2c Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""
This reverts commit 738042711bc08cde9135873200b1d088e6cf11c3. A second, apparently separate, issue has been reported on the original review.
2022-03-03 11:35:34 -08:00
Philip Reames
738042711b Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""
Root issue which triggered the revert was fixed in 689bab.  No changes in the reapplied patch.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instr
uction potentially vectorized to the last. This window can include a very large number of unrelated instruct
ions which are not being considered for vectorization. This change switches the code to only schedule the su
b-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

    Before this patch:

    704357 SLP                          - Number of calcDeps actions
     699021 SLP                          - Number of schedule calls
       5598 SLP                          - Number of ReSchedule actions
         59 SLP                          - Number of ReScheduleOnFail actions
      10084 SLP                          - Number of schedule resets
       8523 SLP                          - Number of vector instructions generated

    After this patch:

    102895 SLP                          - Number of calcDeps actions
     161916 SLP                          - Number of schedule calls
       5637 SLP                          - Number of ReSchedule actions
         55 SLP                          - Number of ReScheduleOnFail actions
      10083 SLP                          - Number of schedule resets
       8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-03-02 10:47:20 -08:00
Philip Reames
689babdf68 [SLP] Don't try to vectorize allocas
While a collection of allocas are technically vectorizeable - by forming a wider alloca - this was not a transform SLP actually knows how to do.  Instead, we were forming a bundle with missing dependencies, and then relying on the scheduling code to preserve program order if multiple instructions were scheduleable at once.  I haven't been able to write a test case, but I'm 99% sure this was wrong in some edge case.

The unknown op case was flowing down the shufflevector path.  This did result in some splat handling being lost with this change, but the same lack of splat handling is visible in a whole bunch of simple examples for the gather path.  I didn't consider this interesting to fix given how narrow the splat of allocas case is.
2022-03-02 10:08:43 -08:00
Florian Hahn
8777cb66a8
[VPlan] Remove reliance on underlying instr for ScalarIVSteps (NFCI).
Instead of relying on underlying instructions, this patch updates
VPScalarIVStepsRecipe to only store the required type information.

This removes access to unrelated information, as well as avoiding issues
with the same underlying instruction being shared by multiple recipes.

This change should only change the debug output and not cause any
codegen changes, hence NFCI.
2022-03-02 16:23:19 +00:00
Florian Hahn
9e46866c0c
[LV] Remove dead EntryVal argument from buildScalarSteps (NFC).
The EntryVal argument is not needed after recent refactoring. Remove it.
2022-03-02 14:59:22 +00:00
Arthur Eubanks
9c6250ee41 Revert "[SLP] Schedule only sub-graph of vectorizable instructions"
This reverts commit 0539a26d91a1b7c74022fa9cf33bd7faca87544d.

Causes a miscompile, see comments on D118538.

Required updating bottom-to-top-reorder.ll.
2022-03-01 17:31:16 -08:00
Arthur Eubanks
6987ac7903 Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]"
This reverts commit a3e9b32c00959ad5c73189d8378d019fbe80ade5.

Required for reverting D118538.
2022-03-01 17:28:52 -08:00
serge-sans-paille
a494ae43be Cleanup includes: TransformsUtils
Estimation on the impact on preprocessor output:
before: 1065307662
after:  1064800684

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
2022-03-01 21:00:07 +01:00
serge-sans-paille
71c3a5519d Cleanup includes: LLVMAnalysis
Number of lines output by preprocessor:
before: 1065940348
after:  1065307662

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120659
2022-03-01 18:01:54 +01:00
Philip Reames
8cb0ac5825 [SLP] Check invariant that all instructions in bundle are in same block [NFC] 2022-02-28 13:17:44 -08:00
Alexey Bataev
e4b9640867 [SLP]Improve bottom-to-top reordering.
Currently bottom-to-top reordering analysis counts orders of the
operands and then adds natural order counts for the operand users. It is
very conservative, this the user nodes themselves may require
reordering. Patch improves bottom-to-top analysis by checking for the
user nodes if they require/allows the reordring. If the user node must
be reordered, has reused scalars, is an alternate op vectorization node,
is a non-ordered gather node or may allow reordering because of the
reordered operands, such node is considered as the node that allows
reodring and is not counted as a node with the natural order.

Differential Revision: https://reviews.llvm.org/D120492
2022-02-28 06:48:46 -08:00
Florian Hahn
b3e8ace198
Recommit "[VPlan] Introduce recipe to build scalar steps."
This reverts the revert commit ff93260bf6bddfbad1fa65c4d5184988885b900f.

The underlying issue causing the PPC bot failures has been fixed in
cbaac1473403 and a corresponding test case has been added in
ad2cad1c521c.

Original message:

    This patch adds a new VPScalarIVStepsRecipe to handle building scalar
    steps.

    In the first patch, it only handles the case where there is no vector
    induction variable needed.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D115953
2022-02-28 14:12:20 +00:00
Florian Hahn
cbaac14734
[LV] Remove induction recipes only used outside vector loop.
Exit values of vector inductions are generated completely independent of
the induction recipes. Consider them for removal, if they are not used
in loop.

This fixes a crash exposed by 49b23f451cf7130.
2022-02-28 11:14:22 +00:00
Philip Reames
319265328c [SLP] Remove field unused after 33ce97f to silence buildbots [NFC] 2022-02-27 10:18:10 -08:00
Florian Hahn
ff93260bf6
Revert "[VPlan] Introduce recipe to build scalar steps."
This reverts commit 49b23f451cf713036c99573a35daed308d2ac894.

This appears to break some PPC build bots. Revert while I investigate.
2022-02-27 17:51:19 +00:00
Philip Reames
33ce97f413 [SLP] Use BatchAA to reduce capture analysis cost [NFC]
SLP makes very heavy use of aliasing queries to construct pointer dependencies for scheduling purposes.  AA internally usings pointerMayBeCaptured to prove some noalias results.  In a local profile, we were spending about 4% of total O2 time in capture tracking.  By using BatchAA interface - which caches capture results - this drops to 2%.

Note that there is no invalidation of BatchAA here.  This assumes that no transformation done by SLP invalidates alias or capture results.  This is the same assumption made by the existing AliasCache, so this is not a new assumption in the code.
2022-02-27 09:47:24 -08:00
Florian Hahn
49b23f451c
[VPlan] Introduce recipe to build scalar steps.
This patch adds a new VPScalarIVStepsRecipe to handle building scalar
steps.

In the first patch, it only handles the case where there is no vector
induction variable needed.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115953
2022-02-27 17:32:41 +00:00
Florian Hahn
9bc866cc6f
[VPlan] Add recipe to handle SCEV expansion (NFC).
This can be used to explicitly model VPValues that depend on SCEV
expansion, like the step for inductions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116288
2022-02-27 12:47:02 +00:00
Florian Hahn
da740492b0
[VPlan] Remove dead header-phi recipes.
This patch adds a new transform to remove dead recipes. For now, it only
removes dead recipes in the header, to keep the number tests that require
updating manageable. Future patches will extend this to remove dead
recipes across the whole plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D118051
2022-02-26 16:26:39 +00:00
Evgeniy Brevnov
10e99eb7e4 [SLP] "Normal" instructions should not go between PHI and Lading pad
Currently, SLP can insert "shuffle" instruction beween PHI and Landing pad instruction. The problem is demonstrated by LIT test. The solution is to adjust insertion point once we are done with PHI generation.

Differential Revision: https://reviews.llvm.org/D120552
2022-02-26 11:44:26 +07:00
Vasileios Porpodas
4bbc3290a2 [SLP] Fix for the min/max intrinsic cost.
The min/max intrinsic cost is currently too low because in the cost calculation
we subtract the cost of the vector compare as we will not emit it.
For the cost of the vector compare we are currently passing BAD_ICMP_PREDICATE
which returns 3, the worst case cost.
I think we should be passing VecPred instead, since we know the predicates of
the compare instr.

I think this is related to commit b3b993a7ad817 which introduced the predicate
argument to getCmpSelInstrCost().
https://reviews.llvm.org/rGb3b993a7ad817c3c5801341fa78f34332900eb83

Differential Revision: https://reviews.llvm.org/D120439
2022-02-24 18:08:40 -08:00
Philip Reames
ed54296ea3 [SLP] Fastpath instructions not in block being scheduled [nfc] 2022-02-23 13:51:36 -08:00
Philip Reames
a4541fdfe4 [SLP] Replace a impossible branch condition with an assert [NFC]
An entire bundle must be inside the scheduling window.  Assert that this property holds as opposed to checking it at runtime.
2022-02-23 13:43:45 -08:00
Philip Reames
9a40f9f681 {SLP] Make it clear ScheduleDataMap is keyed by instructions [NFC] 2022-02-23 13:31:36 -08:00
Philip Reames
9392c0d4ef Revert "[SLP] Remove cap on schedule window size"
This reverts commit 6adf4b039e095224edbbecda5972e5e3353b53b6.  Reverting while investigating https://github.com/llvm/llvm-project/issues/54029
2022-02-23 13:12:07 -08:00
Philip Reames
a83441e8cd Revert "[SLP] Simplify extendSchedulingRegion"
This reverts commit 8c85f3a0523070ef656e30e368df0a679c1400cd.
2022-02-23 13:12:07 -08:00