1578 Commits

Author SHA1 Message Date
David Green
b65267ca7b [LV] Invalidate widening decisions after maximizing vector bandwidth
When MaximizeVectorBandwidth is enabled, we can end up (via calls to
collectUniformsAndScalars/setCostBasedWideningDecision through
calculateRegisterUsage) making widening decisions before we have decided
whether to fold the tail by masking. These decisions will be wrong if we
later decided to fold the tail, for example when the trip count is very
low. It will use incorrect costs for loads that should get masked, using
standard memory operation costs instead.

This still at the moment uses the EmulatedMaskMemRefHack costs (a bit
unfortunately), but the old costs without this change were 1, leading to
too optimistic vectorization.

This slightly changes the way that the MaximizeVectorBandwidth option
works to make it easier to test, always honouring the option if it is
set.

Differential Revision: https://reviews.llvm.org/D120215
2022-03-31 09:19:31 +01:00
Florian Hahn
ecb4171dcb
[LV] Handle zero cost loops in selectInterleaveCount.
In some case, like in the added test case, we can reach
selectInterleaveCount with loops that actually have a cost of 0.

Unfortunately a loop cost of 0 is also used to communicate that the cost
has not been computed yet. To resolve the crash, bail out if the cost
remains zero after computing it.

This seems like the best option, as there are multiple code paths that
return a cost of 0 to force a computation in selectInterleaveCount.
Computing the cost at multiple places up front there would unnecessarily
complicate the logic.

Fixes #54413.
2022-03-29 22:52:43 +01:00
Florian Hahn
d1d3563278
[LV] Move code to place pointer induction increment to VPlan post-processing.
This patch moves the code to set the correct incoming block for the
backedge value to VPlan::execute.

When generating the phi node, the backedge value is temporarily added
using the pre-header as incoming block. The invalid phi node will be
fixed up during VPlan::execute after main VPlan code generation.
At the same time, the backedge value is also moved to the latch.

This change removes the requirement to create the latch block up-front
for VPWidenInductionPHIRecipe::execute, which in turn will enable
modeling the pre-header in VPlan.

Depends on D121617.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121618
2022-03-29 20:27:59 +01:00
Florian Hahn
e7bf2ea934
[LV] Move code to place induction increment to VPlan post-processing.
This patch moves the code to set the correct incoming block for the
backedge value to VPlan::execute.

When generating the phi node, the backedge value is temporarily added
using the pre-header as incoming block. The invalid phi node will be
fixed up during VPlan::execute after main VPlan code generation.
At the same time, the backedge value is also moved to the latch.

This change removes the requirement to create the latch block up-front
for VPWidenIntOrFpInductionRecipe::execute, which in turn will enable
modeling the pre-header in VPlan.

As an alternative, the increment could be modeled as separate recipe,
but that would require more work and a bit of redundant code, as we need
to create the step-vector during VPWidenIntOrFpInductionRecipe::execute
anyways, to create the values for different parts.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121617
2022-03-28 16:20:02 +01:00
Florian Hahn
e47d220230
[LV] Use getVectorLoopRegion to retrieve header. (NFC)
Update all places that currently assume the entry block to the plan is
also the vector loop header to use getVectorLoopRegion instead.

getVectorLoopRegion will keep doing the right thing when the pre-header
is modeled explicitly (and becomes the new entry block in the plan).
2022-03-25 16:57:12 +00:00
Simon Pilgrim
597aefa89c Fix unused variable warning by embedding inside assertion 2022-03-24 17:41:24 +00:00
Florian Hahn
46432a0088
[VPlan] Add VPWidenPointerInductionRecipe.
This patch moves pointer induction handling from VPWidenPHIRecipe to its
own recipe. In the process, it adds all information required to generate
code for pointer inductions without relying on Legal to access the list
of induction phis.

Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121615
2022-03-24 14:58:45 +00:00
serge-sans-paille
1b89c83254 Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D122181
2022-03-23 11:06:13 +01:00
Florian Hahn
50c8588e44
[LV] Remove Loop argument from createInductionResumeValues (NFCI).
createInductionResumeValues only uses its loop argument only to get the
pre-header, but the pre-header is already known (we created/cached it
earlier). Remove the unneeded loop argument.
2022-03-22 14:23:12 +00:00
Sophia
72bde608d2 [LV] Fix typo in comment
Reviewed by: fhahn (Florian Hahn)

    Differential Revision: https://reviews.llvm.org/D121781
2022-03-21 20:30:05 +08:00
Florian Hahn
0ebac76e6e
[LV] Remove unneeded Loop argument from completeLoopSkeleton. (NFCI)
completeLoopSkeleton only uses its loop argument only to get the
pre-header, but the pre-header is already known (we created/cached it
earlier). Remove the unneeded loop argument.
2022-03-21 10:07:25 +00:00
Florian Hahn
487629cc61
[LV] Remove dead Loop argument from emitMemRuntimeChecks. (NFC) 2022-03-20 21:01:15 +00:00
Florian Hahn
1a820ff039
[LV] Remove unnecessary uses of Loop* (NFC).
Update functions that previously took a loop pointer but only to get the
pre-header. Instead, pass the block directly. This removes the
requirement for the loop object to be created up-front.
2022-03-19 20:18:47 +00:00
Florian Hahn
151c144350
[LV] Use usesScalars in widenPHIInstruction.
This uses the existing VPlan helpers to check whether there are scalar
uses of a phi recipe. It remove one of the few remaining dependencies on
the cost model from VPlan code generation.

Depends on D121612.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121613
2022-03-17 13:16:32 +00:00
Malhar Jajoo
a36d269658 [VPlan] Avoid collecting scalars for SVE
This patch ensures scalars (except for uniforms) are no
longer collected (prior to LVP planning phase) for
scalable vectorization.

This is to avoid the chances of generating scalarized
instructions later (during LVP execute phase) as they
are not supported for scalable vectorization.

Relevant test has also been added.

Differential Revision: https://reviews.llvm.org/D121452
2022-03-16 16:33:34 +00:00
Florian Hahn
ca1b2fc9fb
[LV] Remove LoopVectorBody from InnerLoopVectorizer. (NFCI)
Update places still referencing LoopVectorBody to use the vector loop to
get the vector loop header. This is needed to move vector loop
code-generation to VPlan completely, which in turn is needed to model
pre-header & exit blocks in VPlan as well.
2022-03-15 08:22:31 +00:00
Florian Hahn
d621ae30e2
[LV] Remove dead Loop argument from emitMinimumVector... (NFC)
The argument is not used, remove it.
2022-03-14 15:47:40 +00:00
Florian Hahn
3ee2d908a9
[LV] Remove dead Loop argument from emitSCEVChecks. (NFC)
The argument is not used, remove it.
2022-03-14 13:00:03 +00:00
Florian Hahn
8896c36624
[LV] Do not set insert point in completeLoopSkeleton. (NFCI)
The insertion point for the builder used during VPlan code generation is
set during code generation. Setting the insert point here is dead code
and can be removed.
2022-03-14 12:21:26 +00:00
Florian Hahn
95f76bff1c
[LV] Create & use VPScalarIVSteps for all scalar users.
This patch is a follow-up to D115953. It updates optimizeInductions
to also introduce new VPScalarIVStepsRecipes if an IV has both vector
and scalar uses.

It updates all uses that only need scalar values to use the newly
created recipe for the scalar steps.

This completes untangling of VPWidenIntOrFpInductionRecipe
code-generation. Now the recipe *only* creates the widened vector
values, as it says on the tin.

The code to genereate IR has been moved directly to
VPWidenIntOrFpInductionRecipe::execute.

Note that the recipe has been updated to hold a reference to
ScalarEvolution, which is needed to expand the step, until we can place
the corresponding SCEV expansion in the pre-header.

Depends on D120827.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D120828
2022-03-13 17:15:24 +00:00
serge-sans-paille
ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Roman Lebedev
2f80ea7f4f
[NFC][LV] Use different braces in debug output
The analysis passes output function name encapsulated in `'` braces,
but LV uses `"`. Harmonizing this may help in creating an update script
for the LV costmodel test checks.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D121105
2022-03-07 19:32:37 +03:00
Florian Hahn
8777cb66a8
[VPlan] Remove reliance on underlying instr for ScalarIVSteps (NFCI).
Instead of relying on underlying instructions, this patch updates
VPScalarIVStepsRecipe to only store the required type information.

This removes access to unrelated information, as well as avoiding issues
with the same underlying instruction being shared by multiple recipes.

This change should only change the debug output and not cause any
codegen changes, hence NFCI.
2022-03-02 16:23:19 +00:00
Florian Hahn
9e46866c0c
[LV] Remove dead EntryVal argument from buildScalarSteps (NFC).
The EntryVal argument is not needed after recent refactoring. Remove it.
2022-03-02 14:59:22 +00:00
Florian Hahn
b3e8ace198
Recommit "[VPlan] Introduce recipe to build scalar steps."
This reverts the revert commit ff93260bf6bddfbad1fa65c4d5184988885b900f.

The underlying issue causing the PPC bot failures has been fixed in
cbaac1473403 and a corresponding test case has been added in
ad2cad1c521c.

Original message:

    This patch adds a new VPScalarIVStepsRecipe to handle building scalar
    steps.

    In the first patch, it only handles the case where there is no vector
    induction variable needed.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D115953
2022-02-28 14:12:20 +00:00
Florian Hahn
ff93260bf6
Revert "[VPlan] Introduce recipe to build scalar steps."
This reverts commit 49b23f451cf713036c99573a35daed308d2ac894.

This appears to break some PPC build bots. Revert while I investigate.
2022-02-27 17:51:19 +00:00
Florian Hahn
49b23f451c
[VPlan] Introduce recipe to build scalar steps.
This patch adds a new VPScalarIVStepsRecipe to handle building scalar
steps.

In the first patch, it only handles the case where there is no vector
induction variable needed.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115953
2022-02-27 17:32:41 +00:00
Florian Hahn
da740492b0
[VPlan] Remove dead header-phi recipes.
This patch adds a new transform to remove dead recipes. For now, it only
removes dead recipes in the header, to keep the number tests that require
updating manageable. Future patches will extend this to remove dead
recipes across the whole plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D118051
2022-02-26 16:26:39 +00:00
Kerry McLaughlin
12fb133eba [LoopVectorize] Support conditional in-loop vector reductions
Extends getReductionOpChain to look through Phis which may be part of
the reduction chain. adjustRecipesForReductions will now also create a
CondOp for VPReductionRecipe if the block is predicated and not only if
foldTailByMasking is true.

Changes were required in tryToBlend to ensure that we don't attempt
to convert the reduction Phi into a select by returning a VPBlendRecipe.
The VPReductionRecipe will create a select between the Phi and the reduction.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D117580
2022-02-22 12:04:35 +00:00
Florian Hahn
2cd22ce0d0
[LV] Pass start value directly to emitTransformedIndex (NFC). 2022-02-12 19:03:32 +00:00
Philip Reames
5ba115031d [PSE] Remove assumption that top level predicate is union from public interface [NFC*]
Note that this doesn't actually cause the top level predicate to become a non-union just yet.

The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
2022-02-10 16:14:52 -08:00
Simon Pilgrim
6af7c1371a [LoopVectorize] getStepVector - reduce scope of local variable. NFC. 2022-02-10 20:44:25 +00:00
David Green
b55d4c2ad8 Revert "[LV] Remove LoopVectorizationCostModel::useEmulatedMaskMemRefHack()"
This reverts commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 as we've
received multiple reports of this significantly impacting performance,
in ways that don't seem to just be target specific cost models going
wrong. I would offer some reproducers, but the test changes here seem to
be full of them!

Reverting for now and hopefully we can remove the "hack" more carefully
as we go.
2022-02-09 20:02:54 +00:00
Florian Hahn
8aa122081f
[LV] Pass step to emitTransformedIndex (NFC).
Move out the induction step creation from emitTransformedIndex to the
callers. In some places (e.g. widenIntOrFpInduction) the step is already
created. Passing the step in ensures the steps are kept in sync.
2022-02-09 11:12:45 +00:00
Florian Hahn
c9e6678b56
[LV] Move buildScalarSteps out of ILV (NFC).
This makes the function independent of shared state in ILV (ensures no
new dependencies on things like the cost model are introduced) and allows
for use directly in recipe's ::execute functions.
2022-02-08 21:18:40 +00:00
David Green
b4c6d1bb37 [LoopVectorizer] Don't perform interleaving of predicated scalar loops
The vectorizer will choose at times to "vectorize" loops with a scalar
factor (VF=1) with interleaving (IC > 1). This can occasionally produce
better code than the unroller (notable for reductions where it can
produce independent reduction chains that are combined after the loop).
At times this is not very beneficial though, for example when runtime
checks are needed or when the scalar code requires predication.

This addresses the second point, preventing the vectorizer from
interleaving when the scalar loop will require predication. This
prevents it from making a bit of a mess, that is worse than the original
and better left for the unroller to unroll if beneficial. It helps
reverse some of the regressions from D118090.

Differential Revision: https://reviews.llvm.org/D118566
2022-02-07 19:34:28 +00:00
Florian Hahn
5a72357697
[LV] Use IRBuilderBase in VPlan.h, remove IRBuilder.h include (NFC).
By using IRBuilderBase instead of IRBuilder<> a forward declaration can
be used instead of including IRBuilder.h
2022-02-07 17:46:16 +00:00
Roman Lebedev
77a0da926c
[LV] Remove LoopVectorizationCostModel::useEmulatedMaskMemRefHack()
D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model.
What it essentially does is prevents scalarized vectorization of masked memory operations:
```
  // TODO: Cost model for emulated masked load/store is completely
  // broken. This hack guides the cost model to use an artificially
  // high enough value to practically disable vectorization with such
  // operations, except where previously deployed legality hack allowed
  // using very low cost values. This is to avoid regressions coming simply
  // from moving "masked load/store" check from legality to cost model.
  // Masked Load/Gather emulation was previously never allowed.
  // Limited number of Masked Store/Scatter emulation was allowed.
```

While i don't really understand about what specifically `is completely broken`
was talking about, i believe that at least on X86 with AVX2-or-later,
this is no longer true. (or at least, i would like to know what is still broken).
So i would like to follow suit after D111460, and like wise disable that hack for AVX2+.

But since this was added for X86 specifically, let's just instead completely remove this hack.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114779
2022-02-07 16:08:31 +03:00
Florian Hahn
541ca12dcd
[LV] Use VPReplicateRecipe::isUniform instead isUniformAfterVec (NFCI).
In scalarizeInstruction(), isUniformAfterVectorization is used to detect
cases where it is sufficient to always access the first lane. This
should map directly checking whether the operand is a uniform replicate
recipe.

Differential Revision: https://reviews.llvm.org/D116654
2022-02-06 16:37:20 +00:00
Sander de Smalen
eaee477eda [LV] Use VScaleForTuning to allow wider epilogue VFs.
When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot
be any smaller, the vectorizer should try to estimate how many lanes are
executed at runtime and allow a suitable fixed-width VF to be chosen. It
can use VScaleForTuning to figure out what a suitable fixed-width VF could
be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8,
it could still choose an epilogue VF upto VF=4.

This was a bit tricky to test, so this patch also introduces a wrapper
function to get 'VScaleForTuning' by also considering vscale_range.
If min and max are equal, then that will be the vscale we compile for.
It makes little sense to tune for a different width if the code
will not be portable for other widths.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118709
2022-02-03 15:40:17 +00:00
Sander de Smalen
2a44eaf20f [LV] Allow a scalable VF for the epilogue.
For some reason we limited the epilogue VF to be fixed-width, but there
is not necessarily a reason for doing so. If the main VF=vscale x 16, the
epilogue VF could be either fixed-width, or a scalable VF upto vscale x 8.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118688
2022-02-01 22:38:55 +00:00
Florian Hahn
7fe4fa9a0a
[LV] Use onlyFirstLaneDemanded when widening pointer phis (NFCI).
This removes another instance of recipe execution still relying on
the cost model.

Depends on D116554.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D116656
2022-02-01 09:50:47 +00:00
Florian Hahn
8f12175fed
[VPlan] Use VPlan to check if only the first lane is used.
This removes the remaining dependence on LoopVectorizationCostModel from
buildScalarSteps and is required so it can be moved out of ILV.

It also improves allows us to remove a few unneeded instructions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116554
2022-01-30 13:07:29 +00:00
Florian Hahn
efd4938723
[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

 1. First step to avoid setting both vector and scalar values for the
    same induction def.
 2. Reducing complexity of widenIntOrFpInduction through making things
    more explicit in VPlan
 3. Only need to splat the vector IV for block in masks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116123
2022-01-29 16:25:27 +00:00
Florian Hahn
96400f179f
[VPlan] Record whether scalar IVs are need in induction recipe. (NFC)
This explicitly records whether a scalar IV is needed in the
VPWidenIntOrFpInductionRecipe, to remove a dependence on the cost-model
during its ::execute.

It will also be used in D116123 to determine if a vector phi will be
generated.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D118167
2022-01-28 09:34:03 +00:00
Kerry McLaughlin
8082ab2fc3 [LoopVectorize] Support epilogue vectorisation of loops with reductions
isCandidateForEpilogueVectorization will currently return false for loops
which contain reductions. This patch removes this restriction and makes
the following changes to support epilogue vectorisation with reductions:

- `fixReduction`: If fixReduction is being called during vectorisation of the
    epilogue, the phi node it creates will need to additionally carry incoming
     values from the middle block of the main loop.

- `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi
    created by fixReduction are updated after the vec.epilog.iter.check block
    is added. The phi is also moved to the preheader of the epilogue.

- `processLoop`: The start value of any VPReductionPHIRecipes are updated before
    vectorising the epilogue loop. The getResumeInstr function added to the ILV
    will return the resume instruction associated with the recurrence descriptor.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D116928
2022-01-24 12:03:31 +00:00
Florian Hahn
5f2854f1da
[LV] Always create VPWidenCanonicalIVRecipe, optimize away later.
This patch updates createBlockInMask to always generate
VPWidenCanonicalIVRecipe and adds a transform to optimize it away later,
if it is not needed.

This is a step towards breaking up VPWidenIntOrFpInductionRecipe and
explicitly distinguishing between vector phis and scalarizing.

Split off from D116123.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D117140
2022-01-22 15:34:20 +00:00
Florian Hahn
c0cf209076
[VPlan] Add VPWidenIntOrFpInductionRecipe::isCanonical, use it (NFCI).
This patch adds VPWidenIntOrFpInductionRecipe::isCanonical to check if
an induction recipe is canonical. The code is also updated to use it
instead of isCanonicalID.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D117551
2022-01-21 09:35:06 +00:00
Florian Hahn
070d1034da
[LV] Restore metadata to disable runtime unrolling for epilogue loop.
After d4a8fc3a87a1 LV stopped adding metadata to disable runtime
unrolling to the vectorized epilogue loop. This was missed because
278aa65cc495 removed the relevant test coverage.

This patch fixes that by adding the relevant metadata after
vector loop generation.
2022-01-16 13:14:16 +00:00
Florian Hahn
62739204d4
[LV] Move AddRuntimeUnrollDisableMetaData so it can be used earlier (NFC)
Move up the definition of AddRuntimeUnrollDisableMetaData, so it can be
re-used earlier in the file in a follow-up patch.
2022-01-16 10:30:24 +00:00