1882 Commits

Author SHA1 Message Date
Florian Hahn
bf279a0f8e
[VPlan] Remove dangling comment and newlines (NFC).
Apply missed cleanups.
2023-05-11 22:06:56 +01:00
Florian Hahn
3d4eed0133
[LV] Reuse SCEV expansion results for epilogue vectorization.
When generating code for the epilogue vector loop, we need to re-use the
expansion results for induction steps generated for the main vector
loop, as the pre-header of the epilogue vector loop may not dominate the
vector preheader of the epilogue.

This fixes a reported crash. Note that this is a workaround which should
be removed soon once induction resume value creation is handled in VPlan
directly.
2023-05-11 22:00:07 +01:00
Philip Reames
7fbfcc653f [LV/LAA] Use PSE to identify stride multiplies which simplify [mostly nfc]
LV/LAA will speculate that (some) strided access patterns have unit stride, and insert runtime checks if required.

LV cost models a multiply by such a stride as free.  We did this by keeping around the StrideSet structure, just to check if one of the operands were one of the strides we speculated.

We can instead just ask PredicatedScalarEvolution if either of the operands are one (after predicates are applied).  We get mostly the same result - PSE can prove it in more cases in theory - and simpler code.
2023-05-11 11:16:04 -07:00
Florian Hahn
236a0e82df
[LV] Use VPValue to get expanded value for SCEV step expressions.
Update skeleton creation logic to use SCEV expansion results from
expanding the pre-header. This avoids another set of SCEV expansions
that may happen after the CFG has been modified.

Fixes #58811.

Depends on D147964.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147965
2023-05-11 16:49:19 +01:00
Florian Hahn
127b00b25c
[VPlan] Record IR flags on VPWidenRecipe directly (NFC).
This patch introduces a VPRecipeWithIRFlags class to record various IR
flags for a recipe. This allows de-coupling of IR flags from the
underlying instructions. The main benefit is that it allows dropping of
IR flags from recipes directly, without the need to go through
State::MayGeneratePoisonRecipes. The plan is to remove
MayGeneratePoisonRecipes once all relevant recipes are transitioned.

It also allows dropping IR flags during VPlan-to-VPlan transforms, which
will be used in a follow-up patch to implement truncateToMinimalBitwidths
as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149079
2023-05-08 17:28:50 +01:00
Florian Hahn
823d35fd3b
[VPlan] Use RecipeBuilder to look up member when fixing IG (NFC).
Recipes for interleave group members are recorded directly in the
RecipeBuilder. Use it directly instead of going indirectly through
VPlan's Value->VPValue mapping.
2023-05-07 18:02:27 +01:00
Florian Hahn
e3afe0b89d
[VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
To generate cast instructions, the result type is needed. To allow
creating widened casts without underlying instruction, introduce a new
VPWidenCastRecipe that also holds the result type.

This functionality will be used in a follow-up patch to
implement truncateToMinimalBitwidths as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149081
2023-05-05 13:20:16 +01:00
Florian Hahn
b85a402dd8
[VPlan] Introduce new entry block to VPlan for early SCEV expansion.
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.

The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.

D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix #58811.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147964
2023-05-04 14:00:13 +01:00
Florian Hahn
79692750d2
[LV] Use VPValue for SCEV expansion in fixupIVUsers.
The step is already expanded in the VPlan. Use this expansion instead.
This is a step towards modeling fixing up IV users in VPlan.

 It also fixes a crash casued by SCEV-expanding the Step expression in
fixupIVUsers, where the IR is in an incomplete state

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147963
2023-05-04 09:25:59 +01:00
Nikita Popov
5362a0d859 [LCSSA] Remove unused ScalarEvolution argument (NFC)
After D149435, LCSSA formation no longer needs access to
ScalarEvolution, so remove the argument from the utilities.
2023-05-02 12:17:05 +02:00
Florian Hahn
6303fa369c
[VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC)
The argument isn't used. VPlan-based dead recipe removal can be used
instead.
2023-05-01 15:03:29 +01:00
Florian Hahn
0b24436591
[LV] Clarify comment for selectVectorizationFactor (NFC).
The comment is stale, as UserVF is handled before selectVectorizationFactor
is called. Clarify the comment by remove the mention of UserVF.

Suggested as independent improvement in D143938.
2023-04-30 21:12:15 +01:00
Florian Hahn
a431402fd2
[LV] Remove loop arg from CM::isCandidateForEpilogueVectorization (NFC)
LVP operates on the loop it stores in TheLoop. Use it instead of the
argument, to be in line with other member functions.

Suggested as independent improvement in D143938.
2023-04-30 21:11:12 +01:00
Florian Hahn
6fa07a87ab
[LV] Document selectEpilogueVectorizationFactor (NFC).
Add missing documentation for selectEpilogueVectorizationFactor.

Suggested as independent improvement in D143938.
2023-04-30 21:09:24 +01:00
Florian Hahn
8d3ff24e11
[LV] Sink collect* calls to LVP::plan() (NFC).
Move calls of collect* helpers closer to where the cost-model is used.
Should help simplifying D142669 & D142670.

Differential Revision: https://reviews.llvm.org/D142674
2023-04-30 11:41:22 +01:00
Florian Hahn
4583d7ef7c
[LV] Rename Preheader -> VecPreheader (NFC).
Clarify variable name as suggested in D147964 to reduce diff.
2023-04-28 22:15:47 +01:00
Nikita Popov
1745341296 [LoopVectorize] Preserve SCEV
As far as I can tell, LoopVectorize preserves SCEV, mainly by dint
of forgetting the loop being vectorized. We should mark it as
preserved in the pass manager.

This is a very small compile-time improvement.

Differential Revision: https://reviews.llvm.org/D149147
2023-04-26 09:43:54 +02:00
Philip Reames
09d879d060 [SCEV] Common code for computing trip count in a fixed type [NFC-ish]
This is a follow on to D147117 and D147355. In both cases, we were adding special cases to compute zext(BTC+1) instead of zext(BTC)+1 when the BTC+1 computation was known not to overflow.

Differential Revision: https://reviews.llvm.org/D148661
2023-04-25 12:04:42 -07:00
David Green
1869a9c225 [LV] Use the known trip count when costing non-tail folded VFs
Now that we store the ScalarCost in the VectorizationFactor it is possible to
use it to get a slightly more accurate cost in isMoreProfitable between two
vector factors. This extends the logic added in D101726 to non-tail-folded
cases, using the costs of `VecCost * (TripCount / VF) + ScalarCost * (TripCount % VF)`
to compare VFs where the TripCount is known and we are not folding the tail.

This shouldn't alter very much as small trip counts are usually not vectorized,
but does seem to help in the testcase where 4 * VF4 is chosen as profitable
compared to 2 * VF8 + 4 * scalar.

Differential Revision: https://reviews.llvm.org/D147720
2023-04-24 22:02:30 +01:00
Florian Hahn
3157f03a34
[VPlan] Add VPValue::isLiveIn() (NFC).
This helps to clarify checks in multiple places.

Suggested as cleanup in D147892.
2023-04-24 17:51:12 +01:00
Florian Hahn
6f999769b9
[VPlan] Remove unnecessary includes from VPlan.h (NFC).
Clean up some unnecessary includes from VPlan.h, which is imported in
multiple files.
2023-04-24 16:10:46 +01:00
Florian Hahn
6b8d19d2b5
Recommit "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts the revert commit 3d8ed8b5192a59104bfbd5bf7ac84d035ee0a4a5.

The new version of the patch adds a set to avoid duplicating work in
isFixedOrderRecurrence, which was previously done through the removed
SinkAfter map.

Original commit message:
    Building on D142885 and D142589, retire the SinkAfter map from the
    recurrence handling code. It is replaced by checking whether it is
    possible to sink all users of a recurrence directly in VPlan. This
    results in simpler code overall and allows to handle additional cases
    (see the improvements in @test_crash).

    Depends on D142885.
    Depends on D142589.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D142886
2023-04-20 09:31:16 +01:00
Florian Hahn
ff0ec4f42e
Recommit "[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI)."
This reverts the revert commit 8c2276f89887d0a27298a1bbbd2181fa54bbb509.

The updated patch re-orders the getDefiningRecipe check in getVPalue to avoid
a use-after-free.

Original commit message:

    Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
    1) Value2VPValue and 2) VPExternalDefs.

    This duplication is unnecessary and there are already cases where
    external defs are added to Value2VPValue. This patch replaces all uses
    of VPExternalDefs with Value2VPValue.

    It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
    and addVPValue (to addExternalVPValue).

    At the moment, this is NFC, but will enable additional simplifications
    in D147783.

    Depends on D147891.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D147892
2023-04-18 10:29:31 +01:00
Vitaly Buka
8c2276f898 Revert "[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI)."
Asan detects heap-use-after-free, see D147892.

This reverts commit 4fc190351e5af901b6107d162d07e1fbca90934f.
This reverts commit 668045eb77628be13e448ffbb855473ffca1cc43.
2023-04-17 17:24:10 -07:00
Manoj Gupta
3d8ed8b519 Revert "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts commit 7fc0b3049df532fce726d1ff6869a9f6e3183780.

Causes a clang hang when building xz utils, github issue #62187.
2023-04-17 12:19:36 -07:00
Shraiysh Vaishay
7021182d6b [nfc][llvm] Replace pointer cast functions in PointerUnion by llvm casting functions.
This patch replaces the uses of PointerUnion.is function by llvm::isa,
PointerUnion.get function by llvm::cast, and PointerUnion.dyn_cast by
llvm::dyn_cast_if_present. This is according to the FIXME in
the definition of the class PointerUnion.

This patch does not remove them as they are being used in other
subprojects.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D148449
2023-04-17 13:40:51 -05:00
Florian Hahn
4fc190351e
[VPlan] Remove uneeded NeedsVectorIV from VPWidenIntOrFpInduction.
After recent improvements, all instances of
VPWidenIntOrFpInductionRecipe should needs a vector IV and there's no
need for a separate field.
2023-04-17 13:38:00 +01:00
Bjorn Pettersson
a20f7efbc5 Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in
passes that no longer has support for the legacy PM.
2023-04-17 13:54:19 +02:00
David Sherwood
69ee653313 [LoopVectorize] Take vscale into account when deciding to create epilogues
In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.

This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.

Differential Revision: https://reviews.llvm.org/D147522
2023-04-17 10:49:40 +00:00
Florian Hahn
83ab5708d1
[LV] Don't sink scalar instructions that may read from memory.
The current sinking code doesn't prevent us from sinking a load past an
aliasing store. Skip sinking instructions that may read from memory to
avoid a mis-compile.

See @minimal_bit_widths_with_aliasing_store for an example where 2 loads
are sunk past aliasing stores before this fix.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147259
2023-04-17 09:30:25 +01:00
Kazu Hirata
c83c4b58d1 [Transforms] Apply fixes from performance-for-range-copy (NFC) 2023-04-16 08:25:28 -07:00
Florian Hahn
668045eb77
[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI).
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.

This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.

It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).

At the moment, this is NFC, but will enable additional simplifications
in D147783.

Depends on D147891.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147892
2023-04-16 15:38:31 +01:00
Florian Hahn
7fc0b3049d
[VPlan] Switch to checking sinking legality for recurrences in VPlan.
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).

Depends on D142885.
Depends on D142589.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142886
2023-04-13 22:00:52 +01:00
Craig Topper
4b47d875a1 [LV] Optimize trip count SCEV.
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.

The code here is based on similar code I found in
LoopIdiomRecognize.

This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D147355
2023-04-12 16:17:58 -07:00
Florian Hahn
68afaa3f48
[LV] Use std::make_optional to fix build failure after 082a0046.
Some compilers require std::make_optional(std::move()) to force construction
of the std::optional return value. This should fix the build failure in
  https://lab.llvm.org/buildbot#builders/67/builds/10991
2023-04-11 17:56:15 +01:00
Florian Hahn
082a004690
[VPlan] Allow building a VPlan to may fail.
Update the planning code constructing VPlan to allow building VPlans to
fail. This allows us to gradually shift some legality checks to VPlan
construction. The first candidate is checking if all users of
first-order recurrence phis can be sunk past the recipe computing the
previous value.

The new functionality will be used by D142886 which is approved and will
be landed shortly.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142885
2023-04-11 15:41:18 +01:00
Florian Hahn
f9d0b35d22
[LV] Re-use already computed runtime VF in fixFixedOrderRecurrence.
This was suggested as independent cleanup in D147472.

This removes a redundant runtime VF computation when using scalable
vectors.
2023-04-10 21:25:12 +01:00
Florian Hahn
954befe2a7
[LV] Turn check into assert in fixFixedOrderRecurrence (NFCI).
Suggested as independent cleanup in D147567. Either VF or UF need to be
> 1. Note that if the condition would be false, the code below would use
a nullptr and crash.
2023-04-10 21:11:41 +01:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Florian Hahn
c255eb2c4b
[VPlan] Use VPLiveOut to update FOR live-out users.
Instead of iterating over all LCSSA phis in the exit block, collect all
LiveOut users of the FOR splice VPInstruction and only update those
users.

Building on top of D147471, this removes an access to the cost model
after VPlan execution.

Depends on D147471.

Reviewed By: Ayal, michaelmaitland

Differential Revision: https://reviews.llvm.org/D147472
2023-04-10 13:02:44 +01:00
Florian Hahn
620e011a25
[VPlan] Don't add live-outs if scalar epilogue is required.
Instead of clearing live outs when a scalar epilogue is required late,
don't add live outs during VPlan construction if a scalar epilogue is
required.

This enables more VPlan-based DCE (if the live out would be the only
user in the plan) and is a step towards removing an access of the cost
model in fixedVectorizedLoop (which is after VPlan execution).

Depends on D147468.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147471
2023-04-09 09:18:24 +01:00
Florian Hahn
c7a34d355a
[VPlan] Require VFRange.End to be a power-of-2. (NFCI)
This removes the need to convert the end of the range to the next
power-of-2 for the end iterator after 4bd3fda5124962 and was suggested
as follow-up TODO in D147468.
2023-04-08 13:04:08 +01:00
Florian Hahn
4bd3fda512
[VPlan] Add VFRange::begin() and end() iterators. (NFCI)
Add an iterator to iterate over all VFs in VFRange. This simplifies some
existing code and allows using all_of,any_of and none_of on a VFRange.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147468
2023-04-08 10:22:25 +01:00
Florian Hahn
11896357d4
[VPlan] Add VPInterleaveRecipe::NeedsMaskForGaps field (NFCI).
This patch adds a NeedsMaskForGaps field to VPInterleaveRecipe to record
whether a mask for gaps is needed. This removes a dependence on the cost
model in VPlan code-generation.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147467
2023-04-07 13:11:03 +01:00
Michael Maitland
e86ed9bf2a [LV][NFC] Improve complexity of fixing users of recurrences
The original loop has O(MxN) since `is_contained` iterates over
all incoming values. This change makes it so only the phis
which use the value as an incoming value are iterated over so
it is now O(M).

Differential Revision: https://reviews.llvm.org/D146999
2023-04-06 16:15:51 -07:00
Florian Hahn
3f36b9b456
[LV] Move conditional MaskForGaps construction to load case.
Conditionally setting MaskForGaps is only needed for loads. This avoid
re-computing MaskForGaps for stores.

Suggested as independent cleanup in D147467.
2023-04-06 21:16:37 +01:00
David Sherwood
9278dd7b2b [LoopVectorize] Fix zext/sext cost calculations when types are shrunk
In getInstructionCost if we know a zext/sext is going to be shrunk
we should only be changing the destination type, and leave the
source type unchanged. For example, we may change a zext from

  zext <16 x i8> %a to <16 x i32>

to

  zext <16 x i8> %a to <16 x i16>

However, we were previously calculating the cost for doing

  zext <16 x i16> %a to <16 x i16>

which is incorrect.

Differential Revision: https://reviews.llvm.org/D147152
2023-04-06 08:52:25 +00:00
David Green
28c8616a5b [LV] Cleanup and reformatting for some debug messages. NFC
This is just some cleanup of various debug messages, pulled out of another
patch to simplify it a little.
2023-04-05 17:50:01 +01:00
Philip Reames
c416f6700f [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2)
(JFYI - This has been heavily reframed since original attempt at landing.)

This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.

In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).

This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.

Differential Revision: https://reviews.llvm.org/D147336
2023-04-05 09:32:35 -07:00
Graham Hunter
185863f7de [LV] Use available masked vector function variants when required
LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D136251
2023-04-05 11:18:38 +01:00