1874 Commits

Author SHA1 Message Date
Florian Hahn
79692750d2
[LV] Use VPValue for SCEV expansion in fixupIVUsers.
The step is already expanded in the VPlan. Use this expansion instead.
This is a step towards modeling fixing up IV users in VPlan.

 It also fixes a crash casued by SCEV-expanding the Step expression in
fixupIVUsers, where the IR is in an incomplete state

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147963
2023-05-04 09:25:59 +01:00
Nikita Popov
5362a0d859 [LCSSA] Remove unused ScalarEvolution argument (NFC)
After D149435, LCSSA formation no longer needs access to
ScalarEvolution, so remove the argument from the utilities.
2023-05-02 12:17:05 +02:00
Florian Hahn
6303fa369c
[VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC)
The argument isn't used. VPlan-based dead recipe removal can be used
instead.
2023-05-01 15:03:29 +01:00
Florian Hahn
0b24436591
[LV] Clarify comment for selectVectorizationFactor (NFC).
The comment is stale, as UserVF is handled before selectVectorizationFactor
is called. Clarify the comment by remove the mention of UserVF.

Suggested as independent improvement in D143938.
2023-04-30 21:12:15 +01:00
Florian Hahn
a431402fd2
[LV] Remove loop arg from CM::isCandidateForEpilogueVectorization (NFC)
LVP operates on the loop it stores in TheLoop. Use it instead of the
argument, to be in line with other member functions.

Suggested as independent improvement in D143938.
2023-04-30 21:11:12 +01:00
Florian Hahn
6fa07a87ab
[LV] Document selectEpilogueVectorizationFactor (NFC).
Add missing documentation for selectEpilogueVectorizationFactor.

Suggested as independent improvement in D143938.
2023-04-30 21:09:24 +01:00
Florian Hahn
8d3ff24e11
[LV] Sink collect* calls to LVP::plan() (NFC).
Move calls of collect* helpers closer to where the cost-model is used.
Should help simplifying D142669 & D142670.

Differential Revision: https://reviews.llvm.org/D142674
2023-04-30 11:41:22 +01:00
Florian Hahn
4583d7ef7c
[LV] Rename Preheader -> VecPreheader (NFC).
Clarify variable name as suggested in D147964 to reduce diff.
2023-04-28 22:15:47 +01:00
Nikita Popov
1745341296 [LoopVectorize] Preserve SCEV
As far as I can tell, LoopVectorize preserves SCEV, mainly by dint
of forgetting the loop being vectorized. We should mark it as
preserved in the pass manager.

This is a very small compile-time improvement.

Differential Revision: https://reviews.llvm.org/D149147
2023-04-26 09:43:54 +02:00
Philip Reames
09d879d060 [SCEV] Common code for computing trip count in a fixed type [NFC-ish]
This is a follow on to D147117 and D147355. In both cases, we were adding special cases to compute zext(BTC+1) instead of zext(BTC)+1 when the BTC+1 computation was known not to overflow.

Differential Revision: https://reviews.llvm.org/D148661
2023-04-25 12:04:42 -07:00
David Green
1869a9c225 [LV] Use the known trip count when costing non-tail folded VFs
Now that we store the ScalarCost in the VectorizationFactor it is possible to
use it to get a slightly more accurate cost in isMoreProfitable between two
vector factors. This extends the logic added in D101726 to non-tail-folded
cases, using the costs of `VecCost * (TripCount / VF) + ScalarCost * (TripCount % VF)`
to compare VFs where the TripCount is known and we are not folding the tail.

This shouldn't alter very much as small trip counts are usually not vectorized,
but does seem to help in the testcase where 4 * VF4 is chosen as profitable
compared to 2 * VF8 + 4 * scalar.

Differential Revision: https://reviews.llvm.org/D147720
2023-04-24 22:02:30 +01:00
Florian Hahn
3157f03a34
[VPlan] Add VPValue::isLiveIn() (NFC).
This helps to clarify checks in multiple places.

Suggested as cleanup in D147892.
2023-04-24 17:51:12 +01:00
Florian Hahn
6f999769b9
[VPlan] Remove unnecessary includes from VPlan.h (NFC).
Clean up some unnecessary includes from VPlan.h, which is imported in
multiple files.
2023-04-24 16:10:46 +01:00
Florian Hahn
6b8d19d2b5
Recommit "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts the revert commit 3d8ed8b5192a59104bfbd5bf7ac84d035ee0a4a5.

The new version of the patch adds a set to avoid duplicating work in
isFixedOrderRecurrence, which was previously done through the removed
SinkAfter map.

Original commit message:
    Building on D142885 and D142589, retire the SinkAfter map from the
    recurrence handling code. It is replaced by checking whether it is
    possible to sink all users of a recurrence directly in VPlan. This
    results in simpler code overall and allows to handle additional cases
    (see the improvements in @test_crash).

    Depends on D142885.
    Depends on D142589.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D142886
2023-04-20 09:31:16 +01:00
Florian Hahn
ff0ec4f42e
Recommit "[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI)."
This reverts the revert commit 8c2276f89887d0a27298a1bbbd2181fa54bbb509.

The updated patch re-orders the getDefiningRecipe check in getVPalue to avoid
a use-after-free.

Original commit message:

    Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
    1) Value2VPValue and 2) VPExternalDefs.

    This duplication is unnecessary and there are already cases where
    external defs are added to Value2VPValue. This patch replaces all uses
    of VPExternalDefs with Value2VPValue.

    It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
    and addVPValue (to addExternalVPValue).

    At the moment, this is NFC, but will enable additional simplifications
    in D147783.

    Depends on D147891.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D147892
2023-04-18 10:29:31 +01:00
Vitaly Buka
8c2276f898 Revert "[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI)."
Asan detects heap-use-after-free, see D147892.

This reverts commit 4fc190351e5af901b6107d162d07e1fbca90934f.
This reverts commit 668045eb77628be13e448ffbb855473ffca1cc43.
2023-04-17 17:24:10 -07:00
Manoj Gupta
3d8ed8b519 Revert "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts commit 7fc0b3049df532fce726d1ff6869a9f6e3183780.

Causes a clang hang when building xz utils, github issue #62187.
2023-04-17 12:19:36 -07:00
Shraiysh Vaishay
7021182d6b [nfc][llvm] Replace pointer cast functions in PointerUnion by llvm casting functions.
This patch replaces the uses of PointerUnion.is function by llvm::isa,
PointerUnion.get function by llvm::cast, and PointerUnion.dyn_cast by
llvm::dyn_cast_if_present. This is according to the FIXME in
the definition of the class PointerUnion.

This patch does not remove them as they are being used in other
subprojects.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D148449
2023-04-17 13:40:51 -05:00
Florian Hahn
4fc190351e
[VPlan] Remove uneeded NeedsVectorIV from VPWidenIntOrFpInduction.
After recent improvements, all instances of
VPWidenIntOrFpInductionRecipe should needs a vector IV and there's no
need for a separate field.
2023-04-17 13:38:00 +01:00
Bjorn Pettersson
a20f7efbc5 Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in
passes that no longer has support for the legacy PM.
2023-04-17 13:54:19 +02:00
David Sherwood
69ee653313 [LoopVectorize] Take vscale into account when deciding to create epilogues
In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.

This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.

Differential Revision: https://reviews.llvm.org/D147522
2023-04-17 10:49:40 +00:00
Florian Hahn
83ab5708d1
[LV] Don't sink scalar instructions that may read from memory.
The current sinking code doesn't prevent us from sinking a load past an
aliasing store. Skip sinking instructions that may read from memory to
avoid a mis-compile.

See @minimal_bit_widths_with_aliasing_store for an example where 2 loads
are sunk past aliasing stores before this fix.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147259
2023-04-17 09:30:25 +01:00
Kazu Hirata
c83c4b58d1 [Transforms] Apply fixes from performance-for-range-copy (NFC) 2023-04-16 08:25:28 -07:00
Florian Hahn
668045eb77
[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI).
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.

This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.

It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).

At the moment, this is NFC, but will enable additional simplifications
in D147783.

Depends on D147891.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147892
2023-04-16 15:38:31 +01:00
Florian Hahn
7fc0b3049d
[VPlan] Switch to checking sinking legality for recurrences in VPlan.
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).

Depends on D142885.
Depends on D142589.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142886
2023-04-13 22:00:52 +01:00
Craig Topper
4b47d875a1 [LV] Optimize trip count SCEV.
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.

The code here is based on similar code I found in
LoopIdiomRecognize.

This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D147355
2023-04-12 16:17:58 -07:00
Florian Hahn
68afaa3f48
[LV] Use std::make_optional to fix build failure after 082a0046.
Some compilers require std::make_optional(std::move()) to force construction
of the std::optional return value. This should fix the build failure in
  https://lab.llvm.org/buildbot#builders/67/builds/10991
2023-04-11 17:56:15 +01:00
Florian Hahn
082a004690
[VPlan] Allow building a VPlan to may fail.
Update the planning code constructing VPlan to allow building VPlans to
fail. This allows us to gradually shift some legality checks to VPlan
construction. The first candidate is checking if all users of
first-order recurrence phis can be sunk past the recipe computing the
previous value.

The new functionality will be used by D142886 which is approved and will
be landed shortly.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142885
2023-04-11 15:41:18 +01:00
Florian Hahn
f9d0b35d22
[LV] Re-use already computed runtime VF in fixFixedOrderRecurrence.
This was suggested as independent cleanup in D147472.

This removes a redundant runtime VF computation when using scalable
vectors.
2023-04-10 21:25:12 +01:00
Florian Hahn
954befe2a7
[LV] Turn check into assert in fixFixedOrderRecurrence (NFCI).
Suggested as independent cleanup in D147567. Either VF or UF need to be
> 1. Note that if the condition would be false, the code below would use
a nullptr and crash.
2023-04-10 21:11:41 +01:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Florian Hahn
c255eb2c4b
[VPlan] Use VPLiveOut to update FOR live-out users.
Instead of iterating over all LCSSA phis in the exit block, collect all
LiveOut users of the FOR splice VPInstruction and only update those
users.

Building on top of D147471, this removes an access to the cost model
after VPlan execution.

Depends on D147471.

Reviewed By: Ayal, michaelmaitland

Differential Revision: https://reviews.llvm.org/D147472
2023-04-10 13:02:44 +01:00
Florian Hahn
620e011a25
[VPlan] Don't add live-outs if scalar epilogue is required.
Instead of clearing live outs when a scalar epilogue is required late,
don't add live outs during VPlan construction if a scalar epilogue is
required.

This enables more VPlan-based DCE (if the live out would be the only
user in the plan) and is a step towards removing an access of the cost
model in fixedVectorizedLoop (which is after VPlan execution).

Depends on D147468.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147471
2023-04-09 09:18:24 +01:00
Florian Hahn
c7a34d355a
[VPlan] Require VFRange.End to be a power-of-2. (NFCI)
This removes the need to convert the end of the range to the next
power-of-2 for the end iterator after 4bd3fda5124962 and was suggested
as follow-up TODO in D147468.
2023-04-08 13:04:08 +01:00
Florian Hahn
4bd3fda512
[VPlan] Add VFRange::begin() and end() iterators. (NFCI)
Add an iterator to iterate over all VFs in VFRange. This simplifies some
existing code and allows using all_of,any_of and none_of on a VFRange.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147468
2023-04-08 10:22:25 +01:00
Florian Hahn
11896357d4
[VPlan] Add VPInterleaveRecipe::NeedsMaskForGaps field (NFCI).
This patch adds a NeedsMaskForGaps field to VPInterleaveRecipe to record
whether a mask for gaps is needed. This removes a dependence on the cost
model in VPlan code-generation.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147467
2023-04-07 13:11:03 +01:00
Michael Maitland
e86ed9bf2a [LV][NFC] Improve complexity of fixing users of recurrences
The original loop has O(MxN) since `is_contained` iterates over
all incoming values. This change makes it so only the phis
which use the value as an incoming value are iterated over so
it is now O(M).

Differential Revision: https://reviews.llvm.org/D146999
2023-04-06 16:15:51 -07:00
Florian Hahn
3f36b9b456
[LV] Move conditional MaskForGaps construction to load case.
Conditionally setting MaskForGaps is only needed for loads. This avoid
re-computing MaskForGaps for stores.

Suggested as independent cleanup in D147467.
2023-04-06 21:16:37 +01:00
David Sherwood
9278dd7b2b [LoopVectorize] Fix zext/sext cost calculations when types are shrunk
In getInstructionCost if we know a zext/sext is going to be shrunk
we should only be changing the destination type, and leave the
source type unchanged. For example, we may change a zext from

  zext <16 x i8> %a to <16 x i32>

to

  zext <16 x i8> %a to <16 x i16>

However, we were previously calculating the cost for doing

  zext <16 x i16> %a to <16 x i16>

which is incorrect.

Differential Revision: https://reviews.llvm.org/D147152
2023-04-06 08:52:25 +00:00
David Green
28c8616a5b [LV] Cleanup and reformatting for some debug messages. NFC
This is just some cleanup of various debug messages, pulled out of another
patch to simplify it a little.
2023-04-05 17:50:01 +01:00
Philip Reames
c416f6700f [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2)
(JFYI - This has been heavily reframed since original attempt at landing.)

This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.

In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).

This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.

Differential Revision: https://reviews.llvm.org/D147336
2023-04-05 09:32:35 -07:00
Graham Hunter
185863f7de [LV] Use available masked vector function variants when required
LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D136251
2023-04-05 11:18:38 +01:00
David Sherwood
b4089cfa2f [NFC][LoopVectorize] Simplify preferPredicateOverEpilogue interface
Given just how many arguments we pass to
preferPredicateOverEpilogue and considering this list may
grow over time I've decided to pass in a pointer to a new
TailFoldingInfo structure instead, similar to what we do
with IntrinsicCostAttributes, etc. In addition, many of the
arguments we pass in are actually available in the
LoopVectorizationLegality class so I've managed to
reduce the set of pointers that we need to pass in the
TailFoldingInfo struct.

Differential Revision: https://reviews.llvm.org/D146127
2023-04-04 14:00:49 +00:00
Philip Reames
f6b217c7cb [LV] Remmove unused default argument to isLegalGatherOrScatter [nfc] 2023-04-03 11:03:35 -07:00
David Green
965a090f02 Revert "[IVDescriptors] Add pointer InductionDescriptors with non-constant strides"
Multiple errors have being reported on
https://reviews.llvm.org/rG498aa534f472d28db893aa9a8627d0b46e17f312

Reverting until the correctness issues can be resolved.

We are also seeing a lot of performance differences from the patch.  Some are
looking good, but some are looking pretty bad.
2023-03-31 11:08:50 +01:00
Philip Reames
498aa534f4 [IVDescriptors] Add pointer InductionDescriptors with non-constant strides
This matches the handling for integer IVs.  I left the non-opaque cases alone, mostly because they're largely irrelevant today.

This doesn't actually make much difference in vectorization right now as we immediately fail on aliasing checks (which also bail on non-constant strides).  Slightly suprisingly, it's the case which *do* need runtime checks which work after this patch as they don't use the same dependency analysis path.

This will also enable non-constant stride pointer recurrences for other consumers.  I've auditted said code, and don't see any obvious issues.
2023-03-30 11:56:00 -07:00
David Sherwood
0ef8a79b12 [LoopVectorize] Add non-zero check for MaxPowerOf2RuntimeVF in computeMaxVF
This one-line patch just tightens up the code added in
1c4fedfa35aeb8b456e2d8f4f826c0e026b9d863
where we try to avoid tail-folding if we know the runtime
VF will always be a multiple of the trip count.
2023-03-29 10:08:32 +00:00
David Sherwood
1c4fedfa35 [LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail
Currently in LoopVectorize we avoid tail-folding if we can
prove the trip count is always a multiple of the maximum
fixed-width VF. This works because we know the vectoriser
only ever chooses a VF that is a power of 2. However, if
we are also considering scalable VFs then we conservatively
bail out of the optimisation because we don't know the value
of vscale, which could be an odd or prime number, etc.

This patch tries to enable the same optimisation for scalable
VFs by asking if vscale is known to be a power of 2. If so,
we can then query the maximum value of vscale and use the same
logic as we do for fixed-width VFs. I've also added a new TTI
hook called isVScaleKnownToBeAPowerOfTwo that does the same
thing as the existing TargetLowering hook.

Differential Revision: https://reviews.llvm.org/D146199
2023-03-27 08:34:30 +00:00
Florian Hahn
ea929a07b6
[LV] Set inbounds flag using CreateGEP in vectorizeInterleaveGroup(NFC).
This avoids having to cast the result of the builder to
GetElementPtrInst.
2023-03-22 11:29:57 +00:00
Florian Hahn
af99aa0ff7
[LV] Set imbounds flag using CreateGEP in VPWidenMemInst (NFC).
This avoids having to cast the result of the builder to
GetElementPtrInst.
2023-03-21 11:44:21 +00:00