3703 Commits

Author SHA1 Message Date
Florian Hahn
4fc190351e
[VPlan] Remove uneeded NeedsVectorIV from VPWidenIntOrFpInduction.
After recent improvements, all instances of
VPWidenIntOrFpInductionRecipe should needs a vector IV and there's no
need for a separate field.
2023-04-17 13:38:00 +01:00
Bjorn Pettersson
21a6890856 [Vectorize] Clean up Transforms/Vectorize.h
Removed definitions of vectorizeBasicBlock and VectorizeConfig
(possibly a remnant from the BBVectorize pass that was removed
way back in 2017).

Also reduced amount of include dependencies to Transforms/Vectorize.h.
2023-04-17 13:54:19 +02:00
Bjorn Pettersson
a20f7efbc5 Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in
passes that no longer has support for the legacy PM.
2023-04-17 13:54:19 +02:00
Florian Hahn
02369b75fd
[VPlan] Mark recurrence recipes as not having side-effects.
Add support for FirstOrderRecurrenceSplice and VPFirstOrderRecurrencePHI
recipes to mayHaveSideEffects. They both don't have side-effects.
2023-04-17 12:30:52 +01:00
David Sherwood
69ee653313 [LoopVectorize] Take vscale into account when deciding to create epilogues
In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.

This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.

Differential Revision: https://reviews.llvm.org/D147522
2023-04-17 10:49:40 +00:00
Florian Hahn
83ab5708d1
[LV] Don't sink scalar instructions that may read from memory.
The current sinking code doesn't prevent us from sinking a load past an
aliasing store. Skip sinking instructions that may read from memory to
avoid a mis-compile.

See @minimal_bit_widths_with_aliasing_store for an example where 2 loads
are sunk past aliasing stores before this fix.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147259
2023-04-17 09:30:25 +01:00
Kazu Hirata
c83c4b58d1 [Transforms] Apply fixes from performance-for-range-copy (NFC) 2023-04-16 08:25:28 -07:00
Florian Hahn
668045eb77
[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI).
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.

This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.

It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).

At the moment, this is NFC, but will enable additional simplifications
in D147783.

Depends on D147891.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147892
2023-04-16 15:38:31 +01:00
Florian Hahn
2db031528e
[VPlan] Check VPValue step in isCanonical (NFCI).
Update the isCanonical() implementations to check the VPValue step
operand instead of the step in the induction descriptor.

At the moment this is NFC, but it enables further optimizations if the
step is replaced by a constant in D147783.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147891
2023-04-16 14:48:03 +01:00
Vasileios Porpodas
7e67a9473d [SLP][NFC] Remove Limit from tryToVectorizeSequence() arguments.
Limit turns out to be implemented in the exact same way for all calls to
tryToVectorizeSequence(). So this patch removes it and implements it internally
as a lambda function.

Differential Revision: https://reviews.llvm.org/D148382
2023-04-14 14:58:57 -07:00
Nikita Popov
62ef97e063 [llvm-c] Remove PassRegistry and initialization APIs
Remove C APIs for interacting with PassRegistry and pass
initialization. These are legacy PM concepts, and are no longer
relevant for the new pass manager.

Calls to these initialization functions can simply be dropped.

Differential Revision: https://reviews.llvm.org/D145043
2023-04-14 12:12:48 +02:00
Florian Hahn
7fc0b3049d
[VPlan] Switch to checking sinking legality for recurrences in VPlan.
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).

Depends on D142885.
Depends on D142589.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142886
2023-04-13 22:00:52 +01:00
Alexey Bataev
a4eff2b56c [SLP][NFC]Remove extra semicolons after function definitions, NFC 2023-04-13 11:33:25 -07:00
Alexey Bataev
f82eb7e066 [SLP]Introduce gather cost estimation function.
Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial
implementation of the gather/buildvector cost estimation for buildvector
nodes. It will allow to use general codegen infrastructure for better
cost estimation + it improves the cost estimation for the
gathers/buildvectors.

Improved part of D110978.

Differential Revision: https://reviews.llvm.org/D148174
2023-04-13 10:16:00 -07:00
Simon Pilgrim
b3480d5ede [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support).

Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
2023-04-13 17:00:39 +01:00
Simon Pilgrim
9e30b87afb [TTI] getMinMaxReductionCost - add FastMathFlag argument
Similar to the getArithmeticReductionCost / getExtendedReductionCost calls (which really don't need to use std::optional<>).

This will be necessary to correct recognize fast/nnan fmax/fmul reductions which can avoid nan handling - which will allow us to remove the fmax/fmin special case in X86TTIImpl::getMinMaxCost and use getIntrinsicInstrCost like we do for integer reductions (63c3895327839ba5b57f5b99ec9e888abf976ac6).

Differential Revision: https://reviews.llvm.org/D148149
2023-04-13 10:42:42 +01:00
Bjorn Pettersson
410775ecfd [Transforms][LTO] Remove some redundant includes. NFC
No need to include CallGraphSCCPass.h from the IPO/Inliner.

Also removed the include of LegacyPassManager.h in a couple of files
that do not really depend on that header file.

Differential Revision: https://reviews.llvm.org/D148083
2023-04-13 10:12:00 +02:00
Craig Topper
4b47d875a1 [LV] Optimize trip count SCEV.
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.

The code here is based on similar code I found in
LoopIdiomRecognize.

This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D147355
2023-04-12 16:17:58 -07:00
Alexey Bataev
b28f407df9 [SLP]Improve reduction cost model for scalars.
Instead of abstract cost of the scalar reduction ops, try to use the
cost of actual reduction operation instructions, where possible. Also,
remove the estimation of the vectorized GEPs pointers for reduced loads,
since it is already handled in the tree.

Differential Revision: https://reviews.llvm.org/D148036
2023-04-12 11:32:51 -07:00
Alexey Bataev
d00158cd28 [SLP][NFC]Introduce ShuffleCostEstimator and adjustExtracts member function.
Added ShuffleCostEstimator class and the first adjustExtracts member,
which is just a copy of previous AdjustExtractCost lambda.

Differential Revision: https://reviews.llvm.org/D147787
2023-04-11 12:47:07 -07:00
Florian Hahn
68afaa3f48
[LV] Use std::make_optional to fix build failure after 082a0046.
Some compilers require std::make_optional(std::move()) to force construction
of the std::optional return value. This should fix the build failure in
  https://lab.llvm.org/buildbot#builders/67/builds/10991
2023-04-11 17:56:15 +01:00
Florian Hahn
082a004690
[VPlan] Allow building a VPlan to may fail.
Update the planning code constructing VPlan to allow building VPlans to
fail. This allows us to gradually shift some legality checks to VPlan
construction. The first candidate is checking if all users of
first-order recurrence phis can be sunk past the recipe computing the
previous value.

The new functionality will be used by D142886 which is approved and will
be landed shortly.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142885
2023-04-11 15:41:18 +01:00
Florian Hahn
f9d0b35d22
[LV] Re-use already computed runtime VF in fixFixedOrderRecurrence.
This was suggested as independent cleanup in D147472.

This removes a redundant runtime VF computation when using scalable
vectors.
2023-04-10 21:25:12 +01:00
Florian Hahn
954befe2a7
[LV] Turn check into assert in fixFixedOrderRecurrence (NFCI).
Suggested as independent cleanup in D147567. Either VF or UF need to be
> 1. Note that if the condition would be false, the code below would use
a nullptr and crash.
2023-04-10 21:11:41 +01:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Florian Hahn
c255eb2c4b
[VPlan] Use VPLiveOut to update FOR live-out users.
Instead of iterating over all LCSSA phis in the exit block, collect all
LiveOut users of the FOR splice VPInstruction and only update those
users.

Building on top of D147471, this removes an access to the cost model
after VPlan execution.

Depends on D147471.

Reviewed By: Ayal, michaelmaitland

Differential Revision: https://reviews.llvm.org/D147472
2023-04-10 13:02:44 +01:00
Florian Hahn
0dbcbfe0d0
[VPlan] Don't assign slots for external defs (NFCI).
External defs are VPValues wrapping an IR value and hence will get
printed as ir<>. We don't need to assign a slot for a VPValue number.
2023-04-09 21:01:21 +01:00
Florian Hahn
620e011a25
[VPlan] Don't add live-outs if scalar epilogue is required.
Instead of clearing live outs when a scalar epilogue is required late,
don't add live outs during VPlan construction if a scalar epilogue is
required.

This enables more VPlan-based DCE (if the live out would be the only
user in the plan) and is a step towards removing an access of the cost
model in fixedVectorizedLoop (which is after VPlan execution).

Depends on D147468.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147471
2023-04-09 09:18:24 +01:00
Florian Hahn
c7a34d355a
[VPlan] Require VFRange.End to be a power-of-2. (NFCI)
This removes the need to convert the end of the range to the next
power-of-2 for the end iterator after 4bd3fda5124962 and was suggested
as follow-up TODO in D147468.
2023-04-08 13:04:08 +01:00
Florian Hahn
4bd3fda512
[VPlan] Add VFRange::begin() and end() iterators. (NFCI)
Add an iterator to iterate over all VFs in VFRange. This simplifies some
existing code and allows using all_of,any_of and none_of on a VFRange.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147468
2023-04-08 10:22:25 +01:00
Alexey Bataev
85327f307b [SLP][NFC]Make clusterSortPtrAccesses static. 2023-04-07 13:24:24 -07:00
Alexey Bataev
6ff177d928 [SLP][NFC]Improve SLP time by precomputing value<->gather nodes
dependencies.

Improved compiled time by the precomputing the mapping between gathered
scalars and their gather/buildvector nodes for later use in
isGatherShuffledEntry to avoid recomputing this map each time this
function is called.
2023-04-07 12:12:02 -07:00
Florian Hahn
11896357d4
[VPlan] Add VPInterleaveRecipe::NeedsMaskForGaps field (NFCI).
This patch adds a NeedsMaskForGaps field to VPInterleaveRecipe to record
whether a mask for gaps is needed. This removes a dependence on the cost
model in VPlan code-generation.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147467
2023-04-07 13:11:03 +01:00
Alexey Bataev
52dd72a37a [SLP][NFC]Make adjustExtracts/needToDelay members of ShuffleInstructionBuilder.
Make adjustExtracts/needToDelay lambdas members of ShuffleInstructionBuilder to allow to overload them later for cost model.

Differential Revision: https://reviews.llvm.org/D147730
2023-04-06 16:27:19 -07:00
Michael Maitland
e86ed9bf2a [LV][NFC] Improve complexity of fixing users of recurrences
The original loop has O(MxN) since `is_contained` iterates over
all incoming values. This change makes it so only the phis
which use the value as an incoming value are iterated over so
it is now O(M).

Differential Revision: https://reviews.llvm.org/D146999
2023-04-06 16:15:51 -07:00
Florian Hahn
3f36b9b456
[LV] Move conditional MaskForGaps construction to load case.
Conditionally setting MaskForGaps is only needed for loads. This avoid
re-computing MaskForGaps for stores.

Suggested as independent cleanup in D147467.
2023-04-06 21:16:37 +01:00
Alexey Bataev
e58a49300e [SLP][NFC]Evaluate FMF for reductions before the loop, no need to
reevaluate it.
2023-04-06 11:57:20 -07:00
Alexey Bataev
50af6ab0ab [SLP]Fix emission of the masks in shuffles for undefs.
If the value is used in the expression, need to adjust the mask before
applying the mask. Plus, need to fix the analysis of the phi nodes for
reused scalars.
2023-04-06 10:16:58 -07:00
Alexey Bataev
cf62adbbd8 [SLP]Fix delete of the extractelement with users.
Made the condition for the erasing of the gathered extractelements
stricter, remove it only if it has single vectorized use, otherwise
leave it for instcombiner/instsimplify analysis.
2023-04-06 09:15:30 -07:00
Philip Reames
92aae9e725 [LV] Remove a cover function with a single use [nfc]
And more importantly, move the fixme to the sole caller where it actually makes sense in context.
2023-04-06 08:27:57 -07:00
David Sherwood
9278dd7b2b [LoopVectorize] Fix zext/sext cost calculations when types are shrunk
In getInstructionCost if we know a zext/sext is going to be shrunk
we should only be changing the destination type, and leave the
source type unchanged. For example, we may change a zext from

  zext <16 x i8> %a to <16 x i32>

to

  zext <16 x i8> %a to <16 x i16>

However, we were previously calculating the cost for doing

  zext <16 x i16> %a to <16 x i16>

which is incorrect.

Differential Revision: https://reviews.llvm.org/D147152
2023-04-06 08:52:25 +00:00
David Green
28c8616a5b [LV] Cleanup and reformatting for some debug messages. NFC
This is just some cleanup of various debug messages, pulled out of another
patch to simplify it a little.
2023-04-05 17:50:01 +01:00
Alexey Bataev
40105a9933 [SLP]Find reused scalars in buildvector sequences, if any.
Patch generalizes analysis of scalars. The main part is outlined into
lambda, which can be used to find reused inserted scalars and emit
shuffle for them instead of multiple insertelement instructions, if the
permutation is found alreadyi. I.e. some scalars are transformed by the
permutation of previously vectorized nodes, and some are inserted
directly.

Reworked part of D110978

Differential Revision: https://reviews.llvm.org/D146564
2023-04-05 09:37:05 -07:00
Philip Reames
c416f6700f [IVDescriptors] Add pointer InductionDescriptors with non-constant strides (try 2)
(JFYI - This has been heavily reframed since original attempt at landing.)

This change updates the InductionDescriptor logic to allow matching a pointer IV with a non-constant stride, but also updates the LoopVectorizer to bailout on such descriptors by default. This preserves the default vectorizer behavior.

In review, it was pointed out that there's multiple unfortunate performance implications which need to be addressed before this can be enabled. Having a flag allows us to exercise the behavior, and write test cases for logic which is otherwise unreachable (or hard to reach).

This will also enable non-constant stride pointer recurrences for other consumers. I've audited said code, and don't see any obvious issues.

Differential Revision: https://reviews.llvm.org/D147336
2023-04-05 09:32:35 -07:00
Jie Fu
ae5f049378 [Transforms] Fix -Wunused-function for 'GetReplicateRegion' with -DLLVM_ENABLE_ASSERTIONS=OFF (NFC)
/Users/jiefu/llvm-project/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:614:23: error: unused function 'GetReplicateRegion' [-Werror,-Wunused-function]
static VPRegionBlock *GetReplicateRegion(VPRecipeBase *R) {
                      ^
1 error generated.
2023-04-05 22:34:42 +08:00
Florian Hahn
c18bc7f7fe
[VPlan] Replace check for replicate regions with assert (NFCI).
After recent changes, replication regions only get introduced later, so
there's no need to check for them.
2023-04-05 14:29:24 +01:00
Graham Hunter
185863f7de [LV] Use available masked vector function variants when required
LLVM has the ability to vectorize using function variants that require
a mask by creating an all-true mask, and to vectorize a conditional
call via scalarization, now we want to join the two parts together
and use a masked variant when a mask is required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D136251
2023-04-05 11:18:38 +01:00
David Sherwood
b4089cfa2f [NFC][LoopVectorize] Simplify preferPredicateOverEpilogue interface
Given just how many arguments we pass to
preferPredicateOverEpilogue and considering this list may
grow over time I've decided to pass in a pointer to a new
TailFoldingInfo structure instead, similar to what we do
with IntrinsicCostAttributes, etc. In addition, many of the
arguments we pass in are actually available in the
LoopVectorizationLegality class so I've managed to
reduce the set of pointers that we need to pass in the
TailFoldingInfo struct.

Differential Revision: https://reviews.llvm.org/D146127
2023-04-04 14:00:49 +00:00
Philip Reames
f6b217c7cb [LV] Remmove unused default argument to isLegalGatherOrScatter [nfc] 2023-04-03 11:03:35 -07:00
Alexey Bataev
c1660006b2 [SLP]Reorder counters for same values, if the root node is reordered.
The counters for the repeated scalars are ordered in the natural order,
but the original scalars might be reordered during SLP graph reordering
and this order can be dropped. Need to use the scalars after the
reordering, not the original ones, to emit correct code for same value
counters.
2023-04-03 07:52:49 -07:00