3553 Commits

Author SHA1 Message Date
Alexey Bataev
9bdcf8778a [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.
The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141512
2023-01-19 13:46:25 -08:00
Florian Hahn
e2c43a547b
[VPlan] Add vp_depth_first_deep (NFC)
Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to
make existing code clearer and more compact.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142055
2023-01-19 20:34:23 +00:00
Florian Hahn
655c88ca36
[VPlan] Add vp_depth_first_shallow + graph traits for wrapper(NFC)
This patch adds a new VPBlockShallowTraversalWrapper struct to
provide graph traits specialization that do not traverse through
VPRegionBlocks. This matches the behavior of the existing traits for
plain VPBlockBase and is a step before moving the graph traits for
VPBlockBase to traverse through VPRegionBlocks to enable cross region
support in VPDominatorTree.

Depends on D140511.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140512
2023-01-19 12:07:27 +00:00
Florian Hahn
feee22db52
[VPlan] Disconnect VPRegionBlock from successors in graph iterator(NFCI)
This updates the VPAllSuccessorsIterator to not connect the
VPRegionBlock itself to its successors. The successors are connected to
the exit block of the region. At the moment, this doesn't change any
exisint functionality.

But the new schema ensures the following property when used for
VPDominatorTree:

1. Entry & exit blocks of regions dominate the successors of the region.

This allows for convenient checking of dominance between defs and uses
that are not defined in the same region. I will share a follow-up patch
to use it for the VPDominatorTree soon.

Depends on D140500.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140511
2023-01-18 15:02:41 +00:00
Florian Hahn
22c9f4cf2d
[VPlan] Replace VPInterleaveRecipe::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-18 14:23:22 +00:00
Florian Hahn
f615de7e26
[VPlan] Replace VPBranchOnMaskSC::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-18 12:14:58 +00:00
Florian Hahn
cdd8fcdbd7
[VPlan] Replace VPExpandSCEVRecipe::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-17 21:11:33 +00:00
Florian Hahn
bf1ba6bb52
[VPlan] Replace VPScalarIVStepsRecipe::classof with VP_CLASSOF_IMPL(NFC) 2023-01-17 20:53:14 +00:00
Florian Hahn
d47bdae28e
[VPlan] Remove duplicated VPValue IDs (NFCI).
At the moment, both VPValue and VPDef have an ID used when casting via
classof. This duplication is cumbersome, because it requires adding IDs
for new recipes twice and also requires setting them twice. In a few
cases, there's only a VPDef ID and no VPValue ID, which can cause same
confusion.

To simplify things, remove the VPValue IDs for different recipes.
Instead, only retain the generic VPValue ID (= used VPValues without a
corresponding defining recipe) and VPVRecipe for VPValues that are
defined by recipes that inherit from VPValue.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140848
2023-01-17 15:11:38 +00:00
Florian Hahn
c95138392a
[VPlan] Remove unnecessary getNumSuccessors call (NFC).
If ParentWithSuccs is nullptr, the number of successors is guaranteed to
be 0. Simplify the code as suggested by @Ayal in D140511.
2023-01-17 11:44:50 +00:00
Florian Hahn
133f017479
[VPlan] Remove unneeded VPUser::classof(const VPDef *) (NFC).
This specialization is not needed any longer as VPRecipeBase inherits
from VPUser and getDefiningRecipe returns a VPRecipeBase.
2023-01-17 09:08:33 +00:00
Joe Loser
a288d7f937 [llvm][ADT] Replace uses of makeMutableArrayRef with deduction guides
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.

Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.

Differential Revision: https://reviews.llvm.org/D141814
2023-01-16 14:49:37 -07:00
Florian Hahn
56ffd39c3d
[VPlan] Use VPDef prefix for VPDef IDs instead of VPRecipeBase (NFC).
Various places in the code where still using the VPRecipeBase:: prefix
for VPDef IDs or not prefix at all. Now that the VPDef IDs have been
moved to VPDef, use this prefix instead and consistently use it.
2023-01-16 10:23:52 +00:00
Florian Hahn
2b054d5dd4
[VPlan] Use to_vector when iterating over a temporary vector. (NFC) 2023-01-13 17:51:00 +00:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
Guillaume Chatelet
48f5d77eee [NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:36:39 +00:00
Miguel Saldivar
3c5e0d87f8
[LoopVectorize] Clear cache of LoopAccessInfoManager
LAI is cached during the LoopDistribute pass, and is later re-used during LoopVectorize. The problem is that LoopVectorize changes SCEV, and the cached LAI does not get updated. Hence, when re-using the cached LAI, it references an invalid SCEV.

Fixes #59319

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D139601
2023-01-11 09:03:40 +00:00
Valery N Dmitriev
fd7273359a [SLP] Do not ignore ordering for root node when it has in-tree uses.
When rooted with PHIs, a vectorization tree may have another node with PHIs
which have roots as their operands. We cannot ignore ordering information
for root in such a case.

Differential Revision: https://reviews.llvm.org/D141309
2023-01-10 10:12:51 -08:00
Alexey Bataev
755282ec1e [SLP][NFC]Move getExtractIndex function for future changes, NFC. 2023-01-09 09:53:01 -08:00
Benjamin Kramer
b6942a2880 [NFC] Hide implementation details in anonymous namespaces 2023-01-08 17:37:02 +01:00
Florian Hahn
78914e8c32
[VPlan] Keep entries in worklist in sinkScalarOperands.
Not removing the entries ensures that duplicates are avoided,
reducing the number of iterations.
2023-01-08 15:52:57 +00:00
Alexey Bataev
996ad44b97 [SLP][NFC]Fix compile build by declaring ArrayRef, NFC.
Fix compiler build reported in https://lab.llvm.org/buildbot#builders/243/builds/218
2023-01-06 17:01:48 -08:00
Alexey Bataev
cc17e93178 [SLP][NFC]Remove unused variables, NFC. 2023-01-06 16:55:54 -08:00
Alexey Bataev
7439e1b2de [SLP]Fix incorrect reordering of clustered scalars.
The new mask represents the order, not the mask itself. At first, need
to treat as the order, convert to mask and only after that reorder
gathered scalars to build correct clustered order.

Differential Revision: https://reviews.llvm.org/D141161
2023-01-06 16:04:09 -08:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Florian Hahn
68469a80cb
[LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
Valery N Dmitriev
6d677c0b3d [SLP] Unify GEP cost modeling for load, store and GEP nodes.
Make a separate routine for GEPs cost calculation and make
the approach uniform across load, store and GEP tree nodes.
Additional issue fixed is GEP cost savings were applied twice
for ScatterVectorize nodes (aka gather load) making them look
unrealistically profitable for vectorization.

Differential Revision: https://reviews.llvm.org/D140789
2023-01-05 10:11:36 -08:00
serge-sans-paille
38818b60c5
Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
David Green
586fd86b0a [LoopVectorizer] Fix inloop reductions mask placement
The validation of vplans could fail if an inloop reduction was created
with a block-in mask that did not dominate the reduction. This makes
sure that the insert point is set when creating the mask, to ensure it
dominates the reduction.

Differential Revision: https://reviews.llvm.org/D141003
2023-01-05 11:37:37 +00:00
Augie Fackler
0676156f81 Revert "[VPlan] Also consider operands of sink candidates in same block."
This reverts commit aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d.

Previously-valid IR from a tensorflow test case (as shown on the
Diffusion revision for aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d) started
hanging in the loop-vectorize pass. Reverting to keep everyone working.
2023-01-04 16:17:13 -05:00
Alexey Bataev
a1b18946f9 [SLP]Fix incorrect shuffle results because of missing shuffle mask
analysis.

Missed the analysis of the shuffle mask when trying to analyze the
operands of the shuffle instruction during peeking through shuffle
instructions.
2023-01-04 13:10:40 -08:00
Dinar Temirbulatov
55c600819f [SLP][AArch64] Incorrectly estimated intrinsic as a function call.
We incorrectly assume intrinsic as a function call and it prevents us from
the opportunity to vectorize. On Aarch64 Cortex-A53 we think that
llvm.fmuladd.f64 is a function call which is wrong.

Differential Revision: https://reviews.llvm.org/D140392
2023-01-03 19:45:24 +00:00
Alexey Bataev
26fec4e845 [SLP]Fix crash on casting non-instruction extractelement.
Need to check if the extractelement operation is an extraction before
trying to move it around the buildblocks to avoid crash on cast.
2023-01-03 09:45:57 -08:00
Florian Hahn
ce1be13a86
[VPlan] Use VP_CLASSOF_IMPL for VPWidenCanonicalIVRecipe(NFC).
Replace VPWidenCanonicalIVRecipe::classof implementation with general
VP_CLASSOF_IMPL.
2023-01-02 17:52:13 +00:00
Florian Hahn
64f1d845b3
[VPlan] Use VP_CLASSOF_IMPL for VPWidenMemoryInstructionRecipe (NFC).
Replace VPWidenMemoryInstructionRecipe ::classof implementation with general
VP_CLASSOF_IMPL.
2023-01-02 17:32:31 +00:00
Florian Hahn
2d6d47f807
[VPlan] Use VP_CLASSOF_IMPL for VPPredInstPHI (NFC).
Replace VPPredInstPHI::classof implementation with general
VP_CLASSOF_IMPL.
2023-01-02 17:22:34 +00:00
Florian Hahn
89718815c6
[VPlan] Adjust mergeReplicateRegions to be in line with mergeBlock (NFC)
Adjust mergeReplicateRegions to be in line with
mergeBlocksIntoPredecessors added in 36d70a6aea6b by collecting only the
valid candidates first.

Also rename to mergeReplicateRegionsIntoSuccessors and add missing
doc-comment.

This addresses post-commit suggestions by @Ayal.
2023-01-01 19:48:49 +00:00
Florian Hahn
cd16a3f04c
[VPlan] Move GraphTraits definitions to separate header (NFC).
This reduces the size of VPlan.h and avoids future growth of the file
when the graph traits are extended in future patches.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140500
2022-12-31 15:14:57 +00:00
Florian Hahn
aa2414729e
[VPlan] Also consider operands of sink candidates in same block.
Even if the the sink candidate is already in the target block, its
operands can be candidates for sinking. Queue them up as well. Also
moves the queuing logic to a helper.
2022-12-30 18:24:35 +00:00
Alexey Bataev
5dccea5a68 [SLP]Do not emit many extractelements, reuse the single one emitted.
We do not need to emit many extractelements for each particular use, we
can reuse the only one, just need to adjust it to make it dominate on
all uses.

Differential Revision: https://reviews.llvm.org/D140580
2022-12-30 06:38:06 -08:00
Valery N Dmitriev
ad956ed568 [SLP] Fix debug print for cost in tryToVectorizeList - NFC.
Actual VF was confused with local variable named "VF".
2022-12-29 11:30:10 -08:00
Valery N Dmitriev
8eb3698b94 [SLP] A couple of minor improvements for slp graph view - NFC.
Show ScatterVectorize nodes in frames of blue color
and print vectorize tree indices.
2022-12-29 11:02:36 -08:00
Alexey Bataev
ac01ae71f0 [SLP]Use ShuffleInstructionBuilder for vector shrinking.
We can use ShuffleInstructionBuilder now for shrinking shuffle emission.
It allows to remove extra shuffle from the emitted code and reuse
original vector.

Part of D110978

Differential Revision: https://reviews.llvm.org/D140499
2022-12-28 06:09:04 -08:00
Michael Maitland
396b0b2b13 [LV] Remove duplicate name set of vector header basic block. NFC
The preheader was named explicitly in 256c6b0ba14e8a7ab6373b61b7193ea8c0a3651c
which makes setting the name in prior commit 95b2aa511eea1f31e183a2a3aed4d2aa852d089c
unnecessary.

Differential Revision: https://reviews.llvm.org/D140246
2022-12-27 17:19:08 -08:00
Florian Hahn
e91e62db14
[LV] Sink scalar operands and merge regions repeatedly.
Merging regions can enable new sinking opportunities (e.g. if users of a
scalar value are moved from different VPBBs into the same VPBB). Sinking
in turn can also enable new merging opportunities (e.g. if a recipe
between to merge-able regions is moved.

To enable more sinking opportunities, repeat sinking & merging if
regions could be merged.

Also fix mergeReplicateRegions to return the correct Changed status.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139788
2022-12-27 18:08:32 +00:00
Alexey Bataev
a9b052e2ef [SLP]Fix PR59693: Do not crash trying to set insert point for buildvector
of extractvalues.

No need to get the last instruction only for vectorized extractvalues,
for gathered(buildvector sequence) still need to get the insertion
  point.
2022-12-27 06:01:38 -08:00
Florian Hahn
36d70a6aea
[VPlan] Remove redundant blocks by merging them into predecessors.
Add and run VPlan transform to fold blocks with a single predecessor
into the predecessor. This remove redundant blocks and addresses a TODO
to replace special handling for the vector latch VPBB.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139927
2022-12-26 22:47:09 +00:00
Florian Hahn
435e220ba6
[VPlan] Use VPBB in sinkScalarOperands directly. (NFC)
Suggested by @Ayal in D139790.
2022-12-25 21:34:59 +00:00
Florian Hahn
9758242046
[LV] Use SCEV to check if the trip count <= VF * UF.
Just comparing constant trip counts causes LV to miss cases where the
vector loop body only executes once.

The motivation for this is to remove the need for unrolling to remove
vector loop back-edges, if the body only executes once in more cases.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133017
2022-12-24 18:34:54 +00:00
Florian Hahn
e1650c8d52
[LV] Move exit cond simplification to separate transform.
This sets the stage for D133017 by moving out the code that performs
VPlan based simplifications to a separate transform that takes the
chosen VF & UF as arguments.

The main advantage is that this transform runs before any changes to
the CFG are being made. This allows using SCEV without worrying about
making queries while the IR is in an incomplete state.

Note that this patch switches the reasoning to use SCEV, but still only
simplifies loops with constant trip counts. Using SCEV here is needed to
access the backedge taken count, because the trip count IR value has not
been created yet.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D135017
2022-12-23 12:51:21 +00:00