3584 Commits

Author SHA1 Message Date
Florian Hahn
f61c9b7569
[SLP] Fix infinite loop in isUndefVector.
This fixes an infinite loop if isa<T>(II->getOperand(1)) is true.
Update Base at the top of the loop, before the continue.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D144292
2023-02-19 21:42:24 +00:00
Florian Hahn
7737c05696
[VPlan] Make sure properlyDominates(A, A) returns false.
At the moment, properlyDominates(A, A) can return true via
LocalComesBefore. Add an early exit to ensure it returns false if
A == B.

Note: no test has been added because the existing test suite covers this
case already with libc++ with assertions enabled.

Fixes https://github.com/llvm/llvm-project/issues/60850.
2023-02-19 18:01:16 +00:00
Alexey Bataev
e03d254bbd [SLP]Do not reduce repeated values, use scalar red ops instead.
Metric: size..text

                                                     size..text                 results     results0    diff
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test      445.00      461.00  3.6%
SingleSource/Benchmarks/Adobe-C++/loop_unroll.test                               428477.00   428445.00 -0.0%
External/SPEC/CFP2006/447.dealII/447.dealII.test                                 618849.00   618785.00 -0.0%

For all tests some extra code was optimized, GCC-C-execute has some more
inlining after

Differential Revision: https://reviews.llvm.org/D132261
2023-02-17 07:19:35 -08:00
Florian Hahn
a3d1de3e29
[LV] Move invalid cost remark code to separate function (NFC).
The code only needs access to INvalidCosts, ORE and TheLoop, so it can
easily be moved into a helper to make selectVectorizationFactor more
compact.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D143957
2023-02-16 11:28:19 +00:00
Kazu Hirata
7e6e636fb6 Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.
2023-02-15 22:17:27 -08:00
Hongtao Yu
eddec9de44 [Pseudo probe] Duplicate probes in vectorized loop body.
Prevoius pseudo probes were dropped out of a vectorized loop body during loop vectorization. This can result in the samples of the loop entry is used for the loop body, which in turn can cause undercounting of the loop iteration count. The undercounting can further prevent the loop from being vectorized in the next build. I'm fixing this by explicting allowing pseudo probes to be kept in the vectorized loop body, and by claiming a probe instruction is not "uniform", the vectorizer will duplicate it by the number of vector lanes.

For one internal service, I'm seeing the change causes the size increase of the .pseudoprobe section by 0.7%, which should count around 0.2% of the whole binary size.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D144066
2023-02-15 10:18:08 -08:00
Graham Hunter
0fa5df1959 [LV] Synthesize all true masks for masked vector function variants
When vectorizing code with function calls in it, if we encounter
a function which only has vectorized variants requiring a mask
we can synthesize an all-true mask to enable us to proceed.

Since we want the mask to be represented in vplan, the pointer
to the chosen Function is now stored as part of the
VPWidenCallRecipe, and mask arguments are added at the
appropriate index to the recipe operands.

Reviewed By: david-arm, fhahn, reames

Differential Revision: https://reviews.llvm.org/D132458
2023-02-14 14:33:18 +00:00
Fangrui Song
1e6921131a Move global namespace cl::opt inside llvm:: 2023-02-14 00:09:44 -08:00
Florian Hahn
807d43239a
[VPlan] Use properlyDominates predicate for ordering FOR users.
The current implementation may return true for A < B and B < A, which
may cause issues if the sort implementation assures this property of the
comperator. This should fix a crash with MSVC.
2023-02-13 21:24:58 +00:00
Florian Hahn
af3c25dc3d
[VPlan] Fix iterator invalidation in adjustFixedOrderRecurrences.
adjustFixedOrderRecurrences may insert instructions after immediately
after the PHI nodes in the block. This invalidates the phis() iterator.
To avoid crashing/accessing invalid recipes, first collect all
first-order recurrence phi recipes.

This should fix a crash reported by @dmgreen after D142589 landed.
2023-02-13 13:51:14 +00:00
Florian Hahn
2e6430666c
[LV] Update recipe builder functions to pass VPlan directly (NFC).
Passing VPlanPtr requires a dereference of std::unique_ptr on each
access, which is unnecessary. Just pass the plan by reference.
2023-02-12 22:35:14 +00:00
Sanjay Patel
af39acda88 [VectorCombine] fix insertion point of shuffles
As shown in issue #60649, the new shuffles were
being inserted before a phi, and that is invalid.

It seems like most test coverage for this fold
(foldSelectShuffle) lives in the AArch64 dir,
but this doesn't repro there for a base target.
2023-02-10 10:57:11 -05:00
Sander de Smalen
5a115452c4 Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part,
e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292',
which was obviously wrong.

Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this.

This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.
2023-02-09 09:42:29 +00:00
Florian Hahn
c83fdc905a
[LV] Perform recurrence sinking directly on VPlan.
This patch updates LV to sink recipes directly using the VPlan use
chains. The initial patch only moves sinking to be purely VPlan-based.
Follow-up patches will move legality checks to VPlan as well.

At the moment, there's a single test failure remaining.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142589
2023-02-08 15:49:29 +00:00
Sander de Smalen
1f01cdda68 Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices."
This patch causes a regression, so reverting it while I investigate the issue.

This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.
2023-02-08 15:46:52 +00:00
Florian Hahn
32efff591a
[VPlan] Mark load VPWidenMemoryInstruction as not having side-effects.
Also add an assert using the underlying instruction to catch any
potential violations.
2023-02-07 22:02:50 +00:00
Arthur Eubanks
15977742d3 Reland [LegacyPM] Remove some legacy passes
These are part of the optimization pipeline, of which the legacy pass manager version is deprecated.

Namely
* Internalize
* StripSymbols
* StripNonDebugSymbols
* StripDeadDebugInfo
* StripDeadPrototypes
* VectorCombine
* WarnMissedTransformations

Fixed previously failing ocaml tests (one of them seems to already be failing?)
2023-02-07 12:56:05 -08:00
Arthur Eubanks
1b254022b2 Revert "[LegacyPM] Remove some legacy passes"
This reverts commit a4b4f62beb0bf40123181e5f5bdf32ef54f87166.

Ocaml bindings tests failing.
2023-02-07 10:17:45 -08:00
Arthur Eubanks
a4b4f62beb [LegacyPM] Remove some legacy passes
These are part of the optimization pipeline, of which the legacy pass manager version is deprecated.

Namely
* Internalize
* StripSymbols
* StripNonDebugSymbols
* StripDeadDebugInfo
* StripDeadPrototypes
* VectorCombine
* WarnMissedTransformations
2023-02-07 09:57:48 -08:00
Sander de Smalen
e6eb84a191 [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
This is specifically relevant for loops that vectorize using a scalable VF,
where the code results in:

  %vscale = call i32 llvm.vscale.i32()
  %vf.part1 = mul i32 %vscale, 4
  %gep = getelementptr  ..., i32 %vf.part1

Which InstCombine then changes into:

  %vscale = call i32 llvm.vscale.i32()
  %vf.part1 = mul i32 %vscale, 4
  %vf.part1.zext = sext i32 %vf.part1 to i64
  %gep = getelementptr  ..., i32 %vf.part1.zext

D143016 tried to remove these extends, but that only works when
the call to llvm.vscale.i32() has a single use. After doing any
kind of CSE on these calls the combine no longer kicks in.

It seems more sensible to ask DataLayout what type to use, rather
than relying on InstCombine to insert the extend and hoping it can
fold it away.

I've only changed this for indices that are not constant, because
I vaguely remember there was a reason for sticking with i32. It
would also mean patching up loads more tests.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D143267
2023-02-07 11:47:51 +00:00
Simon Pilgrim
552e27c521 [SLP] Use allConstant helper. NFCI. 2023-02-05 19:21:48 +00:00
Sander de Smalen
005311399e [LoopVectorize][TTI] NFCI: Clarify enum for the tail folding style.
This NFC (intended) patch has several small changes:
* It renames PredicationStyle to TailFoldingStyle.
* It renames TTI.emitActiveLaneMask() to TTI.getPreferredTailFoldingStyle()
* Simplifies some of its uses in the LoopVectorizer

Rationale: To my surprise PredicationStyle::None did not mean 'no
predication', but rather 'no active lane mask intrinsic', such that the
predicate is created using a splat + compare with stepvector. The enum is
also highly specific to tail folding, so it seems better to name this
around that feature, i.e. 'tail folding style'.

This also makes it more amenable to extend it to other tail folding styles,
such as the one added in D142109.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D142887
2023-02-03 14:59:57 +00:00
Florian Hahn
cf2d436b31
[VPlan] VPPredInstPHIRecipe does not read from memory.
VPPredInstPHIRecipe just merges the incoming values and does not write
to memory.
2023-01-31 21:51:03 +00:00
Florian Hahn
5368536cf1
[VPlan] VPPredInstPHIRecipes does not write to memory.
VPPredInstPHIRecipe just merges the incoming values and does not write
to memory.
2023-01-30 10:29:27 +00:00
Kazu Hirata
526966d07d Use llvm::bit_ceil (NFC)
Note that:

  std::has_single_bit(X) ? X : llvm::NextPowerOf2(X);

is equivalent to:

  std::bit_ceil(X)

even for input 0.
2023-01-28 16:13:09 -08:00
Kazu Hirata
f20b5071f3 [llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC) 2023-01-28 09:06:31 -08:00
Florian Hahn
b6b3d20d06
[VPlan] Use VPDominatorTree in VPlanVerifier .
Use VPDominatorTree to generalize def-use verification.

Depends on D140513.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140514
2023-01-25 16:32:40 +00:00
Florian Hahn
bf9e0da1a5
[VPlan] Switch default graph traits to be recursive, update VPDomTree.
This updates the GraphTraits specialization for VPBlockBase to recurse
through VPRegionBlocks.

This in turn enables using VPDominatorTree to query dominance between
any block in a plan. This should enable additional use cases, including
improvements to def-use verification and porting IR-based transforms
that rely on the dominator tree.

Specifically, this change means that for regions, the entry and exit
blocks dominate the successors of the region.

Depends on D140512 and D142162.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140513
2023-01-23 14:00:43 +00:00
Florian Hahn
31d46ca8aa
[Dominators] Introduce DomTreeNodeTraits to allow customization. (NFC)
This patch introduces DomTreeNodeTraits for customization. Clients can implement
DomTreeNodeTraitsCustom to provide custom ParentPtr, getEntryNode and getParent.
There's also a default specialization if DomTreeNodeTraitsCustom is not implemented,
that assume a Function-like NodeT. This is what is used for the existing DominatorTree
and MachineDominatorTree.

The main motivation for this patch is using DominatorTreeBase across all
regions of a VPlan, see D140513.

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D142162
2023-01-22 20:22:41 +00:00
Florian Hahn
fb40c34b8f
[VPlan] Consider all recipes in replicate blocks as sink candidates.
Update sinkScalarOperands to consider all operands of recipes in
replicate blocks as sink candidates This enables additional sinking
opportunities and is another step towards retiring LLVM IR-based
sinkScalarOperands.

This enables iterative sinking of operands for successive calls of
sinkScalarOperands.

Depends on D139788.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139790
2023-01-21 17:14:13 +00:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Alexey Bataev
9bdcf8778a [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.
The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141512
2023-01-19 13:46:25 -08:00
Florian Hahn
e2c43a547b
[VPlan] Add vp_depth_first_deep (NFC)
Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to
make existing code clearer and more compact.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142055
2023-01-19 20:34:23 +00:00
Florian Hahn
655c88ca36
[VPlan] Add vp_depth_first_shallow + graph traits for wrapper(NFC)
This patch adds a new VPBlockShallowTraversalWrapper struct to
provide graph traits specialization that do not traverse through
VPRegionBlocks. This matches the behavior of the existing traits for
plain VPBlockBase and is a step before moving the graph traits for
VPBlockBase to traverse through VPRegionBlocks to enable cross region
support in VPDominatorTree.

Depends on D140511.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140512
2023-01-19 12:07:27 +00:00
Florian Hahn
feee22db52
[VPlan] Disconnect VPRegionBlock from successors in graph iterator(NFCI)
This updates the VPAllSuccessorsIterator to not connect the
VPRegionBlock itself to its successors. The successors are connected to
the exit block of the region. At the moment, this doesn't change any
exisint functionality.

But the new schema ensures the following property when used for
VPDominatorTree:

1. Entry & exit blocks of regions dominate the successors of the region.

This allows for convenient checking of dominance between defs and uses
that are not defined in the same region. I will share a follow-up patch
to use it for the VPDominatorTree soon.

Depends on D140500.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140511
2023-01-18 15:02:41 +00:00
Florian Hahn
22c9f4cf2d
[VPlan] Replace VPInterleaveRecipe::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-18 14:23:22 +00:00
Florian Hahn
f615de7e26
[VPlan] Replace VPBranchOnMaskSC::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-18 12:14:58 +00:00
Florian Hahn
cdd8fcdbd7
[VPlan] Replace VPExpandSCEVRecipe::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-17 21:11:33 +00:00
Florian Hahn
bf1ba6bb52
[VPlan] Replace VPScalarIVStepsRecipe::classof with VP_CLASSOF_IMPL(NFC) 2023-01-17 20:53:14 +00:00
Florian Hahn
d47bdae28e
[VPlan] Remove duplicated VPValue IDs (NFCI).
At the moment, both VPValue and VPDef have an ID used when casting via
classof. This duplication is cumbersome, because it requires adding IDs
for new recipes twice and also requires setting them twice. In a few
cases, there's only a VPDef ID and no VPValue ID, which can cause same
confusion.

To simplify things, remove the VPValue IDs for different recipes.
Instead, only retain the generic VPValue ID (= used VPValues without a
corresponding defining recipe) and VPVRecipe for VPValues that are
defined by recipes that inherit from VPValue.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140848
2023-01-17 15:11:38 +00:00
Florian Hahn
c95138392a
[VPlan] Remove unnecessary getNumSuccessors call (NFC).
If ParentWithSuccs is nullptr, the number of successors is guaranteed to
be 0. Simplify the code as suggested by @Ayal in D140511.
2023-01-17 11:44:50 +00:00
Florian Hahn
133f017479
[VPlan] Remove unneeded VPUser::classof(const VPDef *) (NFC).
This specialization is not needed any longer as VPRecipeBase inherits
from VPUser and getDefiningRecipe returns a VPRecipeBase.
2023-01-17 09:08:33 +00:00
Joe Loser
a288d7f937 [llvm][ADT] Replace uses of makeMutableArrayRef with deduction guides
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.

Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.

Differential Revision: https://reviews.llvm.org/D141814
2023-01-16 14:49:37 -07:00
Florian Hahn
56ffd39c3d
[VPlan] Use VPDef prefix for VPDef IDs instead of VPRecipeBase (NFC).
Various places in the code where still using the VPRecipeBase:: prefix
for VPDef IDs or not prefix at all. Now that the VPDef IDs have been
moved to VPDef, use this prefix instead and consistently use it.
2023-01-16 10:23:52 +00:00
Florian Hahn
2b054d5dd4
[VPlan] Use to_vector when iterating over a temporary vector. (NFC) 2023-01-13 17:51:00 +00:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
Guillaume Chatelet
48f5d77eee [NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:36:39 +00:00
Miguel Saldivar
3c5e0d87f8
[LoopVectorize] Clear cache of LoopAccessInfoManager
LAI is cached during the LoopDistribute pass, and is later re-used during LoopVectorize. The problem is that LoopVectorize changes SCEV, and the cached LAI does not get updated. Hence, when re-using the cached LAI, it references an invalid SCEV.

Fixes #59319

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D139601
2023-01-11 09:03:40 +00:00
Valery N Dmitriev
fd7273359a [SLP] Do not ignore ordering for root node when it has in-tree uses.
When rooted with PHIs, a vectorization tree may have another node with PHIs
which have roots as their operands. We cannot ignore ordering information
for root in such a case.

Differential Revision: https://reviews.llvm.org/D141309
2023-01-10 10:12:51 -08:00
Alexey Bataev
755282ec1e [SLP][NFC]Move getExtractIndex function for future changes, NFC. 2023-01-09 09:53:01 -08:00