3597 Commits

Author SHA1 Message Date
Sander de Smalen
9449deda12 [LV] NFC: Move logic to query maximum vscale to its own function.
To query the maximum value for vscale, the LV queries the vscale_range
attribute or a TTI hook. To avoid having to reimplement the same behaviour
for multiple uses (such as in D142894), it makes sense to move this code
to a separate function.
2023-02-23 15:12:35 +00:00
Jonas Paulsson
1387a13e1d [SLP] Check with target before vectorizing GEP Indices.
The target hook prefersVectorizedAddressing() already exists to check with
target if address computations should be vectorized, so it seems like this
should be used in SLPVectorizer as well.

Reviewed By: ABataev, RKSimon

Differential Revision: https://reviews.llvm.org/D144128
2023-02-23 15:31:34 +01:00
OCHyams
620a529760 [Assignment Tracking] Choose better passes for RemoveRedundantDbgInstrs call
Enabling assignment tracking without this patch, a significant amount of
additional compiler run time comes from the RemoveRedundantDbgInstrs call in
InstCombine. This patch reduces compiler run time by choosing better places to
call RemoveRedundantDbgInstrs.

In non-assignment-tracking builds, RemoveRedundantDbgInstrs is called by
InstCombine if LowerDbgDeclare makes a change (i.e. it is _sometimes_
called). In assignment tracking builds LowerDbgDeclare doesn't do anything. We
still need to clean up redundant intrinsics to avoid a large performance hit
due to the number of instructions, so the current approach is to have
InstCombine _always_ call RemoveRedundantDbgInstrs.

Instrumenting the compiler to run RemoveRedundantDbgInstrs after every pass and
dump the numbers and building CTMark/tramp3d-v4 indicates that SROA and
LoopVectorize give us a bigger bang (number removed) for buck (times pass is
run).

The compile time tracker reports that this patch reduces the number of
instructions retired building CTMark projects by an average of 1.1%.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D144483
2023-02-22 16:28:06 +00:00
Luke Lau
b02b1e0ed6 [LV][NFC] Use ElementCount for getMaxInterleaveFactor
In order to allow targets to disable interleaving for scalable vectors, pass the entire VF's ElementCount to getMaxInterleaveFactor.
This is based off of the approach used here: 8d36708507

The plan would then be to disable interleaving on scalable VFs on RISC-V in a follow up patch.
See https://reviews.llvm.org/D143723#4132349

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D144474
2023-02-22 10:15:05 +00:00
Liren Peng
529ee9750b [NFC] Use single quotes for single char output during printPipline
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144365
2023-02-22 02:35:13 +00:00
Alexey Bataev
cbcdd747e8 [SLP]Do not swap not counted extractelements.
No need to swap extractelements, which were not excluded from the list
during cost analysis. It leads to incorrect cost calculation and make
vector code more profitable than it is actually is.
2023-02-21 13:16:51 -08:00
Alexey Bataev
5f928a223e [SLP]Properly define incoming block for user PHI nodes.
MainOp of the PHI vectorizable entries contains the proper order of
incoming blocks, not the last instruction in the block.
2023-02-21 08:01:24 -08:00
Alexey Bataev
708eb1b96d [SLP]Add shuffling of extractelements to avoid extra costs/data movement.
If the scalar must be extracted and then used in the gather node,
instead we can emit shuffle instruction to avoid those extra
extractelements and vector-to-scalar and back data movement.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141940
2023-02-20 06:14:42 -08:00
Florian Hahn
c21ccebe6f
[VPlan] Use usesScalars in shouldPack.
Suggested by @Ayal as follow-up improvement in D143864.

I was unable to find a case where this actually changes generated code,
but it is a unifying code to use common infrastructure.
2023-02-20 14:11:40 +00:00
Florian Hahn
df016a9525
[VPlan] Move shouldPack outside of DEBUG ifdef.
This fixes a build failure with assertions disabled.
2023-02-20 10:53:45 +00:00
Florian Hahn
9333b97763
[VPlan] Replace AlsoPack field with shouldPack() method (NFC).
There is no need to update the AlsoPack field when creating
VPReplicateRecipes. It can be easily computed based on the VP def-use
chains when it is needed.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D143864
2023-02-20 10:28:26 +00:00
Kazu Hirata
a28b252d85 Use APInt::getSignificantBits instead of APInt::getMinSignedBits (NFC)
Note that getMinSignedBits has been soft-deprecated in favor of
getSignificantBits.
2023-02-19 23:56:52 -08:00
Kazu Hirata
f8f3db2756 Use APInt::count{l,r}_{zero,one} (NFC) 2023-02-19 22:04:47 -08:00
Florian Hahn
f61c9b7569
[SLP] Fix infinite loop in isUndefVector.
This fixes an infinite loop if isa<T>(II->getOperand(1)) is true.
Update Base at the top of the loop, before the continue.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D144292
2023-02-19 21:42:24 +00:00
Florian Hahn
7737c05696
[VPlan] Make sure properlyDominates(A, A) returns false.
At the moment, properlyDominates(A, A) can return true via
LocalComesBefore. Add an early exit to ensure it returns false if
A == B.

Note: no test has been added because the existing test suite covers this
case already with libc++ with assertions enabled.

Fixes https://github.com/llvm/llvm-project/issues/60850.
2023-02-19 18:01:16 +00:00
Alexey Bataev
e03d254bbd [SLP]Do not reduce repeated values, use scalar red ops instead.
Metric: size..text

                                                     size..text                 results     results0    diff
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-980605-1.test      445.00      461.00  3.6%
SingleSource/Benchmarks/Adobe-C++/loop_unroll.test                               428477.00   428445.00 -0.0%
External/SPEC/CFP2006/447.dealII/447.dealII.test                                 618849.00   618785.00 -0.0%

For all tests some extra code was optimized, GCC-C-execute has some more
inlining after

Differential Revision: https://reviews.llvm.org/D132261
2023-02-17 07:19:35 -08:00
Florian Hahn
a3d1de3e29
[LV] Move invalid cost remark code to separate function (NFC).
The code only needs access to INvalidCosts, ORE and TheLoop, so it can
easily be moved into a helper to make selectVectorizationFactor more
compact.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D143957
2023-02-16 11:28:19 +00:00
Kazu Hirata
7e6e636fb6 Use llvm::has_single_bit<uint32_t> (NFC)
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t>
where the argument is wider than uint32_t.
2023-02-15 22:17:27 -08:00
Hongtao Yu
eddec9de44 [Pseudo probe] Duplicate probes in vectorized loop body.
Prevoius pseudo probes were dropped out of a vectorized loop body during loop vectorization. This can result in the samples of the loop entry is used for the loop body, which in turn can cause undercounting of the loop iteration count. The undercounting can further prevent the loop from being vectorized in the next build. I'm fixing this by explicting allowing pseudo probes to be kept in the vectorized loop body, and by claiming a probe instruction is not "uniform", the vectorizer will duplicate it by the number of vector lanes.

For one internal service, I'm seeing the change causes the size increase of the .pseudoprobe section by 0.7%, which should count around 0.2% of the whole binary size.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D144066
2023-02-15 10:18:08 -08:00
Graham Hunter
0fa5df1959 [LV] Synthesize all true masks for masked vector function variants
When vectorizing code with function calls in it, if we encounter
a function which only has vectorized variants requiring a mask
we can synthesize an all-true mask to enable us to proceed.

Since we want the mask to be represented in vplan, the pointer
to the chosen Function is now stored as part of the
VPWidenCallRecipe, and mask arguments are added at the
appropriate index to the recipe operands.

Reviewed By: david-arm, fhahn, reames

Differential Revision: https://reviews.llvm.org/D132458
2023-02-14 14:33:18 +00:00
Fangrui Song
1e6921131a Move global namespace cl::opt inside llvm:: 2023-02-14 00:09:44 -08:00
Florian Hahn
807d43239a
[VPlan] Use properlyDominates predicate for ordering FOR users.
The current implementation may return true for A < B and B < A, which
may cause issues if the sort implementation assures this property of the
comperator. This should fix a crash with MSVC.
2023-02-13 21:24:58 +00:00
Florian Hahn
af3c25dc3d
[VPlan] Fix iterator invalidation in adjustFixedOrderRecurrences.
adjustFixedOrderRecurrences may insert instructions after immediately
after the PHI nodes in the block. This invalidates the phis() iterator.
To avoid crashing/accessing invalid recipes, first collect all
first-order recurrence phi recipes.

This should fix a crash reported by @dmgreen after D142589 landed.
2023-02-13 13:51:14 +00:00
Florian Hahn
2e6430666c
[LV] Update recipe builder functions to pass VPlan directly (NFC).
Passing VPlanPtr requires a dereference of std::unique_ptr on each
access, which is unnecessary. Just pass the plan by reference.
2023-02-12 22:35:14 +00:00
Sanjay Patel
af39acda88 [VectorCombine] fix insertion point of shuffles
As shown in issue #60649, the new shuffles were
being inserted before a phi, and that is invalid.

It seems like most test coverage for this fold
(foldSelectShuffle) lives in the AArch64 dir,
but this doesn't repro there for a base target.
2023-02-10 10:57:11 -05:00
Sander de Smalen
5a115452c4 Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part,
e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292',
which was obviously wrong.

Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this.

This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.
2023-02-09 09:42:29 +00:00
Florian Hahn
c83fdc905a
[LV] Perform recurrence sinking directly on VPlan.
This patch updates LV to sink recipes directly using the VPlan use
chains. The initial patch only moves sinking to be purely VPlan-based.
Follow-up patches will move legality checks to VPlan as well.

At the moment, there's a single test failure remaining.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142589
2023-02-08 15:49:29 +00:00
Sander de Smalen
1f01cdda68 Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices."
This patch causes a regression, so reverting it while I investigate the issue.

This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.
2023-02-08 15:46:52 +00:00
Florian Hahn
32efff591a
[VPlan] Mark load VPWidenMemoryInstruction as not having side-effects.
Also add an assert using the underlying instruction to catch any
potential violations.
2023-02-07 22:02:50 +00:00
Arthur Eubanks
15977742d3 Reland [LegacyPM] Remove some legacy passes
These are part of the optimization pipeline, of which the legacy pass manager version is deprecated.

Namely
* Internalize
* StripSymbols
* StripNonDebugSymbols
* StripDeadDebugInfo
* StripDeadPrototypes
* VectorCombine
* WarnMissedTransformations

Fixed previously failing ocaml tests (one of them seems to already be failing?)
2023-02-07 12:56:05 -08:00
Arthur Eubanks
1b254022b2 Revert "[LegacyPM] Remove some legacy passes"
This reverts commit a4b4f62beb0bf40123181e5f5bdf32ef54f87166.

Ocaml bindings tests failing.
2023-02-07 10:17:45 -08:00
Arthur Eubanks
a4b4f62beb [LegacyPM] Remove some legacy passes
These are part of the optimization pipeline, of which the legacy pass manager version is deprecated.

Namely
* Internalize
* StripSymbols
* StripNonDebugSymbols
* StripDeadDebugInfo
* StripDeadPrototypes
* VectorCombine
* WarnMissedTransformations
2023-02-07 09:57:48 -08:00
Sander de Smalen
e6eb84a191 [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
This is specifically relevant for loops that vectorize using a scalable VF,
where the code results in:

  %vscale = call i32 llvm.vscale.i32()
  %vf.part1 = mul i32 %vscale, 4
  %gep = getelementptr  ..., i32 %vf.part1

Which InstCombine then changes into:

  %vscale = call i32 llvm.vscale.i32()
  %vf.part1 = mul i32 %vscale, 4
  %vf.part1.zext = sext i32 %vf.part1 to i64
  %gep = getelementptr  ..., i32 %vf.part1.zext

D143016 tried to remove these extends, but that only works when
the call to llvm.vscale.i32() has a single use. After doing any
kind of CSE on these calls the combine no longer kicks in.

It seems more sensible to ask DataLayout what type to use, rather
than relying on InstCombine to insert the extend and hoping it can
fold it away.

I've only changed this for indices that are not constant, because
I vaguely remember there was a reason for sticking with i32. It
would also mean patching up loads more tests.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D143267
2023-02-07 11:47:51 +00:00
Simon Pilgrim
552e27c521 [SLP] Use allConstant helper. NFCI. 2023-02-05 19:21:48 +00:00
Sander de Smalen
005311399e [LoopVectorize][TTI] NFCI: Clarify enum for the tail folding style.
This NFC (intended) patch has several small changes:
* It renames PredicationStyle to TailFoldingStyle.
* It renames TTI.emitActiveLaneMask() to TTI.getPreferredTailFoldingStyle()
* Simplifies some of its uses in the LoopVectorizer

Rationale: To my surprise PredicationStyle::None did not mean 'no
predication', but rather 'no active lane mask intrinsic', such that the
predicate is created using a splat + compare with stepvector. The enum is
also highly specific to tail folding, so it seems better to name this
around that feature, i.e. 'tail folding style'.

This also makes it more amenable to extend it to other tail folding styles,
such as the one added in D142109.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D142887
2023-02-03 14:59:57 +00:00
Florian Hahn
cf2d436b31
[VPlan] VPPredInstPHIRecipe does not read from memory.
VPPredInstPHIRecipe just merges the incoming values and does not write
to memory.
2023-01-31 21:51:03 +00:00
Florian Hahn
5368536cf1
[VPlan] VPPredInstPHIRecipes does not write to memory.
VPPredInstPHIRecipe just merges the incoming values and does not write
to memory.
2023-01-30 10:29:27 +00:00
Kazu Hirata
526966d07d Use llvm::bit_ceil (NFC)
Note that:

  std::has_single_bit(X) ? X : llvm::NextPowerOf2(X);

is equivalent to:

  std::bit_ceil(X)

even for input 0.
2023-01-28 16:13:09 -08:00
Kazu Hirata
f20b5071f3 [llvm] Use llvm::bit_floor instead of llvm::PowerOf2Floor (NFC) 2023-01-28 09:06:31 -08:00
Florian Hahn
b6b3d20d06
[VPlan] Use VPDominatorTree in VPlanVerifier .
Use VPDominatorTree to generalize def-use verification.

Depends on D140513.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140514
2023-01-25 16:32:40 +00:00
Florian Hahn
bf9e0da1a5
[VPlan] Switch default graph traits to be recursive, update VPDomTree.
This updates the GraphTraits specialization for VPBlockBase to recurse
through VPRegionBlocks.

This in turn enables using VPDominatorTree to query dominance between
any block in a plan. This should enable additional use cases, including
improvements to def-use verification and porting IR-based transforms
that rely on the dominator tree.

Specifically, this change means that for regions, the entry and exit
blocks dominate the successors of the region.

Depends on D140512 and D142162.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140513
2023-01-23 14:00:43 +00:00
Florian Hahn
31d46ca8aa
[Dominators] Introduce DomTreeNodeTraits to allow customization. (NFC)
This patch introduces DomTreeNodeTraits for customization. Clients can implement
DomTreeNodeTraitsCustom to provide custom ParentPtr, getEntryNode and getParent.
There's also a default specialization if DomTreeNodeTraitsCustom is not implemented,
that assume a Function-like NodeT. This is what is used for the existing DominatorTree
and MachineDominatorTree.

The main motivation for this patch is using DominatorTreeBase across all
regions of a VPlan, see D140513.

Reviewed By: kuhar

Differential Revision: https://reviews.llvm.org/D142162
2023-01-22 20:22:41 +00:00
Florian Hahn
fb40c34b8f
[VPlan] Consider all recipes in replicate blocks as sink candidates.
Update sinkScalarOperands to consider all operands of recipes in
replicate blocks as sink candidates This enables additional sinking
opportunities and is another step towards retiring LLVM IR-based
sinkScalarOperands.

This enables iterative sinking of operands for successive calls of
sinkScalarOperands.

Depends on D139788.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139790
2023-01-21 17:14:13 +00:00
ShihPo Hung
5fb3a57ea7 [Cost] Add CostKind to getVectorInstrCost and its related users
LoopUnroll estimates the loop size via getInstructionCost(),
but getInstructionCost() cannot pass CostKind to getVectorInstrCost().
And so does getShuffleCost() to getBroadcastShuffleOverhead(),
getPermuteShuffleOverhead(), getExtractSubvectorOverhead(),
and getInsertSubvectorOverhead().

To address this, this patch adds an argument CostKind to these
functions.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D142116
2023-01-21 05:29:24 -08:00
Alexey Bataev
9bdcf8778a [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.
The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.

Part of D110978

Differential Revision: https://reviews.llvm.org/D141512
2023-01-19 13:46:25 -08:00
Florian Hahn
e2c43a547b
[VPlan] Add vp_depth_first_deep (NFC)
Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to
make existing code clearer and more compact.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142055
2023-01-19 20:34:23 +00:00
Florian Hahn
655c88ca36
[VPlan] Add vp_depth_first_shallow + graph traits for wrapper(NFC)
This patch adds a new VPBlockShallowTraversalWrapper struct to
provide graph traits specialization that do not traverse through
VPRegionBlocks. This matches the behavior of the existing traits for
plain VPBlockBase and is a step before moving the graph traits for
VPBlockBase to traverse through VPRegionBlocks to enable cross region
support in VPDominatorTree.

Depends on D140511.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140512
2023-01-19 12:07:27 +00:00
Florian Hahn
feee22db52
[VPlan] Disconnect VPRegionBlock from successors in graph iterator(NFCI)
This updates the VPAllSuccessorsIterator to not connect the
VPRegionBlock itself to its successors. The successors are connected to
the exit block of the region. At the moment, this doesn't change any
exisint functionality.

But the new schema ensures the following property when used for
VPDominatorTree:

1. Entry & exit blocks of regions dominate the successors of the region.

This allows for convenient checking of dominance between defs and uses
that are not defined in the same region. I will share a follow-up patch
to use it for the VPDominatorTree soon.

Depends on D140500.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140511
2023-01-18 15:02:41 +00:00
Florian Hahn
22c9f4cf2d
[VPlan] Replace VPInterleaveRecipe::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-18 14:23:22 +00:00
Florian Hahn
f615de7e26
[VPlan] Replace VPBranchOnMaskSC::classof with VP_CLASSOF_IMPL. (NFC) 2023-01-18 12:14:58 +00:00