1650 Commits

Author SHA1 Message Date
Florian Hahn
b0da3c6fa4
[VPlan] Move setDebugLocFromInst to VPTransformState (NFC).
The moved helpers are only used for codegen. It will allow moving the
remaining ::execute implementations out of LoopVectorize.cpp.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D128657
2022-07-02 15:18:17 +01:00
Florian Hahn
0dddf04cab
[LV] Don't optimize exit cond during epilogue vectorization.
At the moment, the same VPlan can be used code generation of both the
main vector and epilogue vector loop. This can lead to wrong results, if
the plan is optimized based on the VF of the main vector loop and then
re-used for the epilogue loop.

One example where this is problematic is if the scalar loops need to
execute at least one iteration, e.g. due to interleave groups.

To prevent mis-compiles in the short-term, disable optimizing exit
conditions for VPlans when using epilogue vectorization. The proper fix
is to avoid re-using the same plan for both loops, which will require
support for cloning plans first.

Fixes #56319.
2022-07-01 13:48:38 +01:00
Florian Hahn
583abd0e36
[VPlan] Move addMetadata to VPTransformState (NFC).
The moved helpers are only used for codegen. It will allow moving the
remaining ::execute implementations out of LoopVectorize.cpp.

Depends on D127966.
Depends on D127965.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127968
2022-07-01 12:03:25 +01:00
Florian Hahn
68884dde70
[LV] Move LoopVersioning creation to LVP::execute.
At the moment LoopVersioning is only created for inner-loop
vectorization. This patch moves it to LVP::execute, which means it will
also be added for epilogue vectorization. As a consequence, the proper
noalias metadata is now also added to epilogue vector loops.

LVer will be moved to VPTransformState as follow-up.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D127966
2022-06-30 12:14:32 +01:00
Philip Reames
20dd3297b1 [LV] Allow scalable vectorization with vscale = 1
This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.

(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)

This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.

This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.

In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.

Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.

Differential Revision: https://reviews.llvm.org/D128542
2022-06-27 13:38:57 -07:00
Kazu Hirata
a7938c74f1 [llvm] Don't use Optional::hasValue (NFC)
This patch replaces Optional::hasValue with the implicit cast to bool
in conditionals only.
2022-06-25 21:42:52 -07:00
Kazu Hirata
3b7c3a654c Revert "Don't use Optional::hasValue (NFC)"
This reverts commit aa8feeefd3ac6c78ee8f67bf033976fc7d68bc6d.
2022-06-25 11:56:50 -07:00
Kazu Hirata
aa8feeefd3 Don't use Optional::hasValue (NFC) 2022-06-25 11:55:57 -07:00
Florian Hahn
cb69ba4faa
[LV] Create RT checks once VF/IC are selected, track scalar cost.
This patch updates LV to generate runtime after the VF & IC are selected. It
allows deciding whether to vectorize with runtime checks or not based on
their cost compared to the vector loop.

It also updates VectorizationFactor to include the scalar cost.

Reviewed By: lebedev.ri, dmgreen

Differential Revision: https://reviews.llvm.org/D75981
2022-06-24 17:42:11 +02:00
Florian Hahn
b18141a8f2
[VPlan] Set VFs included in plan before last set of VPTransforms (NFC).
This allows VPlanTransforms to query the VFs included in the plan in the
future.
2022-06-24 10:16:56 +02:00
Philip Reames
46ea4b5ea1 [LV] Avoid a crash when costing a uniform store which doesn't correspond to a legal scatter
If we have an unaligned uniform store, then when costing a scalable VF we can't emit code to scalarize it.  (Well, we could, but we haven't implemented that case.)  This change replaces an assert with a cost-model bailout such that we reject vectorization with the scalable VF instead of crashing.
2022-06-23 12:41:09 -07:00
Serguei Katkov
8f891b7c39 [LoopVectorize] Uninitialized phi node leads to a crash in SSAUpdater.
createInductionResumeValues creates a phi node placeholder
without filling incoming values. Then it generates the incoming values.

It includes triggering of SCEV expander which may invoke SSAUpdater.
SSAUpdater has an optimization to detect number of predecessors
basing on incoming values if there is phi node.
In case phi node is not filled with incoming values - the number of predecessors
is detected as 0 and this leads to segmentation fault.

In other words SSAUpdater expects that phi is in good shape while
LoopVectorizer breaks this requirement.

The fix is just prepare all incoming values first and then build a phi node.

Reviewed By: fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D128033
2022-06-22 10:49:27 +07:00
Florian Hahn
88ce403c6a
[LV] Add new block to place recurrence splice, if needed.
In some cases, a recurrence splice instructions needs to be inserted
between to regions, for example if the regions get re-arranged during
sinking.

Fixes #56146.
2022-06-21 21:54:37 +02:00
Kazu Hirata
7a47ee51a1 [llvm] Don't use Optional::getValue (NFC) 2022-06-20 22:45:45 -07:00
Florian Hahn
949c13649c
[LV] Remove widenPHIInstruction dependence on underlying instr (NFC).
Instead of using the underlying instruction and VF to get the type, use
the type of the incoming value. This removes an unnecessary dependence
on the underlying instruction and enables using the recipe without an
underlying instruction.
2022-06-16 16:03:01 +02:00
Florian Hahn
5ff5b460d9
[LV] Remove unneeded CustomBuilder arg from setDebugLocFromInst (NFC).
The only user that passed in a custom builder was passing in
VPTransformState::Builder, which is the same as ILV::Builder.
2022-06-15 18:48:02 +01:00
Florian Hahn
9129e7bb54
[LV] Replace OrigPHIsToFix in native with VPlan traversal. (NFC)
OrigPHIsToFix is only used in the native path. Collecting phis can be
replaced by iterating over the plan. This also removes another
unnecessary use of a late getVPValue.

This also reduces the coupling between ILV and the VPlan utilities.
2022-06-13 22:20:58 +01:00
Hubert Tong
5efb380c26 [NFC] Undo AIX build compiler workaround
Removes the workaround from https://reviews.llvm.org/D98509#2732628 for
an AIX build compiler issue.

The AIX build compiler product that caused the issue has since been
fixed. Also, the AIX build compiler has been changed to one based on
LLVM.
2022-06-13 17:00:33 -04:00
Florian Hahn
763f2bdba5
[VPlan] Remove dead OrigLoop argument from removeDeadRecipes (NFC).
The use of the argument has been remove a while ago. Remove the dead
argument.
2022-06-11 23:36:47 +01:00
Florian Hahn
85983ca42e
[VPlan] Replace remaining use of needsScalarIV.
All information is already available in VPlan. Note that there are some
test changes, because we now can correctly look through instructions
like truncates to analyze the actual users.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123541
2022-06-09 12:05:37 +01:00
David Sherwood
997ecb0036 [LoopVectorize] Add FastMathFlags to the select used for reductions with tail-folding
Based on reviewer comments on https://reviews.llvm.org/D126692 I've
added FastMathFlags to the select instruction used when tail-folding
with reductions. These flags can then be used by InstCombine to
decide upon the most optimal floating point identity value for
fadd/fsub. Doing so unlocks further optimisations, such as folding
selects into masked loads.

Differential Revision: https://reviews.llvm.org/D126778
2022-06-07 10:21:31 +01:00
Florian Hahn
a5bb4a3b4d
[VPlan] Replace CondBit with BranchOnCond VPInstruction.
This patch removes CondBit and Predicate from VPBasicBlock. To do so,
the patch introduces a new branch-on-cond VPInstruction opcode to model
a branch on a condition explicitly.

This addresses a long-standing TODO/FIXME that blocks shouldn't be users
of VPValues. Those extra users can cause issues for VPValue-based
analyses that don't expect blocks. Addressing this fixme should allow us
to re-introduce 266ea446ab7476.

The generic branch opcode can also be used in follow-up patches.

Depends on D123005.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126618
2022-06-03 11:48:31 +01:00
Florian Hahn
08482830eb
[LV] Update var name to Exiting, in line with terminology (NFC)
Recently the terminology used has been changed from Exit->Exiting in
line with common LLVM loop terminology. Update a remaining use of the
old terminology.
2022-06-01 22:13:29 +01:00
Florian Hahn
05776122b6
[VPlan] Use region for each loop in native path.
This patch updates the VPlan native path to use VPRegionBlocks for all
loops in a loop nest. Up to now, only the outermost loop used a region.

This is a step towards unifying both paths and keep things consistent
between them. It also prepares various code-gen parts for modeling the
pre-header in the inner loop vectorizer (D121624).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123005
2022-06-01 10:41:05 +01:00
Florian Hahn
d157019482
[VPlan] Remove unused native utilities incompatible with nested regions.
The implementations of VPlanDominatorTree, VPlanLoopInfo and VPlanPredicator
are all incompatible with modeling loops in VPlans as region without
explicit back-edges.

Those pieces are not actively used and only exercised by a few gtest
unit tests. They are at the moment blocking progress towards unifying
the native and inner-loop vectorizer paths in D121624 and D123005.

I think we should not block forward progress on unused pieces of code,
so this patch removes the utilities for now. The plan is to re-introduce
them as needed in a way that is compatible with the unified VPlan scheme
used in both the inner loop vectorizer and the native path.

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D123017
2022-06-01 09:32:59 +01:00
Mel Chen
b0fc765350 [NFC] Change LoopVectorizationCostModel::useOrderedReductions() to be a const function.
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D126200
2022-05-31 05:39:13 -07:00
Florian Hahn
6abce17fc2
[VPlan] Use Exiting-block instead of Exit-block terminology (NFC).
In LLVM's common loop terminology, an exit block is a block outside a
loop with a predecessor inside the loop. An exiting block is a block
inside the loop which branches to an exit block outside the loop.

This patch updates a few places where VPlan was using ExitBlock for a
block exiting a region. Those instances have been updated to use
ExitingBlock.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D126173
2022-05-28 21:16:05 +01:00
Florian Hahn
390c0ac28d
[LV] Fix indentation in tryToCreateWidenRecipe (NFC). 2022-05-26 08:53:34 +01:00
David Sherwood
87936c7b13 [LoopVectorize] Fix assertion failure in fixReduction when tail-folding
When compiling the attached new test in scalable-reductions-tf.ll we
were hitting this assertion in fixReduction:

  Assertion `isa<PHINode>(U) && "Reduction exit must feed Phi's or select"

The loop contains a reduction and an intermediate store of the reduction
value. When vectorising with tail-folding the contains of 'U' in the
assertion above happened to be a scatter_store. It turns out that we
were still creating a widen recipe for the invariant store, despite
knowing that we can actually sink it. The simplest fix is to change
buildVPlanWithVPRecipes so that we look for invariant stores before
attempting to widen it.

Differential Revision: https://reviews.llvm.org/D126295
2022-05-25 11:46:32 +01:00
Florian Hahn
c6e45ea074
[VPlan] Exit earlier when trying to widen with scalar VFs.
This simplifies the code a bit, suggested in D124718.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D125029
2022-05-25 11:05:23 +01:00
Jingu Kang
bb82f74612 Revert "Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth""
This reverts commit 42ebfa8269470e6b1fe2de996d3f1db6d142e16a.

The commmit from https://reviews.llvm.org/D125918 has fixed the stage 2 build
failure.

Differential Revision: https://reviews.llvm.org/D118979
2022-05-23 16:15:45 +01:00
Peter Waller
ade47bdc31 [LV] Improve register pressure estimate at high VFs
Previously, `getRegUsageForType` was implemented using
`getTypeLegalizationCost`.  `getRegUsageForType` is used by the loop
vectorizer to estimate the register pressure caused by using a vector
type.  However, `getTypeLegalizationCost` currently only appears to
understand splitting and not scalarization, so significantly
underestimates the register requirements.

Instead, use `getNumRegisters`, which understands when scalarization
can occur (via computeRegisterProperties).

This was discovered while investigating D118979 (Set maximum VF with
shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the
loop vectorizer previously ends up costing an v128i1 as 2 v64i*
registers where it actually occupies 128 i32 registers.

I'm sending this patch early for comment, I'm still doing some sanity checking
with LNT.  I note that getRegisterClassForType appears to return VectorRC even
though the type in question (large vNi1 types) end up occupying scalar
registers. That might be worth fixing too.

Differential Revision: https://reviews.llvm.org/D125918
2022-05-23 07:57:45 +00:00
Florian Hahn
145fe57106
[LV] Use exiting block instead of latch in addUsersInExitBlock.
The latch may not be the exiting block. Use the exiting block instead
when looking up the incoming value of the LCSSA phi node. This fixes a
crash with early-exit loops.
2022-05-22 18:27:41 +01:00
Florian Hahn
97590baead
[LV] Widen ptr-inductions with scalar uses for scalable VFs.
Current codegen only supports scalarization of pointer inductions for
scalable VFs if they are uniform. After 3bebec659 we now may enter the
scalarization code path in VPWidenPointerInductionRecipe::execute for
scalable vectors.

Fall back to widening for scalable vectors if necessary.

This should fix a build failure when bootstrapping LLVM with SVE, e.g.
https://lab.llvm.org/buildbot/#/builders/176/builds/1723
2022-05-22 16:24:13 +01:00
Florian Hahn
3bebec6592
[VPlan] Model first exit values using VPLiveOut.
This patch introduces a new VPLiveOut subclass of VPUser  to model
 exit values explicitly. The initial version handles exit values that
are neither part of induction or reduction chains nor first order
recurrence phis.

Fixes #51366, #54867, #55167, #55459

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123537
2022-05-21 16:01:38 +01:00
Florian Hahn
cd61d4bd2f
[LV] Do not LoopSimplify/LCSSA after generating main vector loop.
At the moment LV runs LoopSimplify and reconstructs LCSSA form after
generating the main vector loop and before generating the epilogue
vector loop.

In practice, this adds a new exit block for the scalar loop because the
middle block now also branches to the original exit block of the scalar
loop. It also requires adding a new LCSSA phi in the newly created exit
block.

This complicates things when modeling exit values in VPlan, because we
would need to update the VPlan for the epilogue loop to update the newly
created LCSSA phi node.

But none of that should be necessary, as all analysis requiring
loop-simplify form is already done at this point and LCSSA form of the
original loop is not broken.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D125810
2022-05-20 09:58:40 +01:00
Florian Hahn
c90235f0ef
[LV] Drop wrap flags for reductions using VP def-use chain.
Update clearReductionWrapFlags to use the VPlan def-use chain from the
reduction phi recipe to drop reduction wrap flags.

This addresses an existing FIXME and fixes a crash when instructions in
the reduction chain are not used and have been removed before VPlan
codegeneration.

Fixes #55540.
2022-05-19 20:36:46 +01:00
Tiehu Zhang
3ed9f603fd [LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold
The runtime check threshold should also restrict interleave count.
Otherwise, too many runtime checks will be generated for some cases.

Reviewed By: fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D122126
2022-05-19 23:29:00 +08:00
Florian Hahn
df56fb44f5
[VPlan] Update VPWidenMemoryInstruction to not inherit from VPValue.
VPWidenMemoryInstruction also models stores which may not produce a value.
This can trip over analyses. Improve the modeling by only adding
VPValues for VPWidenMemoryInstructionRecipes modeling loads.
2022-05-19 16:24:58 +01:00
lizhijin
90ea81fcb2 [LV] Widen freeze instead of scalarizing it
This patch changes the strategy for vectorizing freeze instrucion, from
replicating multiple times to widening according to selected VF.

Fixes #54992

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D125016
2022-05-19 12:28:01 +08:00
Florian Hahn
fcfb86483b
[LV] set Header earlier, use variable instead of repeated access (NFC). 2022-05-18 09:29:59 +01:00
Florian Hahn
5b00d13c00
[LV] Fetch vector loop region once and remember it (NFC).
This avoids an unnecessary lookup and makes the code slightly more
compact.
2022-05-17 15:57:23 +01:00
Florian Hahn
c1a9d14982
[VPlan] Move usesScalars/onlyFirstLaneUsed to VPUser.
Those helpers model properties of a user and they should also be
available to non-recipe users. This will be used in D123537 for a new
exit value user.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D124936
2022-05-17 11:20:06 +01:00
Florian Hahn
b7315ffc3c
[LAA,LV] Add initial support for pointer-diff memory checks.
This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.

The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
`VF * UF * AccessSize`. It is sufficient to check
`(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.

Note that the initial version is restricted in multiple ways:

1. Pointers must only either be read or written, by a single
   instruction (this allows re-constructing source/sink for
   dependences with the available information)
 2. Source and sink pointers must be add-recs, with matching steps
 3. The step must be a constant.
 3. abs(step) == AccessSize.

Most of those restrictions can be relaxed in the future.

See https://github.com/llvm/llvm-project/issues/53590.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D119078
2022-05-16 15:27:22 +01:00
David Sherwood
befc952045 [LoopVectorize] Permit tail-folding for low trip counts using scalable vectors
When the loop vectoriser encounters a known low trip count it tries
to create a single predicated loop in order to get the benefit of
vectorisation and eliminate the scalar tail. However, until now the
vectoriser prevented the use of scalable vectors in this case due
to concerns in the past about stability. I believe that tail-folded
loops using scalable vectors are now sufficiently well tested that
we can enable this. For the same reason I've also enabled it when
optimising for code size too.

Tests added here:

  Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll
  Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
  Transforms/LoopVectorize/RISCV/low-trip-count.ll

Differential Revision: https://reviews.llvm.org/D121595
2022-05-16 09:14:24 +01:00
Florian Hahn
8b7c3d2179
[LV] Set SCEVCheckCond to nullptr whenever it was used.
Under some circumstances, SCEVExpander will insert new instructions when
expanding a predicate, but the final result of the expansion can be a
false constant.

In those cases, the expanded instructions may later be used by other
expansions, e.g. the trip count. This may trigger an assertion during
SCEVExpander cleanup. To avoid this, always mark the result as used.

Fixes #55100.
2022-05-15 21:52:07 +01:00
David Sherwood
92c645b5c1 [LoopVectorize] Add overflow checks when tail-folding with scalable vectors
In InnerLoopVectorizer::getOrCreateVectorTripCount there is an
assert that the known minimum value for the VF is a power of 2
when tail-folding is enabled. However, for scalable vectors the
value of vscale may not be a power of 2, which means we have
to worry about the possibility of overflow. I have solved this
problem by adding preheader checks that prevent us from entering
the vector body if the canonical IV would overflow, i.e.

  if ((IntMax - TripCount) < (VF * UF)) ... skip vector loop ...

Differential Revision: https://reviews.llvm.org/D125235
2022-05-13 14:09:43 +01:00
Nikita Popov
ed1cb01baf [IRBuilder] Add IsInBounds parameter to CreateGEP()
We commonly want to create either an inbounds or non-inbounds GEP
based on a boolean value, e.g. when preserving inbounds from
existing GEPs. Directly accept such a boolean in the API, rather
than requiring a ternary between CreateGEP and CreateInBoundsGEP.

This change is not entirely NFC, because we now preserve an
inbounds flag in a constant expression edge-case in InstCombine.
2022-05-13 14:30:55 +02:00
Florian Hahn
f9f7aa30f8
[VPlan] Remove dead code to create VPWidenPHIRecipes (NFCI).
After introducing VPWidenPointerInductionRecipe, VPWidenPHIRecipes
should not be created at this point. Turn check into an assert.
2022-05-05 19:29:02 +01:00
Igor Kirillov
4e5e042d9a [LoopVectorize] Support reductions that store intermediary result
Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.

Ordered fadd reductions are not yet supported.

Differential Revision: https://reviews.llvm.org/D110235
2022-05-03 10:12:30 +01:00