258 Commits

Author SHA1 Message Date
Florian Hahn
241fe83704
[VPlan] Introduce ComputeReductionResult VPInstruction opcode. (#70253)
This patch introduces a new ComputeReductionResult opcode to compute the
final reduction result in the middle block. The code from fixReduction
has been moved to ComputeReductionResult, after some earlier cleanup
changes to model parts of fixReduction explicitly elsewhere as needed.

The recipe may be broken down further in the future.

Note that  the phi nodes to merge the reduction result from the trip 
count check and the middle block, to be used as resume value for the
scalar remainder loop are also generated based on 
ComputeReductionResult.

Once we have a VPValue for the reduction result, this can also be
modeled explicitly and moved out of the recipe.
2024-01-04 22:53:18 +00:00
Florian Hahn
b1bfe221e6
[VPlan] Remove unneeded getNumUsers calls in replaceAllUsesWith (NFC).
As suggested post-commit for a00227197, replace unnecessary getNumUsers
calls by boolean variable to indicate if users changed. Note that this
also requires an early exit to detect the case where a value is replaced
by itself.
2023-12-15 13:43:15 +00:00
Florian Hahn
a5891fa4d2
[VPlan] Initial modeling of VF * UF as VPValue. (#74761)
This patch starts initial modeling of VF * UF in VPlan.
Initially, introduce a dedicated VFxUF VPValue, which is then
populated during VPlan::prepareToExecute. Initially, the VF * UF
applies only to the main vector loop region. Once we extend the
scope of VPlan in the future, we may want to associate different VFxUFs
with different vector loop regions (e.g. the epilogue vector loop)

This allows explicitly parameterizing recipes that rely on the
VF * UF, like the canonical induction increment. At the moment, this
mainly helps to avoid generating some duplicated calls to vscale with
scalable vectors. It should also allow using EVL as induction increments
explicitly in D99750. Referring to VF * UF is also needed in other
places that we plan to migrate to VPlan, like the minimum trip count
check during skeleton creation.

The first version creates the value for VF * UF directly in
prepareToExecute to limit the scope of the patch. A follow-on patch will
model VF * UF computation explicitly in VPlan using recipes.

Moved from Phabricator (https://reviews.llvm.org/D157322)
2023-12-08 18:30:30 +00:00
Florian Hahn
99aa5311ee
[VPlan] Add missing output of live-ins to VPlan dot printing.
Split off live-in printing to VPlan::printLiveIns and use it to print
Live-ins when printing in the DOT format.
2023-12-04 13:41:28 +00:00
Florian Hahn
906f598263
[VPlan] Remove dead IsEpilogueVec argument from prepareToExecute (NFC). 2023-11-23 16:59:50 +00:00
Florian Hahn
34c2dcd5ac
[VPlan] Move initial skeleton construction to createInitialVPlan. (NFC)
This patch moves creating the  middle VPBBs and an initial empty
vector loop region for the top-level loop to createInitialVPlan.

This consolidates code to create the initial VPlan skeleton and enables
adding other bits outside the main region during initial VPlan
construction. In particular, D150398 will add the exit check & branch to
the middle block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158333
2023-11-12 13:00:44 +00:00
Florian Hahn
a002271972
[VPlan] Add VPValue::replaceUsesWithIf (NFCI).
Add replaceUsesWithIf helper and use it in a few places.
2023-11-06 16:08:22 +00:00
Kazu Hirata
3af0ff99b1 [llvm] Stop including llvm/ADT/DepthFirstIterator.h (NFC)
Identified with misc-include-cleaner.
2023-10-22 12:15:46 -07:00
Florian Hahn
97687b7aea
[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.

This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.

This has the advantage of keeping the logic  (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158779
2023-09-25 13:34:45 +01:00
Florian Hahn
168e23c741
[VPlan] Remove reference to Instr when setting debug loc. (NFCI)
This allows untangling references to underlying IR for various recipes.
2023-09-05 10:59:13 +01:00
Florian Hahn
19d286bca0
[VPlan] Assert that inst isnt' a debug or pseudo inst (NFCI).
Debug and pseudo instructions aren't modeled in VPlan. Turn a check into
an assertion. This will help removing the direct use of Inst here in the
future.
2023-09-03 21:31:31 +01:00
Florian Hahn
e18a547ce2
[VPlan] Fold if into return in prepareToExecute assertion (NFC).
Independent simplification suggested in D157194.
2023-08-08 12:45:55 +01:00
Florian Hahn
af635a5547
[VPlan] Model wrap flags directly, remove *NUW opcodes (NFC)
Model wrap flags directly using VPRecipeWithIRFlags and clean up the
duplicated *NUW opcodes.

D157144 will build on this and also model FMFs for VPInstruction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157194
2023-08-08 12:12:30 +01:00
Florian Hahn
deec9e7674
[VPlan] Move VPTransformState::get() to VPlan.cpp (NFC).
The last dependency of code defined in LoopVectorize.cpp has been
removed a while ago. Move VPTransformState::get() to VPlan.cpp where
other members are also defined.
2023-08-03 21:49:58 +01:00
Florian Hahn
d1d0e135a1
[LV] Move packScalarIntoVectorValue to VPTransformState (NFC).
This moves packScalarIntoVectorValue from ILV to the more approriate
VPTransformState.
2023-08-02 12:36:48 +01:00
Elliot Goodrich
b0abd4893f [llvm] Add missing StringExtras.h includes
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
2023-06-25 15:42:22 +01:00
Florian Hahn
96686796f6
[VPlan] Move live-out printing to VPLiveOut::print (NFC).
Preparation for D150398. This brings live-out printing in line with how
printing for recipes is handled.
2023-05-22 09:53:53 +01:00
Hongtao Yu
9272d0f079 [PseudoProbe] Clean up dwarf discriminator and avoid duplicating factor.
A pseudo probe is created with dwarf line information shared with its nearest instruction. If the instruction comes with a dwarf discriminator, it will be shared with the probe as well. This can confuse the later FS-AFDO discriminator assignment pass. To fix this, I'm cleaning up the discriminator fields for probes when they are inserted.

I also notice another possibility to change the discriminator field of pseudo probes in the pipeline before the FS discriminator assignment pass. That is the loop unroller, which assigns duplication factor to instruction being vectorized. I'm disabling that for pseudo probe intrinsics specifically, also for callsites with probes.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D148569
2023-05-10 11:26:23 -07:00
Florian Hahn
e3afe0b89d
[VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
To generate cast instructions, the result type is needed. To allow
creating widened casts without underlying instruction, introduce a new
VPWidenCastRecipe that also holds the result type.

This functionality will be used in a follow-up patch to
implement truncateToMinimalBitwidths as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149081
2023-05-05 13:20:16 +01:00
Florian Hahn
147a56149c
[VPlan] Clean up preheader block after b85a402dd899fc.
Fix a leak introduced in b85a402dd899fc and flagged by LSan
https://lab.llvm.org/buildbot#builders/5/builds/33452
2023-05-04 16:29:57 +01:00
Florian Hahn
b85a402dd8
[VPlan] Introduce new entry block to VPlan for early SCEV expansion.
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.

The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.

D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix #58811.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147964
2023-05-04 14:00:13 +01:00
Florian Hahn
79692750d2
[LV] Use VPValue for SCEV expansion in fixupIVUsers.
The step is already expanded in the VPlan. Use this expansion instead.
This is a step towards modeling fixing up IV users in VPlan.

 It also fixes a crash casued by SCEV-expanding the Step expression in
fixupIVUsers, where the IR is in an incomplete state

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147963
2023-05-04 09:25:59 +01:00
Florian Hahn
b9efffa7e9
[VPlan] Add assignSlot(const VPBasicBlock *) (NFC).
Factor out utility to simplify D147964 as sugested.
2023-05-03 19:51:09 +01:00
Florian Hahn
2c9d21a2a3
[VPlan] Turn Plan entry node into VPBasicBlock (NFCI).
The entry to the plan is the preheader of the vector loop and
guaranteed to be a VPBasicBlock. Make sure this is the case by
adjusting the type.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149005
2023-04-28 12:29:06 +01:00
Florian Hahn
3157f03a34
[VPlan] Add VPValue::isLiveIn() (NFC).
This helps to clarify checks in multiple places.

Suggested as cleanup in D147892.
2023-04-24 17:51:12 +01:00
Florian Hahn
ff0ec4f42e
Recommit "[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI)."
This reverts the revert commit 8c2276f89887d0a27298a1bbbd2181fa54bbb509.

The updated patch re-orders the getDefiningRecipe check in getVPalue to avoid
a use-after-free.

Original commit message:

    Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
    1) Value2VPValue and 2) VPExternalDefs.

    This duplication is unnecessary and there are already cases where
    external defs are added to Value2VPValue. This patch replaces all uses
    of VPExternalDefs with Value2VPValue.

    It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
    and addVPValue (to addExternalVPValue).

    At the moment, this is NFC, but will enable additional simplifications
    in D147783.

    Depends on D147891.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D147892
2023-04-18 10:29:31 +01:00
Vitaly Buka
8c2276f898 Revert "[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI)."
Asan detects heap-use-after-free, see D147892.

This reverts commit 4fc190351e5af901b6107d162d07e1fbca90934f.
This reverts commit 668045eb77628be13e448ffbb855473ffca1cc43.
2023-04-17 17:24:10 -07:00
Florian Hahn
668045eb77
[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI).
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.

This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.

It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).

At the moment, this is NFC, but will enable additional simplifications
in D147783.

Depends on D147891.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147892
2023-04-16 15:38:31 +01:00
Florian Hahn
0dbcbfe0d0
[VPlan] Don't assign slots for external defs (NFCI).
External defs are VPValues wrapping an IR value and hence will get
printed as ir<>. We don't need to assign a slot for a VPValue number.
2023-04-09 21:01:21 +01:00
Florian Hahn
620e011a25
[VPlan] Don't add live-outs if scalar epilogue is required.
Instead of clearing live outs when a scalar epilogue is required late,
don't add live outs during VPlan construction if a scalar epilogue is
required.

This enables more VPlan-based DCE (if the live out would be the only
user in the plan) and is a step towards removing an access of the cost
model in fixedVectorizedLoop (which is after VPlan execution).

Depends on D147468.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147471
2023-04-09 09:18:24 +01:00
David Green
28c8616a5b [LV] Cleanup and reformatting for some debug messages. NFC
This is just some cleanup of various debug messages, pulled out of another
patch to simplify it a little.
2023-04-05 17:50:01 +01:00
Kazu Hirata
398af9b43b [llvm] Use *{Map,Set}::contains (NFC) 2023-03-15 18:06:32 -07:00
David Green
98481bc723 [LV][VPlan] Fix printing TripCount liveins. NFC
The TripCount liveins would currently be printed as badref in the vplan as they
are not allocated slots in the VPSlotTracker. This patch allocates them a slot
and adds them to the printed Live-Ins. It also makes a minor adjustment to
printing of Live-ins to reduce the empty lines when multiple Live-ins are
present.

Differential Revision: https://reviews.llvm.org/D145507
2023-03-13 19:44:12 +00:00
Fangrui Song
1e6921131a Move global namespace cl::opt inside llvm:: 2023-02-14 00:09:44 -08:00
Florian Hahn
e2c43a547b
[VPlan] Add vp_depth_first_deep (NFC)
Similar to vp_depth_first_shallow (D140512) add vp_depth_first_deep to
make existing code clearer and more compact.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142055
2023-01-19 20:34:23 +00:00
Florian Hahn
655c88ca36
[VPlan] Add vp_depth_first_shallow + graph traits for wrapper(NFC)
This patch adds a new VPBlockShallowTraversalWrapper struct to
provide graph traits specialization that do not traverse through
VPRegionBlocks. This matches the behavior of the existing traits for
plain VPBlockBase and is a step before moving the graph traits for
VPBlockBase to traverse through VPRegionBlocks to enable cross region
support in VPDominatorTree.

Depends on D140511.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140512
2023-01-19 12:07:27 +00:00
Florian Hahn
2b054d5dd4
[VPlan] Use to_vector when iterating over a temporary vector. (NFC) 2023-01-13 17:51:00 +00:00
Florian Hahn
cd16a3f04c
[VPlan] Move GraphTraits definitions to separate header (NFC).
This reduces the size of VPlan.h and avoids future growth of the file
when the graph traits are extended in future patches.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D140500
2022-12-31 15:14:57 +00:00
Florian Hahn
e1650c8d52
[LV] Move exit cond simplification to separate transform.
This sets the stage for D133017 by moving out the code that performs
VPlan based simplifications to a separate transform that takes the
chosen VF & UF as arguments.

The main advantage is that this transform runs before any changes to
the CFG are being made. This allows using SCEV without worrying about
making queries while the IR is in an incomplete state.

Note that this patch switches the reasoning to use SCEV, but still only
simplifies loops with constant trip counts. Using SCEV here is needed to
access the backedge taken count, because the trip count IR value has not
been created yet.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D135017
2022-12-23 12:51:21 +00:00
Florian Hahn
5df34e971d
[VPlan] Add support for tracking UFs applicable to VPlan (NFC).
Explicitly track the UFs supported in a VPlan. This is needed to
allow transformations to restrict the UFs which are supported.

Discussed as separate improvement in D135017.
2022-12-22 18:58:25 +00:00
Florian Hahn
96296922b6
[VPlan] Move VF and UF string generation to getName() (NFC).
The VFs and UFs may be more constrained as the plans are transformed
(e.g. see D135017 for an example).

To make sure the VFs/UFs included in the VPlan dump are accurate,
generate them when accessing a plan's name, rather than include them in
the name string set after initial construction.
2022-12-22 13:15:01 +00:00
Mircea Trofin
946831ea2d [NFC] Rename Function::isDebugInfoForProfiling to shouldEmit[...]
The function name was misleading - the expectation set both by the name
and by other members of Function (like isDeclaration or isIntrinsic)
would be that the function somehow would "be" "debug info for
profiling". But that's not the case - the property indicates (as the
comment over the declaration also explains) whether debug info should be
emitted (for profiling).
2022-12-21 18:36:59 -08:00
Florian Hahn
0c5df7cd2f
Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit bf15f1e489aa2f1ac13268c9081a992a8963eb5b.

The updated version fixes a crash by checking the induction kind instead
of the opcode; for integer inductions, the step is always added, but the
opcode might not be set.
2022-11-30 17:04:20 +00:00
Florian Hahn
bf15f1e489
Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit 0fa666ecedc3f36471c0fee925d664512e7525a8.

This triggers an assertion during AArch64 stage2 builds. Revert while I
investigate.

See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio
2022-11-28 22:43:11 +00:00
Florian Hahn
0fa666eced
[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.
This patch splits off the logic to transform the canonical IV to a
a value for an induction with a different start and step. This
transformation only needs to be done once (independent of VF/UF) and
enables sinking of VPScalarIVStepsRecipe as follow-up.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133758
2022-11-28 16:32:31 +00:00
Florian Hahn
55f56cdc33
[VPlan] Introduce VPValue::hasDefiningRecipe helper (NFC).
This clarifies the intention of code that uses the helper.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:12:40 +00:00
Florian Hahn
aa16689f82
[VPlan] Use recipe type to avoid getDefiningRecipe call (NFC).
Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:03:34 +00:00
Florian Hahn
32f1c5531b
[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC)
The return value of getDef is guaranteed to be a VPRecipeBase and all
users can also accept a VPRecipeBase *. Most users actually case to
VPRecipeBase or a specific recipe before using it, so this change
removes a number of redundant casts.

Also rename it to getDefiningRecipe to make the name a bit clearer.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D136068
2022-11-16 22:12:08 +00:00
Florian Hahn
2c692d891e
[LV] Update handling of scalable pointer inductions after b73d2c8.
The dependent code has been changed quite a lot since 151c144 which
b73d2c8 effectively reverts. Now we run into a case where lowering
didn't expect/support the behavior pre 151c144 any longer.

Update the code dealing with scalable pointer inductions to also check
for uniformity in combination with isScalarAfterVectorization. This
should ensure scalable pointer inductions are handled properly during
epilogue vectorization.

Fixes #57912.
2022-09-23 18:23:02 +01:00
Florian Hahn
582f8ef19f
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.

At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.

This can lead to widened phis with incorrect start values being created
in the epilogue vector body.

This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization

Fixes #57712.
2022-09-19 18:14:35 +01:00