116 Commits

Author SHA1 Message Date
Florian Hahn
56f5738d85
[LV] Move induction ::execute impls to VPlanRecipes.cpp (NFC).
All dependencies on code from LoopVectorize.cpp have been
removed/refactored. Move the ::execute implementations to other recipe
definitions in VPlanRecipes.cpp
2023-08-20 21:00:05 +01:00
Florian Hahn
ada2a455fc
[VPlan] Use VPBasicBlock to get incoming block for exit phi fixup (NFC)
Retrieve block via VPlan infrastructure as suggested as independent
cleanup in D150398.
2023-08-17 18:17:45 +01:00
Mel Chen
463e7cb892 [LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc
This commit refactors the implementation of VPReductionRecipe to use
reference instead of pointer for member RdxDesc. Because the member
RdxDesc in VPReductionRecipe should not be a nullptr, using a reference
will provide clearer semantics.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158058
2023-08-16 19:37:49 -07:00
Florian Hahn
aacaf3d580
[VPlan] Simplify VPDerivedIV truncation handling (NFCI).
Address post-commit simplification suggestion for 8a56179bcd8c: Replace
IsTruncated by conditionally setting TruncResultTy only if truncation
is required.
2023-08-14 17:33:10 +01:00
Florian Hahn
8a56179bcd
[VPlan] Store induction kind & binop directly in VPDerviedIVRecipe(NFC)
Limit the information stored in VPDerivedIVRecipe to the ingredients
really needed.
2023-08-10 10:57:32 +01:00
Florian Hahn
698ae66092
[VPlan] Replace FMF in VPInstruction with VPRecipeWithIRFlags (NFC).
Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for
VPInstruction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157144
2023-08-08 20:13:11 +01:00
Florian Hahn
b6d994de0f
[VPlan] Address post-commit suggestions for af635a554 (NFC). 2023-08-08 12:59:34 +01:00
Florian Hahn
af635a5547
[VPlan] Model wrap flags directly, remove *NUW opcodes (NFC)
Model wrap flags directly using VPRecipeWithIRFlags and clean up the
duplicated *NUW opcodes.

D157144 will build on this and also model FMFs for VPInstruction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157194
2023-08-08 12:12:30 +01:00
Florian Hahn
93c5bae00e
[VPlan] Use printOperands for VPInstruction.
Use the printOperands for printing VPInstruction's operands to be more
in line with other recipes and ensure consistent printing after D15719.

Also removes some stray spaces in print output.
2023-08-08 11:31:21 +01:00
Florian Hahn
0b17e9d285
[VPlan] Move VPRecipeWithIRFlags::getFastMathFlags. (NFCI)
Split off suggested refactoring from D157144. Also adds a assert to make
sure this is only used when OpType is FPMathOp.
2023-08-07 12:35:53 +01:00
Mel Chen
425e9e81a0 [LV] Rename the Select[I|F]Cmp reduction pattern to [I|F]AnyOf. (NFC)
Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D155786
2023-08-03 00:37:19 -07:00
Florian Hahn
2265bb064b
[LV] Update generateInstruction to return produced value (NFC).
Update generateInstruction to return the produced value instead of
setting it for each opcode. This reduces the amount of duplicated code
and is a preparation for D153696.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154240
2023-07-05 19:53:59 +01:00
Florian Hahn
0a246a0c72
[LV] Use VPValues when creating GEP with all invariant indices.
Update VPWidenGEPRecipe::execute to use the VPValue operands of the
recipe when creating the GEP instruction.

Fixes #63340.
2023-06-16 16:14:01 +01:00
Florian Hahn
8f781b96e2
Revert "[VPlan] Mark recurrence recipes as not having side-effects."
This reverts commit 02369b75fdd7b5fc5d9b47f1b60587c225918511.

At the moment, live-outs used *only* for the resume values in the scalar
loop are not modeled in VPlan yet. This means first-order recurrence
recipes could be removed, when a scalar epilogue is required and the
only use of a FOR is outside the loop.

Keep treating recurrence recipes as having side-effects for now, to
avoid them being removed.

Fixes #62954.
2023-06-06 11:35:26 +02:00
Florian Hahn
299f0ff60e
[VPlan] Print IR flags for VPRecipeWithIRFlags.
Now that IR flags are modeled as part of VPRecipeWithIRFlags, include
the flags when printing recipes.

Depends on D150027.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150029
2023-05-23 20:36:16 +01:00
Florian Hahn
8eaf7a75fe
[VPlan] Add missing ifdef after 96686796f606.
Fixes build with debug printing disabled.
2023-05-22 10:44:17 +01:00
Florian Hahn
96686796f6
[VPlan] Move live-out printing to VPLiveOut::print (NFC).
Preparation for D150398. This brings live-out printing in line with how
printing for recipes is handled.
2023-05-22 09:53:53 +01:00
Florian Hahn
236a0e82df
[LV] Use VPValue to get expanded value for SCEV step expressions.
Update skeleton creation logic to use SCEV expansion results from
expanding the pre-header. This avoids another set of SCEV expansions
that may happen after the CFG has been modified.

Fixes #58811.

Depends on D147964.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147965
2023-05-11 16:49:19 +01:00
Florian Hahn
c096e91735
[VPlan] Address missed suggestions from D149082.
This address 2 comments missed from D149082. It sets inbounds directly
when creating the GEP and fixes the order in the enum.
2023-05-09 15:17:20 +01:00
Florian Hahn
5f3343985b
[VPlan] Use VPRecipeWithIRFlags for VPWidenGEPRecipe (NFCI).
Extend VPRecipeWithIRFlags to also include InBounds and use for VPWidenGEPRecipe.

The last remaining recipe that needs updating for
MayGeneratePoisonRecipes is VPReplicateRecipe.

Depends on D149081.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149082
2023-05-09 12:33:28 +01:00
Florian Hahn
127b00b25c
[VPlan] Record IR flags on VPWidenRecipe directly (NFC).
This patch introduces a VPRecipeWithIRFlags class to record various IR
flags for a recipe. This allows de-coupling of IR flags from the
underlying instructions. The main benefit is that it allows dropping of
IR flags from recipes directly, without the need to go through
State::MayGeneratePoisonRecipes. The plan is to remove
MayGeneratePoisonRecipes once all relevant recipes are transitioned.

It also allows dropping IR flags during VPlan-to-VPlan transforms, which
will be used in a follow-up patch to implement truncateToMinimalBitwidths
as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149079
2023-05-08 17:28:50 +01:00
Florian Hahn
01fa764c9a
[VPlan] Assert instead of check if VF is vector when widening GEPs(NFC)
VPWidenGEPRecipe should not be generated for scalar VFs. Replace
check with an assert.
2023-05-06 09:25:56 +01:00
Florian Hahn
8bd02e5aef
[VPlan] Assert instead checking if VF is vec when widening calls (NFC)
VPWidenCallRecipe should not be generated for scalar VFs. Replace check
with an assert.
2023-05-05 18:21:57 +01:00
Florian Hahn
e3afe0b89d
[VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
To generate cast instructions, the result type is needed. To allow
creating widened casts without underlying instruction, introduce a new
VPWidenCastRecipe that also holds the result type.

This functionality will be used in a follow-up patch to
implement truncateToMinimalBitwidths as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149081
2023-05-05 13:20:16 +01:00
Florian Hahn
29712ccda6
[VPlan] Assert instead of check if VF is vector when widening casts.
VPWidenRecipes should not be generated for scalar VFs. Replace check
with an assert. Suggested in preparation for D149081.
2023-05-05 09:02:33 +01:00
Florian Hahn
1b05e74982
[VPlan] Reorder cases in switch (NFC).
Reorder cases to make sure they are ordered properly in preparation
for D149081.
2023-05-04 21:40:22 +01:00
Florian Hahn
b85a402dd8
[VPlan] Introduce new entry block to VPlan for early SCEV expansion.
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.

The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.

D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix #58811.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147964
2023-05-04 14:00:13 +01:00
Jay Foad
593e25ffae [Vectorize] Fix vectorization, scalarization and folding of llvm.is.fpclass
llvm.is.fpclass is different from other vectorizable intrinsics in that
it is overloaded on an argument type, not on the return type.

Differential Revision: https://reviews.llvm.org/D148905
2023-04-24 13:42:08 +01:00
Florian Hahn
02369b75fd
[VPlan] Mark recurrence recipes as not having side-effects.
Add support for FirstOrderRecurrenceSplice and VPFirstOrderRecurrencePHI
recipes to mayHaveSideEffects. They both don't have side-effects.
2023-04-17 12:30:52 +01:00
Florian Hahn
2db031528e
[VPlan] Check VPValue step in isCanonical (NFCI).
Update the isCanonical() implementations to check the VPValue step
operand instead of the step in the induction descriptor.

At the moment this is NFC, but it enables further optimizations if the
step is replaced by a constant in D147783.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147891
2023-04-16 14:48:03 +01:00
Florian Hahn
9be8d90e62
[VPlan] Add VPWidenSelectRecipe::getCond() (NFC).
Add helper to access condition, as suggested in D144489.
2023-03-10 17:49:23 +01:00
Florian Hahn
54558fd8f3
[VPlan] Replace InvariantCond field from VPWidenSelectRecipe.
There is no need to store information about invariance in the recipe.
Replace the fields with checks of the operands using
isDefinedOutsideVectorRegions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D144489
2023-03-10 15:28:43 +01:00
Florian Hahn
a8adb38a96
[VPlan] Replace invariance fields from VPWidenGEPRecipe.
There is no need to store information about invariance in the recipe.
Replace the fields with checks of the operands using
isDefinedOutsideVectorRegions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D144487
2023-03-09 17:52:22 +01:00
Nikita Popov
ffe8f47d72 [IR] Add operator<< overload for CmpInst::Predicate (NFC)
I regularly try and fail to use this while debugging.
2023-03-07 15:10:56 +01:00
Florian Hahn
be968dbeee
[VPlan] VPWidenCallRecipe has side-effects if the call has.
Handle VPWidenCallRecipe in VPRecipeBase::mayHaveSideEffects by
delegating to the underlying call.
2023-03-05 12:08:56 +01:00
Sander de Smalen
fe1b51ffee [LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.
When using tail-folding and using the predicate for both data and control-flow
(the next vector iteration's predicate is generated with the llvm.active.lane.mask
intrinsic and then tested for the backedge), the LoopVectorizer still inserts a
runtime check to see if the 'i + VF' may at any point overflow for the given
trip-count. When it does, it falls back to a scalar epilogue loop.

We can get rid of that runtime check in the pre-header and therefore also
remove the scalar epilogue loop. This reduces code-size and avoids a runtime
check.

Consider the following loop:

  void foo(char * __restrict__ dst, char *src, unsigned long N) {
      for (unsigned long  i=0; i<N; ++i)
          dst[i] = src[i] + 42;
  }

If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter
will overflow when calculating the predicate for the next vector iteration
at some point, because LLVM does:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %index.next = add i64 %index, 16
      ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed.
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N)
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

The solution:

What we can do instead is calculate the predicate before incrementing
the loop iteration counter, such that the llvm.active.lane.mask is
calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e.

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
    %N_minus_VF = select %N > 16 ? %N - 16 : 0

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF)
    %index.next = add i64 %index, %4
      ; The add above may still overflow, but this time the active.lane.mask is not affected
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

For N = 20, we'd then get:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
      ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1>
    %N_minus_VF = select 20 > 16 ? 20 - 16 : 0
      ; %N_minus_VF = 4

  vector.body: (1st iteration)
    ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4)
      ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 0, 16
      ; %index.next = 16
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 1
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %vector.body

  vector.body: (2nd iteration)
    ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4)
      ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 16, 16
      ; %index.next = 32
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %for.cond.cleanup

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D142109
2023-03-01 09:01:19 +00:00
Florian Hahn
c21ccebe6f
[VPlan] Use usesScalars in shouldPack.
Suggested by @Ayal as follow-up improvement in D143864.

I was unable to find a case where this actually changes generated code,
but it is a unifying code to use common infrastructure.
2023-02-20 14:11:40 +00:00
Florian Hahn
df016a9525
[VPlan] Move shouldPack outside of DEBUG ifdef.
This fixes a build failure with assertions disabled.
2023-02-20 10:53:45 +00:00
Florian Hahn
9333b97763
[VPlan] Replace AlsoPack field with shouldPack() method (NFC).
There is no need to update the AlsoPack field when creating
VPReplicateRecipes. It can be easily computed based on the VP def-use
chains when it is needed.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D143864
2023-02-20 10:28:26 +00:00
Graham Hunter
0fa5df1959 [LV] Synthesize all true masks for masked vector function variants
When vectorizing code with function calls in it, if we encounter
a function which only has vectorized variants requiring a mask
we can synthesize an all-true mask to enable us to proceed.

Since we want the mask to be represented in vplan, the pointer
to the chosen Function is now stored as part of the
VPWidenCallRecipe, and mask arguments are added at the
appropriate index to the recipe operands.

Reviewed By: david-arm, fhahn, reames

Differential Revision: https://reviews.llvm.org/D132458
2023-02-14 14:33:18 +00:00
Fangrui Song
1e6921131a Move global namespace cl::opt inside llvm:: 2023-02-14 00:09:44 -08:00
Florian Hahn
32efff591a
[VPlan] Mark load VPWidenMemoryInstruction as not having side-effects.
Also add an assert using the underlying instruction to catch any
potential violations.
2023-02-07 22:02:50 +00:00
Florian Hahn
cf2d436b31
[VPlan] VPPredInstPHIRecipe does not read from memory.
VPPredInstPHIRecipe just merges the incoming values and does not write
to memory.
2023-01-31 21:51:03 +00:00
Florian Hahn
5368536cf1
[VPlan] VPPredInstPHIRecipes does not write to memory.
VPPredInstPHIRecipe just merges the incoming values and does not write
to memory.
2023-01-30 10:29:27 +00:00
Florian Hahn
3c5f07349f
[VPlan] Mark VPScalarIVStepsRecipe as not reading/writing memory.
The recipe only computes the inductions steps using its operands. It
does neither read nor write memory.

Split of from D133760.
2022-12-04 12:58:46 +00:00
Florian Hahn
0c5df7cd2f
Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit bf15f1e489aa2f1ac13268c9081a992a8963eb5b.

The updated version fixes a crash by checking the induction kind instead
of the opcode; for integer inductions, the step is always added, but the
opcode might not be set.
2022-11-30 17:04:20 +00:00
Florian Hahn
bf15f1e489
Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit 0fa666ecedc3f36471c0fee925d664512e7525a8.

This triggers an assertion during AArch64 stage2 builds. Revert while I
investigate.

See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio
2022-11-28 22:43:11 +00:00
Florian Hahn
0fa666eced
[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.
This patch splits off the logic to transform the canonical IV to a
a value for an induction with a different start and step. This
transformation only needs to be done once (independent of VF/UF) and
enables sinking of VPScalarIVStepsRecipe as follow-up.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133758
2022-11-28 16:32:31 +00:00
Florian Hahn
55f56cdc33
[VPlan] Introduce VPValue::hasDefiningRecipe helper (NFC).
This clarifies the intention of code that uses the helper.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:12:40 +00:00
Florian Hahn
32f1c5531b
[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC)
The return value of getDef is guaranteed to be a VPRecipeBase and all
users can also accept a VPRecipeBase *. Most users actually case to
VPRecipeBase or a specific recipe before using it, so this change
removes a number of redundant casts.

Also rename it to getDefiningRecipe to make the name a bit clearer.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D136068
2022-11-16 22:12:08 +00:00