79 Commits

Author SHA1 Message Date
Florian Hahn
f7a8a78cb7
[VPlan] Also print operands of canonical IV (NFC).
Also print the operands of VPCanonicalIVPHIRecipe. That was missed
earlier.
2023-10-16 20:28:23 +01:00
Florian Hahn
d9f83169d1
[VPlan] Ensure start value of phis is the first op at construction (NFC)
Header phi recipes have the start value (incoming from outside the loop)
as first operand. This wasn't the case for VPWidenPHIRecipes. Instead
the start value was picked during execute() by doing extra work.

To be in line with other recipes, ensure the operand order is as
expected during construction.
2023-09-22 21:24:15 +01:00
Florian Hahn
f23246a0bb
[LV] Directly add fast-math flags to select recipe (NFC).
Now that VPInstruction can manage fast math flags via
VPRecipeWithIRFlags, use them directly to model the fast-math flags of
the select created for the final reduction value instead of adding them
late.
2023-09-21 11:05:55 +01:00
Florian Hahn
1d1cba44ea
[VPlan] Remove stray indent when printing scalar steps recipe.
VPScalarIVStepsRecipe will now be printed as
      vp<%6> = SCALAR-STEPS vp<%3>, ir<1>
instead of
      vp<%6>      = SCALAR-STEPS vp<%3>, ir<1>
2023-09-17 10:15:52 +01:00
Jeremy Morse
6942c64e81 [NFC][RemoveDIs] Prefer iterator-insertion over instructions
Continuing the patch series to get rid of debug intrinsics [0], instruction
insertion needs to be done with iterators rather than instruction pointers,
so that we can communicate information in the iterator class. This patch
adds an iterator-taking insertBefore method and converts various call sites
to take iterators. These are all sites where such debug-info needs to be
preserved so that a stage2 clang can be built identically; it's likely that
many more will need to be changed in the future.

At this stage, this is just changing the spelling of a few operations,
which will eventually become signifiant once the debug-info bearing
iterator is used.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939

Differential Revision: https://reviews.llvm.org/D152537
2023-09-11 11:48:45 +01:00
Florian Hahn
3e2d564c3d
[VPlan] Use VPRecipeWithFlags for VPScalarIVStepsRecipe (NFC).
This directly models the flags as part of the recipe, which allows
dropping them using the VPlan infrastructure when required.

It also allows removing the full reference to InductionDescriptor and
limit it to only the opcode.
2023-09-08 15:46:12 +01:00
Florian Hahn
785e7063b9
[VPlan] Don't rely on underlying instr in VPWidenRecipe (NFCI).
VPWidenRecipe only needs the opcode to widen, all other information
(flags, debug loc and operands) is already modeled directly via the
recipe.

This removes the remaining uses of the underlying instruction from
VPWidenRecipe::execute.
2023-09-06 16:27:09 +01:00
Florian Hahn
165e24aa2a
[VPlan] Move DebugLoc to VPRecipeBase (NFCI).
Add a dedicated debug location to VPRecipeBase to remove another
unneeded use of the underlying LLVM IR instruction and also consolidate
various DL fields in sub classes.

Each recipe can have debug location and it shouldn't rely on reference
to the underlying LLVM IR instructions to retain it. See various recipes
that had separate DL fields already.
2023-09-05 15:45:16 +01:00
Florian Hahn
168e23c741
[VPlan] Remove reference to Instr when setting debug loc. (NFCI)
This allows untangling references to underlying IR for various recipes.
2023-09-05 10:59:13 +01:00
Florian Hahn
3fa1b254b7
[VPlan] Print blend recipe as operand directly, instead of IR PHI.
Update VPBlendRecipe::print() to print the result directly, instead of
relying on the stored Phi pointer. This brings the recipe in line with
how other recipes are printed.
2023-09-04 12:35:58 +01:00
Florian Hahn
fd66195777
[VPlan] Manage compare predicates in VPRecipeWithIRFlags.
Extend VPRecipeWithIRFlags to also manage predicates for compares. This
allows removing the custom ICmpULE opcode from VPInstruction which was a
workaround for missing proper predicate handling.

This simplifies the code a bit while also allowing compares with any
predicates. It also fixes a case where the compare predixcate wasn't
printed properly for VPReplicateRecipes.

Discussed/split off from D150398.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158992
2023-09-02 21:45:24 +01:00
Florian Hahn
32cb8f519e
[VPlan] Generalize variable names for ICmpULE operands (NFC)
ICmp codegen for VPInstructionD will be extended for other predicates,
and the operands could be any values (not just IV and TC as implied by
the names). Suggested cleanup from 150398.
2023-08-28 15:47:04 +01:00
Florian Hahn
34d25924c4
[VPlan] Mark some VPInstruction opcodes as not having side effects.
Mark some VPInstruction opcodes as not having side effects, preparation
for D157037.
2023-08-22 20:05:57 +01:00
Florian Hahn
56f5738d85
[LV] Move induction ::execute impls to VPlanRecipes.cpp (NFC).
All dependencies on code from LoopVectorize.cpp have been
removed/refactored. Move the ::execute implementations to other recipe
definitions in VPlanRecipes.cpp
2023-08-20 21:00:05 +01:00
Florian Hahn
ada2a455fc
[VPlan] Use VPBasicBlock to get incoming block for exit phi fixup (NFC)
Retrieve block via VPlan infrastructure as suggested as independent
cleanup in D150398.
2023-08-17 18:17:45 +01:00
Mel Chen
463e7cb892 [LV][VPlan] Refactor VPReductionRecipe to use reference for member RdxDesc
This commit refactors the implementation of VPReductionRecipe to use
reference instead of pointer for member RdxDesc. Because the member
RdxDesc in VPReductionRecipe should not be a nullptr, using a reference
will provide clearer semantics.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D158058
2023-08-16 19:37:49 -07:00
Florian Hahn
aacaf3d580
[VPlan] Simplify VPDerivedIV truncation handling (NFCI).
Address post-commit simplification suggestion for 8a56179bcd8c: Replace
IsTruncated by conditionally setting TruncResultTy only if truncation
is required.
2023-08-14 17:33:10 +01:00
Florian Hahn
8a56179bcd
[VPlan] Store induction kind & binop directly in VPDerviedIVRecipe(NFC)
Limit the information stored in VPDerivedIVRecipe to the ingredients
really needed.
2023-08-10 10:57:32 +01:00
Florian Hahn
698ae66092
[VPlan] Replace FMF in VPInstruction with VPRecipeWithIRFlags (NFC).
Update VPInstruction to use VPRecipeWithIRFlags to manage FMFs for
VPInstruction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157144
2023-08-08 20:13:11 +01:00
Florian Hahn
b6d994de0f
[VPlan] Address post-commit suggestions for af635a554 (NFC). 2023-08-08 12:59:34 +01:00
Florian Hahn
af635a5547
[VPlan] Model wrap flags directly, remove *NUW opcodes (NFC)
Model wrap flags directly using VPRecipeWithIRFlags and clean up the
duplicated *NUW opcodes.

D157144 will build on this and also model FMFs for VPInstruction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D157194
2023-08-08 12:12:30 +01:00
Florian Hahn
93c5bae00e
[VPlan] Use printOperands for VPInstruction.
Use the printOperands for printing VPInstruction's operands to be more
in line with other recipes and ensure consistent printing after D15719.

Also removes some stray spaces in print output.
2023-08-08 11:31:21 +01:00
Florian Hahn
0b17e9d285
[VPlan] Move VPRecipeWithIRFlags::getFastMathFlags. (NFCI)
Split off suggested refactoring from D157144. Also adds a assert to make
sure this is only used when OpType is FPMathOp.
2023-08-07 12:35:53 +01:00
Mel Chen
425e9e81a0 [LV] Rename the Select[I|F]Cmp reduction pattern to [I|F]AnyOf. (NFC)
Regarding this NFC change, please refer to the discussion in this thread. https://reviews.llvm.org/D150851#4467261

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D155786
2023-08-03 00:37:19 -07:00
Florian Hahn
2265bb064b
[LV] Update generateInstruction to return produced value (NFC).
Update generateInstruction to return the produced value instead of
setting it for each opcode. This reduces the amount of duplicated code
and is a preparation for D153696.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154240
2023-07-05 19:53:59 +01:00
Florian Hahn
0a246a0c72
[LV] Use VPValues when creating GEP with all invariant indices.
Update VPWidenGEPRecipe::execute to use the VPValue operands of the
recipe when creating the GEP instruction.

Fixes #63340.
2023-06-16 16:14:01 +01:00
Florian Hahn
8f781b96e2
Revert "[VPlan] Mark recurrence recipes as not having side-effects."
This reverts commit 02369b75fdd7b5fc5d9b47f1b60587c225918511.

At the moment, live-outs used *only* for the resume values in the scalar
loop are not modeled in VPlan yet. This means first-order recurrence
recipes could be removed, when a scalar epilogue is required and the
only use of a FOR is outside the loop.

Keep treating recurrence recipes as having side-effects for now, to
avoid them being removed.

Fixes #62954.
2023-06-06 11:35:26 +02:00
Florian Hahn
299f0ff60e
[VPlan] Print IR flags for VPRecipeWithIRFlags.
Now that IR flags are modeled as part of VPRecipeWithIRFlags, include
the flags when printing recipes.

Depends on D150027.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150029
2023-05-23 20:36:16 +01:00
Florian Hahn
8eaf7a75fe
[VPlan] Add missing ifdef after 96686796f606.
Fixes build with debug printing disabled.
2023-05-22 10:44:17 +01:00
Florian Hahn
96686796f6
[VPlan] Move live-out printing to VPLiveOut::print (NFC).
Preparation for D150398. This brings live-out printing in line with how
printing for recipes is handled.
2023-05-22 09:53:53 +01:00
Florian Hahn
236a0e82df
[LV] Use VPValue to get expanded value for SCEV step expressions.
Update skeleton creation logic to use SCEV expansion results from
expanding the pre-header. This avoids another set of SCEV expansions
that may happen after the CFG has been modified.

Fixes #58811.

Depends on D147964.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147965
2023-05-11 16:49:19 +01:00
Florian Hahn
c096e91735
[VPlan] Address missed suggestions from D149082.
This address 2 comments missed from D149082. It sets inbounds directly
when creating the GEP and fixes the order in the enum.
2023-05-09 15:17:20 +01:00
Florian Hahn
5f3343985b
[VPlan] Use VPRecipeWithIRFlags for VPWidenGEPRecipe (NFCI).
Extend VPRecipeWithIRFlags to also include InBounds and use for VPWidenGEPRecipe.

The last remaining recipe that needs updating for
MayGeneratePoisonRecipes is VPReplicateRecipe.

Depends on D149081.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149082
2023-05-09 12:33:28 +01:00
Florian Hahn
127b00b25c
[VPlan] Record IR flags on VPWidenRecipe directly (NFC).
This patch introduces a VPRecipeWithIRFlags class to record various IR
flags for a recipe. This allows de-coupling of IR flags from the
underlying instructions. The main benefit is that it allows dropping of
IR flags from recipes directly, without the need to go through
State::MayGeneratePoisonRecipes. The plan is to remove
MayGeneratePoisonRecipes once all relevant recipes are transitioned.

It also allows dropping IR flags during VPlan-to-VPlan transforms, which
will be used in a follow-up patch to implement truncateToMinimalBitwidths
as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149079
2023-05-08 17:28:50 +01:00
Florian Hahn
01fa764c9a
[VPlan] Assert instead of check if VF is vector when widening GEPs(NFC)
VPWidenGEPRecipe should not be generated for scalar VFs. Replace
check with an assert.
2023-05-06 09:25:56 +01:00
Florian Hahn
8bd02e5aef
[VPlan] Assert instead checking if VF is vec when widening calls (NFC)
VPWidenCallRecipe should not be generated for scalar VFs. Replace check
with an assert.
2023-05-05 18:21:57 +01:00
Florian Hahn
e3afe0b89d
[VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
To generate cast instructions, the result type is needed. To allow
creating widened casts without underlying instruction, introduce a new
VPWidenCastRecipe that also holds the result type.

This functionality will be used in a follow-up patch to
implement truncateToMinimalBitwidths as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149081
2023-05-05 13:20:16 +01:00
Florian Hahn
29712ccda6
[VPlan] Assert instead of check if VF is vector when widening casts.
VPWidenRecipes should not be generated for scalar VFs. Replace check
with an assert. Suggested in preparation for D149081.
2023-05-05 09:02:33 +01:00
Florian Hahn
1b05e74982
[VPlan] Reorder cases in switch (NFC).
Reorder cases to make sure they are ordered properly in preparation
for D149081.
2023-05-04 21:40:22 +01:00
Florian Hahn
b85a402dd8
[VPlan] Introduce new entry block to VPlan for early SCEV expansion.
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.

The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.

D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix #58811.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147964
2023-05-04 14:00:13 +01:00
Jay Foad
593e25ffae [Vectorize] Fix vectorization, scalarization and folding of llvm.is.fpclass
llvm.is.fpclass is different from other vectorizable intrinsics in that
it is overloaded on an argument type, not on the return type.

Differential Revision: https://reviews.llvm.org/D148905
2023-04-24 13:42:08 +01:00
Florian Hahn
02369b75fd
[VPlan] Mark recurrence recipes as not having side-effects.
Add support for FirstOrderRecurrenceSplice and VPFirstOrderRecurrencePHI
recipes to mayHaveSideEffects. They both don't have side-effects.
2023-04-17 12:30:52 +01:00
Florian Hahn
2db031528e
[VPlan] Check VPValue step in isCanonical (NFCI).
Update the isCanonical() implementations to check the VPValue step
operand instead of the step in the induction descriptor.

At the moment this is NFC, but it enables further optimizations if the
step is replaced by a constant in D147783.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147891
2023-04-16 14:48:03 +01:00
Florian Hahn
9be8d90e62
[VPlan] Add VPWidenSelectRecipe::getCond() (NFC).
Add helper to access condition, as suggested in D144489.
2023-03-10 17:49:23 +01:00
Florian Hahn
54558fd8f3
[VPlan] Replace InvariantCond field from VPWidenSelectRecipe.
There is no need to store information about invariance in the recipe.
Replace the fields with checks of the operands using
isDefinedOutsideVectorRegions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D144489
2023-03-10 15:28:43 +01:00
Florian Hahn
a8adb38a96
[VPlan] Replace invariance fields from VPWidenGEPRecipe.
There is no need to store information about invariance in the recipe.
Replace the fields with checks of the operands using
isDefinedOutsideVectorRegions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D144487
2023-03-09 17:52:22 +01:00
Nikita Popov
ffe8f47d72 [IR] Add operator<< overload for CmpInst::Predicate (NFC)
I regularly try and fail to use this while debugging.
2023-03-07 15:10:56 +01:00
Florian Hahn
be968dbeee
[VPlan] VPWidenCallRecipe has side-effects if the call has.
Handle VPWidenCallRecipe in VPRecipeBase::mayHaveSideEffects by
delegating to the underlying call.
2023-03-05 12:08:56 +01:00
Sander de Smalen
fe1b51ffee [LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.
When using tail-folding and using the predicate for both data and control-flow
(the next vector iteration's predicate is generated with the llvm.active.lane.mask
intrinsic and then tested for the backedge), the LoopVectorizer still inserts a
runtime check to see if the 'i + VF' may at any point overflow for the given
trip-count. When it does, it falls back to a scalar epilogue loop.

We can get rid of that runtime check in the pre-header and therefore also
remove the scalar epilogue loop. This reduces code-size and avoids a runtime
check.

Consider the following loop:

  void foo(char * __restrict__ dst, char *src, unsigned long N) {
      for (unsigned long  i=0; i<N; ++i)
          dst[i] = src[i] + 42;
  }

If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter
will overflow when calculating the predicate for the next vector iteration
at some point, because LLVM does:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %index.next = add i64 %index, 16
      ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed.
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N)
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

The solution:

What we can do instead is calculate the predicate before incrementing
the loop iteration counter, such that the llvm.active.lane.mask is
calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e.

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
    %N_minus_VF = select %N > 16 ? %N - 16 : 0

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF)
    %index.next = add i64 %index, %4
      ; The add above may still overflow, but this time the active.lane.mask is not affected
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

For N = 20, we'd then get:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
      ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1>
    %N_minus_VF = select 20 > 16 ? 20 - 16 : 0
      ; %N_minus_VF = 4

  vector.body: (1st iteration)
    ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4)
      ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 0, 16
      ; %index.next = 16
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 1
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %vector.body

  vector.body: (2nd iteration)
    ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4)
      ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 16, 16
      ; %index.next = 32
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %for.cond.cleanup

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D142109
2023-03-01 09:01:19 +00:00
Florian Hahn
c21ccebe6f
[VPlan] Use usesScalars in shouldPack.
Suggested by @Ayal as follow-up improvement in D143864.

I was unable to find a case where this actually changes generated code,
but it is a unifying code to use common infrastructure.
2023-02-20 14:11:40 +00:00