1754 Commits

Author SHA1 Message Date
Florian Hahn
bf0bd85f9d
[LV] Move trunc codegen to buildScalarSteps (NFCI).
This moves the code to truncate step and IV into buildScalarSteps,
closer to the place where they are actually used.

Suggested in D133758.
2022-11-26 23:48:46 +00:00
Florian Hahn
12bb5535d2
[VPlan] Move cast codegen to emitTransformedIndex (NFCI).
This reduces duplication a bit.

Suggested as simplification in D133758.
2022-11-26 22:47:13 +00:00
Florian Hahn
ed2fdace89
[LV] Use separate index to access StoredValues in vectorizeInterleave.
StoredValues only has entries for members of the interleave group. If
there are gaps, then using the index i here will either access a wrong
entry or be out-of-bounds.

Instead use a dedicated index that only gets incremented for members of
the interleave group.

Fixes #59090.
2022-11-25 15:28:05 +00:00
Fangrui Song
fa36d72305 [LoopVectorize] Internalize some cl::opt 2022-11-23 23:03:02 -08:00
Bjorn Pettersson
1c308d6641 [LV] Clean up LoopVectorizationCostModel::calculateRegisterUsage. NFC
Minor refactoring in LoopVectorizationCostModel::calculateRegisterUsage.

Also adding some FIXME:s related to what appears to be some short
comings related to how the register usage is calculated.

Differential Revision: https://reviews.llvm.org/D138342
2022-11-20 20:52:13 +01:00
Florian Hahn
55f56cdc33
[VPlan] Introduce VPValue::hasDefiningRecipe helper (NFC).
This clarifies the intention of code that uses the helper.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:12:40 +00:00
Florian Hahn
239b52d4b6
[VPlan] Update stale comment (NFC).
Update comment to reflect current code, which also allows for
VPScalarIVStepsRecipes to be uniform.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 22:39:50 +00:00
Florian Hahn
bcc9c5d959
[LV] Replace unnecessary cast_or_null with cast (NFC).
The existing code already unconditionally dereferences RepR, so
cast_or_null can be replaced by just cast.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 22:31:59 +00:00
Florian Hahn
32f1c5531b
[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC)
The return value of getDef is guaranteed to be a VPRecipeBase and all
users can also accept a VPRecipeBase *. Most users actually case to
VPRecipeBase or a specific recipe before using it, so this change
removes a number of redundant casts.

Also rename it to getDefiningRecipe to make the name a bit clearer.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D136068
2022-11-16 22:12:08 +00:00
Jordan Rupprecht
81896f88ce [NFC] Remove unused OrigLoopID vars 2022-11-11 07:51:40 -08:00
Florian Hahn
2d7e5e29b7
[LV] Remove unused OrigLoopID argument from completeLoopSekelton (NFC).
The argument is not used any longer and can be removed.
2022-11-11 15:39:08 +00:00
LiDongjin
d1cee3539f [LoopVectorize] Fix crash on "Cannot dereference end iterator!"(PR56627)
Check hasOneUser before user_back().

Differential Revision: https://reviews.llvm.org/D136227
2022-11-03 23:13:37 +08:00
Philip Reames
269bc684e7 [LV][RISCV] Disable vectorization of epilogue loops
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.

In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.

In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.

As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.

As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.

Differential Revision: https://reviews.llvm.org/D136695
2022-10-25 14:28:02 -07:00
Florian Hahn
e25ed058bc
[LV] Use buildScalarSteps to also handle VF = 1. (NFCI)
The code in buildScalarSteps already properly handles creating the
scalar induction values with VF = 1. Use it directly instead of using
extra code to handle that case.

Suggested by @Ayal in D133760.
2022-10-20 14:30:01 +01:00
Florian Hahn
d72fcee8f4
[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC).
@Ayal suggested a better named helper than using `!getDef()` to check if
a value is invariant across all parts.

The property we are using here is that the VPValue is defined outside
any vector loop region. There's a TODO left to handle recipes defined in
pre-header blocks.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133666
2022-10-19 13:20:30 +01:00
Craig Topper
d3366efd43 [LV] Simplify register usage code and avoid double map lookups. NFC
Instead of checking whether a map entry exists to decide if we should
initialize it or add to it, we can rely on the map entry being constructed
and initialized to 0 before the addition happens.

For the std::max case, I've made a reference to the map entry to
avoid looking it up twice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135977
2022-10-14 11:55:48 -07:00
Florian Hahn
c1fe52bfa6
[VPlan] Remove dead recipes before sinking.
optimizeInductions may leave dead recipes which can prevent sinking.
Sinking on the other hand should not introduce new dead recipes, so
clean up dead recipes before sinking.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133762
2022-10-12 12:49:42 +01:00
Florian Hahn
73950f26f5
[LV] Replace check with assert for reduction resume values (NFC).
At this point, we need to have resume values for all inductions. If not,
this would result in silent mis-compiles.
2022-10-08 16:26:10 +01:00
Florian Hahn
825e16969e
[LAA] Pass LoopAccessInfoManager instead of GetLAA function.
Use LoopAccessInfoManager directly instead of various GetLAA lambdas.

Depends on D134608.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134609
2022-10-04 11:51:25 +01:00
Florian Hahn
7c0ff64b0f
[LAA] Change to function analysis for new PM.
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.

To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.

Fixes #50940.

Fixes #51669.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134606
2022-10-01 15:44:27 +01:00
Florian Hahn
080a1e2bbb
[LV] Create createInductionResumeValue helper (NFC).
Factor out the logic to create induction resume values for a specific
induction. This will be used in D92132 to support widened IVs during
epilogue vectorization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D134211
2022-09-29 11:13:01 +01:00
serge-sans-paille
16544cbe64 [iwyu] Move <cmath> out of llvm/Support/MathExtras.h
Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of
that header and include it when needed.

No functional change intended, but there's no longer a transitive include
fromMathExtras.h to cmath.
2022-09-28 20:49:01 +02:00
Philip Reames
899ebd7e99 [LV] Remove two unused default arguments [nfc] 2022-09-27 14:33:53 -07:00
Philip Reames
dc7387b587 [LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores
Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated.

The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use.

Differential Revision: https://reviews.llvm.org/D134460
2022-09-27 07:28:40 -07:00
Florian Hahn
2c692d891e
[LV] Update handling of scalable pointer inductions after b73d2c8.
The dependent code has been changed quite a lot since 151c144 which
b73d2c8 effectively reverts. Now we run into a case where lowering
didn't expect/support the behavior pre 151c144 any longer.

Update the code dealing with scalable pointer inductions to also check
for uniformity in combination with isScalarAfterVectorization. This
should ensure scalable pointer inductions are handled properly during
epilogue vectorization.

Fixes #57912.
2022-09-23 18:23:02 +01:00
Philip Reames
32dc1151e2 [VPlan] Only generate single instr for unpredicated stores of varying value to invariant address
This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.)

This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.)

Differential Revision: https://reviews.llvm.org/D133580
2022-09-22 08:53:46 -07:00
Florian Hahn
dcbc8a0daa
[LV] Remove unused widenCallInstruction declaration (NFC).
The definition and uses have been removed a while ago. Clean up the
unused declaration.
2022-09-20 15:20:28 +01:00
Florian Hahn
582f8ef19f
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.

At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.

This can lead to widened phis with incorrect start values being created
in the epilogue vector body.

This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization

Fixes #57712.
2022-09-19 18:14:35 +01:00
Craig Topper
90a004b4a1 [LV] Remove FIXME about NoImplicitFloat. NFC
My understanding is that NoImplicitFloat, despite it's name, is
supposed to disable all vectors not just float vectors.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134084
2022-09-19 10:01:02 -07:00
Florian Hahn
ac80b0e84f
[LV] Mark Instr as const in scalarizeInstruction. (NFC).
This is to reduce the diff in follow-up changes.
2022-09-13 09:10:02 +01:00
Florian Hahn
69d9bb2aad
[VPlan] Check recipe uses instead of type of underlying instr (NFC).
Suggested by @Ayal post-commit, to reduce the dependence on the
underlying instruction in favor of information available directly for
the recipe.
2022-09-11 12:24:44 +01:00
Florian Hahn
da734473fa
[LV] Remove now dead variable after 2a78890b7b7f08 (NFC). 2022-09-09 20:25:55 +01:00
Florian Hahn
2a78890b7b
[VPlan] Move SCEV expansion for pointer induction to VPExpandSCEV (NFC).
Use VPExpandSCEVRecipe to expand the step of pointer inductions. This
cleanup addresses a corresponding FIXME.

It should be NFC, as steps for pointer induction must be constants,
which makes expansion trivial.
2022-09-09 19:20:13 +01:00
Philip Reames
a33d98e20a [LV] Pull out common expression [nfc] 2022-09-09 07:31:46 -07:00
Philip Reames
edb26268ce [VPlan] Only generate single instr for stores uniform across all parts.
Extend the approach taken by D133019 to store instructions.

Differential Revision: https://reviews.llvm.org/D133497
2022-09-09 07:15:12 -07:00
Philip Reames
4c4c0d2c06 [LV] Use safe-divisor lowering for fixed vectors if profitable
This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well.

Differential Revision: https://reviews.llvm.org/D132591
2022-09-08 09:15:54 -07:00
Florian Hahn
422cf99161
[VPlan] Only generate single instr for loads uniform across all parts.
VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a
scalar instruction is generated per-part.

This is a potential alternative D132892. For now the current patch only
catches cases where the address is trivially invariant (defined outside
VPlan), while D132892 catches any address that is considered invariant
by SCEV AFAICT.

It should be possible to hoist fully invariant recipes feeding loads out
of the vector loop region as well, but in practice LICM should do that
already.

This version of the patch artificially limits this to loads to make it
easier to compare, but this restriction should be easily liftable.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133019
2022-09-08 14:27:58 +01:00
Florian Hahn
408ebe5e3a
[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC).
Depends on D132585.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132586
2022-09-05 10:48:29 +01:00
Florian Hahn
fc444ddc77
[VPlan] Add field to track if intrinsic should be used for call. (NFC)
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132585
2022-09-01 13:14:40 +01:00
Philip Reames
8936d86469 [LV] Add debug output for force scalar tracing [nfc]
I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.
2022-08-29 15:17:51 -07:00
Florian Hahn
c78696813f
[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC).
Suggested as independent fix during the review of D132585.
2022-08-29 10:19:47 +01:00
Kazu Hirata
56ea4f9bd3 [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-27 21:21:02 -07:00
Kazu Hirata
a33ef8f2b7 Use llvm::all_equal (NFC) 2022-08-27 09:53:10 -07:00
Philip Reames
3dcec5e29f [LV] Consistently use vputils::isUniformAfterVectorization [mostly nfc]
I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them.

For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check.  On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism.  It would be correct to consider values defined outside of vplan uniform here.
2022-08-26 11:09:17 -07:00
Philip Reames
2d5f025779 [LV] Extract utility for checking if VPValue is uniform [nfc] 2022-08-26 09:56:13 -07:00
Daniil Fukalov
9c710ebbdb [TTI] NFC: Reduce InstructionCost::getValue() usage...
in order to propagate `InstructionCost` value upper.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D103406
2022-08-26 16:37:32 +03:00
Philip Reames
23245a914b [LV] Simplify code given isPredicatedInst doesn't dependent on VF any more [nfc] 2022-08-24 11:42:10 -07:00
Philip Reames
3ab00cfca9 [LV] Adjust code added in f79214d1 for 531dd3634 [nfc]
When rebasing the review which became f79214d1, I forgot to adjust for the changed semantics introduced by 531dd3634.  Functionally, this had no impact, but semantically it resulted in an incorrect result for isPredicatedInst.  I noticed this while doing a follow up change.
2022-08-24 10:38:17 -07:00
Philip Reames
f79214d1e1 [LV] Support predicated div/rem operations via safe-divisor select idiom
This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors.

The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others.  This one was chosen mostly because it was straight forward, and none of the others seemed oviously better.

I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches.

Differential Revision: https://reviews.llvm.org/D130164
2022-08-24 10:07:59 -07:00
David Green
8d830f8d68 [LV] Replace fixed-order cost model with a SK_Splice shuffle
The existing cost model for fixed-order recurrences models the phi as an
extract shuffle of a v1 vector. The shuffle produced should be a splice,
as they take two vectors inputs are extracting from a subset of the
lanes. On certain architectures the existing cost model can drastically
under-estimate the correct cost for the shuffle, so this changes it to a
SK_Splice and passes a correct Mask through to the getShuffleCost call.

I believe this might be the first use of a SK_Splice shuffle cost model
outside of scalable vectors, and some targets may require additions to
the cost-model to correctly account for them. In tree targets appear to
all have been updated where needed.

Differential Revision: https://reviews.llvm.org/D132308
2022-08-24 13:00:32 +01:00