1743 Commits

Author SHA1 Message Date
LiDongjin
d1cee3539f [LoopVectorize] Fix crash on "Cannot dereference end iterator!"(PR56627)
Check hasOneUser before user_back().

Differential Revision: https://reviews.llvm.org/D136227
2022-11-03 23:13:37 +08:00
Philip Reames
269bc684e7 [LV][RISCV] Disable vectorization of epilogue loops
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.

In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.

In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.

As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.

As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.

Differential Revision: https://reviews.llvm.org/D136695
2022-10-25 14:28:02 -07:00
Florian Hahn
e25ed058bc
[LV] Use buildScalarSteps to also handle VF = 1. (NFCI)
The code in buildScalarSteps already properly handles creating the
scalar induction values with VF = 1. Use it directly instead of using
extra code to handle that case.

Suggested by @Ayal in D133760.
2022-10-20 14:30:01 +01:00
Florian Hahn
d72fcee8f4
[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC).
@Ayal suggested a better named helper than using `!getDef()` to check if
a value is invariant across all parts.

The property we are using here is that the VPValue is defined outside
any vector loop region. There's a TODO left to handle recipes defined in
pre-header blocks.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133666
2022-10-19 13:20:30 +01:00
Craig Topper
d3366efd43 [LV] Simplify register usage code and avoid double map lookups. NFC
Instead of checking whether a map entry exists to decide if we should
initialize it or add to it, we can rely on the map entry being constructed
and initialized to 0 before the addition happens.

For the std::max case, I've made a reference to the map entry to
avoid looking it up twice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135977
2022-10-14 11:55:48 -07:00
Florian Hahn
c1fe52bfa6
[VPlan] Remove dead recipes before sinking.
optimizeInductions may leave dead recipes which can prevent sinking.
Sinking on the other hand should not introduce new dead recipes, so
clean up dead recipes before sinking.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133762
2022-10-12 12:49:42 +01:00
Florian Hahn
73950f26f5
[LV] Replace check with assert for reduction resume values (NFC).
At this point, we need to have resume values for all inductions. If not,
this would result in silent mis-compiles.
2022-10-08 16:26:10 +01:00
Florian Hahn
825e16969e
[LAA] Pass LoopAccessInfoManager instead of GetLAA function.
Use LoopAccessInfoManager directly instead of various GetLAA lambdas.

Depends on D134608.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134609
2022-10-04 11:51:25 +01:00
Florian Hahn
7c0ff64b0f
[LAA] Change to function analysis for new PM.
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.

To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.

Fixes #50940.

Fixes #51669.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134606
2022-10-01 15:44:27 +01:00
Florian Hahn
080a1e2bbb
[LV] Create createInductionResumeValue helper (NFC).
Factor out the logic to create induction resume values for a specific
induction. This will be used in D92132 to support widened IVs during
epilogue vectorization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D134211
2022-09-29 11:13:01 +01:00
serge-sans-paille
16544cbe64 [iwyu] Move <cmath> out of llvm/Support/MathExtras.h
Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of
that header and include it when needed.

No functional change intended, but there's no longer a transitive include
fromMathExtras.h to cmath.
2022-09-28 20:49:01 +02:00
Philip Reames
899ebd7e99 [LV] Remove two unused default arguments [nfc] 2022-09-27 14:33:53 -07:00
Philip Reames
dc7387b587 [LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores
Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated.

The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use.

Differential Revision: https://reviews.llvm.org/D134460
2022-09-27 07:28:40 -07:00
Florian Hahn
2c692d891e
[LV] Update handling of scalable pointer inductions after b73d2c8.
The dependent code has been changed quite a lot since 151c144 which
b73d2c8 effectively reverts. Now we run into a case where lowering
didn't expect/support the behavior pre 151c144 any longer.

Update the code dealing with scalable pointer inductions to also check
for uniformity in combination with isScalarAfterVectorization. This
should ensure scalable pointer inductions are handled properly during
epilogue vectorization.

Fixes #57912.
2022-09-23 18:23:02 +01:00
Philip Reames
32dc1151e2 [VPlan] Only generate single instr for unpredicated stores of varying value to invariant address
This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.)

This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.)

Differential Revision: https://reviews.llvm.org/D133580
2022-09-22 08:53:46 -07:00
Florian Hahn
dcbc8a0daa
[LV] Remove unused widenCallInstruction declaration (NFC).
The definition and uses have been removed a while ago. Clean up the
unused declaration.
2022-09-20 15:20:28 +01:00
Florian Hahn
582f8ef19f
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.

At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.

This can lead to widened phis with incorrect start values being created
in the epilogue vector body.

This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization

Fixes #57712.
2022-09-19 18:14:35 +01:00
Craig Topper
90a004b4a1 [LV] Remove FIXME about NoImplicitFloat. NFC
My understanding is that NoImplicitFloat, despite it's name, is
supposed to disable all vectors not just float vectors.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134084
2022-09-19 10:01:02 -07:00
Florian Hahn
ac80b0e84f
[LV] Mark Instr as const in scalarizeInstruction. (NFC).
This is to reduce the diff in follow-up changes.
2022-09-13 09:10:02 +01:00
Florian Hahn
69d9bb2aad
[VPlan] Check recipe uses instead of type of underlying instr (NFC).
Suggested by @Ayal post-commit, to reduce the dependence on the
underlying instruction in favor of information available directly for
the recipe.
2022-09-11 12:24:44 +01:00
Florian Hahn
da734473fa
[LV] Remove now dead variable after 2a78890b7b7f08 (NFC). 2022-09-09 20:25:55 +01:00
Florian Hahn
2a78890b7b
[VPlan] Move SCEV expansion for pointer induction to VPExpandSCEV (NFC).
Use VPExpandSCEVRecipe to expand the step of pointer inductions. This
cleanup addresses a corresponding FIXME.

It should be NFC, as steps for pointer induction must be constants,
which makes expansion trivial.
2022-09-09 19:20:13 +01:00
Philip Reames
a33d98e20a [LV] Pull out common expression [nfc] 2022-09-09 07:31:46 -07:00
Philip Reames
edb26268ce [VPlan] Only generate single instr for stores uniform across all parts.
Extend the approach taken by D133019 to store instructions.

Differential Revision: https://reviews.llvm.org/D133497
2022-09-09 07:15:12 -07:00
Philip Reames
4c4c0d2c06 [LV] Use safe-divisor lowering for fixed vectors if profitable
This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well.

Differential Revision: https://reviews.llvm.org/D132591
2022-09-08 09:15:54 -07:00
Florian Hahn
422cf99161
[VPlan] Only generate single instr for loads uniform across all parts.
VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a
scalar instruction is generated per-part.

This is a potential alternative D132892. For now the current patch only
catches cases where the address is trivially invariant (defined outside
VPlan), while D132892 catches any address that is considered invariant
by SCEV AFAICT.

It should be possible to hoist fully invariant recipes feeding loads out
of the vector loop region as well, but in practice LICM should do that
already.

This version of the patch artificially limits this to loads to make it
easier to compare, but this restriction should be easily liftable.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133019
2022-09-08 14:27:58 +01:00
Florian Hahn
408ebe5e3a
[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC).
Depends on D132585.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132586
2022-09-05 10:48:29 +01:00
Florian Hahn
fc444ddc77
[VPlan] Add field to track if intrinsic should be used for call. (NFC)
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132585
2022-09-01 13:14:40 +01:00
Philip Reames
8936d86469 [LV] Add debug output for force scalar tracing [nfc]
I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.
2022-08-29 15:17:51 -07:00
Florian Hahn
c78696813f
[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC).
Suggested as independent fix during the review of D132585.
2022-08-29 10:19:47 +01:00
Kazu Hirata
56ea4f9bd3 [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-27 21:21:02 -07:00
Kazu Hirata
a33ef8f2b7 Use llvm::all_equal (NFC) 2022-08-27 09:53:10 -07:00
Philip Reames
3dcec5e29f [LV] Consistently use vputils::isUniformAfterVectorization [mostly nfc]
I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them.

For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check.  On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism.  It would be correct to consider values defined outside of vplan uniform here.
2022-08-26 11:09:17 -07:00
Philip Reames
2d5f025779 [LV] Extract utility for checking if VPValue is uniform [nfc] 2022-08-26 09:56:13 -07:00
Daniil Fukalov
9c710ebbdb [TTI] NFC: Reduce InstructionCost::getValue() usage...
in order to propagate `InstructionCost` value upper.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D103406
2022-08-26 16:37:32 +03:00
Philip Reames
23245a914b [LV] Simplify code given isPredicatedInst doesn't dependent on VF any more [nfc] 2022-08-24 11:42:10 -07:00
Philip Reames
3ab00cfca9 [LV] Adjust code added in f79214d1 for 531dd3634 [nfc]
When rebasing the review which became f79214d1, I forgot to adjust for the changed semantics introduced by 531dd3634.  Functionally, this had no impact, but semantically it resulted in an incorrect result for isPredicatedInst.  I noticed this while doing a follow up change.
2022-08-24 10:38:17 -07:00
Philip Reames
f79214d1e1 [LV] Support predicated div/rem operations via safe-divisor select idiom
This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors.

The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others.  This one was chosen mostly because it was straight forward, and none of the others seemed oviously better.

I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches.

Differential Revision: https://reviews.llvm.org/D130164
2022-08-24 10:07:59 -07:00
David Green
8d830f8d68 [LV] Replace fixed-order cost model with a SK_Splice shuffle
The existing cost model for fixed-order recurrences models the phi as an
extract shuffle of a v1 vector. The shuffle produced should be a splice,
as they take two vectors inputs are extracting from a subset of the
lanes. On certain architectures the existing cost model can drastically
under-estimate the correct cost for the shuffle, so this changes it to a
SK_Splice and passes a correct Mask through to the getShuffleCost call.

I believe this might be the first use of a SK_Splice shuffle cost model
outside of scalable vectors, and some targets may require additions to
the cost-model to correctly account for them. In tree targets appear to
all have been updated where needed.

Differential Revision: https://reviews.llvm.org/D132308
2022-08-24 13:00:32 +01:00
Florian Hahn
ff34432649
[LoopUtils] Remove unused Loop arg from addDiffRuntimeChecks (NFC).
The argument is no longer used, remove it.
2022-08-23 10:15:28 +01:00
Philip Reames
27d3321c4f [TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc]
This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.
2022-08-22 11:26:31 -07:00
Philip Reames
c42a5f1cc2 [TTI] Migrate getOperandInfo to OperandVaueInfo [nfc]
This is part of merging OperandValueKind and OperandValueProperties.
2022-08-22 10:19:02 -07:00
Philip Reames
5cd427106d [TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc]
OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling.  We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so.

This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works.  Target TTI implementations still use the split flags.  I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.
2022-08-22 09:48:15 -07:00
Simon Pilgrim
5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Philip Reames
b0a2c48e9f [tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc] 2022-08-19 16:22:22 -07:00
Alexey Bataev
d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Florian Hahn
b8709a9d03
[LV] Support fixed order recurrences.
If the incoming previous value of a fixed-order recurrence is a phi in
the header, go through incoming values from the latch until we find a
non-phi value. Use this as the new Previous, all uses in the header
will be dominated by the original phi, but need to be moved after
the non-phi previous value.

At the moment, fixed-order recurrences are modeled as a chain of
first-order recurrences.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D119661
2022-08-18 19:15:52 +01:00
Philip Reames
531dd3634d [LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops)
This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is *not* an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it *not* to be a predicated instruction, but *were* considering it to be scalable with predication.

As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious.

Differential Revision: https://reviews.llvm.org/D131093
2022-08-18 07:14:04 -07:00
Kazu Hirata
50724716cd [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-14 12:51:58 -07:00
Kazu Hirata
109df7f9a4 [llvm] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-13 12:55:42 -07:00