1324 Commits

Author SHA1 Message Date
Caroline Concatto
cf06c8eee3 [LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction
The function fixReduction used to assert/crash for scalable vector when
a vector reduce could be done with a smaller vector.
This patch removes this assertion as it is safe to use scalable vector for
vector reduce and truncate.

Differential Revision: https://reviews.llvm.org/D101260
2021-05-07 09:37:37 +01:00
David Green
4979c90458 [LV] Account for tripcount when calculation vectorization profitability
The loop vectorizer will currently assume a large trip count when
calculating which of several vectorization factors are more profitable.
That is often not a terrible assumption to make as small trip count
loops will usually have been fully unrolled. There are cases however
where we will try to vectorize them, and especially when folding the
tail by masking can incorrectly choose to vectorize loops that are not
beneficial, due to the folded tail rounding the iteration count up for
the vectorized loop.

The motivating example here has a trip count of 5, so either performs 5
scalar iterations or 2 vector iterations (with VF=4). At a high enough
trip count the vectorization becomes profitable, but the rounding up to
2 vector iterations vs only 5 scalar makes it unprofitable.

This adds an alternative cost calculation when we know the max trip
count and are folding tail by masking, rounding the iteration count up
to the correct number for the vector width. We still do not account for
anything like setup cost or the mixture of vector and scalar loops, but
this is at least an improvement in a few cases that we have had
reported.

Differential Revision: https://reviews.llvm.org/D101726
2021-05-06 12:36:46 +01:00
Kerry McLaughlin
8c9742bd23 [SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences
Adds support for scalable vectorization of loops containing first-order recurrences, e.g:
```
for(int i = 0; i < n; i++)
  b[i] =  a[i] + a[i - 1]
```
This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into
account when inserting into and extracting from the last lane of a vector.
CreateVectorSplice has been added to construct a vector for the recurrence, which
returns a splice intrinsic for scalable types. For fixed-width the behaviour
remains unchanged as CreateVectorSplice will return a shufflevector instead.

The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D101076
2021-05-06 11:35:39 +01:00
Philip Reames
80e8025083 [LV] Workaround PR49900 (a crash due to analyzing partially mutated IR)
LoopVectorize has a fairly deeply baked in design problem where it will try to query analysis (primarily SCEV, but also ValueTracking) in the midst of mutating IR. In particular, the intermediate IR state does not represent the semantics of the original (or final) program.

Fixing this for real is hard, but all of the cases seen so far share a common symptom. In cases seen to date, the analysis being queried is the computation of the original loop's trip count. We can fix this particular instance of the issue by simply computing the trip count early, and caching it.

I want to be really clear that this is nothing but a workaround. It does nothing to fix the root issue, and at best, delays the time until we have to fix this for real. Florian and I have discussed an eventual solution in the review comments for https://reviews.llvm.org/D100663, but it's a lot of work.

Test taken from https://reviews.llvm.org/D100663.

Differential Revision: https://reviews.llvm.org/D101487
2021-05-05 09:56:28 -07:00
Florian Hahn
ccebf7a109
[VPlan] Properly handle sinking of replicate regions.
This patch updates the code that sinks recipes required for first-order
recurrences to properly handle replicate-regions. At the moment, the
code would just move the replicate recipe out of its replicate-region,
producing an invalid VPlan.

When sinking a recipe in a replicate-region, we have to sink the whole
region. To do that, we first need to split the block at the target
recipe and move the region in between.

This patch also adds a splitAt helper to VPBasicBlock to split a
VPBasicBlock at a given iterator.

Fixes PR50009.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100751
2021-05-04 22:36:01 +01:00
Florian Hahn
4ba8720f88
[VPlan] Representing backedge def-use feeding reduction phis.
This patch updates the code handling reduction recipes to also keep
track of the incoming value from the latch in the recipe. This is needed
to model the def-use chains completely in VPlan, so that it is possible
to replace the incoming value with an arbitrary VPValue.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D99294
2021-05-04 16:33:22 +01:00
Sander de Smalen
9931ae645e Reland "[LV] Calculate max feasible scalable VF."
Relands https://reviews.llvm.org/D98509

This reverts commit 51d648c119d7773ce6fb809353bd6bd14bca8818.
2021-05-04 15:44:41 +01:00
Florian Hahn
2b7fa7f744 [LV] Iterate over recipes in VPlan to fix PHI (NFC).
As we gradually move more elements of LV to VPlan, we are trying to
reduce the number of places that still has to check IR of the original
loop.

This patch adjusts the code to fix cross iteration phis to get the PHIs
to fix directly from the VPlan that is executed. We still need the
original PHI to check for first-order recurrences, but we can get rid of
that once we model that explicitly in VPlan as well.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D99293
2021-05-03 14:09:46 +01:00
Sander de Smalen
51d648c119 Revert "[LV] Calculate max feasible scalable VF."
Temporarily reverting this patch due to some unexpected issue found
by one of the PPC buildbots.

This reverts commit 584e9b6e4b4987b882719923e640eed854613d91.
2021-04-29 16:04:37 +01:00
Florian Hahn
a0e1313c23
[VPlan] Add getVPSingleValue helper.
As suggested in D99294, this adds a getVPSingleValue helper to use for
recipes that are guaranteed to define a single value. This replaces uses
of getVPValue() which used to default to I = 0.
2021-04-29 13:37:38 +01:00
Bardia Mahjour
ddb3b26a12 [LV] Consider Loop Unroll Hints When Making Interleave Decisions
This patch causes the loop vectorizer to not interleave loops that have
nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)).
Note that if a particular interleave count is being requested
(through llvm.loop.interleave_count), it will still be honoured, regardless
of the presence of nounroll hints.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101374
2021-04-28 17:27:52 -04:00
David Sherwood
00e65f3345 [LoopVectorize][SVE] Fix crash when vectorising FP negation
This patch fixes a crash encountered when vectorising the following loop:

 void foo(float *dst, float *src, long long n) {
   for (long long i = 0; i < n; i++)
     dst[i] = -src[i];
 }

using scalable vectors. I've added a test to

 Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

as well as cleaned up the other tests in the same file.

Differential Revision: https://reviews.llvm.org/D98054
2021-04-28 15:22:35 +01:00
Tres Popp
f0e848e63d Silence unused variable warning 2021-04-28 15:46:09 +02:00
David Sherwood
6998f8ae2d [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-28 13:41:07 +01:00
Sander de Smalen
584e9b6e4b [LV] Calculate max feasible scalable VF.
This patch also refactors the way the feasible max VF is calculated,
although this is NFC for fixed-width vectors.

After this change scalable VF hints are no longer truncated/clamped
to a shorter scalable VF, nor does it drop the 'scalable flag' from
the suggested VF to vectorize with a similar VF that is fixed.

Instead, the hint is ignored which means the vectorizer is free
to find a more suitable VF, using the CostModel to determine the
best possible VF.

Reviewed By: c-rhodes, fhahn

Differential Revision: https://reviews.llvm.org/D98509
2021-04-28 12:30:00 +01:00
Kerry McLaughlin
9cc217ab36 [LoopVectorize] Prevent multiple Phis being generated with in-order reductions
When using the -enable-strict-reductions flag where UF>1 we generate multiple
Phi nodes, though only one of these is used as an input to the vector.reduce.fadd
intrinsics. The unused Phi nodes are removed later by instcombine.

This patch changes widenPHIInstruction/fixReduction to only generate
one Phi, and adds an additional test for unrolling to strict-fadd.ll

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D100570
2021-04-28 11:29:01 +01:00
David Sherwood
6968520c3b Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 4afeda9157cffd2daa83f8075d73f1e11ea34c81.
2021-04-27 15:46:03 +01:00
David Sherwood
4afeda9157 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-27 15:26:15 +01:00
Florian Hahn
cb96d802d4
[LV] Hoist code to get vector loop latch (NFC).
Address suggestion from D99294.
2021-04-27 13:30:17 +01:00
Joe Ellis
2c551aedcf [LoopVectorize] Fix bug where predicated loads/stores were dropped
This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

     1  void foo(int *restrict data1, int *restrict data2)
     2  {
     3    int counter = 1024;
     4    while (counter--)
     5      if (data1[counter] > data2[counter])
     6        data1[counter] = data2[counter];
     7  }

... could previously be transformed in such a way that the predicated
store implied by:

    if (data1[counter] > data2[counter])
       data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Differential Revision: https://reviews.llvm.org/D99569
2021-04-22 15:05:54 +00:00
David Sherwood
5a229a6702 [LoopVectorize] Don't create unnecessary vscale intrinsic calls
In quite a few cases in LoopVectorize.cpp we call createStepForVF
with a step value of 0, which leads to unnecessary generation of
llvm.vscale intrinsic calls. I've optimised IRBuilder::CreateVScale
and createStepForVF to return 0 when attempting to multiply
vscale by 0.

Differential Revision: https://reviews.llvm.org/D100763
2021-04-22 09:01:52 +01:00
Sander de Smalen
86729538bd [LV] Let selectVectorizationFactor reason directly on VectorizationFactor.
Rather than maintaining two separate values, a `float` for the per-lane
cost and a Width for the VF, maintain a single VectorizationFactor which
comprises the two and also removes the need for converting an integer value
to float.

This simplifies the query when asking if one VF is more profitable than
another when we want to extend this for scalable vectors (which may
require additional options to determine if e.g. a scalable VF of the
some cost, is more profitable than a fixed VF of the same cost).

The patch isn't entirely NFC because it also fixes an issue in
selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs
no longer truncates the floating-point cost from `float` to `unsigned` to
then perform the calculation on the truncated cost. It now does
a cost comparison with the correct precision.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100121
2021-04-20 09:54:45 +01:00
Cullen Rhodes
f0bc2782f2 [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100377
2021-04-19 11:01:34 +00:00
David Sherwood
ea14df695e [SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction
There were a few places in widenPHIInstruction where calculations of
offsets were failing to take the runtime calculation of VF into
account for scalable vectors. I've fixed those cases in this patch
as well as adding an assert that we should not be scalarising for
scalable vectors.

Tests are added here:

  Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D99254
2021-04-15 10:51:49 +01:00
David Sherwood
7120f89f7d [NFC][LoopVectorize] Remove unnecessary VF.isScalable asserts
There are a few places in LoopVectorize.cpp where we have been too
cautious in adding VF.isScalable() asserts and it can be confusing.
It also makes it more difficult to see the genuine places where
work needs doing to improve scalable vectorization support.

This patch changes getMemInstScalarizationCost to return an
invalid cost instead of firing an assert for scalable vectors. Also,
vectorizeInterleaveGroup had multiple asserts all for the same
thing. I have removed all but one assert near the start of the
function, and added a new assert that we aren't dealing with masks
for scalable vectors.

Differential Revision: https://reviews.llvm.org/D99727
2021-04-15 09:41:03 +01:00
Sander de Smalen
bd86824d98 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
Sander de Smalen
92d8421f49 [TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100199
2021-04-13 14:20:58 +01:00
Florian Hahn
e4de3cdf3d [LV] Pass VPWidenPHIRecipe to widenPHIInstruction (NFC).
Instead of passing the start value and the defined value to
widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be
used to get both (and more in future patches).
2021-04-08 14:25:10 +01:00
David Green
8675ef100f [LV] Logical and/or select costs
D99674 stopped the folding of certain select operations into and/or, due
to incorrect folding in the presence of poison. D97360 added some costs
to attempt to account for the change, but only worked at the getUserCost
level, not the getCmpSelInstrCost that the vectorizer will use directly.
This adds similar logic into the vectorizer to handle these logical
and/or selects, treating them like and/or directly.

This fixes 60% performance regressions from code like the attached test
case.

Differential Revision: https://reviews.llvm.org/D99884
2021-04-08 10:39:47 +01:00
Philip Reames
a6d2a8d6f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Kerry McLaughlin
7344f3d39a [LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:

  %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
  %load = load <8 x float>
  %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)

This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D98435
2021-04-06 14:45:34 +01:00
Kerry McLaughlin
857b8a73da [LoopVectorize] Change the identity element for FAdd
Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.

Reviewed By: dmgreen, spatel

Differential Revision: https://reviews.llvm.org/D98963
2021-04-06 12:13:43 +01:00
Florian Hahn
8867fc69f0 [LV] Hoist mapping of IR operands to VPValues (NFC).
This patch moves mapping of IR operands to VPValues out of
tryToCreateWidenRecipe. This allows using existing VPValue operands when
widening recipes directly, which will be introduced in future patches.
2021-04-02 17:57:20 +01:00
Huihui Zhang
fe5c4a06a4 [LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism.
Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this
can help avoid non-determinism caused by iterating over unordered containers.

This bug was found with reverse iteration turning on,
--extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON".
Failing LLVM test consecutive-ptr-uniforms.ll .

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99549
2021-03-31 11:21:07 -07:00
Sander de Smalen
7108b2dec1 [SVE] Fix LoopVectorizer test scalalable-call.ll
This marks FSIN and other operations to EXPAND for scalable
vectors, so that they are not assumed to be legal by the cost-model.

Depends on D97470

Reviewed By: dmgreen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97471
2021-03-31 14:52:49 +01:00
David Sherwood
a08c7736a7 [LoopVectorize] Add support for scalable vectorization of induction variables
This patch adds support for the vectorization of induction variables when
using scalable vectors, which required the following changes:

1. Removed assert from InnerLoopVectorizer::getStepVector.
2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use
   a runtime determined value for VF and removed an assert.
3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable
   vectors. I did this by calculating the full vector value for each Part
   of the unroll factor (UF) and caching this in the VP state. This means
   that we are always able to extract an arbitrary element from the vector
   if necessary. In addition to this, I also permitted the caching of the
   individual lane values themselves for the known minimum number of elements
   in the same way we do for fixed width vectors. This is a further
   optimisation that improves the code quality since it avoids unnecessary
   extractelement operations when extracting the first lane.
4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while
   testing some code paths I noticed this is currently broken for scalable
   vectors.

Various tests to support different cases have been added here:

  Transforms/LoopVectorize/AArch64/sve-inductions.ll

Differential Revision: https://reviews.llvm.org/D98715
2021-03-30 11:13:31 +01:00
Florian Hahn
c773d0f973
Recommit "[LV] Move runtime pointer size check to LVP::plan()."
Re-apply 25fbe803d4db, with a small update to emit the right remark
class.

Original message:
    [LV] Move runtime pointer size check to LVP::plan().

    This removes the need for the remaining doesNotMeet check and instead
    directly checks if there are too many runtime checks for vectorization
    in the planner.

    A subsequent patch will adjust the logic used to decide whether to
    vectorize with runtime to consider their cost more accurately.

    Reviewed By: lebedev.ri
2021-03-29 16:14:27 +01:00
Florian Hahn
485c8ce733
Revert "[LV] Move runtime pointer size check to LVP::plan()."
This reverts commit 25fbe803d4dbcf8ff3a3a9ca161f5b9a68353ed0.

This breaks a clang test which filters for the wrong remark type.
2021-03-29 14:41:53 +01:00
Florian Hahn
25fbe803d4
[LV] Move runtime pointer size check to LVP::plan().
This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in the planner.

A subsequent patch will adjust the logic used to decide whether to
vectorize with runtime to consider their cost more accurately.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98634
2021-03-29 14:12:29 +01:00
Florian Hahn
8c6c357897
[LV] Mark a few more cost-model members as const (NFC). 2021-03-28 14:59:48 +01:00
Florian Hahn
d2855eba81
[LV] Fix formatting from 2f9d68c3f12a. 2021-03-27 21:29:56 +00:00
Florian Hahn
2f9d68c3f1
[LV] Mark some methods as const (NFC).
Mark a few methods as const, as they do not modify any state.
2021-03-27 21:27:53 +00:00
David Sherwood
c39460cc4f Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 240aa96cf25d880dde7a0db5d96918cfaa4b8891.
2021-03-26 11:36:53 +00:00
David Sherwood
240aa96cf2 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast with scalar uses, or b) this is an update to an induction variable
that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. For all other cases I have added an assert that none of the
users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

Differential Revision: https://reviews.llvm.org/D98512
2021-03-26 11:27:12 +00:00
Florian Hahn
9d45579279
[LV] Factor out phi type access to variable (NFC).
A slight simplification of the code to reduce future diffs.
2021-03-24 19:25:22 +00:00
Florian Hahn
8d1342f79d
[LV] Remove redundant access to Legal::getReductionVars() (NFC).
The reduction descriptor is retrieved earlier and stored in a variable
RdxDesc already.
2021-03-24 19:15:14 +00:00
Sander de Smalen
55d18b3cc2 [TTI] Return a TypeSize from getRegisterBitWidth.
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98874
2021-03-24 14:45:13 +00:00
Florian Hahn
cd0c00c9fe
[LV] Move exact FP math check out of Requirements.
We know if the loop contains FP instructions preventing vectorization
after we are done with legality checks. This patch updates the code the
check for un-vectorizable FP operations earlier, to avoid unnecessarily
running the cost model and picking a vectorization factor. It also makes
the code more direct and moves the check to a position where similar
checks are done.

I might be missing something, but I don't see any reason to handle this
check differently to other, similar checks.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98633
2021-03-24 11:01:44 +00:00
David Sherwood
d70251163f [LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector
In places where we create a ConstantVector whose elements are a
linear sequence of the form <start, start + 1, start + 2, ...>
I've changed the code to make use of CreateStepVector, which creates
a vector with the sequence <0, 1, 2, ...>, and a vector addition
operation. This patch is a non-functional change, since the output
from the vectoriser remains unchanged for fixed length vectors and
there are existing asserts that still fire when attempting to use
scalable vectors for vectorising induction variables.

In a later patch we will enable support for scalable vectors
in InnerLoopVectorizer::getStepVector(), which relies upon the new
stepvector intrinsic in IRBuilder::CreateStepVector.

Differential Revision: https://reviews.llvm.org/D97861
2021-03-23 11:29:05 +00:00
Andrei Elovikov
92205cb27f [NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)"
Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D98897
2021-03-19 10:50:12 -07:00