1460 Commits

Author SHA1 Message Date
David Sherwood
c63b0f471b [NFC][LoopVectorize] Make the createStepForVF interface more caller-friendly
The common use case for calling createStepForVF is currently something
like:

  Value *Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF);

and it makes more sense to reduce overall lines of code and change the
function to let it create the constant instead. With my patch this
becomes:

  Value *Step = createStepForVF(Builder, Ty, VF, UF);

and the ConstantInt is created instead createStepForVF. A side-effect of
this is that the code in createStepForVF is also becomes simpler.

As part of this patch I've also replaced some calls to getRuntimeVF
with calls to createStepForVF, i.e.

  getRuntimeVF(Builder, Count->getType(), VFactor * UFactor) ->
  createStepForVF(Builder, Count->getType(), VFactor, UFactor)

because this feels semantically better.

Differential Revision: https://reviews.llvm.org/D113122
2021-11-08 15:14:14 +00:00
David Sherwood
c42bb30b9e [LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies
At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
we bail out if the main vector loop uses a scalable VF. This patch adds
support for generating epilogue vector loops using a fixed-width VF when the
main vector loop uses a scalable VF.

I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor
so that we convert the scalable VF into a fixed-width VF and do profitability
checks on that instead. In addition, since the scalable and fixed-width VFs
live in different VPlans that means I had to change the calls to
LVP.hasPlanWithVFs so that we only pass in the fixed-width VF.

New tests added here:

  Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll

Differential Revision: https://reviews.llvm.org/D109432
2021-11-08 09:41:13 +00:00
Kazu Hirata
0d182d9d1e [Transforms] Use make_early_inc_range (NFC) 2021-11-07 17:03:15 -08:00
Nikita Popov
f8627877a9 [SCEV] Make eraseValueFromMap() private (NFC)
The public API for this functionality is forgetValue(). There was
only one call from LoopVectorize, which was directly next to a
forgetValue() call and as such redundant.
2021-11-06 17:14:02 +01:00
Florian Hahn
b4992dbb21
[LV] Clarify uniform worklist contains instrs demanding lane 0. 2021-11-04 13:11:50 +01:00
David Sherwood
c0f2774973 [NFC][LoopVectorize] Simple tidy-up in InnerLoopVectorizer::createVectorIntOrFpInductionPHI
Use getSignedIntOrFpConstant instead of creating int or FP constants
manually.
2021-11-03 14:05:21 +00:00
Florian Hahn
64bc31ee93
[LV] Drop unneeded use of getVPSingleValue (NFC).
VPReductionPHIRecipe inherits from VPValue, so there's no need to call
getVPSingleValue.
2021-11-03 14:26:15 +01:00
Florian Hahn
8e44bdd12a
[VPlan] Make VPWidenCanonicalIVRecipe a VPValue (NFC).
The recipe produces exactly one VPValue and can inherit directly from
it. This is in line with other recipes and avoids having to use
getVPSingleValue.
2021-11-03 14:11:01 +01:00
Rosie Sumpter
dcb8222d87 [LoopVectorize] Propagate fast-math flags for inloop reductions
This patch updates VPReductionRecipe::execute so that the fast-math
flags associated with the underlying instruction of the VPRecipe are
propagated through to the reductions which are created.

Differential Revision: https://reviews.llvm.org/D112548
2021-11-02 08:59:53 +00:00
David Sherwood
87a294d5eb [LoopVectorize] Change getRuntimeVFAsFloat to use unsigned int->FP conversion
We never expect the runtime VF to be negative so we should use
the uitofp instruction instead of sitofp.

Differential revision: https://reviews.llvm.org/D112610
2021-11-01 09:58:14 +00:00
Florian Hahn
c45045bfd0
[VPlan] Keep induction recipes in header.
This patch updates recipe creation to ensure all
VPWidenIntOrFpInductionRecipes are in the header block. At the moment,
new induction recipes can be created in different blocks when trying to
optimize casts and induction variables.

Having all induction recipes in the header makes it easier to
analyze/transform them in VPlan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111300
2021-10-28 18:22:05 +01:00
David Sherwood
5d9318638e [NFC][LoopVectorize] Change getStepVector to take a Value* for the StartIdx
This patch changes the definition of getStepVector from:

  Value *getStepVector(Value *Val, int StartIdx, Value *Step, ...

to

  Value *getStepVector(Value *Val, Value *StartIdx, Value *Step, ...

because:

1. it seems inconsistent to pass some values as Value* and some as
   integer, and
2. future work will require the StartIdx to be an expression made up
   of runtime calculations of the VF.

In widenIntOrFpInduction I've changed the code to pass in the
value returned from getRuntimeVF, but the presence of the assert:

  assert(!VF.isScalable() && "scalable vectors not yet supported.");

means that currently this code path is only exercised for fixed-width
VFs and so the patch is still NFC.

Differential revision: https://reviews.llvm.org/D111882
2021-10-27 16:12:38 +01:00
David Sherwood
3d706c20f8 [NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor
I have removed LoopVectorizationPlanner::setBestPlan, since this
function is quite aggressive because it deletes all other plans
except the one containing the <VF,UF> pair required. The code is
currently written to assume that all <VF,UF> pairs will live in the
same vplan. This is overly restrictive, since scalable VFs live in
different plans to fixed-width VFS. When we add support for
vectorising epilogue loops when the main loop uses scalable vectors
then we will the vplan for the main loop will be different to the
epilogue.

Instead I have added a new function called

  LoopVectorizationPlanner::getBestPlanFor

that returns the best vplan for the <VF,UF> pair requested and leaves
all the vplans untouched. We then pass this best vplan to

  LoopVectorizationPlanner::executePlan

which now takes an additional VPlanPtr argument.

Differential revision: https://reviews.llvm.org/D111125
2021-10-27 09:38:27 +01:00
Rosie Sumpter
b716d0aa94 [LoopVectorize] Clean up VPReductionRecipe::execute. NFC
Use RdxDesc->getOpcode instead of getUnderlingInstr()->getOpcode.
Move the code which finds Kind and IsOrdered to be outside the for loop
since neither of these change with the vector part.

Differential Revision: https://reviews.llvm.org/D112547
2021-10-26 17:18:25 +01:00
Florian Hahn
a6c4969f5f
[VPlan] Do not create dummy entry block (NFC).
At the moment a dummy entry block is created at the beginning of VPlan
construction. This dummy block is later removed again.

This means it is not easy to identify the VPlan header block in a
general fashion, because during recipe creation it is the single
successor of the entry block, while later it is the entry block.

To make getting the header easier, just skip creating the dummy block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111299
2021-10-25 09:52:58 +01:00
Florian Hahn
e844f05397
[LoopUtils] Simplify addRuntimeCheck to return a single value.
This simplifies the return value of addRuntimeCheck from a pair of
instructions to a single `Value *`.

The existing users of addRuntimeChecks were ignoring the first element
of the pair, hence there is not reason to track FirstInst and return
it.

Additionally all users of addRuntimeChecks use the second returned
`Instruction *` just as `Value *`, so there is no need to return an
`Instruction *`. Therefore there is no need to create a redundant
dummy `and X, true` instruction any longer.

Effectively this change should not impact the generated code because the
redundant AND will be folded by later optimizations. But it is easy to
avoid creating it in the first place and it allows more accurately
estimating the cost of the runtime checks.
2021-10-18 18:03:09 +01:00
Gil Rapaport
1156bd4fc3 [LV] Record memory widening decisions (NFCI)
Record widening decisions for memory operations within the planned recipes and
use the recorded decisions in code-gen rather than querying the cost model.

Differential Revision: https://reviews.llvm.org/D110479
2021-10-18 18:03:35 +03:00
Kazu Hirata
81e9c90686 [llvm] Use llvm::is_contained (NFC) 2021-10-14 22:44:09 -07:00
Ayal Zaks
15692fd6b5 [LV] Fix 2nd crash for reverse interleaved groups under mask/fold-tail.
This patch fixes another crash revealed by PR51614:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse (which is currently not supported).

Differential Revision: https://reviews.llvm.org/D108900
2021-10-12 21:44:42 +03:00
Kerry McLaughlin
1439ef1a3f [LoopVectorize] Classify pointer induction updates as scalar only if they have one use
collectLoopScalars collects pointer induction updates in ScalarPtrs, assuming
that the instruction will be scalar after vectorization. This may crash later
in VPReplicateRecipe::execute() if there there is another user of the instruction
other than the Phi node which needs to be widened.

This changes collectLoopScalars so that if there are any other users of
Update other than a Phi node, it is not added to ScalarPtrs.

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D111294
2021-10-12 13:24:49 +01:00
David Sherwood
26b7d9d622 [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-11 09:41:38 +01:00
David Sherwood
28388645a3 [NFC] Simple tidy-up in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
Avoid creating EpilogueVectorizationForceVF twice.
2021-10-04 10:14:22 +01:00
Kazu Hirata
4f0225f6d2 [Transforms] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-01 09:57:40 -07:00
Krasimir Georgiev
685f1bfd0a Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns"
It appears to cause stage2 clang build failures, e.g.,
https://lab.llvm.org/buildbot/#/builders/74/builds/7145.

This reverts commit 1fb37334bdb3cdb028977382fbd84cebde64ebb2.
2021-10-01 11:39:43 +02:00
David Sherwood
1fb37334bd [LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns
This patch adds further support for vectorisation of loops that involve
selecting an integer value based on a previous comparison. Consider the
following C++ loop:

  int r = a;
  for (int i = 0; i < n; i++) {
    if (src[i] > 3) {
      r = b;
    }
    src[i] += 2;
  }

We should be able to vectorise this loop because all we are doing is
selecting between two states - 'a' and 'b' - both of which are loop
invariant. This just involves building a vector of values that contain
either 'a' or 'b', where the final reduced value will be 'b' if any lane
contains 'b'.

The IR generated by clang typically looks like this:

  %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ]
  ...
  %pred = icmp ugt i32 %val, i32 3
  %phi.update = select i1 %pred, i32 %b, i32 %phi

We already detect min/max patterns, which also involve a select + cmp.
However, with the min/max patterns we are selecting loaded values (and
hence loop variant) in the loop. In addition we only support certain
cmp predicates. This patch adds a new pattern matching function
(isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp.
We only support selecting values that are integer and loop invariant,
however we can support any kind of compare - integer or float.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-select-cmp.ll
  Transforms/LoopVectorize/select-cmp-predicated.ll
  Transforms/LoopVectorize/select-cmp.ll

Differential Revision: https://reviews.llvm.org/D108136
2021-10-01 08:41:03 +01:00
Kazu Hirata
f631173d80 [llvm] Migrate from arg_operands to args (NFC)
Note that arg_operands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-09-30 08:51:21 -07:00
Anna Thomas
452714f8f8 [BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults
This is analogous to D86156 (which preserves "lossy" BFI in loop
passes). Lossy means that the analysis preserved may not be up to date
with regards to new blocks that are added in loop passes, but BPI will
not contain stale pointers to basic blocks that are deleted by the loop
passes.

This is achieved through BasicBlockCallbackVH in BPI, which calls
eraseBlock that updates the data structures in BPI whenever a basic
block is deleted.

This patch does not have any changes in the upstream pipeline, since
none of the loop passes in the pipeline use BPI currently.
However, since BPI wasn't previously preserved in loop passes, the loop
predication pass was invoking BPI *on the entire
function* every time it ran in an LPM.  This caused massive compile time
in our downstream LPM invocation which contained loop predication.

See updated test with an invocation of a loop-pipeline containing loop
predication and -debug-pass turned ON.

Reviewed-By: asbirlea, modimo
Differential Revision: https://reviews.llvm.org/D110438
2021-09-30 10:27:05 -04:00
Ayal Zaks
ab6a69dfea [LV] Fix crash for reverse interleaved loads with gap under fold-tail.
This patch fixes the crash found by PR51614:
whenever doing tail folding, interleave groups must be considered under mask.

Another fix D108900 follows for targets that support masked loads and stores:
when *deciding* to vectorize with masked interleave groups, check if the access
is reverse - which is currently not supported; rather than (only) asserting when
computing cost and generating code.

Differential Revision: https://reviews.llvm.org/D108891
2021-09-21 20:13:32 +03:00
Kazu Hirata
24c8eaec94 [Transforms] Use make_early_inc_range (NFC) 2021-09-15 19:55:24 -07:00
Markus Lavin
1ac209ed76 [NPM] Added -print-pipeline-passes print params for a few passes.
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.

Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.

The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).

LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch

Differential Revision: https://reviews.llvm.org/D109310
2021-09-15 08:34:04 +02:00
David Sherwood
bbada9ff45 [NFC] Replace unsigned VF with ElementCount in EpilogueLoopVectorizationInfo
This patch simply replaces any unsigned VFs with ElementCounts. It's
still NFC because at the moment epilogue vectorisation is disabled
when the main vector loop uses scalable vectors.

Differential Revision: https://reviews.llvm.org/D109364
2021-09-13 10:18:30 +01:00
Kazu Hirata
15e9575fb5 [Vectorize] Fix "unused variable" warnings 2021-09-11 12:06:43 -07:00
Nikita Popov
45c467346a [LAA] Pass access type to getPtrStride()
Pass the access type to getPtrStride(), so it is not determined
from the pointer element type. Many cases still fetch the element
type at a higher level though, so this only partially addresses
the issue.
2021-09-11 19:16:49 +02:00
Rosie Sumpter
9d1bea9c88 [SVE][LoopVectorize] Optimise code generated by widenPHIInstruction
For SVE, when scalarising the PHI instruction the whole vector part is
generated as opposed to creating instructions for each lane for fixed-
width vectors. However, in some cases the lane values may be needed
later (e.g for a load instruction) so we still need to calculate
these values to avoid extractelement being called on the vector part.

Differential Revision: https://reviews.llvm.org/D109445
2021-09-10 11:58:04 +01:00
Chris Lattner
735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Kazu Hirata
5648f7170e [Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC) 2021-09-07 09:19:33 -07:00
Nikita Popov
02f74eadbe [IVDescriptors] Make pointer inductions compatible with opaque pointers
Store the used element type in the InductionDescriptor. For typed
pointers, it remains the pointer element type. For opaque pointers,
we always use an i8 element type, such that the step is a simple
offset.

A previous version of this patch instead tried to guess the element
type from an induction GEP, but this is not reliable, as the GEP
may be hidden (see @both in iv_outside_user.ll).

Differential Revision: https://reviews.llvm.org/D104795
2021-09-01 21:02:05 +02:00
Florian Hahn
abd36fe512
[VPlan] Introduce code to limit querying VPValues using IR references.
After applying VPlan-to-VPlan transformations, using IR references to
query VPlan values may be incorrect, as the IR is not in sync with the
VPlan any longer.

To better detect such mis-matches, this patch introduces a new flag to
VPlans to indicate whether it is safe to query VPValues using IR values.

getVPValue is updated to assert if it is called when the flag indicates
it is not safe any longer.

There is an escape hatch via an extra argument, because there are 3
places that need to be fixed first. Those are

1. truncateToMinimalBitwidths
2. clearReductionWrapFlags
3. fixLCSSAPHIs

As a first step, this flag will help preventing new code from violating
this property.

Any suggestions with respect to naming very welcome!

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D108573
2021-08-30 09:12:09 +02:00
Florian Hahn
d024a01511
Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts the revert ab9296f13be45cd190608f54a69bdd5c7c561b16.

The issue causing the revert should be fixed in 9baed023b4b5.
2021-08-23 11:25:27 +01:00
Florian Hahn
9baed023b4
[LV] Adjust reduction recipes before recurrence handling.
Adjusting the reduction recipes still relies on references to the
original IR, which can become outdated by the first-order recurrence
handling. Until reduction recipe construction does not require IR
references, move it before first-order recurrence handling, to prevent a
crash as exposed by D106653.
2021-08-22 11:02:33 +01:00
Florian Hahn
ab9296f13b
Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
This reverts commit f4122398e7c195147cde120d070f9b72905d7c91 to
investigate a crash exposed by it.

The patch breaks building the code below with `clang -O2 --target=aarch64-linux`

     int a;
     double b, c;
     void d() {
       for (; a; a++) {
         b += c;
         c = a;
       }
     }
2021-08-20 21:24:28 +01:00
David Sherwood
f4122398e7 [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64
I have added a new TTI interface called enableOrderedReductions() that
controls whether or not ordered reductions should be enabled for a
given target. By default this returns false, whereas for AArch64 it
returns true and we rely upon the cost model to make sensible
vectorisation choices. It is still possible to override the new TTI
interface by setting the command line flag:

  -force-ordered-reductions=true|false

I have added a new RUN line to show that we use ordered reductions by
default for SVE and Neon:

  Transforms/LoopVectorize/AArch64/strict-fadd.ll
  Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D106653
2021-08-19 09:29:40 +01:00
Dylan Fleming
ef198cd99e [SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D106277
2021-08-17 14:42:47 +01:00
Nikita Popov
570c9beb8e [MemorySSA] Remove unnecessary MSSA dependencies
LoopLoadElimination, LoopVersioning and LoopVectorize currently
fetch MemorySSA when construction LoopAccessAnalysis. However,
LoopAccessAnalysis does not actually use MemorySSA and we can pass
nullptr instead.

This saves one MemorySSA calculation in the default pipeline, and
thus improves compile-time.

Differential Revision: https://reviews.llvm.org/D108074
2021-08-16 20:40:55 +02:00
Paul Walker
f7a831daa6 [LoopVectorize] Don't emit remarks about lack of scalable vectors unless they're specifically requested.
Previously we emitted a "does not support scalable vectors"
remark for all targets whenever vectorisation is attempted. This
pollutes the output for architectures that don't support scalable
vectors and is likely confusing to the user.

Instead this patch introduces a debug message that reports when
scalable vectorisation is allowed by the target and only issues
the previous remark when scalable vectorisation is specifically
requested, for example:

  #pragma clang loop vectorize_width(2, scalable)

Differential Revision: https://reviews.llvm.org/D108028
2021-08-15 12:15:52 +01:00
Dorit Nuzman
67278b8a90 [LV] Support Interleaved Store Group With Gaps
Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).

The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.

Reviewed by: Ayal Zaks

Differential Revision: https://reviews.llvm.org/D104750
2021-08-08 10:32:02 +03:00
Florian Hahn
a00aafc30d
[VPlan] Iterate over phi recipes to detect reductions to fix.
After refactoring the phi recipes, we can now iterate over all header
phis in a VPlan to detect reductions when it comes to fixing them up
when tail folding.

This reduces the coupling with the cost model & legal by using the
information directly available in VPlan. It also removes a call to
getOrAddVPValue, which references the original IR value which may
become outdated after VPlan transformations.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100102
2021-08-07 14:06:50 +01:00
David Sherwood
3fd96e1b2e [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284
2021-08-06 10:13:15 +01:00
David Sherwood
43a5c750d1 Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors"
This reverts commit 95800da914938129083df2fa0165c1901909c273.
2021-08-06 09:48:16 +01:00
Florian Hahn
3e58dd19df
[LV] Move reduction PHI node fixup to VPlan::execute (NFC).
All information to fix-up the reduction phi nodes in the vectorized loop
is available in VPlan now. This patch moves the code to do so, to make
this clearer. Fixing up the loop exit value still relies on other
information and remains outside of VPlan for now.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100113
2021-08-06 08:29:20 +01:00