9 Commits

Author SHA1 Message Date
Dávid Bolvanský
872f7000fc Revert "[NFCI] Regenerate SROA/LoopVectorize test checks"
This reverts commit 14e3450fb57305aa9ff3e9e60687b458e43835c9.
2022-04-04 01:15:30 +02:00
Dávid Bolvanský
a113a582b1 [NFCI] Regenerate LoopVectorize test checks 2022-04-03 21:56:24 +02:00
David Green
b55d4c2ad8 Revert "[LV] Remove LoopVectorizationCostModel::useEmulatedMaskMemRefHack()"
This reverts commit 77a0da926c9ea86afa9baf28158d79c7678fc6b9 as we've
received multiple reports of this significantly impacting performance,
in ways that don't seem to just be target specific cost models going
wrong. I would offer some reproducers, but the test changes here seem to
be full of them!

Reverting for now and hopefully we can remove the "hack" more carefully
as we go.
2022-02-09 20:02:54 +00:00
Roman Lebedev
77a0da926c
[LV] Remove LoopVectorizationCostModel::useEmulatedMaskMemRefHack()
D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model.
What it essentially does is prevents scalarized vectorization of masked memory operations:
```
  // TODO: Cost model for emulated masked load/store is completely
  // broken. This hack guides the cost model to use an artificially
  // high enough value to practically disable vectorization with such
  // operations, except where previously deployed legality hack allowed
  // using very low cost values. This is to avoid regressions coming simply
  // from moving "masked load/store" check from legality to cost model.
  // Masked Load/Gather emulation was previously never allowed.
  // Limited number of Masked Store/Scatter emulation was allowed.
```

While i don't really understand about what specifically `is completely broken`
was talking about, i believe that at least on X86 with AVX2-or-later,
this is no longer true. (or at least, i would like to know what is still broken).
So i would like to follow suit after D111460, and like wise disable that hack for AVX2+.

But since this was added for X86 specifically, let's just instead completely remove this hack.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114779
2022-02-07 16:08:31 +03:00
Florian Hahn
efd4938723
[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

 1. First step to avoid setting both vector and scalar values for the
    same induction def.
 2. Reducing complexity of widenIntOrFpInduction through making things
    more explicit in VPlan
 3. Only need to splat the vector IV for block in masks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116123
2022-01-29 16:25:27 +00:00
David Sherwood
e781620dee [LoopVectorize][AArch64] Use get.active.lane.mask intrinsic when SVE is enabled
When SVE is enabled for AArch64 targets it makes more sense to use the
get.active.lane.mask intrinsic, because SVE has an exact 1-1 mapping
from the intrinsic to the 'whilelo' instruction for legal vector types.
This instruction neatly takes overflow into account as well. This patch
fixes an issue in VPInstruction::generateInstruction that assumed we are
only dealing with fixed-width vectors.

Differential Revision: https://reviews.llvm.org/D117109
2022-01-18 11:59:30 +00:00
Rosie Sumpter
552eb372cb [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Differential Revision: https://reviews.llvm.org/D115329
2022-01-12 13:34:12 +00:00
David Sherwood
e20391fc5d [LoopVectorize] When tail-folding, don't always predicate uniform loads
In VPRecipeBuilder::handleReplication if we believe the instruction
is predicated we then proceed to create new VP region blocks even
when the load is uniform and only predicated due to tail-folding.

I have updated isPredicatedInst to avoid treating a uniform load as
predicated when tail-folding, which means we can do a single scalar
load and a vector splat of the value.

Tests added here:

  Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll

Differential Revision: https://reviews.llvm.org/D112552
2021-11-26 11:30:54 +00:00
David Sherwood
9da8dde7fd [NFC][LoopVectorize] Add test for tail-folding loop with conditional uniform load
I've added a test for a loop containing a conditional uniform load for
a target that supports masked loads. The test just ensures that we
correctly use gather instructions and have the correct mask.

Differential Revision: https://reviews.llvm.org/D112619
2021-11-03 09:51:11 +00:00