1971 Commits

Author SHA1 Message Date
Luke Lau
d7323f6a7a [RISCV][NFC] Add tests for interleaved accesses in loop vectorizer
Precommit test for D145155

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D145697
2023-03-09 17:43:16 +00:00
Anna Thomas
ac4c0ea73b [Tests] Precommit tests for D145616 2023-03-08 17:30:53 -05:00
Florian Hahn
79272ec028
[VPlan] Add predicate to VPReplicateRecipe, expand region later.
This patch adds the predicate as additional operand to VPReplicateRecipe
during initial construction. The predicated recipes are later moved into
replicate regions. This simplifies constructions and some VPlan
transformations, like fixed-order recurrence handling.

It also improves codegen in some cases (e.g. for in-loop reductions),
because the recipes remain in the same block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D143865
2023-03-08 20:11:28 +01:00
Florian Hahn
7019624ee1
[SCEV] Strengthen nowrap flags via ranges for ARs on construction.
At the moment, proveNoWrapViaConstantRanges is only used when creating
SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by
strengthening flags after creating the AddRec.

I'll also share a follow-up patch that removes the code to strengthen
flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while
creating those can lead to surprising changes.

Compile-time looks neutral:
https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u

Reviewed By: mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D144050
2023-03-07 17:10:34 +01:00
Mel Chen
d5c404d1b9 [RISCV] Enable ordered reduction.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144455
2023-03-07 04:36:50 -08:00
Graham Hunter
92e0ab937f [AArch64] Don't map llvm sqrt intrinsics to veclib functions
Since AArch64 has sqrt instructions, we want to use those instead of
calls to vector math routines for llvm sqrt intrinsics (since those
don't imply some of the constraints that libm calls might have) so
we just remove the mappings.

Code originally written by mgabka

Reviewed By: danielkiss, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D145392
2023-03-07 11:43:41 +00:00
Graham Hunter
a180344589 [LV] Allow scalarization of function calls when masking is required
This patch adds support for scalarizing calls to a function when
there is a vector variant that cannot be used, either because there
isn't a masked variant or because the cost model indicated a VF
without a masked variant was better.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D134422
2023-03-03 15:26:04 +00:00
Mel Chen
9d0703a646 [RISCV] Pre-commit test case for ordered reduction, NFC
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D144458
2023-03-01 06:27:43 -08:00
Sander de Smalen
c41b41eb11 [LoopVectorize] Use overflow-check analysis to improve tail-folding.
This work follows on from D142109 and addresses a possible regression
when we know the loop iteration counter cannot overflow.

When we know the overflow-check always evaluates to false, it's better to
use the other style of tail folding where it assumes a runtime check was
added, because that avoids having to calculate a modified trip-count.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D142894
2023-03-01 14:17:58 +00:00
Sander de Smalen
fe1b51ffee [LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.
When using tail-folding and using the predicate for both data and control-flow
(the next vector iteration's predicate is generated with the llvm.active.lane.mask
intrinsic and then tested for the backedge), the LoopVectorizer still inserts a
runtime check to see if the 'i + VF' may at any point overflow for the given
trip-count. When it does, it falls back to a scalar epilogue loop.

We can get rid of that runtime check in the pre-header and therefore also
remove the scalar epilogue loop. This reduces code-size and avoids a runtime
check.

Consider the following loop:

  void foo(char * __restrict__ dst, char *src, unsigned long N) {
      for (unsigned long  i=0; i<N; ++i)
          dst[i] = src[i] + 42;
  }

If 'N' is e.g. ULONG_MAX, and the VF > 1, then the loop iteration counter
will overflow when calculating the predicate for the next vector iteration
at some point, because LLVM does:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %index.next = add i64 %index, 16
      ; The add above may overflow, which would affect the lane mask and control flow. Hence a runtime check is needed.
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index.next, i64 %N)
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

The solution:

What we can do instead is calculate the predicate before incrementing
the loop iteration counter, such that the llvm.active.lane.mask is
calculated from 'i' to 'tripcount > VF ? tripcount - VF : 0', i.e.

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
    %N_minus_VF = select %N > 16 ? %N - 16 : 0

  vector.body:
    %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %active.lane.mask = phi <vscale x 16 x i1> [ %active.lane.mask.entry, %vector.ph ], [ %active.lane.mask.next, %vector.body ]
    ...

    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 %index, i64 %N_minus_VF)
    %index.next = add i64 %index, %4
      ; The add above may still overflow, but this time the active.lane.mask is not affected
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7

For N = 20, we'd then get:

  vector.ph:
    %active.lane.mask.entry = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 %N)
      ; %active.lane.mask.entry = <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1>
    %N_minus_VF = select 20 > 16 ? 20 - 16 : 0
      ; %N_minus_VF = 4

  vector.body: (1st iteration)
    ... ; using <1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 4)
      ; %active.lane.mask.next = <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 0, 16
      ; %index.next = 16
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 1
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %vector.body

  vector.body: (2nd iteration)
    ... ; using <1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0> as predicate in the loop
    ...
    %active.lane.mask.next = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 16, i64 4)
      ; %active.lane.mask.next = <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
    %index.next = add i64 16, 16
      ; %index.next = 32
    %8 = extractelement <vscale x 16 x i1> %active.lane.mask.next, i64 0
      ; %8 = 0
    br i1 %8, label %vector.body, label %for.cond.cleanup, !llvm.loop !7
      ; branch to %for.cond.cleanup

Reviewed By: fhahn, david-arm

Differential Revision: https://reviews.llvm.org/D142109
2023-03-01 09:01:19 +00:00
Sander de Smalen
de111ae70a NFC: Use generate_test_checks script for LV tests which seem to have been auto-generated. 2023-03-01 09:01:19 +00:00
sgokhale
ac67ec3a54 [LV] Reland testcase in 0ec4cae 2023-02-28 18:14:18 +05:30
sgokhale
0ec4cae146 [LV] Modify test case for commit 4f9a544
Was observing test failure. Relanding the test
2023-02-28 18:07:36 +05:30
sgokhale
4f9a5447c6 [LV] Reland "Update logic for calculating register usage due to invariants"
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 17:32:39 +05:30
sgokhale
3c8ddbde37 Revert "[LV] Update logic for calculating register usage due to invariants"
Observing test failure for llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll

This reverts commit d1628266946fdddb44bdad2b3ccf3cd5fc769f42.
2023-02-28 15:46:59 +05:30
sgokhale
d162826694 [LV] Update logic for calculating register usage due to invariants
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 11:05:26 +05:30
Luke Lau
15f9cf164c [LV][RISCV] Don't interleave scalable vector loops
It's less clear with scalable vectors than fixed length vectors that
interleaving exposes more ILP, as scalable vectors can be thought of a
sort of hardware form of interleaving, especially with larger LMULs.
This also addresses the unexpected additional unrolling that occurs when
using larger LMULs in the loop vectorizer.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D144485
2023-02-22 10:15:11 +00:00
Luke Lau
0b336e9ef0 [RISCV][NFC] Add test for different LMULs in vectorizer
This is a test for an upcoming patch that proposes to change the default LMUL used by the loop vectorizer from 1 to 2

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D143722
2023-02-20 22:40:40 +00:00
Mel Chen
3e84fc857f [LV] Harden the test of the minmax with index pattern. (NFC)
- Add test config: -force-vector-width=4 -force-vector-interleave=1
  - New test case: The test case both returns the minimum value and the index.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D143905
2023-02-20 03:16:28 -08:00
Hongtao Yu
c38c8d6743 [PseudoProbe] Refactoring a test
As titled.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D144137
2023-02-15 14:07:51 -08:00
Graham Hunter
0fa5df1959 [LV] Synthesize all true masks for masked vector function variants
When vectorizing code with function calls in it, if we encounter
a function which only has vectorized variants requiring a mask
we can synthesize an all-true mask to enable us to proceed.

Since we want the mask to be represented in vplan, the pointer
to the chosen Function is now stored as part of the
VPWidenCallRecipe, and mask arguments are added at the
appropriate index to the recipe operands.

Reviewed By: david-arm, fhahn, reames

Differential Revision: https://reviews.llvm.org/D132458
2023-02-14 14:33:18 +00:00
Florian Hahn
af3c25dc3d
[VPlan] Fix iterator invalidation in adjustFixedOrderRecurrences.
adjustFixedOrderRecurrences may insert instructions after immediately
after the PHI nodes in the block. This invalidates the phis() iterator.
To avoid crashing/accessing invalid recipes, first collect all
first-order recurrence phi recipes.

This should fix a crash reported by @dmgreen after D142589 landed.
2023-02-13 13:51:14 +00:00
Sander de Smalen
5a115452c4 Reland D143267: [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
Fixed issue where 'ConstantInt::get(IndextTy, -Part)' was executed with the wrong type for Part,
e.g. IndexTy was i64, but Part was 'unsigned', which led to things like 'mul i64 .., 4294967292',
which was obviously wrong.

Also changed sve-vector-reverse.ll to be vectorized with UF>1 to test this.

This reverts commit 1f01cdda68614dba12af3cc3aff38541d0abcc6b.
2023-02-09 09:42:29 +00:00
Florian Hahn
c83fdc905a
[LV] Perform recurrence sinking directly on VPlan.
This patch updates LV to sink recipes directly using the VPlan use
chains. The initial patch only moves sinking to be purely VPlan-based.
Follow-up patches will move legality checks to VPlan as well.

At the moment, there's a single test failure remaining.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142589
2023-02-08 15:49:29 +00:00
Sander de Smalen
1f01cdda68 Revert "[LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices."
This patch causes a regression, so reverting it while I investigate the issue.

This reverts commit e6eb84a191ca2a1afd5789c5bb398da68bb6065e.
2023-02-08 15:46:52 +00:00
Fangrui Song
af12879146 [RISCV] Allow mismatched SmallDataLimit and use Min for conflicting values
Fix an issue about module linking with LTO.

When compiling with PIE, the small data limitation needs to be consistent with that in PIC, otherwise there will be linking errors due to conflicting values.

bar.c
```
int bar() { return 1; }
```

foo.c
```
int foo() { return 1; }
```

```
clang --target=riscv64-unknown-linux-gnu -flto -c foo.c -o foo.o -fPIE
clang --target=riscv64-unknown-linux-gnu -flto -c bar.c -o bar.o -fPIC

clang --target=riscv64-unknown-linux-gnu -flto foo.o bar.o -flto -nostdlib -v -fuse-ld=lld
```

```
ld.lld: error: linking module flags 'SmallDataLimit': IDs have conflicting values in 'bar.o' and 'ld-temp.o'
clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
```

Use Min instead of Error for conflicting SmallDataLimit.

Authored by: @joshua-arch1
Signed-off-by: xiaojing.zhang <xiaojing.zhang@xcalibyte.com>
Signed-off-by: jianxin.lai <jianxin.lai@xcalibyte.com>

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131230
2023-02-07 17:13:21 -08:00
Florian Hahn
a69f23493e
[LV] Remove unused load from RISCV test (NFC).
The test contained a unused load that appears unrelated to the test
(store of vector of i1). Remove it to avoid test changes in follow-up
change which will lead to dead loads being removed.
2023-02-07 21:55:44 +00:00
Sander de Smalen
e6eb84a191 [LoopVectorize] Use DataLayout::getIndexType instead of i32 for non-constant GEP indices.
This is specifically relevant for loops that vectorize using a scalable VF,
where the code results in:

  %vscale = call i32 llvm.vscale.i32()
  %vf.part1 = mul i32 %vscale, 4
  %gep = getelementptr  ..., i32 %vf.part1

Which InstCombine then changes into:

  %vscale = call i32 llvm.vscale.i32()
  %vf.part1 = mul i32 %vscale, 4
  %vf.part1.zext = sext i32 %vf.part1 to i64
  %gep = getelementptr  ..., i32 %vf.part1.zext

D143016 tried to remove these extends, but that only works when
the call to llvm.vscale.i32() has a single use. After doing any
kind of CSE on these calls the combine no longer kicks in.

It seems more sensible to ask DataLayout what type to use, rather
than relying on InstCombine to insert the extend and hoping it can
fold it away.

I've only changed this for indices that are not constant, because
I vaguely remember there was a reason for sticking with i32. It
would also mean patching up loads more tests.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D143267
2023-02-07 11:47:51 +00:00
Florian Hahn
40621ff4b8
[LV] Also check interleaving only in select-min-index.ll
The new combination exposed a crash in earlier versions of
D132063.
2023-02-06 11:30:14 +00:00
Florian Hahn
a9ac22b501
[LV] Add users for loads to make tests more robust.
Update a few tests to add users to loads to avoid them being optimized
out by future changes. In cases the unused loads didn't matter for the
test, remove them.
2023-02-04 20:42:57 +00:00
Florian Hahn
fd5bccb8b1
[LV] Add initial tests for sinking loads past other instructions.
Extend test coverage for sinking loads that use fixed order recurrences.
2023-02-04 18:18:33 +00:00
Craig Topper
df76ff98e8 [InstCombine][LV] Fold (add (zext (add X, -1)), 1) -> (zext X) if X is non-zero.
This artifact can appear from the vectorizer. (add X, -1) is the
backedge taken count. It gets zero extended and then 1 is added to
it to get the trip count.

There is usually a dominating branch that rules out X being zero.

Alive: https://alive2.llvm.org/ce/z/NsRDwX
2023-01-30 17:45:01 -08:00
Matt Devereau
8ff47f6032 [LoopVectorize] Enable integer Mul and Add as select reduction patterns
This patch vectorizes Phi node loop reductions for select's whos condition
comes from a floating-point comparison, with its operands being integers
for Add, Sub, and Mul reductions.

Example:

int foo(float *x, int n) {
    int sum = 0;
    for (int i=0; i<n; ++i) {
        float elem = x[i];
        if (elem > 0) {
            sum += 2;
        }
    }
    return sum;
}

This would previously fail to vectorize due to the integer reduction.
2023-01-30 09:41:40 +00:00
Matt Devereau
4468e27d9f Revert "[LoopVectorize] Enable integer Mul and Add as select reduction patterns"
This reverts commit f90103851f9a381bbf7ed6da250217577afd00d2.
2023-01-26 12:02:16 +00:00
Matt Devereau
f90103851f [LoopVectorize] Enable integer Mul and Add as select reduction patterns
This patch vectorizes Phi node loop reductions for select's whos condition
comes from a floating-point comparison, with its operands being integers
for Add, Sub, and Mul reductions.

Example:

int foo(float *x, int n) {
    int sum = 0;
    for (int i=0; i<n; ++i) {
        float elem = x[i];
        if (elem > 0) {
            sum += 2;
        }
    }
    return sum;
}

Differential Revision: https://reviews.llvm.org/D141842
2023-01-25 13:25:18 +00:00
Florian Hahn
fb40c34b8f
[VPlan] Consider all recipes in replicate blocks as sink candidates.
Update sinkScalarOperands to consider all operands of recipes in
replicate blocks as sink candidates This enables additional sinking
opportunities and is another step towards retiring LLVM IR-based
sinkScalarOperands.

This enables iterative sinking of operands for successive calls of
sinkScalarOperands.

Depends on D139788.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D139790
2023-01-21 17:14:13 +00:00
Daniel Kiss
c4fa504f79 [AArch64] Enable libm vectorized functions via SLEEF
It enables trigonometry functions vectorization via SLEEF: http://sleef.org/.

  - A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
  - A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
  - A comprehensive test case is included in this changeset.
  - A new vectorization library argument is added to -fveclib: -fveclib=SLEEF.

Trigonometry functions that are vectorized by sleef:
acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma

Co-authored-by: Stefan Teleman

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D134719
2023-01-20 18:52:38 +01:00
Nikita Popov
9ed2f14c87 [AsmParser] Remove typed pointer auto-detection
IR is now always parsed in opaque pointer mode, unless
-opaque-pointers=0 is explicitly given. There is no automatic
detection of typed pointers anymore.

The -opaque-pointers=0 option is added to any remaining IR tests
that haven't been migrated yet.

Differential Revision: https://reviews.llvm.org/D141912
2023-01-18 09:58:32 +01:00
Florian Hahn
830d0bc56b
[AArch64] Set MaxInterleaveFactor for Apple A14, A15, A16.
Those CPUs can benefit from additional interleaving.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D141499
2023-01-11 18:52:51 +00:00
Florian Hahn
42f6783747
[AArch64] Add tests for selecting interleave counts for different CPUs.
Add extra tests for interleaving heuristics for different AArch64 CPUs.
2023-01-11 14:50:33 +00:00
Paul Walker
eae26b6640 [IRBuilder] Use canonical i64 type for insertelement index used by vector splats.
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.

NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.

Differential Revision: https://reviews.llvm.org/D140983
2023-01-11 14:08:06 +00:00
Miguel Saldivar
3c5e0d87f8
[LoopVectorize] Clear cache of LoopAccessInfoManager
LAI is cached during the LoopDistribute pass, and is later re-used during LoopVectorize. The problem is that LoopVectorize changes SCEV, and the cached LAI does not get updated. Hence, when re-using the cached LAI, it references an invalid SCEV.

Fixes #59319

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D139601
2023-01-11 09:03:40 +00:00
Florian Hahn
78914e8c32
[VPlan] Keep entries in worklist in sinkScalarOperands.
Not removing the entries ensures that duplicates are avoided,
reducing the number of iterations.
2023-01-08 15:52:57 +00:00
Craig Topper
e5a71a41d8 [RISCV] Add support for the vscale_range attribute.
This is based on @frasercrmck's D107290. At least some of the clang
portion of D107290 has already been committed.

This uses vscale_range for min/max vector width unless the command
line overrides are used.

As a follow up, I plan to add a max or exact VLEN option to clang
to control the vscale_range. This will eliminate many of the reasons
for users to use the overrides through the -mllvm interface.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D139873
2023-01-06 08:20:37 -08:00
Nikita Popov
5867241eac [Transforms] Convert some tests to opaque pointers (NFC) 2023-01-06 12:14:45 +01:00
Florian Hahn
68469a80cb
[LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
David Green
586fd86b0a [LoopVectorizer] Fix inloop reductions mask placement
The validation of vplans could fail if an inloop reduction was created
with a block-in mask that did not dominate the reduction. This makes
sure that the insert point is set when creating the mask, to ensure it
dominates the reduction.

Differential Revision: https://reviews.llvm.org/D141003
2023-01-05 11:37:37 +00:00
Augie Fackler
0676156f81 Revert "[VPlan] Also consider operands of sink candidates in same block."
This reverts commit aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d.

Previously-valid IR from a tensorflow test case (as shown on the
Diffusion revision for aa2414729ebbcb2d8f162e9002a3a6aa768b1f9d) started
hanging in the loop-vectorize pass. Reverting to keep everyone working.
2023-01-04 16:17:13 -05:00
Nikita Popov
2fab927546 [LoopVectorize] Convert some tests to opaque pointers (NFC)
Check lines for some of these tests were regenerated. The difference
is that with opaque pointers SCEVExpander always emits i8 GEPs,
making the address calculation explicit. This is a known problem
that will be solved long term by making all address calculations
explicit.
2023-01-04 17:25:42 +01:00
David Green
11f3308ca2 [NFC] Regenerate reduction-inloop.ll check lines. NFC 2023-01-04 16:02:20 +00:00