2209 Commits

Author SHA1 Message Date
Florian Hahn
fd31112634
[VPlan] Insert Trunc/Exts for reductions directly in VPlan.
Update the code to create Trunc/Ext recipes directly in
adjustRecipesForReductions instead of fixing it up later in
fixReductions.

This explicitly models the required conversions and also makes sure they
are generated at the right place (instead of after the exit condition),
hence the changes in a few tests.
2023-10-17 19:17:40 +01:00
Yingwei Zheng
4718b4011f
[LV] Invalidate disposition of SCEV values after loop vectorization (#69230)
This PR fixes the assertion failure of `SE.verify()` after loop vectorization.
2023-10-17 03:49:39 +08:00
Florian Hahn
f7a8a78cb7
[VPlan] Also print operands of canonical IV (NFC).
Also print the operands of VPCanonicalIVPHIRecipe. That was missed
earlier.
2023-10-16 20:28:23 +01:00
Florian Hahn
38f8b7cbe4
[LV] Replace value numbers with patterns in tests (NFC).
Replace some hardcoded value numbers in CHECK-LINES to use patterns, to
 make the tests more robust wrt renumbering.
2023-10-16 19:53:44 +01:00
JolantaJensen
afdb18df4d
[NFC][AArch64][LV] Reorganise LV tests using symbols from SLEEF (#68207)
The tests introduced by https://reviews.llvm.org/D134719 and later
modified in https://reviews.llvm.org/D146839 are not testing LV in
isolation. This patch:
  1. Assures that all tests test LV in isolation.
  2. Adds LV tests using llvm intrinsics that have libm mappings.

llrint, llround and lrint are not included as currently IR verifier pass
does not allow to use vector types with them.
2023-10-13 12:10:21 +01:00
Ramkumar Ramachandra
8593c0bc02
LoopVectorize/test: clean up reduction.ll; generate using UTC (NFC) (#68890)
The test reduction.ll was introduced before utils/update_test_checks.py,
and hence contains hand-written CHECK lines. Revisit the test today, and
modernize it by:

- Removing extranous attributes on functions and their arguments, as
LoopVectorize doesn't even look at these attributes.
- Removing the target datalayout, as it is not essential for
LoopVectorize.

Finally, regenerate the CHECK lines using update_test_checks.py,
eliminating hand-written error-prone CHECK lines.
2023-10-12 15:45:15 +01:00
Nikita Popov
30faaaf626 [LoopVectorize] Regenerate test checks (NFC) 2023-10-12 14:35:23 +02:00
Rin
df8e0d057d
[AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF (#67697)
This patch is based off of
https://github.com/llvm/llvm-project/pull/67543.

We are currently using the exact trip count to make decisions regarding
the maximum VF. We can instead use the upper bound TC, which will be the
same as the constant trip count when that is known.
2023-10-09 16:26:19 +01:00
Dmitriy Smirnov
e13bed4c5f [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP
This patch tries to canonicalise add + gep to gep + gep.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D155688
2023-10-06 12:29:06 +01:00
Alexey Bataev
e22818d5c9 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-05 06:17:07 -07:00
Rin
d3e4702c0f
[AArch64] [LoopVectorize] Use either fixed-width or scalable VF when tail-folding (#67543)
Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.
2023-10-05 10:24:30 +01:00
Arthur Eubanks
07389535a7 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit b186f1f68be11630355afb0c08b80374a6d31782.

Causes crashes, see https://reviews.llvm.org/D158449.
2023-10-04 14:37:16 -07:00
Alex Richardson
e86d6a43f0 Regenerate test checks for tests affected by D141060 2023-10-04 10:51:35 -07:00
Alexey Bataev
b186f1f68b [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-04 07:53:30 -07:00
Alexey Bataev
1129dec778 Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 6f43d28f3452b3ef598bc12b761cfc2dbd0f34c9 to fix
a crash reported in https://reviews.llvm.org/D158449.
2023-10-03 13:02:16 -07:00
Alexey Bataev
6f43d28f34 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-10-03 10:26:11 -07:00
JolantaJensen
01797dad86
Fix mechanism propagating mangled names for TLI function mappings (#66656)
Currently the mappings from TLI are used to generate the list of
available "scalar to vector" mappings attached to scalar calls as
"vector-function-abi-variant" LLVM IR attribute. Function names from TLI
are wrapped in mangled name following the pattern:
_ZGV<isa><mask><vlen><parameters>_<scalar_name>[(<vector_redirection>)]
The problem is the mangled name uses _LLVM_ as the ISA name which
prevents the compiler to compute vectorization factor for scalable
vectors as it cannot make any decision based on the _LLVM_ ISA. If we
use "s" as the ISA name, the compiler can make decisions based on VFABI
specification where SVE spacific rules are described.

This patch is only a refactoring stage where there is no change to the
compiler's behaviour.
2023-10-02 18:58:39 +01:00
Alexey Bataev
ebcb5d59fc Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit 9f5960e004ff54082ccfa9396522e07358f5b66b to fix
buildbots reported here https://lab.llvm.org/buildbot/#/builders/230/builds/19412.
2023-09-29 15:03:46 -07:00
Alexey Bataev
9f5960e004 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-29 13:16:03 -07:00
Mel Chen
ab9cd27fa4
[LV][NFC] Move and add truncated-related FindLastIV reduction test cases. (#67674) 2023-09-29 22:18:32 +08:00
Alexey Bataev
3204f88a8b Revert "[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst."
This reverts commit c88c281cf1ac1a01c55231b93826d7c8ae83985b to fix the
crash revealed by https://lab.llvm.org/buildbot/#/builders/230/builds/19353.
2023-09-28 11:57:32 -07:00
Alexey Bataev
c88c281cf1 [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.

Differential Revision: https://reviews.llvm.org/D158449
2023-09-28 11:03:21 -07:00
Mel Chen
707686b0fc
[LV][NFC] Remove unnecessary parameter attributes from the test cases. (#67630)
The vectorization of the FindLastIV reduction does not depend on the
nocapture and readonly attributes.
2023-09-28 21:15:34 +08:00
Ramkumar Ramachandra
ad415e3095 LoopVectorize/iv-select-cmp: comment out-of-bound tests (NFC)
To help future contributors understand a couple of mysterious
out-of-bound tests, add a brief comment to each.
2023-09-25 14:02:19 +01:00
Florian Hahn
97687b7aea
[VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.
This patch updates the mask creation code to always create compares of
the form (ICMP_ULE, wide canonical IV, backedge-taken-count) up front
when tail folding and introduce active-lane-mask as later
transformation.

This effectively makes (ICMP_ULE, wide canonical IV, backedge-taken-count)
the canonical form for tail-folding early on. Introducing more specific
active-lane-mask recipes is treated as a VPlan-to-VPlan optimization.

This has the advantage of keeping the logic  (and complexity) of
introducing active-lane-mask recipes in a single place, instead of
spreading the logic out across multiple functions. It also simplifies
initial VPlan construction and enables treating introducing EVL as
similar optimization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158779
2023-09-25 13:34:45 +01:00
Ramkumar Ramachandra
ef48e90489 LoopVectorize/iv-select-cmp: add test for decreasing IV out-of-bound
The most straightforward extension to D150851 would involve handling the
decreasing IV case, for which tests have been added in 110ec1863a
(LoopVectorize/iv-select-cmp: add test for decreasing IV, const start).
However, the commit missed a testcase for the out-of-bound sentinel
value LONG_MAX, which should not be vectorized. Fix this by adding a
test corresponding to the following program:

  long test(long *a) {
    long rdx = 331;
    for (long i = LONG_MAX; i >= 0; i--) {
      if (a[i] > 3)
        rdx = i;
    }
    return rdx;
  }

Differential Revision: https://reviews.llvm.org/D157969
2023-09-25 13:20:11 +01:00
Sergey Kachkov
0a5d52a757
[RISCV][CostModel] Add getCFInstrCost RISC-V implementation (#65599)
This patch implements getCFInstrCost TTI hook that mostly affects
LoopVectorizer decisions. It sets zero cost for PHI nodes and zero
throughput cost for branches (assuming that branches are likely to
be predicted). The implementation is similar to X86/AArch64/PowerPC
targets and reduces loop cost by excluding induction PHIs/loop latch
branches, which in turn leads to selecting smaller vectorization
factor.
2023-09-25 12:26:01 +03:00
Florian Hahn
1a9358c090
[LV] Relax over-strict assertion for reduction exit value selects.
After f108c6c, (mul x, 1) is simplified to x, which can cause the select
for the final reduction value when tail-folding to use the reduction
value for both options. Relax the assertion to make sure this case is
allowed.

Note that the reduction is now redundant itself and could be further
simplified.

Fixes #66895.
2023-09-21 10:12:29 +01:00
Dhruv Chawla
3e992d81af
[InferAlignment] Enable InferAlignment pass by default
This gives an improvement of 0.6%:
https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u

Differential Revision: https://reviews.llvm.org/D158600
2023-09-20 12:08:52 +05:30
Nikita Popov
c41b4b6397 [InstCombine] Make flag drop during select equiv fold more generic
Instead of unsetting flags on the instruction, attempting the
fold, and the resetting the flags if it failed, add support to
simplifyWithOpReplaced() to ignore poison-generating flags/metadata
and collect all instructions where they may need to be dropped.

This allows us to perform the fold a) with poison-generating
metadata, which was previously not handled and b) poison-generating
flags/metadata that are not on the root instruction.

Proof for the ctpop case: https://alive2.llvm.org/ce/z/3H3HFs

Fixes https://github.com/llvm/llvm-project/issues/62450.
2023-09-19 14:54:25 +02:00
Florian Hahn
f108c6cdc1
[VPlan] Fold (MUL A, 1) -> A as VPlan2VPlan transform.
Add first VPlan-based recipe simplification to fold (MUL A, 1) -> A.
Among other things, this enables additional simplifications after
applying versioned strides, as follow up to D147783.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D159200
2023-09-18 21:45:34 +01:00
Yingwei Zheng
44e5afdb91
[InstCombine] Generalize foldICmpWithMinMax
This patch generalizes the fold of `icmp pred min/max(X, Y), Z` to address the issue https://github.com/llvm/llvm-project/issues/62898.

For example, we can fold `smin(X, Y) < Z` into `X < Z` when `Y > Z` is implied by constant folds/invariants/dom conditions.

Alive2 (with `--disable-undef-input` due to the limitation of --smt-to=10000): https://alive2.llvm.org/ce/z/rB7qLc
You can run the standalone translation validation tool `alive-tv` locally to verify these transformations.
```
alive-tv transforms.ll --smt-to=600000 --exit-on-error
```

Reviewed By: goldstein.w.n

Differential Revision: https://reviews.llvm.org/D156238
2023-09-11 02:26:48 +08:00
Florian Hahn
3fa1b254b7
[VPlan] Print blend recipe as operand directly, instead of IR PHI.
Update VPBlendRecipe::print() to print the result directly, instead of
relying on the stored Phi pointer. This brings the recipe in line with
how other recipes are printed.
2023-09-04 12:35:58 +01:00
Florian Hahn
cb54522853
[LV] Add test coverage for adding DebugLoc to vector select.
Add missing test coverage for selects with !dbg info.
2023-09-04 12:01:14 +01:00
Nuno Lopes
66a652ab08 recommit test for #65212 2023-09-04 09:17:18 +01:00
Muhammad Omair Javaid
42a46730bb Revert "fix test for #65212"
This reverts commit a0b0d7493db64d897ef68b43636810cfcb12bd22.

It has broken following buildbots:

https://lab.llvm.org/buildbot/#/builders/188/builds/34873
https://lab.llvm.org/buildbot/#/builders/245/builds/13538
https://lab.llvm.org/buildbot/#/builders/65/builds/11074
2023-09-04 12:53:12 +05:00
Nuno Lopes
a0b0d7493d fix test for #65212
I committed the wrong test, sorry.
2023-09-03 17:01:36 +01:00
Nuno Lopes
5a3fd5f3f5 [LoopVectorizer] Fix PR #65212: vectorization of reduction loop wasn't respecting original store alignment 2023-09-03 16:35:05 +01:00
Nuno Lopes
335a9bc4d9 precommit test for #65212 2023-09-03 16:33:57 +01:00
Florian Hahn
fd66195777
[VPlan] Manage compare predicates in VPRecipeWithIRFlags.
Extend VPRecipeWithIRFlags to also manage predicates for compares. This
allows removing the custom ICmpULE opcode from VPInstruction which was a
workaround for missing proper predicate handling.

This simplifies the code a bit while also allowing compares with any
predicates. It also fixes a case where the compare predixcate wasn't
printed properly for VPReplicateRecipes.

Discussed/split off from D150398.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158992
2023-09-02 21:45:24 +01:00
Igor Kirillov
ac65fb8699 [LoopVectorize] Fix incorrect order of invariant stores when there are multiple reductions.
When a loop has multiple reductions, each with an intermediate invariant
store, the order in which those reductions are processed is not considered.
This can result in the invariant stores outside the loop not preserving the
original order.
This patch sorts VPReductionPHIRecipes by the order in which they have
stores in the original loop before running
`InnerLoopVectorizer::fixReduction` function, and it helps to maintain
the correct order of stores.

Fixes https://github.com/llvm/llvm-project/issues/64047

Differential Revision: https://reviews.llvm.org/D157631
2023-08-31 16:21:44 +00:00
Igor Kirillov
2df9ed11c5 [LoopVectorize] Pre-commit tests for D157631
Differential Revision: https://reviews.llvm.org/D157630
2023-08-31 09:50:53 +00:00
Dhruv Chawla
4ea8212775
[NFC][LoopVectorize] Regenerate test checks 2023-08-30 23:22:57 +05:30
Ramkumar Ramachandra
04b1276ad3 LoopVectorize/iv-select-cmp: add tests for truncated IV
The current tests in iv-select-cmp.ll are not representative of clang
output of common real-world C programs, which are often written with i32
induction vars, as opposed to i64 induction vars. Hence, add five tests
corresponding to the following programs:

  int test(int *a, int n) {
    int rdx = 331;
    for (int i = 0; i < n; i++) {
      if (a[i] > 3)
        rdx = i;
    }
    return rdx;
  }

  int test(int *a) {
    int rdx = 331;
    for (int i = 0; i < 20000; i++) {
      if (a[i] > 3)
        rdx = i;
    }
    return rdx;
  }

  int test(int *a, long n) {
    int rdx = 331;
    for (int i = 0; i < n; i++) {
      if (a[i] > 3)
        rdx = i;
    }
    return rdx;
  }

  int test(int *a, unsigned n) {
    int rdx = 331;
    for (int i = 0; i < n; i++) {
      if (a[i] > 3)
        rdx = i;
    }
    return rdx;
  }

  int test(int *a) {
    int rdx = 331;
    for (long i = INT_MIN - 1; i < UINT_MAX; i++) {
      if (a[i] > 3)
        rdx = i;
    }
    return rdx;
  }

The first two can theoretically be vectorized without a runtime-check,
while the third and fourth cannot. The fifth cannot be vectorized, even
with a runtime-check.

This issue was found while reviewing D150851.

Differential Revision: https://reviews.llvm.org/D156124
2023-08-30 13:09:37 +01:00
Florian Hahn
96e83d3705
[LV] Use IRBuilder to create and optimize middle-block compare.
Split off from D150398 to avoid builder-related diff changes there.
Using IRBuilder to create ICmps simplifies the result if both operands
are constants.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158332
2023-08-29 11:42:18 +01:00
David Sherwood
c02184f286 [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop
Suppose we have a nested loop like this:

  void foo(int32_t *dst, int32_t *src, int m, int n) {
    for (int i = 0; i < m; i++) {
      for (int j = 0; j < n; j++) {
        dst[(i * n) + j] += src[(i * n) + j];
      }
    }
  }

We currently generate runtime memory checks as a precondition for
entering the vectorised version of the inner loop. However, if the
runtime-determined trip count for the inner loop is quite small
then the cost of these checks becomes quite expensive. This patch
attempts to mitigate these costs by adding a new option to
expand the memory ranges being checked to include the outer loop
as well. This leads to runtime checks that can then be hoisted
above the outer loop. For example, rather than looking for a
conflict between the memory ranges:

1. &dst[(i * n)] -> &dst[(i * n) + n]
2. &src[(i * n)] -> &src[(i * n) + n]

we can instead look at the expanded ranges:

1. &dst[0] -> &dst[((m - 1) * n) + n]
2. &src[0] -> &src[((m - 1) * n) + n]

which are outer-loop-invariant. As with many optimisations there
is a trade-off here, because there is a danger that using the
expanded ranges we may never enter the vectorised inner loop,
whereas with the smaller ranges we might enter at least once.

I have added a HoistRuntimeChecks option that is turned off by
default, but can be enabled for workloads where we know this is
guaranteed to be of real benefit. In future, we can also use
PGO to determine if this is worthwhile by using the inner loop
trip count information.

When enabling this option for SPEC2017 on neoverse-v1 with the
flags "-Ofast -mcpu=native -flto" I see an overall geomean
improvement of ~0.5%:

SPEC2017 results (+ is an improvement, - is a regression):
520.omnetpp: +2%
525.x264: +2%
557.xz: +1.2%
...
GEOMEAN: +0.5%

I didn't investigate all the differences to see if they are
genuine or noise, but I know the x264 improvement is real because
it has some hot nested loops with low trip counts where I can
see this hoisting is beneficial.

Tests have been added here:

  Transforms/LoopVectorize/runtime-checks-hoist.ll

Differential Revision: https://reviews.llvm.org/D152366
2023-08-24 12:14:02 +00:00
David Sherwood
494d28ec07 [LoopVectorize] Add pre-commit tests for D152366
Differential Revision: https://reviews.llvm.org/D154075
2023-08-24 10:52:18 +00:00
Florian Hahn
c071dba1a3
[LV] update hexagon test to use load results.
The current version of the test doesn't use any of the loads, so they
can be removed together with the mask of the interleave group.

Use some loaded values and store them, to prevent the mask from being
optimized away.
2023-08-22 20:20:58 +01:00
Florian Hahn
34d25924c4
[VPlan] Mark some VPInstruction opcodes as not having side effects.
Mark some VPInstruction opcodes as not having side effects, preparation
for D157037.
2023-08-22 20:05:57 +01:00
Kolya Panchenko
acbe886880 [LV] Vectorization remark for outerloop
Reviewed By: fhahn, ABataev

Differential Revision: https://reviews.llvm.org/D150696
2023-08-21 13:05:06 -04:00