679 Commits

Author SHA1 Message Date
Nikita Popov
30faaaf626 [LoopVectorize] Regenerate test checks (NFC) 2023-10-12 14:35:23 +02:00
Dhruv Chawla
3e992d81af
[InferAlignment] Enable InferAlignment pass by default
This gives an improvement of 0.6%:
https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u

Differential Revision: https://reviews.llvm.org/D158600
2023-09-20 12:08:52 +05:30
Yingwei Zheng
44e5afdb91
[InstCombine] Generalize foldICmpWithMinMax
This patch generalizes the fold of `icmp pred min/max(X, Y), Z` to address the issue https://github.com/llvm/llvm-project/issues/62898.

For example, we can fold `smin(X, Y) < Z` into `X < Z` when `Y > Z` is implied by constant folds/invariants/dom conditions.

Alive2 (with `--disable-undef-input` due to the limitation of --smt-to=10000): https://alive2.llvm.org/ce/z/rB7qLc
You can run the standalone translation validation tool `alive-tv` locally to verify these transformations.
```
alive-tv transforms.ll --smt-to=600000 --exit-on-error
```

Reviewed By: goldstein.w.n

Differential Revision: https://reviews.llvm.org/D156238
2023-09-11 02:26:48 +08:00
Florian Hahn
96e83d3705
[LV] Use IRBuilder to create and optimize middle-block compare.
Split off from D150398 to avoid builder-related diff changes there.
Using IRBuilder to create ICmps simplifies the result if both operands
are constants.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D158332
2023-08-29 11:42:18 +01:00
Anna Thomas
5dfdf34df0 [LV] Move interleaved test to X86 directory
Remove the x86-registered-target under REQUIRES.
2023-08-09 16:03:33 -04:00
Anna Thomas
3cf24dbbdd [LV] Complete load groups and release store groups. Try 2.
This is a complete fix for CompleteLoadGroups introduced in
D154309. We need to check for dependency between A and every member of
the load Group of B.
This patch also fixes another miscompile seen when we incorrectly sink stores
below a depending load (see testcase in
interleaved-accesses-sink-store-across-load.ll). This is fixed by
releasing store groups correctly.

This change was previously reverted (e85fd3cbdd68) due to Asan failure with
use-after-free error. A testcase is added and the bug is fixed in this
version of the patch.

Differential Revision: https://reviews.llvm.org/D155520
2023-08-08 18:10:23 -04:00
Florian Hahn
707359ecf5
Recommit "[LV] Re-use existing broadcast value for live-ins."
This reverts commit 245ec675a4e41f7ec24dfc998720bffdc46a6c53.

Recommits eea9258648ce with a fix to only erase the instruction from the
first part if it is defined outside the loop. This fixes a
use-after-free error reported.
2023-08-01 15:54:02 +01:00
Nikita Popov
d01aec4c76 [InstCombine] Set dead phi inputs to poison in more cases
Set phi inputs to poison whenever we find a dead edge (either
during initial worklist population or the main InstCombine run),
instead of only doing this for successors of dead blocks.

This means that the phi operand is set to poison even if for
critical edges without an intermediate block.

There are quite a few test changes, because the pattern is fairly
common in vectorizer output, for cases where we know the vectorized
loop will be entered.
2023-08-01 11:53:47 +02:00
Anna Thomas
e85fd3cbdd Revert "[LV] Complete load groups and release store groups in presence of dependency"
This reverts commit eaf6117f3388615f51198e47c0d6be0252729508 (D155520).
There's an ASAN build failure that needs investigation.
2023-07-26 15:07:26 -04:00
Anna Thomas
eaf6117f33 [LV] Complete load groups and release store groups in presence of dependency
This is a complete fix for CompleteLoadGroups introduced in
D154309. We need to check for dependency between A and every member of
the load Group of B.
This patch also fixes another miscompile seen when we incorrectly sink stores
below a depending load (see testcase in
interleaved-accesses-sink-store-across-load.ll). This is fixed by
releasing store groups correctly.

Differential Revision: https://reviews.llvm.org/D155520
2023-07-25 17:32:09 -04:00
Martin Storsjö
245ec675a4 Revert "[LV] Re-use existing broadcast value for live-ins."
This reverts commit eea9258648ce73507f6f85c395de978af659d498.

That commit triggered crashes in the following testcase:

$ cat reduced.c
typedef struct {
  int a[8]
} b;
typedef struct {
  b *c;
  short d
} e;
void f() {
  int g;
  char *h;
  e *i = f;
  short j = i->d;
  int a = i->c->a[0];
  for (;;)
    for (; g < a; g++) {
      *h = j * i->d >> 8;
      h++;
    }
}
$ clang -target aarch64-linux-gnu -w -c -O2 reduced.c
2023-07-25 10:35:41 +03:00
Florian Hahn
eea9258648
[LV] Re-use existing broadcast value for live-ins.
When requesting a vector value for a live-in, we can re-use the
broadcast of the live-in of part 0 for parts > 0.
2023-07-24 11:50:47 +01:00
Anna Thomas
a5573bf030 [LV] Precommit test for interleaving miscompile
Identified another miscompile while working on fixing interleaving's
current miscompile in D154309. This is different from testcases landed in D154309,
since it showcases an incorrect sinking of store (the former testcases
in that review and follow-up ones) showed incorrect hoisting of loads
across stores.
2023-07-17 17:24:40 -04:00
Anna Thomas
dfaf4587e4 Precommit follow-up testcase for interleaved miscompile
Follow-up testcase for PR63602.

Suggested by Ayal in D154309, more complete fix coming up which should
handle this testcase as well.
2023-07-14 16:04:56 -04:00
Florian Hahn
14ec3f4b06
[LV] Skip VFs > # iterations remaining for epilogue vectorization.
If a candidate VF for epilogue vectorization is greater than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154264
2023-07-07 21:43:51 +01:00
Florian Hahn
aee851fd0e
Revert "[LV] Skip VFs < iterations remaining for epilogue vectorization."
This reverts commit 7cc0be01a0068946ea3613dc2cb45c81b0f45860.

The title of the commit is incorrect, revert to fix the commit message.
2023-07-07 21:41:24 +01:00
Florian Hahn
7cc0be01a0
[LV] Skip VFs < iterations remaining for epilogue vectorization.
If a candidate VF for epilogue vectorization is less than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154264
2023-07-07 20:33:42 +01:00
Florian Hahn
4d847bf4d0
[LV] Do not add load to group if it moves across conflicting store.
This patch prevents invalid load groups from being formed, where a load
needs to be moved across a conflicting store.

Once we hit a store that conflicts with a load with an existing
interleave group, we need to stop adding earlier loads to the group, as
this would force hoisting the previous stores in the group across the
conflicting load.

To detect such cases, add a new CompletedLoadGroups set, which is used
to keep track of load groups to which no earlier loads can be added.

Fixes https://github.com/llvm/llvm-project/issues/63602

Reviewed By: anna

Differential Revision: https://reviews.llvm.org/D154309
2023-07-07 11:06:30 +01:00
Florian Hahn
a0fcf84a8c
[LV] Consider if scalar epilogue is required in getMaximizedVFForTarget.
When a scalar epilogue is required, at least one iteration of the scalar loop
has to execute. Adjust ConstTripCount accordingly to avoid picking a max VF
that results in a dead vector loop.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D154261
2023-07-06 13:35:35 +01:00
Florian Hahn
8a25dc3787
[LV] Regenerate check lines to reduced diff.
Regenerate checks to avoid unnecessary changes in D154264.
2023-07-04 14:01:05 +01:00
Florian Hahn
e561edaaa5
[LV] Prepare tests for D154261.
Update trip count of test in
pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll to
make sure epilogue vectorization will still trigger after D154261,
checking for the original issue.

Move the original test to limit-vf-by-tripcount.ll for testing new
functionality of D154261.
2023-07-03 17:49:36 +01:00
Florian Hahn
c14b0a7c55
[LV] Check for vector instruction in main vector loop.
Update the test to check for the vectorization call in the main vector
loop, not the dead epilogue vector loop as it does currently.
2023-07-03 14:16:47 +01:00
Florian Hahn
6954cb5425
[LV] Add test case for #63602. 2023-07-02 22:17:16 +01:00
Florian Hahn
9078a9942d
[LV] Add additional tests with dead vector epilogues. 2023-06-30 12:17:57 +01:00
Simon Pilgrim
4cbedaeff5 [LoopVectorize][X86] Regenerate slm-no-vectorize.ll 2023-06-13 14:15:37 +01:00
Nikita Popov
9cf67f6ea0 [LoopVectorize] Convert most tests to opaque pointers (NFC)
The unsized-pointee-crash.ll and zero-sized-pointee-crash.ll tests
have been removed, because these issues are not relevant for opaque
pointers.
2023-06-12 13:10:22 +02:00
Florian Hahn
8f781b96e2
Revert "[VPlan] Mark recurrence recipes as not having side-effects."
This reverts commit 02369b75fdd7b5fc5d9b47f1b60587c225918511.

At the moment, live-outs used *only* for the resume values in the scalar
loop are not modeled in VPlan yet. This means first-order recurrence
recipes could be removed, when a scalar epilogue is required and the
only use of a FOR is outside the loop.

Keep treating recurrence recipes as having side-effects for now, to
avoid them being removed.

Fixes #62954.
2023-06-06 11:35:26 +02:00
Florian Hahn
f47084ecfb
[LV] Use force-vector-width for X86 recurrence test.
This makes sure that all tests that can be vectorized in the file are
vectorized.
2023-06-06 11:27:35 +02:00
Florian Hahn
4c51a45e80
[LV] Add test for #62954. 2023-06-06 11:20:22 +02:00
Florian Hahn
572cfa3fde
[LV] Use SCEV for uniformity analysis across VF
This patch uses SCEV to check if a value is uniform across a given VF.

The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the vector lane 0
(offset 0) compare it to the expressions for lanes 1 to the last vector
lane (VF - 1). If they are equal, consider the expression uniform.

While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D148841
2023-05-31 16:01:00 +01:00
Tobias Hieta
f84bac329b
[NFC][Py Reformat] Reformat lit.local.cfg python files in llvm
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: barannikov88, kwk

Differential Revision: https://reviews.llvm.org/D150762
2023-05-17 17:03:15 +02:00
Florian Hahn
3d4eed0133
[LV] Reuse SCEV expansion results for epilogue vectorization.
When generating code for the epilogue vector loop, we need to re-use the
expansion results for induction steps generated for the main vector
loop, as the pre-header of the epilogue vector loop may not dominate the
vector preheader of the epilogue.

This fixes a reported crash. Note that this is a workaround which should
be removed soon once induction resume value creation is handled in VPlan
directly.
2023-05-11 22:00:07 +01:00
Philip Reames
cb3cb417a0 [LV] Refresh some auto-gen tests to reduce diff [nfc] 2023-05-01 13:55:11 -07:00
Noah Goldstein
d840391401 [ValueTracking] Add logic for isKnownNonZero(smin/smax X, Y)
For `smin` if either `X` or `Y` is negative, the result is non-zero.
For `smax` if either `X` or `Y` is strictly positive, the result is
non-zero.

For both if `X != 0` and `Y != 0` the result is non-zero.

Alive2 Link:
    https://alive2.llvm.org/ce/z/7yvbgN
    https://alive2.llvm.org/ce/z/zizbvq

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149417
2023-04-30 10:06:46 -05:00
ManuelJBrito
8b56da5e9f [IR] Change shufflevector undef mask to poison
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.

Differential Revision: https://reviews.llvm.org/D149210
2023-04-27 14:41:10 +01:00
David Green
1869a9c225 [LV] Use the known trip count when costing non-tail folded VFs
Now that we store the ScalarCost in the VectorizationFactor it is possible to
use it to get a slightly more accurate cost in isMoreProfitable between two
vector factors. This extends the logic added in D101726 to non-tail-folded
cases, using the costs of `VecCost * (TripCount / VF) + ScalarCost * (TripCount % VF)`
to compare VFs where the TripCount is known and we are not folding the tail.

This shouldn't alter very much as small trip counts are usually not vectorized,
but does seem to help in the testcase where 4 * VF4 is chosen as profitable
compared to 2 * VF8 + 4 * scalar.

Differential Revision: https://reviews.llvm.org/D147720
2023-04-24 22:02:30 +01:00
Nikita Popov
2ec1d0f427 [InstCombine] Don't reassociate GEPs for loop invariance
Since D146813, LICM will reassociate GEPs to expose hoisting
opportunities itself. Don't perform this transform in InstCombine,
where it is fragile because it depends on an optional LoopInfo
analysis.
2023-04-18 12:17:07 +02:00
Craig Topper
4b47d875a1 [LV] Optimize trip count SCEV.
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.

The code here is based on similar code I found in
LoopIdiomRecognize.

This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D147355
2023-04-12 16:17:58 -07:00
Florian Hahn
35af27c30a
[VPlan] Only create extracts for recurrence exits if there are live-outs.
Move the code to collect live-out earlier and only generate extracts for
exit values if there are any live-outs that use them.

Depends on D147472.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147567
2023-04-10 21:08:34 +01:00
Nikita Popov
53f7f85703 [LoopVectorize] Convert some tests to opaque pointers (NFC) 2023-04-06 09:38:47 +02:00
Florian Hahn
0d61ffd350
[Loads] Support SCEVAddExpr as start for pointer AddRec.
Extend handling to support `%base + offset` as start for AddRecs in
isDereferenceableAndAlignedInLoop. This is done by adjusting AccessSize
by the offset and effectively checking if the full object starting from
%base to %base + offset + access-size is dereferenceable.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D147260
2023-04-02 12:33:44 +01:00
Anna Thomas
4277d932ef [LV] Use speculatability within entire loop to avoid strided load predication
Use existing functionality for identifying total access size by strided
loads. If we can speculate the load across all vector iterations, we can
avoid predication for these strided loads (or masked gathers in
architectures which support it).

Differential Revision: https://reviews.llvm.org/D145616
2023-03-21 12:08:25 -04:00
Sanjay Patel
ef6f23535d Revert "[InstCombine] use loop info when running the pass after loop vectorization"
This reverts commit 43ae4b62b2671cf73e691c0b53324cd39405cd51.

This was intended to be practically NFC in terms of the overall
opt pipeline, but there is experimental data showing that code
changes occurred here:
https://llvm-compile-time-tracker.com/compare.php?from=772aa05452f8ff90a47168e6801cda2acb5a1873&to=43ae4b62b2671cf73e691c0b53324cd39405cd51&stat=size-text
2023-03-11 17:28:56 -05:00
Sanjay Patel
43ae4b62b2 [InstCombine] use loop info when running the pass after loop vectorization
This is the follow-up to D144199 and suggestion from D144045.
We make use of loop info explicit via InstCombine pass parameter
rather than semi-arbitrary via caching.

The only InstCombine transform that uses LoopInfo currently is a
GEP fold in visitGEPOfGEP(), so that shows up as a failure in the
dedicated test for the fold as well as several LoopVectorizer tests
that run extra passes.

I don't see any pass manager regression tests that actually check
for pass options, but this is intended to be NFC for the pass
pipeline behavior - we only try to use loop info where it would
have been used before via caching .

Differential Revision: https://reviews.llvm.org/D144274
2023-03-11 14:20:30 -05:00
Anna Thomas
ac4c0ea73b [Tests] Precommit tests for D145616 2023-03-08 17:30:53 -05:00
Florian Hahn
79272ec028
[VPlan] Add predicate to VPReplicateRecipe, expand region later.
This patch adds the predicate as additional operand to VPReplicateRecipe
during initial construction. The predicated recipes are later moved into
replicate regions. This simplifies constructions and some VPlan
transformations, like fixed-order recurrence handling.

It also improves codegen in some cases (e.g. for in-loop reductions),
because the recipes remain in the same block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D143865
2023-03-08 20:11:28 +01:00
Florian Hahn
7019624ee1
[SCEV] Strengthen nowrap flags via ranges for ARs on construction.
At the moment, proveNoWrapViaConstantRanges is only used when creating
SCEV[Zero,Sign]ExtendExprs. We can get significant improvements by
strengthening flags after creating the AddRec.

I'll also share a follow-up patch that removes the code to strengthen
flags when creating SCEV[Zero,Sign]ExtendExprs. Modifying AddRecs while
creating those can lead to surprising changes.

Compile-time looks neutral:
https://llvm-compile-time-tracker.com/compare.php?from=94676cf8a13c511a9acfc24ed53c98964a87bde3&to=aced434e8b103109104882776824c4136c90030d&stat=instructions:u

Reviewed By: mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D144050
2023-03-07 17:10:34 +01:00
sgokhale
4f9a5447c6 [LV] Reland "Update logic for calculating register usage due to invariants"
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 17:32:39 +05:30
sgokhale
3c8ddbde37 Revert "[LV] Update logic for calculating register usage due to invariants"
Observing test failure for llvm/test/Transforms/LoopVectorize/AArch64/reg-usage.ll

This reverts commit d1628266946fdddb44bdad2b3ccf3cd5fc769f42.
2023-02-28 15:46:59 +05:30
sgokhale
d162826694 [LV] Update logic for calculating register usage due to invariants
Previously, while calculating register usage due to invariants, it was assumed that invariant would always be part of widening
instructions. This resulted in calculating vector register types for vectors which cant be legalized(check the newly added test for more details).

An invariant might not always need a vector register. For e.g., invariant might just be used for iteration check.

This patch checks if the invariant is part of any widening instruction and considers register usage accordingly. Fixes issue 60493

Differential Revision: https://reviews.llvm.org/D143422
2023-02-28 11:05:26 +05:30