Add a number of LAA test cases with both forward and backward
dependences with non-constant strides and dependence distances.
This includes test coverage for
https://github.com/llvm/llvm-project/issues/87336
Also includes a LoopLoadElimination test to make sure the pass does not
crash on non-constant dependence distances.
At the moment, getUnderlyingObjects simply continues for phis that do
not refer to the same underlying object in loops, without adding them to
the list of underlying objects, effectively ignoring those phis.
Instead of ignoring those phis, add them to the list of underlying
objects. This fixes a miscompile where LoopAccessAnalysis fails to
identify a memory dependence, because no underlying objects can be found
for a set of memory accesses.
Fixes https://github.com/llvm/llvm-project/issues/82665.
PR: https://github.com/llvm/llvm-project/pull/84339
Similar to d39b4ce3ce8a3c256e01bdec2b140777a332a633
Using "eabi" or "gnueabi" for aarch64 targets is a common mistake and
warned by Clang Driver. We want to avoid them elsewhere as well. Just
use the common "aarch64" without other triple components.
LAA currently adds memory locations with their original AATags to AST.
However, scoped alias AATags may be valid only within one loop
iteration, while LAA reasons across iterations.
Fix this by determining which alias scopes are defined inside the loop,
and drop AATags that reference these scopes.
Fixes https://github.com/llvm/llvm-project/issues/79137.
depend_diff_types.ll already covers the same tests afer it hs been
converted to opaque pointersj, so remove the redundant
depend_diff_types_opaque_ptr.ll
As shown in #70473, the following loop was not considered safe to
vectorize. When determining the memory access dependencies in
a loop which has negative iteration step, we invert the source and
sink of the dependence. Perhaps we should just invert the operands
to getMinusSCEV(). This way the dependency is not regarded to be
true, since the users of the `IsWrite` variables, which correspond to
each of the memory accesses, rely on program order and therefore
should not be swapped.
void vectorizable_Read_Write(int *A) {
for (unsigned i = 1022; i >= 0; i--)
A[i+1] = A[i] + 1;
}
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
After 9645267, TypeByteSize is 0 if both access do not have the same
size (i.e. HasSameSize will be false). This can cause an infinite loop
in couldPreventStoreLoadForward, if HasSameSize is not checked first.
So check HasSameSize first instead of after
couldPreventStoreLoadForward. Checking HasSameSize first is also
cheaper.
Auto-generate checks for -loop-carried.ll to make it easier to update in
follow-on patch. As this test only checks the dependence, mark pointers
as noalias to avoid also checking various runtime pointer check groups.
Add an additional test case where we currently incorrectly identify a
dependence as Foward instead of ForwardButPreventsForwarding.
Also cleans up the names in the tests a bit to improve readability.
This patch adds a new dependence kind UnsafeIndirect, for cases where at
least one of the memory access instructions may access a loop varying object,
e.g. the address of underlying object is loaded inside the loop, like A[B[i]].
We cannot determine direction or distance in those cases, and also are unable
to generate any runtime checks.
This fixes a miscompile, if we attempt to generate runtime checks for
unknown dependencies.
Note that in most cases we do not attempt to generate runtime checks for
unknown dependences, except if FoundNonConstantDistanceDependence is
true.
Fixes https://github.com/llvm/llvm-project/issues/69744.
SCEV expressions may contain multiple {{ or }} in the debug output,
which needs escaping.
See
llvm/test/Analysis/LoopAccessAnalysis/loops-with-indirect-reads-and-writes.ll
for a test that needs escaping.
Note that both loops in the tests are needed to incorrectly determine that
the loops are safe with runtime checks via FoundNonConstantDistanceDependence
handling code in LAA.
Currently the loop access analysis classifies this loop as unsafe to
vectorize because the memory dependencies are
'ForwardButPreventsForwarding'. However, the access pattern is
'write-after-read' with no subsequent read accessing the written memory
locations. I can't see how store-to-load forwarding is applicable here.
void vectorizable_Read_Write(int *A) {
for (unsigned i = 1022; i >= 0; i--)
A[i+1] = A[i] + 1;
}
update_analyze_test_checks.py is an invaluable tool in updating tests.
Unfortunately, it only supports output from the CostModel,
ScalarEvolution, and LoopVectorize analyses. Many LoopAccessAnalysis
tests use hand-crafted CHECK lines, and it is moreover tedious to
generate these CHECK lines, as the output fom the analysis is not
stable, and requires the test-writer to hand-craft FileCheck matches.
Alleviate this pain, and support output from:
$ opt -passes='print<loop-accesses>'
This patch includes several non-trivial changes including:
- Preserving whitespace at the beginning of the line, so that the LAA
output can be properly indented.
- Regexes matching the unstable output, which is basically a pointer
address hex.
- Separating is_analyze from preserve_names clearly, as the former was
formerly used as an overload for the latter.
To demonstate the utility of this patch, several tests in
LoopAccessAnalysis have been auto-generated by
update_analyze_test_checks.py.
Given a function like the following: https://godbolt.org/z/T9c99fr88
```c
1161_noReadWrite(int *Preds) {
for (int i = 0; i < LEN_1D-1; ++i) {
if (Preds[i] != 0)
b[i] = c[i] + 1;
else
a[i] = i * i;
}
}
```
LLVM will optimize the IR to a single store by a phi instruction:
```llvm
%1 = load ptr, ptr @a, align 64
%2 = load ptr, ptr @b, align 64
...
for.inc:
%.sink = phi ptr [ %1, %if.then ], [ %2, %if.else ]
%add.sink = phi double [ %add, %if.then ], [ %conv8, %if.else ]
%arrayidx7 = getelementptr inbounds double, ptr %.sink, i64 %indvars.iv
store double %add.sink, ptr %arrayidx7, align 8
```
LAA is currently unable to analyze such IR, since ScalarEvolution will
return a SCEVUnknown for the forked pointer operand of the store.
This patch adds initial optional support for analyzing both
possibilities for the pointer and allowing LAA to generate runtime
checks for the bounds if required, refers to D108699, but here address
the phi node.
Fixes https://github.com/llvm/llvm-project/issues/64888
Reviewed By: huntergr-arm, fhahn
Differential Revision: https://reviews.llvm.org/D158965
`MaxSafeDepDistBytes` was not correct based on its name an semantics
in instances when there was a non-unit stride loop. For example,
```
for (int k = 0; k < len; k+=3) {
a[k] = a[k+4];
a[k+2] = a[k+6];
}
```
Here, the smallest dependence distance is 24 bytes, but only vectorizing 8 bytes
is safe. `MaxSafeVectorWidthInBits` reported the correct number of bits
that could be vectorized as 64 bits.
The semantics of of `MaxSafeDepDistBytes` should be:
The smallest dependence distance in bytes in the loop. This may not be
the same as the maximum number of bytes that are safe to operate on
simultaneously.
The name of this variable should reflect those semantics and
its docstring should be updated accordingly, `MinDepDistBytes`.
A debug message that used `MaxSafeDepDistBytes` to signify to the user
how many bytes could be accessed in parallel is updated to use
`MaxSafeVectorWidthInBits` instead. That way, the same message if
communicated to the user, just in different units.
This patch makes sure that when `MinDepDistBytes` is modified in a way
that should impact `MaxSafeVectorWidthInBits`, that we update the latter
accordingly. This patch also clarifies why `MaxSafeVectorWidthInBits`
does not to be updated when `MinDepDistBytes` is (i.e. in the case of a
forward dependency).
Differential Revision: https://reviews.llvm.org/D156158
The default invalidate method for analysis results is just looking
at the preserved state of the pass itself. It does not consider if
the analysis has an internal state that depend on other analyses.
Thus, we need to implement LoopAccessInfoManager::invalidate in order
to catch if LoopAccessAnalysis needs to be invalidated due to
transitive analyses such as AAManager is being invalidated. Otherwise
we might end up having references to an AAManager that is stale.
Fixes https://github.com/llvm/llvm-project/issues/61324
Differential Revision: https://reviews.llvm.org/D146206
When converting this test to opaque pointers (and dropping bitcast),
we get improved memory checks. Per fhahn:
> It looks like the difference is due to the logic that determines
> pointer strides in LAA not handling bitcasts. Without the
> bitcasts, the logic now triggers successfully.
Differential Revision: https://reviews.llvm.org/D140204
Over the past day or so, i've took a large swing at our tests,
and reduced the number of tests that were still using the old syntax
from ~1800 to just 200.
Left to handle: (as it is seen in this patch)
* Transforms/LSR
* Transforms/CGP
* Transforms/TypePromotion
* Transforms/HardwareLoops
* Analysis/*
* some misc.
I think this is the right point to start actively refusing
to honor the old syntax, except for the old tests,
to prevent the old syntax from creeping back in.
Thus, let's add temporary default-off flag,
and if it is not passed refuse to accept old syntax.
The tests that still need porting are annotated with this flag.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D139647
LAA analyzes cross-iteration memory dependencies, as such AA should
not make assumptions about equality of values inside the loop, as
they may come from different iterations.
Fix this by exposing the MayBeCrossIteration AA flag and enabling
it for LAA.
Differential Revision: https://reviews.llvm.org/D137958
The IR from https://github.com/llvm/llvm-project/issues/57368 results
in an assert firing when trying to create a runtime check for the
forked pointer. One of the forks is fine since it's loop invariant,
but the other is a scAddExpr (containing a scAddRecExpr, so not
invariant) when RtCheck::insert expects a scAddRecExpr.
This is a simple fix to just avoid forks which aren't AddRec or
loop invariant. We can allow it as a forked pointer later with
more work.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D133020
This updates the naming for the LAA printing pass to be in line with
most other analysis printing passes.
The old name has come up as confusing multiple times already, e.g. in
D131924.
Handle cases where a forked pointer has an add or sub instruction
before reaching a select.
Reviewed By: fhahn
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D130278
As test in PR56672 shows, LAA produces different results which lead to either
positive or negative vectorization decisions depending on the order of blocks
in loop. The exact reason of this is not clear to me, however this makes investigation
of related bugs extremely complex.
Current order of blocks in the loop is arbitrary. It may change, for example, if loop
info analysis is dropped and recomputed. Seems that it interferes with LAA's logic.
This patch chooses fixed traversal order of blocks in loops, making it RPOT.
Note: this is *not* a fix for bug with incorrect analysis result. It just makes
the answer more robust to make the investigation easier.
Differential Revision: https://reviews.llvm.org/D130482
Reviewed By: aeubanks, fhahn
Adds new tests for add and sub instructions before reaching a select.
Also adds tests using different bit widths for memory, including
non-power-of-two integers.
This builds on the previous forked pointers patch, which only accepted
a single select as the pointer to check. A recursive function to walk
through IR has been added, which searches for either a loop-invariant
or addrec SCEV.
This will only handle a single fork at present, so selects of selects
or a GEP with a select for both the base and offset will be rejected.
There is also a recursion limit with a cli option to change it.
Reviewed By: fhahn, david-arm
Differential Revision: https://reviews.llvm.org/D108699
* Converted tests to use opaque pointers
* Added suggested test for inbounds GEP
* Added a test for forks on both the base and offset terms of a GEP
* Added a test for a select of a select
* Added a test for a GEP with >2 operands
* Added a test for vector GEPs
This reverts commit 7aa8a678826dea86ff3e6c7df9d2a8a6ef868f5d.
This version includes fixes to address issues uncovered after
the commit landed and discussed at D11448.
Those include:
* Limit select-traversal to selects inside the loop.
* Freeze pointers resulting from looking through selects to avoid
branch-on-poison.
Scaffolding support for generating runtime checks for multiple SCEV expressions
per pointer. The initial version just adds support for looking through
a single pointer select.
The more sophisticated logic for analyzing forks is in D108699
Reviewed By: huntergr
Differential Revision: https://reviews.llvm.org/D114487