This is purely so that we can expose and work through downstream codegen issues. My intention is to see if we can get this disabled by default, but that requires fixing a bunch of downstream issues first.
The previous code structure and comments were exceedingly confusing. I have multiple times looked at this code and suspected a bug. This time, I decided to take the time to reflow the code and comment out why it is correct.
The only suspect (to me) case left is that an underaligned access with a unit stride (in terms of the access type) might miss the undefined null pointer when wrapping. This is unlikely to be an issue for C/C++ code with real page sizes, so I'm not bothering to fully convince myself whether that case is correct or not.
The vectorisation analysis collects strides for loop invariant
pointers, which is wrong because they are not strided. We don't
need to generate SCEV checks (which are costly performancewise)
for such pointers, we just need to do the appropriate aliasing
checks.
This patch fixes the problem by changing getStrideFromPointer()
to treat loop invariant pointers as having no stride.
Originally proposed by David Sherwood with further suggestions
from Florian Hahn.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D146958
When debugging and using debug-pass-manager (e.g. in regression tests)
we prefer a consistent order in which analysis passes are executed.
But when for example doing
return MyClass(AM.getResult<LoopAnalysis>(F),
AM.getResult<DominatorTreeAnalysis>(F));
then the order in which LoopAnalysis and DominatorTreeAnalysis isn't
guaranteed, and might for example depend on which compiler that is
used when building LLVM.
I've not scanned the full source tree, but this fixes some occurances
of the above pattern found in lib/Analysis.
This problem was discussed briefly in review for D146206.
The default invalidate method for analysis results is just looking
at the preserved state of the pass itself. It does not consider if
the analysis has an internal state that depend on other analyses.
Thus, we need to implement LoopAccessInfoManager::invalidate in order
to catch if LoopAccessAnalysis needs to be invalidated due to
transitive analyses such as AAManager is being invalidated. Otherwise
we might end up having references to an AAManager that is stale.
Fixes https://github.com/llvm/llvm-project/issues/61324
Differential Revision: https://reviews.llvm.org/D146206
LAA analyzes cross-iteration memory dependencies, as such AA should
not make assumptions about equality of values inside the loop, as
they may come from different iterations.
Fix this by exposing the MayBeCrossIteration AA flag and enabling
it for LAA.
Differential Revision: https://reviews.llvm.org/D137958
D138014 restricted AST to work on immutable IR. This means it is
also safe to use a single BatchAA instance for the entire AST
lifetime, instead of only batching parts of individual queries.
The primary motivation for this is not compile-time, but rather
having a central place to control cross-iteration AA, which will
be used by D137958.
Differential Revision: https://reviews.llvm.org/D137955
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Simplify LoopAccessLegacyAnalysis by using LoopAccessInfoManager from
D134606. As a side-effect this also removes printing support from
LoopAccessLegacyAnalysis.
Depends on D134606.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134608
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.
To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.
Fixes#50940.
Fixes#51669.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134606
This is purely NFC restructure in advance of a change which actually exposes zero strides. This is mostly because I find this interface confusing each time I look at it.
The IR from https://github.com/llvm/llvm-project/issues/57368 results
in an assert firing when trying to create a runtime check for the
forked pointer. One of the forks is fine since it's loop invariant,
but the other is a scAddExpr (containing a scAddRecExpr, so not
invariant) when RtCheck::insert expects a scAddRecExpr.
This is a simple fix to just avoid forks which aren't AddRec or
loop invariant. We can allow it as a forked pointer later with
more work.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D133020
The simpler diff-checks require pointers with add-recs from the same
innermost loop, but this property wasn't check completely. Add the
missing check to ensure both addrecs are in the innermost loop.
Fixes#57315.
When we have a dependency with a dependence distance which can only be hit on an iteration beyond the actual trip count of the loop, we can ignore that dependency when analyzing said loop. We already had this code, but had restricted it solely to unknown dependence distances. This change applies it to all dependence distances.
Without this code, we relied on the vectorizer reducing VF such that our infeasible dependence was respected. This usually worked out to about the same result, but not always. For fixed length vectorization, this could mean a smaller VF than optimal being chosen or additional runtime checks. For scalable vectorization - where the bounds on access implied by VF are broader - we could often not find a feasible VF at all.
Differential Revision: https://reviews.llvm.org/D131924
A debug message in `LoopAccessAnalysis` did not have a newline in it, causing printed debug messages to be formatted incorrectly.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D132172
Handle cases where a forked pointer has an add or sub instruction
before reaching a select.
Reviewed By: fhahn
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D130278
As test in PR56672 shows, LAA produces different results which lead to either
positive or negative vectorization decisions depending on the order of blocks
in loop. The exact reason of this is not clear to me, however this makes investigation
of related bugs extremely complex.
Current order of blocks in the loop is arbitrary. It may change, for example, if loop
info analysis is dropped and recomputed. Seems that it interferes with LAA's logic.
This patch chooses fixed traversal order of blocks in loops, making it RPOT.
Note: this is *not* a fix for bug with incorrect analysis result. It just makes
the answer more robust to make the investigation easier.
Differential Revision: https://reviews.llvm.org/D130482
Reviewed By: aeubanks, fhahn
No need to add checks for every type per pointer that we couldn't create
a check for the first time around, just the types that weren't
successful.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D119376
Noticed via inspection; to my knowledge, impossible to hit today. In theory, we could have a fixed stride check be analyzed, then a scalable one. With the old code, the scalable one would be silently dropped, and the runtime guard would go ahead with only the fixed one. This would be a miscompile.
llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: no viable conversion from returned value of type 'SmallVector<[...], 2>' to function return type 'SmallVector<[...], (default)
CalculateSmallVectorDefaultInlinedElements<T>::value aka 3>'
return Scevs;
^~~~~
This builds on the previous forked pointers patch, which only accepted
a single select as the pointer to check. A recursive function to walk
through IR has been added, which searches for either a loop-invariant
or addrec SCEV.
This will only handle a single fork at present, so selects of selects
or a GEP with a select for both the base and offset will be rejected.
There is also a recursion limit with a cli option to change it.
Reviewed By: fhahn, david-arm
Differential Revision: https://reviews.llvm.org/D108699
This reverts commit 7aa8a678826dea86ff3e6c7df9d2a8a6ef868f5d.
This version includes fixes to address issues uncovered after
the commit landed and discussed at D11448.
Those include:
* Limit select-traversal to selects inside the loop.
* Freeze pointers resulting from looking through selects to avoid
branch-on-poison.
This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.
The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
`VF * UF * AccessSize`. It is sufficient to check
`(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.
Note that the initial version is restricted in multiple ways:
1. Pointers must only either be read or written, by a single
instruction (this allows re-constructing source/sink for
dependences with the available information)
2. Source and sink pointers must be add-recs, with matching steps
3. The step must be a constant.
3. abs(step) == AccessSize.
Most of those restrictions can be relaxed in the future.
See https://github.com/llvm/llvm-project/issues/53590.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D119078
Scaffolding support for generating runtime checks for multiple SCEV expressions
per pointer. The initial version just adds support for looking through
a single pointer select.
The more sophisticated logic for analyzing forks is in D108699
Reviewed By: huntergr
Differential Revision: https://reviews.llvm.org/D114487
Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.
Ordered fadd reductions are not yet supported.
Differential Revision: https://reviews.llvm.org/D110235
Adds new optimization remarks when loop vectorization fails due to
the compiler being unable to find bound of an array access inside
a loop
Differential Revision: https://reviews.llvm.org/D115873
Note that this doesn't actually cause the top level predicate to become a non-union just yet.
The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
Previously we relied on the pointee type to determine what type we need
to do runtime pointer access checks.
With opaque pointers, we can access a pointer with more than one type,
so now we keep track of all the types we're accessing a pointer's
memory with.
Also some other minor getPointerElementType() removals.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D119047
Adds new optimization remarks when vectorization fails.
More specifically, new remarks are added for following 4 cases:
- Backward dependency
- Backward dependency that prevents Store-to-load forwarding
- Forward dependency that prevents Store-to-load forwarding
- Unknown dependency
It is important to note that only one of the sources
of failures (to vectorize) is reported by the remarks.
This source of failure may not be first in program order.
A regression test has been added to test the following cases:
a) Loop can be vectorized: No optimization remark is emitted
b) Loop can not be vectorized: In this case an optimization
remark will be emitted for one source of failure.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D108371
0a00d64 turned an early exit here into an assertion, but the assertion
can be triggered, as PR52920 shows.
The later code is agnostic to the accessed type, so just drop the
assert. The patch also adds tests for LAA directly and
loop-load-elimination to show the behavior is sane.
In the isDependence function the code does not try hard enough
to determine the dependence between types. If the types are
different it simply gives up, whereas in fact what we really
care about are the type sizes. I've changed the code to compare
sizes instead of types.
Reviewed By: fhahn, sdesmalen
Differential Revision: https://reviews.llvm.org/D108763