It was `inaccessiblemem: readwrite` before, no need for the read.
No real benefit is expected but it can help debugging and other efforts.
Differential Revision: https://reviews.llvm.org/D156478
After 572cfa3fde5433, isUniform now checks VF based uniformity instead of
just invariance as before.
As follow-up cleanup suggested in D148841, separate the invariance check
out and update callers that currently check only for invariance.
This also moves the implementation of isUniform from LoopAccessAnalysis
to LoopVectorizationLegality, as LoopAccesAnalysis doesn't use the more
general isUniform.
If the step is not loop-invariant, we cannot create a modified AddRec,
as the start needs to be loop-invariant. Mark those cases as
CannotAnalyze and bail out, to fix a crash.
This patch uses SCEV to check if a value is uniform across a given VF.
The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the vector lane 0
(offset 0) compare it to the expressions for lanes 1 to the last vector
lane (VF - 1). If they are equal, consider the expression uniform.
While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D148841
The last use was removed by:
commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a
Author: Philip Reames <preames@rivosinc.com>
Date: Thu May 11 08:10:49 2023 -0700
Mostly just avoiding the need to keep both Value and SCEVs flowing through with consistent handling. We can do everything in terms of SCEV - aside from the profitability heuristics which are now isolated in one spot.
LV/LAA will speculate that (some) strided access patterns have unit stride, and insert runtime checks if required.
LV cost models a multiply by such a stride as free. We did this by keeping around the StrideSet structure, just to check if one of the operands were one of the strides we speculated.
We can instead just ask PredicatedScalarEvolution if either of the operands are one (after predicates are applied). We get mostly the same result - PSE can prove it in more cases in theory - and simpler code.
The original commit wasn't quite NFC, and this was caught by an arguably overly strong assert. Specifically, I'd failed to strip off the integer cast off the SCEV before saving it in the map. The result - other than a failed assert - is that we'd speculate on the casted unknown, not the unknown. The only case I can think of where that might change behavior would be a sext(i1 load). I doubt that case is interesting in practice, but it's good to be strictly NFC on this change regardless.
Original commit message follows..
The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.
We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.
Differential Revision: https://reviews.llvm.org/D147750
This reverts commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a. Running this through broader testing after rebasing is revealing a crash. Reverting while I investigate.
The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.
We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.
Differential Revision: https://reviews.llvm.org/D147750
This is purely so that we can expose and work through downstream codegen issues. My intention is to see if we can get this disabled by default, but that requires fixing a bunch of downstream issues first.
The previous code structure and comments were exceedingly confusing. I have multiple times looked at this code and suspected a bug. This time, I decided to take the time to reflow the code and comment out why it is correct.
The only suspect (to me) case left is that an underaligned access with a unit stride (in terms of the access type) might miss the undefined null pointer when wrapping. This is unlikely to be an issue for C/C++ code with real page sizes, so I'm not bothering to fully convince myself whether that case is correct or not.
The vectorisation analysis collects strides for loop invariant
pointers, which is wrong because they are not strided. We don't
need to generate SCEV checks (which are costly performancewise)
for such pointers, we just need to do the appropriate aliasing
checks.
This patch fixes the problem by changing getStrideFromPointer()
to treat loop invariant pointers as having no stride.
Originally proposed by David Sherwood with further suggestions
from Florian Hahn.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D146958
When debugging and using debug-pass-manager (e.g. in regression tests)
we prefer a consistent order in which analysis passes are executed.
But when for example doing
return MyClass(AM.getResult<LoopAnalysis>(F),
AM.getResult<DominatorTreeAnalysis>(F));
then the order in which LoopAnalysis and DominatorTreeAnalysis isn't
guaranteed, and might for example depend on which compiler that is
used when building LLVM.
I've not scanned the full source tree, but this fixes some occurances
of the above pattern found in lib/Analysis.
This problem was discussed briefly in review for D146206.
The default invalidate method for analysis results is just looking
at the preserved state of the pass itself. It does not consider if
the analysis has an internal state that depend on other analyses.
Thus, we need to implement LoopAccessInfoManager::invalidate in order
to catch if LoopAccessAnalysis needs to be invalidated due to
transitive analyses such as AAManager is being invalidated. Otherwise
we might end up having references to an AAManager that is stale.
Fixes https://github.com/llvm/llvm-project/issues/61324
Differential Revision: https://reviews.llvm.org/D146206
LAA analyzes cross-iteration memory dependencies, as such AA should
not make assumptions about equality of values inside the loop, as
they may come from different iterations.
Fix this by exposing the MayBeCrossIteration AA flag and enabling
it for LAA.
Differential Revision: https://reviews.llvm.org/D137958
D138014 restricted AST to work on immutable IR. This means it is
also safe to use a single BatchAA instance for the entire AST
lifetime, instead of only batching parts of individual queries.
The primary motivation for this is not compile-time, but rather
having a central place to control cross-iteration AA, which will
be used by D137958.
Differential Revision: https://reviews.llvm.org/D137955
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Simplify LoopAccessLegacyAnalysis by using LoopAccessInfoManager from
D134606. As a side-effect this also removes printing support from
LoopAccessLegacyAnalysis.
Depends on D134606.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134608
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.
To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.
Fixes#50940.
Fixes#51669.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134606
This is purely NFC restructure in advance of a change which actually exposes zero strides. This is mostly because I find this interface confusing each time I look at it.
The IR from https://github.com/llvm/llvm-project/issues/57368 results
in an assert firing when trying to create a runtime check for the
forked pointer. One of the forks is fine since it's loop invariant,
but the other is a scAddExpr (containing a scAddRecExpr, so not
invariant) when RtCheck::insert expects a scAddRecExpr.
This is a simple fix to just avoid forks which aren't AddRec or
loop invariant. We can allow it as a forked pointer later with
more work.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D133020
The simpler diff-checks require pointers with add-recs from the same
innermost loop, but this property wasn't check completely. Add the
missing check to ensure both addrecs are in the innermost loop.
Fixes#57315.
When we have a dependency with a dependence distance which can only be hit on an iteration beyond the actual trip count of the loop, we can ignore that dependency when analyzing said loop. We already had this code, but had restricted it solely to unknown dependence distances. This change applies it to all dependence distances.
Without this code, we relied on the vectorizer reducing VF such that our infeasible dependence was respected. This usually worked out to about the same result, but not always. For fixed length vectorization, this could mean a smaller VF than optimal being chosen or additional runtime checks. For scalable vectorization - where the bounds on access implied by VF are broader - we could often not find a feasible VF at all.
Differential Revision: https://reviews.llvm.org/D131924
A debug message in `LoopAccessAnalysis` did not have a newline in it, causing printed debug messages to be formatted incorrectly.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D132172
Handle cases where a forked pointer has an add or sub instruction
before reaching a select.
Reviewed By: fhahn
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D130278
As test in PR56672 shows, LAA produces different results which lead to either
positive or negative vectorization decisions depending on the order of blocks
in loop. The exact reason of this is not clear to me, however this makes investigation
of related bugs extremely complex.
Current order of blocks in the loop is arbitrary. It may change, for example, if loop
info analysis is dropped and recomputed. Seems that it interferes with LAA's logic.
This patch chooses fixed traversal order of blocks in loops, making it RPOT.
Note: this is *not* a fix for bug with incorrect analysis result. It just makes
the answer more robust to make the investigation easier.
Differential Revision: https://reviews.llvm.org/D130482
Reviewed By: aeubanks, fhahn
No need to add checks for every type per pointer that we couldn't create
a check for the first time around, just the types that weren't
successful.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D119376
Noticed via inspection; to my knowledge, impossible to hit today. In theory, we could have a fixed stride check be analyzed, then a scalable one. With the old code, the scalable one would be silently dropped, and the runtime guard would go ahead with only the fixed one. This would be a miscompile.
llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: no viable conversion from returned value of type 'SmallVector<[...], 2>' to function return type 'SmallVector<[...], (default)
CalculateSmallVectorDefaultInlinedElements<T>::value aka 3>'
return Scevs;
^~~~~
This builds on the previous forked pointers patch, which only accepted
a single select as the pointer to check. A recursive function to walk
through IR has been added, which searches for either a loop-invariant
or addrec SCEV.
This will only handle a single fork at present, so selects of selects
or a GEP with a select for both the base and offset will be rejected.
There is also a recursion limit with a cli option to change it.
Reviewed By: fhahn, david-arm
Differential Revision: https://reviews.llvm.org/D108699
This reverts commit 7aa8a678826dea86ff3e6c7df9d2a8a6ef868f5d.
This version includes fixes to address issues uncovered after
the commit landed and discussed at D11448.
Those include:
* Limit select-traversal to selects inside the loop.
* Freeze pointers resulting from looking through selects to avoid
branch-on-poison.
This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.
The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
`VF * UF * AccessSize`. It is sufficient to check
`(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.
Note that the initial version is restricted in multiple ways:
1. Pointers must only either be read or written, by a single
instruction (this allows re-constructing source/sink for
dependences with the available information)
2. Source and sink pointers must be add-recs, with matching steps
3. The step must be a constant.
3. abs(step) == AccessSize.
Most of those restrictions can be relaxed in the future.
See https://github.com/llvm/llvm-project/issues/53590.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D119078