getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that it
was not able to find a valid pointer stride. In analyzeLoop, we call
getPtrStride with a value_or(0) conflating the zero return value with
std::nullopt. Fix this, handling loop-invariant loads correctly.
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
Currently when we encounter a negative step in the induction
variable isDereferenceableAndAlignedInLoop bails out because
the element size is signed greater than the step. This patch
adds support for negative steps in cases where we detect the
start address for the load is of the form base + offset. In
this case the address decrements in each iteration so we need
to calculate the access size differently. I have done this by
caling getStartAndEndForAccess from LoopAccessAnalysis.cpp.
The motivation for this patch comes from PR #88385 where a
reviewer requested reusing isDereferenceableAndAlignedInLoop,
but that PR itself does support reverse loops.
The changed test in LoopVectorize/X86/load-deref-pred.ll now
passes because previously we were calculating the total access
size incorrectly, whereas now it is 412 bytes and fits
perfectly into the alloca.
When inverting source and sink on a negative induction step, the types
of the source and sink should also be swapped. This fixes a bug in the
code that follows, that computes properties based on these types. With
234cc40 ([LAA] Limit no-overlap check to at least one loop-invariant
accesses.), that code is guarded by a loop-invariant condition: however,
the commit did not add any new tests exercising the guarded code, and
hence the bugfix in this patch requires additional tests to exercise
that guarded codepath.
Rearrange the DepDistanceAndSizeInfo struct in preparation to scale
strides. getDependenceDistanceStrideAndSize now returns the data of
CommonStride, MaxStride, and clarifies when to retry with runtime
checks, in place of (unscaled) strides.
I believe that this code doesn't care whether the offsets are known to
be inbounds a priori. For the same reason the change is not testable, as
the SCEV based fallback code will look through non-inbounds offsets
anyway. So make it clear that there is no special inbounds requirement
here.
If we have a pointer AddRec, the maximum increment is
2^(pointer-index-wdith - 1) - 1. This means that if incrementing the
AddRec wraps, the distance between the previously accessed location and
the wrapped location is > 2^(pointer-index-wdith - 1), i.e. if the GEP
for the AddRec is inbounds, this would be poison due to the object being
larger than half the pointer index type space. The poison would be
immediate UB when the memory access gets executed..
Similar reasoning can be applied for decrements.
PR: https://github.com/llvm/llvm-project/pull/113126
With the introduction of the nusw flag in GEPNoWrapFlags, it should be
safe to weaken the check in LoopAccessAnalysis to just check the nusw
flag on the GEP, instead of inbounds.
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded into getPtrStride, which has the correct handling for the
Assume flag. Also note that the Stride == 1 check in isNoWrap is
incorrect: getPtrStride returns Strides == 1 or -1, except when
isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we
can include the case of -1 Stride, and when isNoWrapAddRec is true. With
this change, passing Assume = true to getPtrStride could return a
non-unit stride, and we correctly handle that case as well.
LoopAccessAnalysis currently does not check/track aliasing from the
output pointers, but assumes vectorizing library calls with a mapping is
safe.
This can result in incorrect codegen if something like the following is
vectorized:
```
for(int i=0; i<N; i++) {
// No aliasing between input and output pointers detected.
sincos(cos_out[0], sin_out+i, cos_out+i);
}
```
Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the
original value of `cos_out[0]` not the updated value.
Use computeConstantDifference() instead of casting getMinusSCEV() to
SCEVConstant. This can be much faster in some cases, because
computeConstantDifference() computes the result without creating new
SCEV expressions.
This improves LTO/ThinLTO compile-time for lencod by more than 10%.
I've verified that computeConstantDifference() does not produce worse
results than the previous code for anything in llvm-test-suite. This
required raising the iteration cutoff to 6. I ended up increasing it to
8 just to be on the safe side (for code outside llvm-test-suite), and
because this doesn't materially affect compile-time anyway (we'll almost
always bail out earlier).
Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.
Update getPtrStride to return 0 for invariant pointers.
Then proceed by checking the strides.
If either source or sink are not strided by a constant (i.e. not a
non-wrapping AddRec) or invariant, the accesses may overlap
with earlier or later iterations and we cannot generate runtime
checks to disambiguate them.
Otherwise they are either loop invariant or strided. In that case, we
can generate a runtime check to disambiguate them.
If both are strided by constants, we proceed as previously.
This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces
additional checks if the underlying object is loop-invariant.
Fixes https://github.com/llvm/llvm-project/issues/87189.
PR: https://github.com/llvm/llvm-project/pull/99577
Similarly to Unknown, IndirectUnsafe should also be considered possibly
backward, as it may be a backwards dependency e.g. via loading
different base pointers.
This also brings isPossiblyBackward in line with
Dependence::isSafeForVectorization. At the moment this can't be tested,
as it is not possible to write a test with an AddRec that is based on a
loop varying value. But this may change in the future and may cause
mis-compiles in the future.
The same pointer may be accessed with different types and the bound
includes the size of the accessed type to compute the end. Update the
cache to correctly disambiguate between different accessed types.
This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...
`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
Introduce a Loop::getLocStr stolen from LoopVectorize's static function
getDebugLocString in order to have uniform debug output headers across
LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation
for this change is to have UpdateTestChecks recognize the headers and
automatically generate CHECK lines for debug output, with minimal
special-casing.
733b8b2 ([LAA] Simplify identification of speculatable strides [nfc])
refactored getStrideFromPointer() to compute directly on SCEVs, and
return an SCEV expression instead of a Value. However, it left behind a
call to getUniqueCastUse(), which is completely unnecessary. Remove
this, showing a positive test update, and simplify the surrounding
program logic.
Avoid wastefully setting CanVecMem in several places in analyzeLoop,
complicating the logic, to get the function to return a bool, and set
CanVecMem in the caller.
Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns
the minimum of the countable exits.
When analyzing dependences and computing runtime checks, we need the
smallest upper bound on the number of iterations. In terms of memory
safety, it shouldn't matter if any uncomputable exits leave the loop,
as long as we prove that there are no dependences given the minimum of
the countable exits. The same should apply also for generating runtime
checks.
Note that this shifts the responsiblity of checking whether all exit
counts are computable or handling early-exits to the users of LAA.
Depends on https://github.com/llvm/llvm-project/pull/93498
PR: https://github.com/llvm/llvm-project/pull/93499
Use getStartAndEndForAccess to compute the start and end of both src
and sink (factored out to helper in bce3680f45b57f). If they do not
overlap (i.e. SrcEnd <= SinkStart || SinkEnd <= SrcStart), there is no
dependence, regardless of stride.
PR: https://github.com/llvm/llvm-project/pull/92307
Applying the loop guards to the distance may prevent
isSafeDependenceDistance from determining NoDep, unless loop guards are
also applied to the backedge-taken-count.
Instead of applying the guards to both Dist and the
backedge-taken-count, just apply them after handling
isSafeDependenceDistance and constant distances; there is no benefit to
applying the guards before then.
This fixes a regression flagged by @bjope due to
ecae3ed958481cba7d60868cf3504292f7f4fdf5.
tryToCreateDiffCheck has one caller, and exits early if CanUseDiffCheck
is false. Hence, we can get/set CanUseDiffCheck in the caller to avoid
wastefully calling tryToCreateDiffCheck. This patch is an NFC
simplification of program logic.
Following up to 933f49248, also update the code reasoning about
backwards dependences to support non-constant distances.
Update the code to use the signed minimum distance instead of a constant
distance
This means e checked the lower bound of the dependence distance and the
distance may be larger at runtime (and safe for vectorization). Whether
to classify it as Unknown or Backwards depends on the vector width and
LAA was updated to take TTI to get the maximum vector register width.
If the minimum dependence distance is larger than the max vector width,
we consider it as backwards-vectorizable. Otherwise we classify them as
Unknown, so we re-try with runtime checks.
PR: https://github.com/llvm/llvm-project/pull/91525
Instead of passing LoopAccessInfo only to fetch the MemoryDepChecker,
directly pass MemoryDepChecker. This simplifies the code and also allows
new uses in places where no LAI is available.
Code checking stores to invariant addresses and reductions made an
incorrect assumption that the case of both a load & store to the same
invariant address does not need to be handled.
In some cases when vectorizing with runtime checks, there may be
dependences with a load and store to the same address, storing a
reduction value.
Update LAA to separately track if there was a store-store and a
load-store dependence with an invariant addresses.
Bail out early if there as a load-store dependence with invariant
address. If there was a store-store one, still apply the logic checking
if they all store a reduction.
As discussed in https://github.com/llvm/llvm-project/pull/88039, support
different strides with isSafeDependenceDistance by passing the maximum
of both strides.
isSafeDependenceDistance tries to prove that
|Dist| > BackedgeTakenCount * Step
holds. Chosing the maximum stride computes the maximum range accesed by
the loop for all strides.
PR: https://github.com/llvm/llvm-project/pull/90036
Extend LoopAccessAnalysis to support different strides and as a
consequence non-constant distances between dependences using SCEV to
reason about the direction of the dependence.
In multiple places, logic to rule out dependences using the stride has
been updated to only be used if StrideA == StrideB, i.e. there's a
common stride.
We now also may bail out at multiple places where we may have to set
FoundNonConstantDistanceDependence. This is done when we need to bail
out and the distance is not constant to preserve original behavior.
Fixes https://github.com/llvm/llvm-project/issues/87336
PR: https://github.com/llvm/llvm-project/pull/88039