LoopAccessAnalysis is missing coverage of the special-case of invariant
stores with unit stride. It was previously determined that
stride-versioning for stores is not profitable, but test coverage is
missing. Fix this.
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that it
was not able to find a valid pointer stride. In analyzeLoop, we call
getPtrStride with a value_or(0) conflating the zero return value with
std::nullopt. Fix this, handling loop-invariant loads correctly.
When inverting source and sink on a negative induction step, the types
of the source and sink should also be swapped. This fixes a bug in the
code that follows, that computes properties based on these types. With
234cc40 ([LAA] Limit no-overlap check to at least one loop-invariant
accesses.), that code is guarded by a loop-invariant condition: however,
the commit did not add any new tests exercising the guarded code, and
hence the bugfix in this patch requires additional tests to exercise
that guarded codepath.
A SCEVWrapPredicate A implies B, if
* they have the same flag,
* both steps are positive and
* B's start and step are ULE/SLE (for NSUW/NSSW) than A's.
See https://alive2.llvm.org/ce/z/n2T4ss (first pair with known constants
as strides, second pair with variable strides).
Note that this is limited to steps of the same size, due to NSUW having
slightly different semantics than regular NUW. We should be able to
remove this restriction for NSSW (which matches NSW) in the future.
PR: https://github.com/llvm/llvm-project/pull/118184
In 4de3184f07fd8c548125d315dd306d4afa7c9698 we exposed BasicAA's
cross-iteration mode for use in LAA, so we can handle selects with equal
conditions correctly (where the select condition is not actually equal
across iterations).
However, if we replace the selects with equivalent phis, the issue still
exists. In the phi case, we effectively still have an assumption that
the condition(s) that control which phi arg is used will be the same
across iterations. Fix this by disabling this phi handling in
cross-iteration mode.
(I'm not entirely sure whether this is also needed when BasicAA enables
cross-iteration mode during internal phi recursion, but I wouldn't be
surprised if that's the case.)
If we have a pointer AddRec, the maximum increment is
2^(pointer-index-wdith - 1) - 1. This means that if incrementing the
AddRec wraps, the distance between the previously accessed location and
the wrapped location is > 2^(pointer-index-wdith - 1), i.e. if the GEP
for the AddRec is inbounds, this would be poison due to the object being
larger than half the pointer index type space. The poison would be
immediate UB when the memory access gets executed..
Similar reasoning can be applied for decrements.
PR: https://github.com/llvm/llvm-project/pull/113126
With the introduction of the nusw flag in GEPNoWrapFlags, it should be
safe to weaken the check in LoopAccessAnalysis to just check the nusw
flag on the GEP, instead of inbounds.
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded into getPtrStride, which has the correct handling for the
Assume flag. Also note that the Stride == 1 check in isNoWrap is
incorrect: getPtrStride returns Strides == 1 or -1, except when
isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we
can include the case of -1 Stride, and when isNoWrapAddRec is true. With
this change, passing Assume = true to getPtrStride could return a
non-unit stride, and we correctly handle that case as well.
After pr96656.ll were added to LAA and LoopVersioning, it was decided
that the bug is in a caller of LoopVersioning, not in LAA or
LoopVersioning itself. The new candidate was LoopLoadElim, but #96656
has since been marked invalid. Hence, re-organize the added tests to
avoid confusion, and the testcase from the investigation to
LoopLoadElim.
Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.
Update getPtrStride to return 0 for invariant pointers.
Then proceed by checking the strides.
If either source or sink are not strided by a constant (i.e. not a
non-wrapping AddRec) or invariant, the accesses may overlap
with earlier or later iterations and we cannot generate runtime
checks to disambiguate them.
Otherwise they are either loop invariant or strided. In that case, we
can generate a runtime check to disambiguate them.
If both are strided by constants, we proceed as previously.
This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces
additional checks if the underlying object is loop-invariant.
Fixes https://github.com/llvm/llvm-project/issues/87189.
PR: https://github.com/llvm/llvm-project/pull/99577
Updated @B_indices_loaded_in_loop_A_stored to use a different offset
for one of the accesses we create runtime checks for; the original
version had a runtime check that was always true as the accesses always
overlapped.
The same pointer may be accessed with different types and the bound
includes the size of the accessed type to compute the end. Update the
cache to correctly disambiguate between different accessed types.
Add tests with accesses to the same pointer with different types. At the
moment, runtime checks for those accesses are incorrectly based on the
smaller type.
Add tests which are not safe to vectorize because %indices are loaded in
the loop and the same indices could be loaded in later iterations.
Tests for https://github.com/llvm/llvm-project/issues/87189.
This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.
The issue is in LoopAccessAnalysis, but the regression was seen in the
user LoopVersioning. Hence, add pre-commit tests for both, in
preparation to fix the issue in LoopAccessAnalysis.
Introduce a Loop::getLocStr stolen from LoopVectorize's static function
getDebugLocString in order to have uniform debug output headers across
LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation
for this change is to have UpdateTestChecks recognize the headers and
automatically generate CHECK lines for debug output, with minimal
special-casing.
733b8b2 ([LAA] Simplify identification of speculatable strides [nfc])
refactored getStrideFromPointer() to compute directly on SCEVs, and
return an SCEV expression instead of a Value. However, it left behind a
call to getUniqueCastUse(), which is completely unnecessary. Remove
this, showing a positive test update, and simplify the surrounding
program logic.
Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns
the minimum of the countable exits.
When analyzing dependences and computing runtime checks, we need the
smallest upper bound on the number of iterations. In terms of memory
safety, it shouldn't matter if any uncomputable exits leave the loop,
as long as we prove that there are no dependences given the minimum of
the countable exits. The same should apply also for generating runtime
checks.
Note that this shifts the responsiblity of checking whether all exit
counts are computable or handling early-exits to the users of LAA.
Depends on https://github.com/llvm/llvm-project/pull/93498
PR: https://github.com/llvm/llvm-project/pull/93499
The test select-dependence.ll can be eliminated completely by dce, as it
returns a constant, and doesn't write any arguments. Lift out the local
allocas into arguments, so that it is less nonsensical. While at it,
rename the variables for greater readability, and regenerate the test
with UpdateTestChecks.
The test symbolic-stride.ll does not exercise all codepaths in
getStrideFromPointer, particularly when the operand is an
SCEVIntegralCastExpr. Cover these codepaths as well. This patch serves
as pre-commit tests for #92119.
Use getStartAndEndForAccess to compute the start and end of both src
and sink (factored out to helper in bce3680f45b57f). If they do not
overlap (i.e. SrcEnd <= SinkStart || SinkEnd <= SrcStart), there is no
dependence, regardless of stride.
PR: https://github.com/llvm/llvm-project/pull/92307
Applying the loop guards to the distance may prevent
isSafeDependenceDistance from determining NoDep, unless loop guards are
also applied to the backedge-taken-count.
Instead of applying the guards to both Dist and the
backedge-taken-count, just apply them after handling
isSafeDependenceDistance and constant distances; there is no benefit to
applying the guards before then.
This fixes a regression flagged by @bjope due to
ecae3ed958481cba7d60868cf3504292f7f4fdf5.
Following up to 933f49248, also update the code reasoning about
backwards dependences to support non-constant distances.
Update the code to use the signed minimum distance instead of a constant
distance
This means e checked the lower bound of the dependence distance and the
distance may be larger at runtime (and safe for vectorization). Whether
to classify it as Unknown or Backwards depends on the vector width and
LAA was updated to take TTI to get the maximum vector register width.
If the minimum dependence distance is larger than the max vector width,
we consider it as backwards-vectorizable. Otherwise we classify them as
Unknown, so we re-try with runtime checks.
PR: https://github.com/llvm/llvm-project/pull/91525