This patch replaces:
llvm::copy(Src, std::back_inserter(Dst));
with:
llvm::append_range(Dst, Src);
for breavity.
One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
MaxVF computed in couldPreventStoreLoadFowrard may not be a power of 2,
as CommonStride may not be a power-of-2.
This can cause crashes after 78777a20. Use bit_floor to make sure it is
a suitable power-of-2.
Fixes https://github.com/llvm/llvm-project/issues/134696.
Introduce members() iterator-helper to shorten the members_{begin,end}
idiom. A previous attempt of this patch was #130319, which had to be
reverted due to unit-test failures when attempting to call members() on
the end iterator. In this patch, members() accepts either an ECValue or
an ElemTy, which is more intuitive and doesn't suffer from the same
issue.
This reverts the revert commit 616f447fc84bdc7655117f1b303d895dc3b93e4d.
It includes updates to remaining users in Polly and Clang, to avoid
failures when building those projects.
The patch splits the store-load forwarding distance analysis from other
dependency analysis in LAA. Currently it supports only power-of-2
distances, required to support non-power-of-2 distances in future.
Part of #100755
In some cases, it is possible for the same underlying object to be
accessed via pointers to different address spaces. This could lead to
pointers from different address spaces ending up in the same dependency
set, which isn't allowed (and triggers an assertion).
Update the mapping from underlying object -> last access to also include
the accessing address space.
Fixes https://github.com/llvm/llvm-project/issues/124759.
PR: https://github.com/llvm/llvm-project/pull/129087
Currently we only check if the pointers involved in runtime checks do
not wrap if we need to perform dependency checks. If that's not the
case, we generate runtime checks, even if the pointers may wrap (see
test/Analysis/LoopAccessAnalysis/runtime-checks-may-wrap.ll).
If the pointer wraps, then we swap start and end of the runtime check,
leading to incorrect checks.
An Alive2 proof of what the runtime checks are checking conceptually (on
i4 to have it complete in reasonable time) showing the incorrect result
should be https://alive2.llvm.org/ce/z/KsHzn8
Depends on https://github.com/llvm/llvm-project/pull/127410 to avoid
more regressions.
PR: https://github.com/llvm/llvm-project/pull/127543
Change getDependenceDistanceStrideAndSize to scale strides by
TypeByteSize, scaling the returned CommonStride and MaxStride. Even
though there is a seemingly-functional change of setting CommonStride
when scaled strides are equal, it ends up being a non-functional change
due to aggressive HasSameSize checking.
Update isNoWrap to make the IR Ptr argument optional. This allows using
isNoWrap when dealing with things like pointer-selects, where a select
is translated to multiple pointer SCEV expressions, but there is no IR
value that can be used. We don't try to retrieve pointer values for the
pointer SCEVs and using info from the IR would not be safe. For example,
we cannot use inbounds, because the pointer may never be accessed.
PR: https://github.com/llvm/llvm-project/pull/127410
The stripGetElementPtr function is mysteriously named, and calls into
another mysterious getGEPInductionOperand which does something
complicated with GEP indices. The real purpose of the badly-named
stripGetElementPtr function is to get a loop-variant GEP index, if there
is one. The getGEPInductionOperand is totally redundant, as stripping
off zeros from the end of GEP indices has no effect on computing the
loop-variant GEP index, as constant zeros are always loop-invariant.
Moreover, the GEP induction operand is simply the first non-zero index
from the end, which stripGetElementPtr returns when it finds that any of
the GEP indices are loop-variant: this is a completely unrelated value
to the GEP index that is loop-variant. The implicit assumption here is
that there is only ever one loop-variant index, and it is the first
non-zero one from the end.
The logic is unnecessarily complicated for what stripGetElementPtr wants
to achieve, and the header comments are confusing as well. Strip
getGEPInductionOperand, rework and rename stripGetElementPtr.
Inline hasComputableBounds into createCheckForAccess. This removes a
level of indirection and allows for passing the AddRec directly to
isNoWrap, removing the need to retrieve the AddRec for the pointer
again.
The early continue for invariant SCEVs now also applies to forked
pointers (i.e. when there's more than one entry in TranslatedPtrs) when
ShouldCheckWrap is true, as those trivially won't wrap.
The change is NFC otherwise. replaceSymbolicStrideSCEV is now called
earlier.
Move up replaceSymbolicStrideSCEV before isNoWrap. It needs to be called
after hasComputableBounds, as this may create an AddRec via PSE, which
replaceSymbolicStrideSCEV will look up.
This is in preparation for simplifying isNoWrap.
Reorganize the code in isNoWrap to perform the no-wrap checks without
relying on getPtrStride directly. getPtrStride now uses isNoWrap.
The new structure allows deriving no-wrap in more cases in LAA, because
there are some cases where getPtrStride bails out early because it
cannot return a constant stride, but we can still prove no-wrap for the
pointer.
An example are AddRecs with non-ConstantInt strides with inbound GEPs,
in the improved test cases.
This enables vectorization with runtime checks in a few more cases.
PR: https://github.com/llvm/llvm-project/pull/126971
Uses the fixed register width if scalable vectorization is not enabled
(via TargetTransformInfo::enableScalableVectorization) and improves
results if there are scalable vector registers, but they shouldn't be
used.
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that it
was not able to find a valid pointer stride. In analyzeLoop, we call
getPtrStride with a value_or(0) conflating the zero return value with
std::nullopt. Fix this, handling loop-invariant loads correctly.
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
Currently when we encounter a negative step in the induction
variable isDereferenceableAndAlignedInLoop bails out because
the element size is signed greater than the step. This patch
adds support for negative steps in cases where we detect the
start address for the load is of the form base + offset. In
this case the address decrements in each iteration so we need
to calculate the access size differently. I have done this by
caling getStartAndEndForAccess from LoopAccessAnalysis.cpp.
The motivation for this patch comes from PR #88385 where a
reviewer requested reusing isDereferenceableAndAlignedInLoop,
but that PR itself does support reverse loops.
The changed test in LoopVectorize/X86/load-deref-pred.ll now
passes because previously we were calculating the total access
size incorrectly, whereas now it is 412 bytes and fits
perfectly into the alloca.
When inverting source and sink on a negative induction step, the types
of the source and sink should also be swapped. This fixes a bug in the
code that follows, that computes properties based on these types. With
234cc40 ([LAA] Limit no-overlap check to at least one loop-invariant
accesses.), that code is guarded by a loop-invariant condition: however,
the commit did not add any new tests exercising the guarded code, and
hence the bugfix in this patch requires additional tests to exercise
that guarded codepath.
Rearrange the DepDistanceAndSizeInfo struct in preparation to scale
strides. getDependenceDistanceStrideAndSize now returns the data of
CommonStride, MaxStride, and clarifies when to retry with runtime
checks, in place of (unscaled) strides.
I believe that this code doesn't care whether the offsets are known to
be inbounds a priori. For the same reason the change is not testable, as
the SCEV based fallback code will look through non-inbounds offsets
anyway. So make it clear that there is no special inbounds requirement
here.
If we have a pointer AddRec, the maximum increment is
2^(pointer-index-wdith - 1) - 1. This means that if incrementing the
AddRec wraps, the distance between the previously accessed location and
the wrapped location is > 2^(pointer-index-wdith - 1), i.e. if the GEP
for the AddRec is inbounds, this would be poison due to the object being
larger than half the pointer index type space. The poison would be
immediate UB when the memory access gets executed..
Similar reasoning can be applied for decrements.
PR: https://github.com/llvm/llvm-project/pull/113126
With the introduction of the nusw flag in GEPNoWrapFlags, it should be
safe to weaken the check in LoopAccessAnalysis to just check the nusw
flag on the GEP, instead of inbounds.
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded into getPtrStride, which has the correct handling for the
Assume flag. Also note that the Stride == 1 check in isNoWrap is
incorrect: getPtrStride returns Strides == 1 or -1, except when
isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we
can include the case of -1 Stride, and when isNoWrapAddRec is true. With
this change, passing Assume = true to getPtrStride could return a
non-unit stride, and we correctly handle that case as well.
LoopAccessAnalysis currently does not check/track aliasing from the
output pointers, but assumes vectorizing library calls with a mapping is
safe.
This can result in incorrect codegen if something like the following is
vectorized:
```
for(int i=0; i<N; i++) {
// No aliasing between input and output pointers detected.
sincos(cos_out[0], sin_out+i, cos_out+i);
}
```
Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the
original value of `cos_out[0]` not the updated value.