I tried to split into individual tiny commits to ease review but
uploading them as separate PRs would definitely be an overkill.
---------
Co-authored-by: Ramkumar Ramachandra <r@artagnon.com>
Co-authored-by: Florian Hahn <flo@fhahn.com>
Add an overload of isDereferenceableAndAlignedInLoop that directly takes
the pointer and element sizes as SCEVs. This allows using it from
contexts without relying on an underlying load instruction in follow-up
patches.
Update groupChecks to always use DepCands to try and merge runtime
checks. DepCands contains the dependency partition, grouping together
all accessed pointers to he same underlying objects.
If we computed the dependencies, We only need to check accesses to the
same underlying object, if there is an unknown dependency for this
underlying object; otherwise we already proved that all accesses withing
the underlying object are safe w.r.t. vectorization and we only need to
check that accesses to the underlying object don't overlap with accesses
to other underlying objects.
To ensure runtime checks are generated for the case with unknown
dependencies, remove equivalence classes containing accesses involved in
unknown dependencies.
This reduces the number of runtime checks needed in case non-constant
dependence distances are found, and is in preparation for removing the
restriction that the accesses need to have the same stride which was
added in https://github.com/llvm/llvm-project/pull/88039.
PR: https://github.com/llvm/llvm-project/pull/91196
When a memory-reading or memory-writing instruction is not a
LoadInst/StoreInst, the dyn_cast to Ld/St returns nullptr, which is then
passed to recordAnalysis. This causes the optimization remark to fall
back to the loop header location instead of pointing at the actual
problematic instruction.
Pass &I (the actual Instruction) instead.
The function returns nullptr when the multiplication WOULD overflow,
matching the semantics of its sibling addSCEVNoOverflow. The old name
reads as if the function multiplies with overflow, which is the opposite
of what it does.
This patch fixes an issue found during LoopAccessAnalysis with respect
to recognizing strided pointers that make use of runtime constants. Loop
accesses of the form `p[base + offset * const]` , where `const` is a
runtime constant
should be considered for vectorization. However, it was found that there
were cases that these access patterns weren't recognized. This patch
resolves
this by adding an explicit pattern match within LAA.
---------
Co-authored-by: Florian Hahn <flo@fhahn.com>
The checks created by LAA only compute a pointer difference and do not
need to capture provenance. Use SCEVPtrToAddr instead of SCEVPtrToInt
for computations.
To avoid regressions while parts of SCEV are migrated to use PtrToAddr
this adds logic to rewrite all PtrToInt to PtrToAddr if possible in the
created expressions. This is needed to avoid regressions.
Similarly, if in the original IR we have a PtrToInt, SCEVExpander tries
to re-use it if possible when expanding PtrToAddr.
Depends on https://github.com/llvm/llvm-project/pull/178727.
Fixes https://github.com/llvm/llvm-project/issues/156978.
PR: https://github.com/llvm/llvm-project/pull/178861
Update isNoWrap to only use the inbounds/nusw flags from GEPs that are
guaranteed to be dereferenced on every iteration. This fixes a case
where we incorrectly determine no dependence.
I think the issue is isolated to code that evaluates the resulting
AddRec at BTC, just using it to compute the distance between accesses
should still be fine; if the access does not execute in a given
iteration, there's no dependence in that iteration. But isolating the
code is not straight-forward, so be conservative for now. The practical
impact should be very minor (only one loop changed across a corpus with
27k modules from large C/C++ workloads.
Fixes https://github.com/llvm/llvm-project/issues/160912.
PR: https://github.com/llvm/llvm-project/pull/161445
The map returned by collectStridedAccess is used to replace strides with
their versioned values. This does not work for Undef/Poison, which don't
have use-lists. Don't try to version them, as versioning won't be useful in
practice.
Fixes https://github.com/llvm/llvm-project/issues/162922.
When using information from dereferenceable assumptions, we need to make
sure that the memory is not freed between the assume and the specified
context instruction. Instead of just checking canBeFreed, check if there
any calls that may free between the assume and the context instruction.
This patch introduces a willNotFreeBetween to check for calls that may
free between an assume and a context instructions, to also be used in
https://github.com/llvm/llvm-project/pull/161255.
PR: https://github.com/llvm/llvm-project/pull/161725
The arguments aren't modified, mark them as const. This prepares for new
users in a follow-up, which only have access to const versions of the
arguments.
This reverts commits 56a1cbb ([LAA] Fix non-NFC parts of 1aded51),
1aded51 ([LAA] Prepare to handle diff type sizes (NFC)). The original
NFC patch caused some regressions, which the later patch tried to fix.
However, the later patch is the cause of some crashes, and it would be
best to revert both for now, and re-land after thorough testing.
1aded51 ([LAA] Prepare to handle diff type sizes (NFC)) was supposed to
be a non-functional patch, but introduced functional changes as
known-non-negative and known-non-positive is not equivalent to
!known-non-zero. Fix this.
As depend_diff_types shows, there are several places where the
HasSameSize check can be relaxed for higher analysis precision. As a
first step, return both the source size and the sink size from
getDependenceDistanceStrideAndSize, along with a HasSameSize boolean for
the moment.
Update evaluatePtrAddrecAtMaxBTCWillNotWrap to support non-constant
sizes in dereferenceable assumptions.
Apply loop-guards in a few places needed to reason about expressions
involving trip counts of the from (BTC - 1).
PR: https://github.com/llvm/llvm-project/pull/156758
This reverts commit f0df1e3dd4ec064821f673ced7d83e5a2cf6afa1.
Recommit with extra check for SCEVCouldNotCompute. Test has been added in
b16930204b.
Original message:
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.
The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.
Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 914374624f
(https://github.com/llvm/llvm-project/pull/155300).
PR: https://github.com/llvm/llvm-project/pull/155672
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.
The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.
Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 914374624f
(https://github.com/llvm/llvm-project/pull/155300).
PR: https://github.com/llvm/llvm-project/pull/155672
Add support for identifying multiplication overflow in SCEV.
This is needed in LoopAccessAnalysis and that limitation was worked
around by 484417a.
This allows early-exit vectorization to work as expected in
vect.stats.ll test without needing the workaround.
Emit safety guards for ptr accesses when cross partition loads exist
which have a corresponding store to the same address in a different
partition. This will emit the necessary ptr checks for these accesses.
The test case was obtained from SuperTest, which SiFive runs regularly.
We enabled LoopDistribution by default in our downstream compiler, this
change was part of that enablement.
This reverts commit d43a80936d437d217d5a6dbbaa5fb131c27e7085.
With the correctness issue blocking the recommit finally fixed
(5d01697ec6cb), again unconditionally check if accesses are completely
before or after each other.
Factor out code to check if access are completely before/after each
other. This reduces the diff for an upcoming re-commit and moving to a
function also helps to reduce the nesting level via early exits.
Strip ShouldRetyWithRuntimeCheck from the
DepedenceDistanceStrideAndSizeInfo struct, and free isDependent from the
responsibility of setting the condition for when runtime-checks are
needed, transferring this responsibility to
getDependenceDistanceStrideAndSize.
We can have multiple DepType::Unknown dependences that, by themselves,
do not trigger the retrying with runtime memory checks, and therefore
block vectorization. But once a single
FoundNonConstantDistanceDependence is found, the analysis seems to
switch to the "LAA: Retrying with memory checks" path and allows all
these dependences to be handled via runtime checks. There is hence no
rationale for predicating FoundNonConstantDependenceDistance on
DepType::Unknown, and removing this predication is one of the
side-effects of this patch.
Evaluating AR at the symbolic max BTC may wrap and create an expression
that is less than the start of the AddRec due to wrapping (for example
consider MaxBTC = -2).
If that's the case, set ScEnd to -(EltSize + 1). ScEnd will get
incremented by EltSize before returning, so this effectively sets ScEnd
to unsigned max. Note that LAA separately checks that accesses cannot
not wrap (52ded672492,
https://github.com/llvm/llvm-project/pull/127543), so unsigned max
represents an upper bound.
When there is a computable backedge-taken count, we are guaranteed to
execute the number of iterations, and if any pointer would wrap it would
be UB (or the access will never be executed, so cannot alias). It
includes new tests from the previous discussion that show a case we wrap
with a BTC, but it is UB due to the pointer after the object wrapping
(in `evaluate-at-backedge-taken-count-wrapping.ll`)
When we have only a maximum backedge taken count, we instead try to use
dereferenceability information to determine if the pointer access must be in
bounds for the maximum backedge taken count.
PR: https://github.com/llvm/llvm-project/pull/128061
Currently if there's any memory access that AccessAnalysis couldn't
analyze then all of the runtime pointer check results are discarded.
This patch makes this able to be controlled with the AllowPartial
option, which makes it so we generate the runtime check information
for those pointers that we could analyze, as transformations may still
be able to make use of the partial information.
Of the transformations that use LoopAccessAnalysis, only
LoopVersioningLICM changes behaviour as a result of this change. This is
because the others either:
* Check canVectorizeMemory, which will return false when we have partial
pointer information as analyzeLoop() will return false.
* Examine the dependencies returned by getDepChecker(), which will be
empty as we exit analyzeLoop if we have partial pointer information
before calling areDepsSafe(), which is what fills in the dependency
information.
CodeRegion's were previously passed as Value*, but then immediately
upcast to BasicBlock. Let's keep the type information around until the
use cases for non-BasicBlock code regions actually materialize.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
UpdateTestChecks has a make_analyzer_generalizer to replace pointer
addressess from the debug output of LAA with a pattern, which is an
acceptable solution when there is one RUN line. However, when there are
multiple RUN lines with a common pattern, UTC fails to recognize common
output due to mismatched pointer addresses. Instead of hacking UTC scrub
the output before comparing the outputs from the different RUN lines,
fix the issue once and for all by making LAA not output unstable pointer
addresses in the first place.
The removal of the now-dead make_analyzer_generalizer is left as a
non-trivial exercise for a follow-up.