507 Commits

Author SHA1 Message Date
Florian Hahn
af9a4263a1
[LAA] Only use inbounds/nusw in isNoWrap if the GEP is dereferenced. (#161445)
Update isNoWrap to only use the inbounds/nusw flags from GEPs that are
guaranteed to be dereferenced on every iteration. This fixes a case
where we incorrectly determine no dependence.

I think the issue is isolated to code that evaluates the resulting
AddRec at BTC, just using it to compute the distance between accesses
should still be fine; if the access does not execute in a given
iteration, there's no dependence in that iteration. But isolating the
code is not straight-forward, so be conservative for now. The practical
impact should be very minor (only one loop changed across a corpus with
27k modules from large C/C++ workloads.

Fixes https://github.com/llvm/llvm-project/issues/160912.

PR: https://github.com/llvm/llvm-project/pull/161445
2025-11-04 17:08:12 +00:00
Florian Hahn
0e28c9bc9d
[LAA] Skip undef/poison strides in collectStridedAccess.
The map returned by collectStridedAccess is used to replace strides with
their versioned values. This does not work for Undef/Poison, which don't
have use-lists. Don't try to version them, as versioning won't be useful in
practice.

Fixes https://github.com/llvm/llvm-project/issues/162922.
2025-10-27 05:01:17 +00:00
Florian Hahn
7ceef762c8
[LAA] Check if Ptr can be freed between Assume and CtxI. (#161725)
When using information from dereferenceable assumptions, we need to make
sure that the memory is not freed between the assume and the specified
context instruction. Instead of just checking canBeFreed, check if there
any calls that may free between the assume and the context instruction.

This patch introduces a willNotFreeBetween to check for calls that may
free between an assume and a context instructions, to also be used in
https://github.com/llvm/llvm-project/pull/161255.

PR: https://github.com/llvm/llvm-project/pull/161725
2025-10-03 13:44:58 +00:00
Florian Hahn
9d42c75256
[LAA] Fix picking context instr in evaluatePtrAddRec for multiple preds.
A loop may have more than one predecessor out of the loop. In that case,
just pick the first non-phi instruction in the loop header.
2025-09-30 20:04:29 +01:00
Florian Hahn
0898348abd
[LAA] Make blockNeedsPredication arguments const (NFC).
The arguments aren't modified, mark them as const. This prepares for new
users in a follow-up, which only have access to const versions of the
arguments.
2025-09-30 17:05:04 +01:00
Ramkumar Ramachandra
08c1e9e80a
[LAA] Revert 56a1cbb and 1aded51, due to crash (#160993)
This reverts commits 56a1cbb ([LAA] Fix non-NFC parts of 1aded51),
1aded51 ([LAA] Prepare to handle diff type sizes (NFC)). The original
NFC patch caused some regressions, which the later patch tried to fix.
However, the later patch is the cause of some crashes, and it would be
best to revert both for now, and re-land after thorough testing.
2025-09-27 10:42:20 +01:00
Ramkumar Ramachandra
56a1cbbd1c
[LAA] Fix non-NFC parts of 1aded51 (#160701)
1aded51 ([LAA] Prepare to handle diff type sizes (NFC)) was supposed to
be a non-functional patch, but introduced functional changes as
known-non-negative and known-non-positive is not equivalent to
!known-non-zero. Fix this.
2025-09-25 15:52:02 +01:00
Ramkumar Ramachandra
1aded51d74
[LAA] Prepare to handle diff type sizes (NFC) (#122318)
As depend_diff_types shows, there are several places where the
HasSameSize check can be relaxed for higher analysis precision. As a
first step, return both the source size and the sink size from
getDependenceDistanceStrideAndSize, along with a HasSameSize boolean for
the moment.
2025-09-18 09:30:20 +01:00
Ramkumar Ramachandra
b7e31e7462
[LAA] Strip findForkedPointer (NFC) (#140298)
Remove a level of indirection due to findForkedPointer, in an effort to
improve code.
2025-09-10 14:06:15 +01:00
Florian Hahn
b400fd1151
[LAA] Support assumptions with non-constant deref sizes. (#156758)
Update evaluatePtrAddrecAtMaxBTCWillNotWrap to support non-constant
sizes in dereferenceable assumptions.

Apply loop-guards in a few places needed to reason about expressions
involving trip counts of the from (BTC - 1).

PR: https://github.com/llvm/llvm-project/pull/156758
2025-09-04 11:32:33 +01:00
Florian Hahn
a434a7a4f1
Reapply "[LAA,Loads] Use loop guards and max BTC if needed when checking deref. (#155672)"
This reverts commit f0df1e3dd4ec064821f673ced7d83e5a2cf6afa1.

Recommit with extra check for SCEVCouldNotCompute. Test has been added in
b16930204b.

Original message:
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.

The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.

Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 914374624f
(https://github.com/llvm/llvm-project/pull/155300).

PR: https://github.com/llvm/llvm-project/pull/155672
2025-09-03 12:45:28 +01:00
Florian Hahn
f0df1e3dd4
Revert "[LAA,Loads] Use loop guards and max BTC if needed when checking deref. (#155672)"
This reverts commit 08001cf340185877665ee381513bf22a0fca3533.

This triggers an assertion in some build configs, e.g.
 https://lab.llvm.org/buildbot/#/builders/24/builds/12211
2025-09-02 21:44:30 +01:00
Florian Hahn
08001cf340
[LAA,Loads] Use loop guards and max BTC if needed when checking deref. (#155672)
Remove the fall-back to constant max BTC if the backedge-taken-count
cannot be computed.

The constant max backedge-taken count is computed considering loop
guards, so to avoid regressions we need to apply loop guards as needed.

Also remove the special handling for Mul in willNotOverflow, as this
should not longer be needed after 914374624f
(https://github.com/llvm/llvm-project/pull/155300).

PR: https://github.com/llvm/llvm-project/pull/155672
2025-09-02 18:58:33 +01:00
annamthomas
00926a6db6
[SCEV][LAA] Support multiplication overflow computation (#155236)
Add support for identifying multiplication overflow in SCEV.
This is needed in LoopAccessAnalysis and that limitation was worked
around by 484417a.
This allows early-exit vectorization to work as expected in
vect.stats.ll test without needing the workaround.
2025-08-27 12:11:32 +00:00
Benjamin Maxwell
bb3066d42b
[LAA] Move scalable vector check into getStrideFromAddRec() (#154013)
This moves the check closer to the `.getFixedValue()` call and fixes
#153797 (which is a regression from #126971).
2025-08-19 06:40:07 +01:00
Michael Berg
334a046a3c
[LoopDist] Consider reads and writes together for runtime checks (#145623)
Emit safety guards for ptr accesses when cross partition loads exist
which have a corresponding store to the same address in a different
partition. This will emit the necessary ptr checks for these accesses.

The test case was obtained from SuperTest, which SiFive runs regularly.
We enabled LoopDistribution by default in our downstream compiler, this
change was part of that enablement.
2025-08-14 12:50:17 -07:00
Florian Hahn
2ae996cbbe
[LAA] Support assumptions in evaluatePtrAddRecAtMaxBTCWillNotWrap (#147047)
This patch extends the logic added in
https://github.com/llvm/llvm-project/pull/128061 to support
dereferenceability information from assumptions as well.

Unfortunately both assumption cache and the dominator tree need to be
threaded through multiple layers to make them available where needed.

PR: https://github.com/llvm/llvm-project/pull/147047
2025-08-01 14:18:07 +01:00
Ramkumar Ramachandra
b692b239f0
[LAA] Rename var used to retry with RT-checks (NFC) (#147307)
FoundNonConstantDistanceDependence is a misleading name for a variable
that determines whether we retry with runtime checks. Rename it.
2025-07-22 13:36:33 +01:00
Ramkumar Ramachandra
584158f9ae
[LAA] Hoist check for SCEV-uncomputable dist (NFC) (#148841)
Hoist the check for SCEVCouldNotCompute distance into
getDependenceDistanceAndSize.
2025-07-16 15:30:53 +01:00
Florian Hahn
5a4586f468
Reapply "[LAA] Remove loop-invariant check added in 234cc40adc61."
This reverts commit d43a80936d437d217d5a6dbbaa5fb131c27e7085.

With the correctness issue blocking the recommit finally fixed
(5d01697ec6cb), again unconditionally check if accesses are completely
before or after each other.
2025-07-14 21:21:22 +01:00
Florian Hahn
9693056aac
[LAA] Move code to check if access are completely before/after (NFC).
Factor out code to check if access are completely before/after each
other. This reduces the diff for an upcoming re-commit and moving to a
function also helps to reduce the nesting level via early exits.
2025-07-11 19:53:57 +01:00
Ramkumar Ramachandra
20864c4379
[LAA] Strip outdated comment in isDependent (NFC) (#146367)
The comment has been outdated since 87ddd3a1 ([LAA] Rename and fix
semantics of MaxSafeDepDistBytes to MinDepDistBytes).
2025-07-07 13:54:37 +01:00
Ramkumar Ramachandra
fb845f93c0
[LAA] Hoist setting condition for RT-checks (#128045)
Strip ShouldRetyWithRuntimeCheck from the
DepedenceDistanceStrideAndSizeInfo struct, and free isDependent from the
responsibility of setting the condition for when runtime-checks are
needed, transferring this responsibility to
getDependenceDistanceStrideAndSize.

We can have multiple DepType::Unknown dependences that, by themselves,
do not trigger the retrying with runtime memory checks, and therefore
block vectorization. But once a single
FoundNonConstantDistanceDependence is found, the analysis seems to
switch to the "LAA: Retrying with memory checks" path and allows all
these dependences to be handled via runtime checks. There is hence no
rationale for predicating FoundNonConstantDependenceDistance on
DepType::Unknown, and removing this predication is one of the
side-effects of this patch.
2025-07-07 12:02:41 +01:00
Ramkumar Ramachandra
619f7afd71
[LAA] Clean up APInt-overflow related code (#140048)
Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-06-30 14:48:56 +01:00
Florian Hahn
b8769104f1
[LAA] Address follow-up suggestions for #128061.
Adjust naming and add argument comments as suggested.
2025-06-24 12:00:17 +01:00
Florian Hahn
5d01697ec6
[LAA] Be more careful when evaluating AddRecs at symbolic max BTC. (#128061)
Evaluating AR at the symbolic max BTC may wrap and create an expression
that is less than the start of the AddRec due to wrapping (for example
consider MaxBTC = -2).

If that's the case, set ScEnd to -(EltSize + 1). ScEnd will get
incremented by EltSize before returning, so this effectively sets ScEnd
to unsigned max. Note that LAA separately checks that accesses cannot
not wrap (52ded672492,
https://github.com/llvm/llvm-project/pull/127543), so unsigned max
represents an upper bound.

When there is a computable backedge-taken count, we are guaranteed to
execute the number of iterations, and if any pointer would wrap it would
be UB (or the access will never be executed, so cannot alias). It
includes new tests from the previous discussion that show a case we wrap
with a BTC, but it is UB due to the pointer after the object wrapping
(in `evaluate-at-backedge-taken-count-wrapping.ll`)

When we have only a maximum backedge taken count, we instead try to use
dereferenceability information to determine if the pointer access must be in
bounds for the maximum backedge taken count.

PR: https://github.com/llvm/llvm-project/pull/128061
2025-06-23 20:23:40 +01:00
Ramkumar Ramachandra
c8c4bd1ebc
[LV] Stengthen loop-invariance checks in isPredicatedInst (#140744)
Check loop-invariance against SCEV as well.
2025-06-20 14:01:48 +01:00
Kazu Hirata
03f616eb3a
[llvm] Compare std::optional<T> to values directly (NFC) (#143340)
This patch transforms:

  X && *X == Y

to:

  X == Y

where X is of std::optional<T>, and Y is of T or similar.
2025-06-08 22:37:59 -07:00
John Brawn
81d3189891
[LAA] Keep pointer checks on partial analysis (#139719)
Currently if there's any memory access that AccessAnalysis couldn't
analyze then all of the runtime pointer check results are discarded.
This patch makes this able to be controlled with the AllowPartial
option, which makes it so we generate the runtime check information
for those pointers that we could analyze, as transformations may still
be able to make use of the partial information.

Of the transformations that use LoopAccessAnalysis, only
LoopVersioningLICM changes behaviour as a result of this change. This is
because the others either:
* Check canVectorizeMemory, which will return false when we have partial
pointer information as analyzeLoop() will return false.
* Examine the dependencies returned by getDepChecker(), which will be
empty as we exit analyzeLoop if we have partial pointer information
before calling areDepsSafe(), which is what fills in the dependency
information.
2025-06-04 16:47:20 +01:00
Ramkumar Ramachandra
ba57ff66a3
[LAA] Improve code in findForkedSCEVs (NFC) (#140384) 2025-06-03 11:00:37 +01:00
Jon Roelofs
798058fca5
[Remarks] Remove an upcast footgun. NFC (#142191)
CodeRegion's were previously passed as Value*, but then immediately
upcast to BasicBlock. Let's keep the type information around until the
use cases for non-BasicBlock code regions actually materialize.
2025-05-31 11:07:54 -07:00
Kazu Hirata
89308de4b0
[llvm] Value-initialize values with *Map::try_emplace (NFC) (#141522)
try_emplace value-initializes values, so we do not need to pass
nullptr to try_emplace when the value types are raw pointers or
std::unique_ptr<T>.
2025-05-26 15:13:02 -07:00
Florian Hahn
c554fc9245
[LAA] Use m_scev_AffineAddRec in LAA (NFC). 2025-05-26 19:58:22 +01:00
Kazu Hirata
0918361d8b
[Analysis] Remove unused includes (NFC) (#141319)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-23 23:59:56 -07:00
Ramkumar Ramachandra
5a1311d516
[LAA] Strip isNoWrapGEP: dead code (NFC) (#140308)
isNoWrap is the only caller of isNoWrapGEP, and it has subsuming check
on the GEP immediately after.
2025-05-22 22:47:17 +01:00
Florian Hahn
4a6b1fb9da
[LAA] Remove dead SE arg from canCheckPtrAtRT (NFC). 2025-05-22 20:05:35 +01:00
Ramkumar Ramachandra
bb2791609d
[LAA] Tweak debug output for UTC stability (#140764)
UpdateTestChecks has a make_analyzer_generalizer to replace pointer
addressess from the debug output of LAA with a pattern, which is an
acceptable solution when there is one RUN line. However, when there are
multiple RUN lines with a common pattern, UTC fails to recognize common
output due to mismatched pointer addresses. Instead of hacking UTC scrub
the output before comparing the outputs from the different RUN lines,
fix the issue once and for all by making LAA not output unstable pointer
addresses in the first place.

The removal of the now-dead make_analyzer_generalizer is left as a
non-trivial exercise for a follow-up.
2025-05-21 12:01:49 +01:00
Florian Hahn
35ee462fef
[LAA] Add assert check CanDoRTIFNeeded can be computed w/o RT.Need (NFC)
Add assert to ensure that CanDoRTIfNeeded can be computed w/o
RtCheck.Need, to prepare for adjusting the condition.
2025-05-18 22:12:28 +01:00
Ramkumar Ramachandra
c807395011
[LAA/SLP] Don't truncate APInt in getPointersDiff (#139941)
Change getPointersDiff to return an std::optional<int64_t>, and fill
this value with using APInt::trySExtValue. This simple change requires
changes to other functions in LAA, and major changes in SLPVectorizer
changing types from 32-bit to 64-bit.

Fixes #139202.
2025-05-15 10:08:05 +01:00
Igor Kirillov
a3fb54c1ae
[LAA][NFC] Unify naming of DepCandidates to DepCands (#139534)
The MemoryDepChecker::DepCandidates instance in each LoopAccessInfo had multiple names (AccessSets, DepCands, DependentAccesses), which was confusing. This patch renames all references to DepCands for consistency.
2025-05-13 08:52:46 +01:00
Ramkumar Ramachandra
c1e678b134
[LAA] Improve code in replaceSymbolicStrideSCEV (NFC) (#139532)
Prefer DenseMap::lookup over DenseMap::find.
2025-05-12 14:18:26 +01:00
Ramkumar Ramachandra
68dccb9fa0
[LAA] Strip dead code in getStrideFromPointer (NFC) (#139140)
The SCEV multiply by 1 doesn't make sense, because SCEV would fold it:
therefore, the OrigPtr == Ptr branch effectively rejects a multiply.
However, in this branch, we have a pointer SCEV that cannot be a
multiply, and hence the code the code is dead. Strip it.
2025-05-09 09:20:50 +01:00
Ramkumar Ramachandra
458991197d
[SCEVPatternMatch] Extend with more matchers (#138836) 2025-05-09 09:20:14 +01:00
vaibhav
384a5b00a7
[LAA] Use MaxStride instead of CommonStride to calculate MaxVF (#98142)
We bail out from MaxVF calculation if the strides are not the same.
Instead, we are dependent on runtime checks, though not yet implemented.
We could instead use the MaxStride to conservatively use an upper bound.

This handles cases like the following:
```c
#define LEN 256 * 256
float a[LEN];

void gather() {
  for (int i = 0; i < LEN - 1024 - 255; i++) {
  #pragma clang loop interleave(disable)
  #pragma clang loop unroll(disable)
    for (int j = 0; j < 256; j++)
      a[i + j + 1024] += a[j * 4 + i];
  }
}
```

---------

Co-authored-by: Florian Hahn <flo@fhahn.com>
2025-05-07 21:02:21 +01:00
Kazu Hirata
2f3067ed69
[llvm] Remove unused local variables (NFC) (#138454) 2025-05-04 09:38:16 -07:00
Ramkumar Ramachandra
faf87e1414
[LAA] Prefer set-contains over set-count (NFC) (#136749)
Improve code by preferring {SmallSet,SmallPtrSet}::contains() over the
count() function, when used in a boolean context.
2025-04-29 13:56:04 +01:00
Kazu Hirata
47d8fec9b8
[llvm] Use llvm::append_range (NFC) (#136066)
This patch replaces:

  llvm::copy(Src, std::back_inserter(Dst));

with:

  llvm::append_range(Dst, Src);

for breavity.

One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
2025-04-16 19:30:01 -07:00
Florian Hahn
995fd47944
[LAA] Make sure MaxVF for Store-Load forward safe dep distances is pow2.
MaxVF computed in couldPreventStoreLoadFowrard may not be a power of 2,
as CommonStride may not be a power-of-2.

This can cause crashes after 78777a20. Use bit_floor to make sure it is
a suitable power-of-2.

Fixes https://github.com/llvm/llvm-project/issues/134696.
2025-04-12 20:05:37 +01:00
Ramkumar Ramachandra
fd6260f13b
[EquivClasses] Shorten members_{begin,end} idiom (#134373)
Introduce members() iterator-helper to shorten the members_{begin,end}
idiom. A previous attempt of this patch was #130319, which had to be
reverted due to unit-test failures when attempting to call members() on
the end iterator. In this patch, members() accepts either an ECValue or
an ElemTy, which is more intuitive and doesn't suffer from the same
issue.
2025-04-04 14:34:08 +01:00
Florian Hahn
32f24029c7
Reapply "[EquivalenceClasses] Replace findValue with contains (NFC)."
This reverts the revert commit 616f447fc84bdc7655117f1b303d895dc3b93e4d.

It includes updates to remaining users in Polly and Clang, to avoid
failures when building those projects.
2025-03-31 22:27:59 +01:00