420 Commits

Author SHA1 Message Date
Florian Hahn
a353e258ba
[LAA] Don't require Stride == 1/-1 for inbounds pointer AddRecs nowrap. (#113126)
If we have a pointer AddRec, the maximum increment is
2^(pointer-index-wdith - 1) - 1. This means that if incrementing the
AddRec wraps, the distance between the previously accessed location and
the wrapped location is > 2^(pointer-index-wdith - 1), i.e. if the GEP
for the AddRec is inbounds, this would be poison due to the object being
larger than half the pointer index type space. The poison would be
immediate UB when the memory access gets executed..

Similar reasoning can be applied for decrements.

PR: https://github.com/llvm/llvm-project/pull/113126
2024-11-05 22:45:56 +01:00
Ramkumar Ramachandra
d897ea37db
LAA: check nusw on GEP in place of inbounds (#112223)
With the introduction of the nusw flag in GEPNoWrapFlags, it should be
safe to weaken the check in LoopAccessAnalysis to just check the nusw
flag on the GEP, instead of inbounds.
2024-10-22 09:58:54 +01:00
Ramkumar Ramachandra
f719cfa868
LAA: be less conservative in isNoWrap (#112553)
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded into getPtrStride, which has the correct handling for the
Assume flag. Also note that the Stride == 1 check in isNoWrap is
incorrect: getPtrStride returns Strides == 1 or -1, except when
isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we
can include the case of -1 Stride, and when isNoWrapAddRec is true. With
this change, passing Assume = true to getPtrStride could return a
non-unit stride, and we correctly handle that case as well.
2024-10-22 09:55:51 +01:00
Kazu Hirata
0614b3cfac
[Analysis] Simplify code with DenseMap::operator[] (NFC) (#111331) 2024-10-07 07:00:45 -07:00
Florian Hahn
dec4cfdb09
[LAA] Use loop guards when checking invariant accesses.
Apply loop guards to start and end pointers like done in other places to
improve results.
2024-10-04 12:23:13 +01:00
Benjamin Maxwell
50a1ab12ab
[LAA] Don't assume libcalls with output/input pointers can be vectorized (#108980)
LoopAccessAnalysis currently does not check/track aliasing from the
output pointers, but assumes vectorizing library calls with a mapping is
safe.

This can result in incorrect codegen if something like the following is
vectorized:

```
for(int i=0; i<N; i++) {
  // No aliasing between input and output pointers detected.
  sincos(cos_out[0], sin_out+i, cos_out+i);
}
```

Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the
original value of `cos_out[0]` not the updated value.
2024-09-23 16:05:55 +01:00
Florian Hahn
d43a80936d
Revert "[LAA] Remove loop-invariant check added in 234cc40adc61."
This reverts commit a80053322b765eec93951e21db490c55521da2d8.

The new asserts exposed an underlying issue where the expanded bounds
could wrap, causing the parts of the code to incorrectly determine that
accesses do not overlap.

Reproducer below based on @mstorsjo's test case.

opt -passes='print<access-info>'

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

define i32 @j(ptr %P, i32 %x, i32 %y) {
entry:
  %gep.P.4 = getelementptr inbounds nuw i8, ptr %P, i32 4
  %gep.P.8 = getelementptr inbounds nuw i8, ptr %P, i32 8
  br label %loop

loop:
  %1 = phi i32 [ %x, %entry ], [ %sel, %loop.latch ]
  %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop.latch ]
  %gep.iv = getelementptr inbounds i64, ptr %gep.P.8, i32 %iv
  %l = load i32, ptr %gep.iv, align 4
  %c.1 = icmp eq i32 %l, 3
  br i1 %c.1, label %loop.latch, label %if.then

if.then:                                          ; preds = %for.body
  store i64 0, ptr %gep.iv, align 4
  %l.2 = load i32, ptr %gep.P.4
  br label %loop.latch

loop.latch:
  %sel = phi i32 [ %l.2, %if.then ], [ %1, %loop ]
  %iv.next = add nsw i32 %iv, 1
  %c.2 = icmp slt i32 %iv.next, %sel
  br i1 %c.2, label %loop, label %exit

exit:
  %res = phi i32 [ %iv.next, %loop.latch ]
  ret i32 %res
}
2024-08-27 11:55:47 +01:00
Florian Hahn
a80053322b
[LAA] Remove loop-invariant check added in 234cc40adc61.
234cc40adc61 introduced a loop-invariance check to limit the
compile-time impact of the newly added checks.

This patch removes the restriction and avoids extra compile-time impact
by sinking the check to exits where we would return an unknown
dependence. This notably reduces the amount the extra checks are
executed while not missing out on any improvements from them.

https://llvm-compile-time-tracker.com/compare.php?from=33e7cd6ff23f6c904314d17c68dc58168fd32d09&to=7c55e66d4f31ce8262b90c119a8e84e1f9515ff1&stat=instructions:u
2024-08-26 10:24:00 +01:00
Florian Hahn
d7c84d7b71
[LAA] Collect loop guards only once in MemoryDepChecker (NFCI).
This on its own gives small compile-time improvements in some configs
and enables using loop guards at more places in the future while keeping
compile-time impact low.

https://llvm-compile-time-tracker.com/compare.php?from=c44202574ff9a8c0632aba30c2765b134557435f&to=55ffc3dd920fa9af439fd39f8f9cc13509531420&stat=instructions:u
2024-08-21 08:28:52 +01:00
Nikita Popov
6a84af704f
[LAA] Use computeConstantDifference() (#103725)
Use computeConstantDifference() instead of casting getMinusSCEV() to
SCEVConstant. This can be much faster in some cases, because
computeConstantDifference() computes the result without creating new
SCEV expressions.

This improves LTO/ThinLTO compile-time for lencod by more than 10%.

I've verified that computeConstantDifference() does not produce worse
results than the previous code for anything in llvm-test-suite. This
required raising the iteration cutoff to 6. I ended up increasing it to
8 just to be on the safe side (for code outside llvm-test-suite), and
because this doesn't materially affect compile-time anyway (we'll almost
always bail out earlier).
2024-08-16 12:52:57 +02:00
Florian Hahn
edf46f365c
[SCEV] Use const SCEV * explicitly in more places.
Use const SCEV * explicitly in more places to prepare for
https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.
2024-08-03 20:10:01 +01:00
Florian Hahn
844c188c79
[LAA] Refine stride checks for SCEVs during dependence analysis. (#99577)
Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.

Update getPtrStride to return 0 for invariant pointers.

Then proceed by checking the strides.

If either source or sink are not strided by a constant (i.e. not a
non-wrapping AddRec) or invariant, the accesses may overlap
with earlier or later iterations and we cannot generate runtime
checks to disambiguate them.

Otherwise they are either loop invariant or strided. In that case, we
can generate a runtime check to disambiguate them.

If both are strided by constants, we proceed as previously.

This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces
additional checks if the underlying object is loop-invariant.

Fixes https://github.com/llvm/llvm-project/issues/87189.

PR: https://github.com/llvm/llvm-project/pull/99577
2024-07-26 13:10:16 +01:00
Ramkumar Ramachandra
3eaf9f7237
LAA: fix style after cursory reading (NFC) (#100447) 2024-07-25 18:08:12 +01:00
Ramkumar Ramachandra
2754c083cb
LAA: mark LoopInfo pointer const (NFC) (#100373) 2024-07-24 16:52:11 +01:00
Florian Hahn
19c9a1c2fd
[LAA] Include IndirectUnsafe in ::isPossiblyBackward.
Similarly to Unknown, IndirectUnsafe should also be considered possibly
backward, as it may be a backwards dependency e.g. via loading
different base pointers.

This also brings isPossiblyBackward in line with
Dependence::isSafeForVectorization. At the moment this can't be tested,
as it is not possible to write a test with an AddRec that is based on a
loop varying value. But this may change in the future and may cause
mis-compiles in the future.
2024-07-18 22:09:05 +01:00
Florian Hahn
3ccda93671
[LAA] Update pointer-bounds cache to also consider access type.
The same pointer may be accessed with different types and the bound
includes the size of the accessed type to compute the end. Update the
cache to correctly disambiguate between different accessed types.
2024-07-14 17:24:12 +01:00
Graham Hunter
22a7f6dcc4
Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493)
Reverts llvm/llvm-project#91458 to deal with post-commit reviewer
requests.
2024-07-11 16:39:30 +01:00
Graham Hunter
1860fd049e
[LV] Autovectorization for the all-in-one histogram intrinsic (#91458)
This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.
2024-07-11 15:33:30 +01:00
Florian Hahn
5028dea652
[LAA] Only invalidate loops that require runtime checks (NFCI).
LAA doesn't keep references to IR outside the loop or references to
SCEVs that may be invalidated, unless runtime checks are needed (either
memory or SCEV predicates). For the current LAA users, it should be
sufficient to invalidate entries for loops that require runtime checks,
thus avoiding analyzing loops again unnecessarily.

This helps reduce compile-time, in particular when removing the
restrictions added in 234cc40adc6.

https://llvm-compile-time-tracker.com/compare.php?from=73894dba2cdbcc00678d0c13a6b61765675f60b4&to=05c6bdc41b5f63696ebeb7116325725fa94f66d6&stat=instructions:u
2024-07-06 22:14:01 +01:00
Florian Hahn
28be3f8ac5
[LAA] Cache pointer bounds expansions (NFCI).
This avoids expanding the same bounds multiple times, which helps reduce
the compile-time impact of removing the restrictions added in
234cc40adc6, notably -0.06% on stage1-O3 and -0.05% on both
stage1-ReleaseThinLTO and stage1-ReleaseLTO-g.

https://llvm-compile-time-tracker.com/compare.php?from=8b9ebc4bb86cf0979e05908cbb04336f2d01dda5&to=fabd36f96c31e47ea72653f5a404feaadfc7b5b5&stat=instructions:u
2024-07-04 10:00:05 +01:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Kazu Hirata
1462605ab0
[Analysis] Use range-based for loops (NFC) (#96587) 2024-06-25 06:57:30 -07:00
Ramkumar Ramachandra
0f111ba790
LoopInfo: introduce Loop::getLocStr; unify debug output (#93051)
Introduce a Loop::getLocStr stolen from LoopVectorize's static function
getDebugLocString in order to have uniform debug output headers across
LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation
for this change is to have UpdateTestChecks recognize the headers and
automatically generate CHECK lines for debug output, with minimal
special-casing.
2024-06-25 13:12:15 +01:00
Ramkumar Ramachandra
5ae50698a0
LAA: strip unnecessary getUniqueCastUse (#92119)
733b8b2 ([LAA] Simplify identification of speculatable strides [nfc])
refactored getStrideFromPointer() to compute directly on SCEVs, and
return an SCEV expression instead of a Value. However, it left behind a
call to getUniqueCastUse(), which is completely unnecessary. Remove
this, showing a positive test update, and simplify the surrounding
program logic.
2024-06-24 22:49:02 +01:00
Ramkumar Ramachandra
18a8983c36
LAA: refactor analyzeLoop to return bool (NFC) (#93824)
Avoid wastefully setting CanVecMem in several places in analyzeLoop,
complicating the logic, to get the function to return a bool, and set
CanVecMem in the caller.
2024-06-11 19:39:02 +01:00
Florian Hahn
e949b54a5b
[LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (#93499)
Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns
the minimum of the countable exits.

When analyzing dependences and computing runtime checks, we need the
smallest upper bound on the number of iterations. In terms of memory
safety, it shouldn't matter if any uncomputable exits leave the loop,
as long as we prove that there are no dependences given the minimum of
the countable exits. The same should apply also for generating runtime
checks.

Note that this shifts the responsiblity of checking whether all exit
counts are computable or handling early-exits to the users of LAA.

Depends on https://github.com/llvm/llvm-project/pull/93498

PR: https://github.com/llvm/llvm-project/pull/93499
2024-06-04 22:23:30 +01:00
Florian Hahn
1880a7bf18
[LAA] Move getDependenceDistanceStrideAndSize to MemoryDepChecker (NFC).
This avoids unnecessarily passing a number of parameters, and avoids
needing to add extra parameters in the future.
2024-05-29 14:18:05 -07:00
Florian Hahn
b74f50a269
[LAA] Store reference to SymbolicStrides in MemoryDepChecker (NFC).
This reduces the need for explicitly passing it through multiple layers
of function calls.
2024-05-29 13:37:13 -07:00
Florian Hahn
234cc40adc
[LAA] Limit no-overlap check to at least one loop-invariant accesses.
Limit the logic added in https://github.com/llvm/llvm-project/pull/9230
to cases where either sink or source are loop-invariant, to avoid
compile-time increases. This is not needed for correctness.

I am working on follow-up changes to reduce the compile-time impact in
general to allow us to enable this again for any source/sink.

This should fix the compile-time regression introduced by this change:

* compile-time improvement with this change:
  https://llvm-compile-time-tracker.com/compare.php?from=4351787fb650da6d1bfb8d6e58753c90dcd4c418&to=b89010a2eb5f98494787c1c3b77f25208c59090c&stat=instructions:u

* compile-time improvement with original patch reverted on top of this
  change:
  https://llvm-compile-time-tracker.com/compare.php?from=b89010a2eb5f98494787c1c3b77f25208c59090c&to=19a1103fe68115cfd7d6472c6961f4fabe81a593&stat=instructions:u
2024-05-28 09:23:02 -07:00
Ramkumar Ramachandra
b6468766f7
[LAA] refactor program logic (NFC) (#92101)
Implement NFC improvements spotted during a cursory reading of
LoopAccessAnalysis.
2024-05-23 12:10:21 +01:00
Florian Hahn
1b377dbeb7
[LAA] Check accesses don't overlap early to determine NoDep (#92307)
Use getStartAndEndForAccess to compute the start and end of both src 
and sink (factored out to helper in bce3680f45b57f). If they do not
overlap (i.e. SrcEnd <= SinkStart || SinkEnd <= SrcStart), there is no
dependence, regardless of stride.

PR: https://github.com/llvm/llvm-project/pull/92307
2024-05-21 11:00:11 +01:00
Florian Hahn
bce3680f45
[LAA] Move logic to compute start and end of a pointer to helper (NFC).
This allows use at other places, in particular an updated version of
https://github.com/llvm/llvm-project/pull/92307.
2024-05-20 19:35:19 +01:00
Ramkumar Ramachandra
b6fa78d54c
[LAA] refactor sortPtrAccesses (NFC) (#92256)
Use the destructuring syntax in C++ and llvm::enumerate to make
sortPtrAccesses a little more readable.
2024-05-17 10:41:41 +01:00
Florian Hahn
179efe5abc
[LAA] Delay applying loop guards until after isSafeDependenceDistance.
Applying the loop guards to the distance may prevent
isSafeDependenceDistance from determining NoDep, unless loop guards are
also applied to the backedge-taken-count.

Instead of applying the guards to both Dist and the
backedge-taken-count, just apply them after handling
isSafeDependenceDistance and constant distances; there is no benefit to
applying the guards before then.

This fixes a regression flagged by @bjope due to
ecae3ed958481cba7d60868cf3504292f7f4fdf5.
2024-05-14 19:47:24 +01:00
Ramkumar Ramachandra
08536b0f9c
[LAA] refactor tryToCreateDiffCheck (NFC) (#92110)
tryToCreateDiffCheck has one caller, and exits early if CanUseDiffCheck
is false. Hence, we can get/set CanUseDiffCheck in the caller to avoid
wastefully calling tryToCreateDiffCheck. This patch is an NFC
simplification of program logic.
2024-05-14 16:19:55 +01:00
Florian Hahn
28767afd53
[LAA] Support backward dependences with non-constant distance. (#91525)
Following up to 933f49248, also update the code reasoning about
backwards dependences to support non-constant distances.

Update the code to use the signed minimum distance instead of a constant
distance

This means e checked the lower bound of the dependence distance and the
distance may be larger at runtime (and safe for vectorization). Whether
to classify it as Unknown or Backwards depends on the vector width and
LAA was updated to take TTI to get the maximum vector register width.

If the minimum dependence distance is larger than the max vector width,
we consider it as backwards-vectorizable. Otherwise we classify them as
Unknown, so we re-try with runtime checks.

PR: https://github.com/llvm/llvm-project/pull/91525
2024-05-10 11:47:13 +01:00
Florian Hahn
ecae3ed958
[LAA] Apply loop guards to dependence distance.
After supporting non-constant dependence distances in 933f49248bf,
applying information from loop guards can help further disambiguate
dependencies.
2024-05-09 18:12:55 +01:00
Florian Hahn
3219c0edb2
[LAA] Directly pass DepChecker to getSource/getDestination (NFC).
Instead of passing LoopAccessInfo only to fetch the MemoryDepChecker,
directly pass MemoryDepChecker. This simplifies the code and also allows
new uses in places where no LAI is available.
2024-05-05 21:16:20 +01:00
Florian Hahn
b54a78d69b
[LV,LAA] Don't vectorize loops with load and store to invar address.
Code checking stores to invariant addresses and reductions made an
incorrect assumption that the case of both a load & store to the same
invariant address does not need to be handled.

In some cases when vectorizing with runtime checks, there may be
dependences with a load and store to the same address, storing a
reduction value.

Update LAA to separately track if there was a store-store and a
load-store dependence with an invariant addresses.

Bail out early if there as a load-store dependence with invariant
address. If there was a store-store one, still apply the logic checking
if they all store a reduction.
2024-05-04 20:53:54 +01:00
Florian Hahn
82219e547b
[LAA] Pass maximum stride to isSafeDependenceDistance. (#90036)
As discussed in https://github.com/llvm/llvm-project/pull/88039, support
different strides with isSafeDependenceDistance by passing the maximum
of both strides.

isSafeDependenceDistance tries to prove that
    |Dist| > BackedgeTakenCount * Step
holds. Chosing the maximum stride computes the maximum range accesed by
the loop for all strides.

PR: https://github.com/llvm/llvm-project/pull/90036
2024-04-30 12:59:08 +01:00
Florian Hahn
933f49248b
[LAA] Support different strides & non constant dep distances using SCEV. (#88039)
Extend LoopAccessAnalysis to support different strides and as a
consequence non-constant distances between dependences using SCEV to
reason about the direction of the dependence.

In multiple places, logic to rule out dependences using the stride has
been updated to only be used if StrideA == StrideB, i.e. there's a
common stride.

We now also may bail out at multiple places where we may have to set
FoundNonConstantDistanceDependence. This is done when we need to bail
out and the distance is not constant to preserve original behavior.

Fixes https://github.com/llvm/llvm-project/issues/87336

PR: https://github.com/llvm/llvm-project/pull/88039
2024-04-25 21:38:07 +01:00
Florian Hahn
fe28a0e482
[LAA] Document reasoning in multiple places in isDependent (NFC). (#89381)
As suggested in https://github.com/llvm/llvm-project/pull/88039, add
extra documentation for reasoning in isDependent.

PR: https://github.com/llvm/llvm-project/pull/89381
2024-04-22 14:10:05 +01:00
Florian Hahn
cac4c14ecf
[LAA] Replace std::tuple with struct (NFCI).
As suggested in https://github.com/llvm/llvm-project/pull/88039, replace
the tuple with a struct, to make it easier to extend.
2024-04-10 10:28:43 +01:00
Florian Hahn
a3ad5faa32
[LAA] Fix typo IndidrectUnsafe -> IndirectUnsafe.
Fix type in textual analysis output.
2024-03-12 14:44:04 +00:00
Nikita Popov
cd7ea4ea65
[LAA] Drop alias scope metadata that is not valid across iterations (#79161)
LAA currently adds memory locations with their original AATags to AST.
However, scoped alias AATags may be valid only within one loop
iteration, while LAA reasons across iterations.

Fix this by determining which alias scopes are defined inside the loop,
and drop AATags that reference these scopes.

Fixes https://github.com/llvm/llvm-project/issues/79137.
2024-01-24 11:20:16 +01:00
Bruno De Fraine
656bf13004
[AST] Don't merge memory locations in AliasSetTracker (#65731)
This changes the AliasSetTracker to track memory locations instead of
pointers in its alias sets. The motivation for this is outlined in an RFC
posted on LLVM discourse:
https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336

In the data structures of the AST implementation, I made the choice to
replace the linked list of `PointerRec` entries (that had to go anyway)
with a simple flat vector of `MemoryLocation` objects, but for the
`AliasSet` objects referenced from a lookup table, I retained the
mechanism of a linked list, reference counting, forwarding, etc. The
data structures could be revised in a follow-up change.
2024-01-17 15:59:13 +01:00
Jannik Silvanus
7954c57124
[IR] Fix GEP offset computations for vector GEPs (#75448)
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.

This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of  `GTI.getSequentialElementStride(DL)`, 
which is a new helper function added in this PR.

This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:

* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
  are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.

Existing tests are unaffected, but a miscompilation of a new test is fixed.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-01-04 10:08:21 +01:00
David Sherwood
49b0e6dcc2
[LoopVectorize] Enable hoisting of runtime checks by default (#71538)
With commit https://reviews.llvm.org/D152366 I introduced functionality
that permitted the hoisting of runtime memory checks from a vectorised
inner loop to the preheader of the next outer-most loop. This is useful
for benchmarks like SPEC2017's x264 where the inner loop is vectorised
and only has a small trip count. In such cases the runtime memory checks
become expensive and since the checks never fail in the case of x264 it
makes sense to do this. However, this behaviour was controlled by the
flag -hoist-runtime-checks which was off by default.

This patch enables this flag by default for all targets, since I believe
this is a generally beneficial thing to do. I have tested this with
SPEC2017 and I see 2.3% and 2.6% improvements with x264 on neoverse-v1
and neoverse-n1, respectively. Similarly, I saw slight improvements in
the overall geomean on both machines. The only other notable changes
were a 1% drop in the roms benchmark, which was compensated for by a 1%
improvement in fotonik3d.
2023-12-18 09:41:54 +00:00
Bruno De Fraine
c030778979 [AST] Switch to MemoryLocation in add method (NFC)
Pass MemoryLocation as one argument, instead of passing all its
parts separately.
2023-12-13 12:15:09 +01:00
David Sherwood
ceb02379a9
[LoopVectorize] Improve algorithm for hoisting runtime checks (#73515)
When attempting to hoist runtime checks out of a loop we currently avoid
creating pointer diff checks and prefer to do expanded range checks
instead. This gives us the opportunity to hoist runtime checks out of a
loop, since these checks are loop invariant. However, in some cases the
pointer diff checks would also be loop invariant and so will naturally
get hoisted. Therefore, since diff checks are cheaper so we should
prefer to use those instead.
2023-12-12 09:10:39 +00:00