442 Commits

Author SHA1 Message Date
Florian Hahn
01d0793a69
[LAA] Make Ptr argument optional in isNoWrap. (#127410)
Update isNoWrap to make the IR Ptr argument optional. This allows using
isNoWrap when dealing with things like pointer-selects, where a select
is translated to multiple pointer SCEV expressions, but there is no IR
value that can be used. We don't try to retrieve pointer values for the
pointer SCEVs and using info from the IR would not be safe. For example,
we cannot use inbounds, because the pointer may never be accessed.

PR: https://github.com/llvm/llvm-project/pull/127410
2025-02-19 14:51:19 +01:00
Ramkumar Ramachandra
6646b65082
[LAA] Rework and rename stripGetElementPtr (#125315)
The stripGetElementPtr function is mysteriously named, and calls into
another mysterious getGEPInductionOperand which does something
complicated with GEP indices. The real purpose of the badly-named
stripGetElementPtr function is to get a loop-variant GEP index, if there
is one. The getGEPInductionOperand is totally redundant, as stripping
off zeros from the end of GEP indices has no effect on computing the
loop-variant GEP index, as constant zeros are always loop-invariant.
Moreover, the GEP induction operand is simply the first non-zero index
from the end, which stripGetElementPtr returns when it finds that any of
the GEP indices are loop-variant: this is a completely unrelated value
to the GEP index that is loop-variant. The implicit assumption here is
that there is only ever one loop-variant index, and it is the first
non-zero one from the end.

The logic is unnecessarily complicated for what stripGetElementPtr wants
to achieve, and the header comments are confusing as well. Strip
getGEPInductionOperand, rework and rename stripGetElementPtr.
2025-02-18 10:25:47 +00:00
Florian Hahn
a8b177aa60 [LAA] Remove unneeded hasNoOverflow call (NFC).
The function already calls hasNoOverflow above.
2025-02-17 21:14:01 +01:00
Ramkumar Ramachandra
6d86a8a1a1
LAA: scope responsibility of isNoWrapAddRec (NFC) (#127479)
Free isNoWrapAddRec from the AddRec check, and rename it to isNoWrapGEP.
2025-02-17 16:58:09 +00:00
Florian Hahn
e080366a76 [LAA] Inline hasComputableBounds in only caller, simplify isNoWrap.
Inline hasComputableBounds into createCheckForAccess. This removes a
level of indirection and allows for passing the AddRec directly to
isNoWrap, removing the need to retrieve the AddRec for the pointer
again.

The early continue for invariant SCEVs now also applies to forked
pointers (i.e. when there's more than one entry in TranslatedPtrs) when
ShouldCheckWrap is true, as those trivially won't wrap.

The change is NFC otherwise. replaceSymbolicStrideSCEV is now called
earlier.
2025-02-16 19:56:13 +01:00
Florian Hahn
e60de25c4e [LAA] Replace symbolic strides for translated pointers earlier (NFC).
Move up replaceSymbolicStrideSCEV before isNoWrap. It needs to be called
after hasComputableBounds, as this may create an AddRec via PSE, which
replaceSymbolicStrideSCEV will look up.

This is in preparation for simplifying isNoWrap.
2025-02-15 19:44:39 +01:00
Florian Hahn
4664a4c66b [LAA] Use getPointer/setPointer in createCheckForAccess (NFC).
Use getPointer/setPointer to clarify we are accessing/modifying the
rurrent value.
2025-02-15 16:17:42 +01:00
Kazu Hirata
778001514f [Analysis] Fix a warning
This patch fixes:

  llvm/lib/Analysis/LoopAccessAnalysis.cpp:1530:9: error: unused
  variable 'Ty' [-Werror,-Wunused-variable]
2025-02-14 12:41:43 -08:00
Florian Hahn
9ad83f7fcf [LAA] Get pointer address space from AddRec (NFC).
Retrieve the address space from the pointer AddRec instead of the IR
pointer value, to prepare to make the IR pointer value optional.
2025-02-14 20:39:52 +01:00
Florian Hahn
044b52832a
[LAA] Perform checks for no-wrap separately from getPtrStride. (#126971)
Reorganize the code in isNoWrap to perform the no-wrap checks without
relying on getPtrStride directly. getPtrStride now uses isNoWrap.

The new structure allows deriving no-wrap in more cases in LAA, because
there are some cases where getPtrStride bails out early because it
cannot return a constant stride, but we can still prove no-wrap for the
pointer.

An example are AddRecs with non-ConstantInt strides with inbound GEPs,
in the improved test cases.

This enables vectorization with runtime checks in a few more cases.

PR: https://github.com/llvm/llvm-project/pull/126971
2025-02-14 20:06:37 +01:00
Florian Hahn
424fcc5df7 [LAA] Split off code to compute stride from AddRec for reuse (NFC).
Refactors to code to expose the core logic from getPtrStride to compute
the stride for a given AddRec.

Split off from https://github.com/llvm/llvm-project/pull/126971 as
suggested.
2025-02-13 22:06:12 +01:00
Ramkumar Ramachandra
8327c2cfdb
LAA: fix logic for MaxTargetVectorWidth (#125487)
Uses the fixed register width if scalable vectorization is not enabled
(via TargetTransformInfo::enableScalableVectorization) and improves
results if there are scalable vector registers, but they shouldn't be
used.
2025-02-13 11:40:05 +00:00
Ramkumar Ramachandra
db43dd7f4f
LAA: simplify LoopAccessInfoManager::clear (NFC) (#125488)
DenseMap::erase() doesn't invalidate the iterator.
2025-02-03 16:06:21 +00:00
Ramkumar Ramachandra
7444ccdd26
LAA: improve code in getStrideFromPointer (NFC) (#124780)
Strip dead code, inline a constant, and modernize style.
2025-01-31 20:06:25 +00:00
Ramkumar Ramachandra
3a4376b8f9
LAA: handle 0 return from getPtrStride correctly (#124539)
getPtrStride returns 0 when the PtrScev is loop-invariant, and this is
not an erroneous value: it returns std::nullopt to communicate that it
was not able to find a valid pointer stride. In analyzeLoop, we call
getPtrStride with a value_or(0) conflating the zero return value with
std::nullopt. Fix this, handling loop-invariant loads correctly.
2025-01-27 14:21:14 +00:00
David Sherwood
b7286dbef9
Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752" (#123616)
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
2025-01-27 11:59:38 +00:00
David Sherwood
a00938eedd
Revert "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop (#96752)" (#123057)
This reverts commit bfedf6460c2cad6e6f966b457d8d27084579dcd8.
2025-01-15 13:56:42 +00:00
David Sherwood
bfedf6460c
[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop (#96752)
Currently when we encounter a negative step in the induction
variable isDereferenceableAndAlignedInLoop bails out because
the element size is signed greater than the step. This patch
adds support for negative steps in cases where we detect the
start address for the load is of the form base + offset. In
this case the address decrements in each iteration so we need
to calculate the access size differently. I have done this by
caling getStartAndEndForAccess from LoopAccessAnalysis.cpp.

The motivation for this patch comes from PR #88385 where a
reviewer requested reusing isDereferenceableAndAlignedInLoop,
but that PR itself does support reverse loops.

The changed test in LoopVectorize/X86/load-deref-pred.ll now
passes because previously we were calculating the total access
size incorrectly, whereas now it is 412 bytes and fits
perfectly into the alloca.
2025-01-15 12:47:43 +00:00
Ramkumar Ramachandra
8b4561467e
LAA: add missed swap when inverting src, sink (#122254)
When inverting source and sink on a negative induction step, the types
of the source and sink should also be swapped. This fixes a bug in the
code that follows, that computes properties based on these types. With
234cc40 ([LAA] Limit no-overlap check to at least one loop-invariant
accesses.), that code is guarded by a loop-invariant condition: however,
the commit did not add any new tests exercising the guarded code, and
hence the bugfix in this patch requires additional tests to exercise
that guarded codepath.
2025-01-13 13:07:19 +00:00
Ramkumar Ramachandra
17912f336b
LAA: refactor dependence class to prep for scaled strides (NFC) (#122113)
Rearrange the DepDistanceAndSizeInfo struct in preparation to scale
strides. getDependenceDistanceStrideAndSize now returns the data of
CommonStride, MaxStride, and clarifies when to retry with runtime
checks, in place of (unscaled) strides.
2025-01-09 16:05:17 +00:00
Nikita Popov
bc0976ed1f
[LAA] Strip non-inbounds offset in getPointerDiff() (NFC) (#118665)
I believe that this code doesn't care whether the offsets are known to
be inbounds a priori. For the same reason the change is not testable, as
the SCEV based fallback code will look through non-inbounds offsets
anyway. So make it clear that there is no special inbounds requirement
here.
2024-12-10 13:05:34 +01:00
Ramkumar Ramachandra
aa5cdcea39
LAA: improve code in a couple of routines (NFC) (#108092) 2024-11-28 16:15:45 +00:00
Florian Hahn
a353e258ba
[LAA] Don't require Stride == 1/-1 for inbounds pointer AddRecs nowrap. (#113126)
If we have a pointer AddRec, the maximum increment is
2^(pointer-index-wdith - 1) - 1. This means that if incrementing the
AddRec wraps, the distance between the previously accessed location and
the wrapped location is > 2^(pointer-index-wdith - 1), i.e. if the GEP
for the AddRec is inbounds, this would be poison due to the object being
larger than half the pointer index type space. The poison would be
immediate UB when the memory access gets executed..

Similar reasoning can be applied for decrements.

PR: https://github.com/llvm/llvm-project/pull/113126
2024-11-05 22:45:56 +01:00
Ramkumar Ramachandra
d897ea37db
LAA: check nusw on GEP in place of inbounds (#112223)
With the introduction of the nusw flag in GEPNoWrapFlags, it should be
safe to weaken the check in LoopAccessAnalysis to just check the nusw
flag on the GEP, instead of inbounds.
2024-10-22 09:58:54 +01:00
Ramkumar Ramachandra
f719cfa868
LAA: be less conservative in isNoWrap (#112553)
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded into getPtrStride, which has the correct handling for the
Assume flag. Also note that the Stride == 1 check in isNoWrap is
incorrect: getPtrStride returns Strides == 1 or -1, except when
isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we
can include the case of -1 Stride, and when isNoWrapAddRec is true. With
this change, passing Assume = true to getPtrStride could return a
non-unit stride, and we correctly handle that case as well.
2024-10-22 09:55:51 +01:00
Kazu Hirata
0614b3cfac
[Analysis] Simplify code with DenseMap::operator[] (NFC) (#111331) 2024-10-07 07:00:45 -07:00
Florian Hahn
dec4cfdb09
[LAA] Use loop guards when checking invariant accesses.
Apply loop guards to start and end pointers like done in other places to
improve results.
2024-10-04 12:23:13 +01:00
Benjamin Maxwell
50a1ab12ab
[LAA] Don't assume libcalls with output/input pointers can be vectorized (#108980)
LoopAccessAnalysis currently does not check/track aliasing from the
output pointers, but assumes vectorizing library calls with a mapping is
safe.

This can result in incorrect codegen if something like the following is
vectorized:

```
for(int i=0; i<N; i++) {
  // No aliasing between input and output pointers detected.
  sincos(cos_out[0], sin_out+i, cos_out+i);
}
```

Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the
original value of `cos_out[0]` not the updated value.
2024-09-23 16:05:55 +01:00
Florian Hahn
d43a80936d
Revert "[LAA] Remove loop-invariant check added in 234cc40adc61."
This reverts commit a80053322b765eec93951e21db490c55521da2d8.

The new asserts exposed an underlying issue where the expanded bounds
could wrap, causing the parts of the code to incorrectly determine that
accesses do not overlap.

Reproducer below based on @mstorsjo's test case.

opt -passes='print<access-info>'

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

define i32 @j(ptr %P, i32 %x, i32 %y) {
entry:
  %gep.P.4 = getelementptr inbounds nuw i8, ptr %P, i32 4
  %gep.P.8 = getelementptr inbounds nuw i8, ptr %P, i32 8
  br label %loop

loop:
  %1 = phi i32 [ %x, %entry ], [ %sel, %loop.latch ]
  %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop.latch ]
  %gep.iv = getelementptr inbounds i64, ptr %gep.P.8, i32 %iv
  %l = load i32, ptr %gep.iv, align 4
  %c.1 = icmp eq i32 %l, 3
  br i1 %c.1, label %loop.latch, label %if.then

if.then:                                          ; preds = %for.body
  store i64 0, ptr %gep.iv, align 4
  %l.2 = load i32, ptr %gep.P.4
  br label %loop.latch

loop.latch:
  %sel = phi i32 [ %l.2, %if.then ], [ %1, %loop ]
  %iv.next = add nsw i32 %iv, 1
  %c.2 = icmp slt i32 %iv.next, %sel
  br i1 %c.2, label %loop, label %exit

exit:
  %res = phi i32 [ %iv.next, %loop.latch ]
  ret i32 %res
}
2024-08-27 11:55:47 +01:00
Florian Hahn
a80053322b
[LAA] Remove loop-invariant check added in 234cc40adc61.
234cc40adc61 introduced a loop-invariance check to limit the
compile-time impact of the newly added checks.

This patch removes the restriction and avoids extra compile-time impact
by sinking the check to exits where we would return an unknown
dependence. This notably reduces the amount the extra checks are
executed while not missing out on any improvements from them.

https://llvm-compile-time-tracker.com/compare.php?from=33e7cd6ff23f6c904314d17c68dc58168fd32d09&to=7c55e66d4f31ce8262b90c119a8e84e1f9515ff1&stat=instructions:u
2024-08-26 10:24:00 +01:00
Florian Hahn
d7c84d7b71
[LAA] Collect loop guards only once in MemoryDepChecker (NFCI).
This on its own gives small compile-time improvements in some configs
and enables using loop guards at more places in the future while keeping
compile-time impact low.

https://llvm-compile-time-tracker.com/compare.php?from=c44202574ff9a8c0632aba30c2765b134557435f&to=55ffc3dd920fa9af439fd39f8f9cc13509531420&stat=instructions:u
2024-08-21 08:28:52 +01:00
Nikita Popov
6a84af704f
[LAA] Use computeConstantDifference() (#103725)
Use computeConstantDifference() instead of casting getMinusSCEV() to
SCEVConstant. This can be much faster in some cases, because
computeConstantDifference() computes the result without creating new
SCEV expressions.

This improves LTO/ThinLTO compile-time for lencod by more than 10%.

I've verified that computeConstantDifference() does not produce worse
results than the previous code for anything in llvm-test-suite. This
required raising the iteration cutoff to 6. I ended up increasing it to
8 just to be on the safe side (for code outside llvm-test-suite), and
because this doesn't materially affect compile-time anyway (we'll almost
always bail out earlier).
2024-08-16 12:52:57 +02:00
Florian Hahn
edf46f365c
[SCEV] Use const SCEV * explicitly in more places.
Use const SCEV * explicitly in more places to prepare for
https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.
2024-08-03 20:10:01 +01:00
Florian Hahn
844c188c79
[LAA] Refine stride checks for SCEVs during dependence analysis. (#99577)
Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.

Update getPtrStride to return 0 for invariant pointers.

Then proceed by checking the strides.

If either source or sink are not strided by a constant (i.e. not a
non-wrapping AddRec) or invariant, the accesses may overlap
with earlier or later iterations and we cannot generate runtime
checks to disambiguate them.

Otherwise they are either loop invariant or strided. In that case, we
can generate a runtime check to disambiguate them.

If both are strided by constants, we proceed as previously.

This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces
additional checks if the underlying object is loop-invariant.

Fixes https://github.com/llvm/llvm-project/issues/87189.

PR: https://github.com/llvm/llvm-project/pull/99577
2024-07-26 13:10:16 +01:00
Ramkumar Ramachandra
3eaf9f7237
LAA: fix style after cursory reading (NFC) (#100447) 2024-07-25 18:08:12 +01:00
Ramkumar Ramachandra
2754c083cb
LAA: mark LoopInfo pointer const (NFC) (#100373) 2024-07-24 16:52:11 +01:00
Florian Hahn
19c9a1c2fd
[LAA] Include IndirectUnsafe in ::isPossiblyBackward.
Similarly to Unknown, IndirectUnsafe should also be considered possibly
backward, as it may be a backwards dependency e.g. via loading
different base pointers.

This also brings isPossiblyBackward in line with
Dependence::isSafeForVectorization. At the moment this can't be tested,
as it is not possible to write a test with an AddRec that is based on a
loop varying value. But this may change in the future and may cause
mis-compiles in the future.
2024-07-18 22:09:05 +01:00
Florian Hahn
3ccda93671
[LAA] Update pointer-bounds cache to also consider access type.
The same pointer may be accessed with different types and the bound
includes the size of the accessed type to compute the end. Update the
cache to correctly disambiguate between different accessed types.
2024-07-14 17:24:12 +01:00
Graham Hunter
22a7f6dcc4
Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493)
Reverts llvm/llvm-project#91458 to deal with post-commit reviewer
requests.
2024-07-11 16:39:30 +01:00
Graham Hunter
1860fd049e
[LV] Autovectorization for the all-in-one histogram intrinsic (#91458)
This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.
2024-07-11 15:33:30 +01:00
Florian Hahn
5028dea652
[LAA] Only invalidate loops that require runtime checks (NFCI).
LAA doesn't keep references to IR outside the loop or references to
SCEVs that may be invalidated, unless runtime checks are needed (either
memory or SCEV predicates). For the current LAA users, it should be
sufficient to invalidate entries for loops that require runtime checks,
thus avoiding analyzing loops again unnecessarily.

This helps reduce compile-time, in particular when removing the
restrictions added in 234cc40adc6.

https://llvm-compile-time-tracker.com/compare.php?from=73894dba2cdbcc00678d0c13a6b61765675f60b4&to=05c6bdc41b5f63696ebeb7116325725fa94f66d6&stat=instructions:u
2024-07-06 22:14:01 +01:00
Florian Hahn
28be3f8ac5
[LAA] Cache pointer bounds expansions (NFCI).
This avoids expanding the same bounds multiple times, which helps reduce
the compile-time impact of removing the restrictions added in
234cc40adc6, notably -0.06% on stage1-O3 and -0.05% on both
stage1-ReleaseThinLTO and stage1-ReleaseLTO-g.

https://llvm-compile-time-tracker.com/compare.php?from=8b9ebc4bb86cf0979e05908cbb04336f2d01dda5&to=fabd36f96c31e47ea72653f5a404feaadfc7b5b5&stat=instructions:u
2024-07-04 10:00:05 +01:00
Nikita Popov
2d209d964a
[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)
This is a helper to avoid writing `getModule()->getDataLayout()`. I
regularly try to use this method only to remember it doesn't exist...

`getModule()->getDataLayout()` is also a common (the most common?)
reason why code has to include the Module.h header.
2024-06-27 16:38:15 +02:00
Kazu Hirata
1462605ab0
[Analysis] Use range-based for loops (NFC) (#96587) 2024-06-25 06:57:30 -07:00
Ramkumar Ramachandra
0f111ba790
LoopInfo: introduce Loop::getLocStr; unify debug output (#93051)
Introduce a Loop::getLocStr stolen from LoopVectorize's static function
getDebugLocString in order to have uniform debug output headers across
LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation
for this change is to have UpdateTestChecks recognize the headers and
automatically generate CHECK lines for debug output, with minimal
special-casing.
2024-06-25 13:12:15 +01:00
Ramkumar Ramachandra
5ae50698a0
LAA: strip unnecessary getUniqueCastUse (#92119)
733b8b2 ([LAA] Simplify identification of speculatable strides [nfc])
refactored getStrideFromPointer() to compute directly on SCEVs, and
return an SCEV expression instead of a Value. However, it left behind a
call to getUniqueCastUse(), which is completely unnecessary. Remove
this, showing a positive test update, and simplify the surrounding
program logic.
2024-06-24 22:49:02 +01:00
Ramkumar Ramachandra
18a8983c36
LAA: refactor analyzeLoop to return bool (NFC) (#93824)
Avoid wastefully setting CanVecMem in several places in analyzeLoop,
complicating the logic, to get the function to return a bool, and set
CanVecMem in the caller.
2024-06-11 19:39:02 +01:00
Florian Hahn
e949b54a5b
[LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (#93499)
Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns
the minimum of the countable exits.

When analyzing dependences and computing runtime checks, we need the
smallest upper bound on the number of iterations. In terms of memory
safety, it shouldn't matter if any uncomputable exits leave the loop,
as long as we prove that there are no dependences given the minimum of
the countable exits. The same should apply also for generating runtime
checks.

Note that this shifts the responsiblity of checking whether all exit
counts are computable or handling early-exits to the users of LAA.

Depends on https://github.com/llvm/llvm-project/pull/93498

PR: https://github.com/llvm/llvm-project/pull/93499
2024-06-04 22:23:30 +01:00
Florian Hahn
1880a7bf18
[LAA] Move getDependenceDistanceStrideAndSize to MemoryDepChecker (NFC).
This avoids unnecessarily passing a number of parameters, and avoids
needing to add extra parameters in the future.
2024-05-29 14:18:05 -07:00
Florian Hahn
b74f50a269
[LAA] Store reference to SymbolicStrides in MemoryDepChecker (NFC).
This reduces the need for explicitly passing it through multiple layers
of function calls.
2024-05-29 13:37:13 -07:00