377 Commits

Author SHA1 Message Date
Florian Hahn
a3ad5faa32
[LAA] Fix typo IndidrectUnsafe -> IndirectUnsafe.
Fix type in textual analysis output.
2024-03-12 14:44:04 +00:00
Nikita Popov
cd7ea4ea65
[LAA] Drop alias scope metadata that is not valid across iterations (#79161)
LAA currently adds memory locations with their original AATags to AST.
However, scoped alias AATags may be valid only within one loop
iteration, while LAA reasons across iterations.

Fix this by determining which alias scopes are defined inside the loop,
and drop AATags that reference these scopes.

Fixes https://github.com/llvm/llvm-project/issues/79137.
2024-01-24 11:20:16 +01:00
Bruno De Fraine
656bf13004
[AST] Don't merge memory locations in AliasSetTracker (#65731)
This changes the AliasSetTracker to track memory locations instead of
pointers in its alias sets. The motivation for this is outlined in an RFC
posted on LLVM discourse:
https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336

In the data structures of the AST implementation, I made the choice to
replace the linked list of `PointerRec` entries (that had to go anyway)
with a simple flat vector of `MemoryLocation` objects, but for the
`AliasSet` objects referenced from a lookup table, I retained the
mechanism of a linked list, reference counting, forwarding, etc. The
data structures could be revised in a follow-up change.
2024-01-17 15:59:13 +01:00
Jannik Silvanus
7954c57124
[IR] Fix GEP offset computations for vector GEPs (#75448)
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.

This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of  `GTI.getSequentialElementStride(DL)`, 
which is a new helper function added in this PR.

This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:

* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
  are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.

Existing tests are unaffected, but a miscompilation of a new test is fixed.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2024-01-04 10:08:21 +01:00
David Sherwood
49b0e6dcc2
[LoopVectorize] Enable hoisting of runtime checks by default (#71538)
With commit https://reviews.llvm.org/D152366 I introduced functionality
that permitted the hoisting of runtime memory checks from a vectorised
inner loop to the preheader of the next outer-most loop. This is useful
for benchmarks like SPEC2017's x264 where the inner loop is vectorised
and only has a small trip count. In such cases the runtime memory checks
become expensive and since the checks never fail in the case of x264 it
makes sense to do this. However, this behaviour was controlled by the
flag -hoist-runtime-checks which was off by default.

This patch enables this flag by default for all targets, since I believe
this is a generally beneficial thing to do. I have tested this with
SPEC2017 and I see 2.3% and 2.6% improvements with x264 on neoverse-v1
and neoverse-n1, respectively. Similarly, I saw slight improvements in
the overall geomean on both machines. The only other notable changes
were a 1% drop in the roms benchmark, which was compensated for by a 1%
improvement in fotonik3d.
2023-12-18 09:41:54 +00:00
Bruno De Fraine
c030778979 [AST] Switch to MemoryLocation in add method (NFC)
Pass MemoryLocation as one argument, instead of passing all its
parts separately.
2023-12-13 12:15:09 +01:00
David Sherwood
ceb02379a9
[LoopVectorize] Improve algorithm for hoisting runtime checks (#73515)
When attempting to hoist runtime checks out of a loop we currently avoid
creating pointer diff checks and prefer to do expanded range checks
instead. This gives us the opportunity to hoist runtime checks out of a
loop, since these checks are loop invariant. However, in some cases the
pointer diff checks would also be loop invariant and so will naturally
get hoisted. Therefore, since diff checks are cheaper so we should
prefer to use those instead.
2023-12-12 09:10:39 +00:00
Alexandros Lamprineas
3ad6d1cbe5
[LAA] Fix incorrect dependency classification. (#70819)
As shown in #70473, the following loop was not considered safe to
vectorize. When determining the memory access dependencies in
a loop which has negative iteration step, we invert the source and
sink of the dependence. Perhaps we should just invert the operands
to getMinusSCEV(). This way the dependency is not regarded to be
true, since the users of the `IsWrite` variables, which correspond to
each of the memory accesses, rely on program order and therefore
should not be swapped.

void vectorizable_Read_Write(int *A) {
  for (unsigned i = 1022; i >= 0; i--)
    A[i+1] = A[i] + 1;
}
2023-12-05 15:27:30 +00:00
Florian Hahn
17139f38e5
[LAA] Check HasSameSize before couldPreventStoreLoadForward.
After 9645267, TypeByteSize is 0 if both access do not have the same
size (i.e. HasSameSize will be false). This can cause an infinite loop
in couldPreventStoreLoadForward, if HasSameSize is not checked first.

So check HasSameSize first instead of after
couldPreventStoreLoadForward. Checking HasSameSize first is also
cheaper.
2023-11-27 10:10:41 +00:00
Florian Hahn
96452676e5
[LAA] Factor out logic to compute dependence distance. (NFCI)
This patch refactors the logic to compute the dependence distance,
stride, size and write info to a separate function. This limits the
scope of various A* and B* variables, which in turn makes it easier to
reason about their uses.

In particular this makes it more explicit why dropping the various
std::swaps as done in https://github.com/llvm/llvm-project/pull/70819/
is valid.
2023-11-23 22:08:09 +00:00
Florian Hahn
442fef6eea
Revert "mp"
Commit was pushed by accident.

This reverts commit 0e15f245557000038f27f2dc8926a65bf3d4ccaf.
2023-11-23 20:24:25 +00:00
Florian Hahn
0e15f24555
mp 2023-11-23 19:18:24 +00:00
Florian Hahn
1dbcaf2777
[LAA] Check if dependencies access loop-varying underlying objects.
This patch adds a new dependence kind UnsafeIndirect, for cases where at
least one of the memory access instructions may access a loop varying object,
e.g. the address of underlying object is loaded inside the loop, like A[B[i]].
We cannot determine direction or distance in those cases, and also are unable
to generate any runtime checks.

This fixes a miscompile, if we attempt to generate runtime checks for
unknown dependencies.

Note that in most cases we do not attempt to generate runtime checks for
unknown dependences, except if FoundNonConstantDistanceDependence is
true.

Fixes https://github.com/llvm/llvm-project/issues/69744.
2023-11-15 21:58:57 +00:00
Allen
46cb7e4eea
[LoopDist] Update the pragma info of loop distribute, NFC (#69825)
Base on D19403, the exact pragma of distribute is
  `#pragma clang loop distribute`
2023-10-28 17:47:46 +08:00
Kazu Hirata
9c5a5a421d [llvm] Stop including llvm/ADT/iterator_range.h (NFC)
Identified with misc-include-cleaner.
2023-10-22 15:41:18 -07:00
Kazu Hirata
3af0ff99b1 [llvm] Stop including llvm/ADT/DepthFirstIterator.h (NFC)
Identified with misc-include-cleaner.
2023-10-22 12:15:46 -07:00
Allen
48caa0723c
[LAA] Analyze pointers forked by a phi (#65834)
Given a function like the following: https://godbolt.org/z/T9c99fr88
```c
1161_noReadWrite(int *Preds) {
  for (int i = 0; i < LEN_1D-1; ++i) {
    if (Preds[i] != 0)
      b[i] = c[i] + 1;
    else
      a[i] = i * i;
  }
}
```

LLVM will optimize the IR to a single store by a phi instruction:

   ```llvm
      %1 = load ptr, ptr @a, align 64
      %2 = load ptr, ptr @b, align 64
      ...
    for.inc:
      %.sink = phi ptr [ %1, %if.then ], [ %2, %if.else ]
      %add.sink = phi double [ %add, %if.then ], [ %conv8, %if.else ]
%arrayidx7 = getelementptr inbounds double, ptr %.sink, i64 %indvars.iv
      store double %add.sink, ptr %arrayidx7, align 8
   ```

LAA is currently unable to analyze such IR, since ScalarEvolution will
return a SCEVUnknown for the forked pointer operand of the store.

This patch adds initial optional support for analyzing both
possibilities for the pointer and allowing LAA to generate runtime
checks for the bounds if required, refers to D108699, but here address
the phi node.

Fixes https://github.com/llvm/llvm-project/issues/64888

Reviewed By: huntergr-arm, fhahn
Differential Revision: https://reviews.llvm.org/D158965
2023-09-19 09:16:47 +08:00
Allen
76c6a8bd36
[LAA] Improve the output remark for LoopVectorize (#65832)
Don't report 'Use #pragma loop distribute(enable) to allow loop
distribution' when we already add #pragma clang loop distribute(enable)

Fixes https://github.com/llvm/llvm-project/issues/64637
2023-09-16 12:51:44 +08:00
David Sherwood
c02184f286 [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop
Suppose we have a nested loop like this:

  void foo(int32_t *dst, int32_t *src, int m, int n) {
    for (int i = 0; i < m; i++) {
      for (int j = 0; j < n; j++) {
        dst[(i * n) + j] += src[(i * n) + j];
      }
    }
  }

We currently generate runtime memory checks as a precondition for
entering the vectorised version of the inner loop. However, if the
runtime-determined trip count for the inner loop is quite small
then the cost of these checks becomes quite expensive. This patch
attempts to mitigate these costs by adding a new option to
expand the memory ranges being checked to include the outer loop
as well. This leads to runtime checks that can then be hoisted
above the outer loop. For example, rather than looking for a
conflict between the memory ranges:

1. &dst[(i * n)] -> &dst[(i * n) + n]
2. &src[(i * n)] -> &src[(i * n) + n]

we can instead look at the expanded ranges:

1. &dst[0] -> &dst[((m - 1) * n) + n]
2. &src[0] -> &src[((m - 1) * n) + n]

which are outer-loop-invariant. As with many optimisations there
is a trade-off here, because there is a danger that using the
expanded ranges we may never enter the vectorised inner loop,
whereas with the smaller ranges we might enter at least once.

I have added a HoistRuntimeChecks option that is turned off by
default, but can be enabled for workloads where we know this is
guaranteed to be of real benefit. In future, we can also use
PGO to determine if this is worthwhile by using the inner loop
trip count information.

When enabling this option for SPEC2017 on neoverse-v1 with the
flags "-Ofast -mcpu=native -flto" I see an overall geomean
improvement of ~0.5%:

SPEC2017 results (+ is an improvement, - is a regression):
520.omnetpp: +2%
525.x264: +2%
557.xz: +1.2%
...
GEOMEAN: +0.5%

I didn't investigate all the differences to see if they are
genuine or noise, but I know the x264 improvement is real because
it has some hot nested loops with low trip counts where I can
see this hoisting is beneficial.

Tests have been added here:

  Transforms/LoopVectorize/runtime-checks-hoist.ll

Differential Revision: https://reviews.llvm.org/D152366
2023-08-24 12:14:02 +00:00
Kazu Hirata
6e6014a260 [Analysis] Fix an unused variable warning
This patch fixes:

  llvm/lib/Analysis/LoopAccessAnalysis.cpp:2001:12: error: unused
  variable 'MinDepDistBytesOld' [-Werror,-Wunused-variable]
2023-08-16 10:09:40 -07:00
Michael Maitland
87ddd3a191 [LAA] Rename and fix semantics of MaxSafeDepDistBytes to MinDepDistBytes
`MaxSafeDepDistBytes` was not correct based on its name an semantics
in instances when there was a non-unit stride loop. For example,

```
for (int k = 0; k < len; k+=3) {
  a[k] = a[k+4];
  a[k+2] = a[k+6];
}
```

Here, the smallest dependence distance is 24 bytes, but only vectorizing 8 bytes
is safe. `MaxSafeVectorWidthInBits` reported the correct number of bits
that could be vectorized as 64 bits.

The semantics of of `MaxSafeDepDistBytes` should be:
  The smallest dependence distance in bytes in the loop. This may not be
  the same as the maximum number of bytes that are safe to operate on
  simultaneously.

The name of this variable should reflect those semantics and
its docstring should be updated accordingly, `MinDepDistBytes`.

A debug message that used `MaxSafeDepDistBytes` to signify to the user
how many bytes could be accessed in parallel is updated to use
`MaxSafeVectorWidthInBits` instead. That way, the same message if
communicated to the user, just in different units.

This patch makes sure that when `MinDepDistBytes` is modified in a way
that should impact `MaxSafeVectorWidthInBits`, that we update the latter
accordingly. This patch also clarifies why `MaxSafeVectorWidthInBits`
does not to be updated when `MinDepDistBytes` is (i.e. in the case of a
forward dependency).

Differential Revision: https://reviews.llvm.org/D156158
2023-08-16 09:53:35 -07:00
Bjorn Pettersson
91157a0b26 [LegacyPM] Drop unused includes in passes no longer supporting legacy PM 2023-08-13 16:46:57 +02:00
Johannes Doerfert
fa367d159a [IR] Mark llvm.assume as memory(inaccessiblemem: write)
It was `inaccessiblemem: readwrite` before, no need for the read.
No real benefit is expected but it can help debugging and other efforts.

Differential Revision: https://reviews.llvm.org/D156478
2023-07-31 13:44:52 -07:00
Florian Hahn
4d02b472f1
[LAA] Add assertion to check both Start and End are invariant (NFC).
Add extra assert to check invariant of RuntimePointerChecking::insert to
guard against subtle changes when extending the scope of LAA.
2023-07-24 15:17:18 +02:00
Nikita Popov
9cf5254878 [llvm] Remove some uses of isOpaqueOrPointeeTypeEquals() (NFC) 2023-07-18 11:18:31 +02:00
Kazu Hirata
4eb06e57f1 [LegacyPM] Remove LoopAccessLegacyAnalysis
Differential Revision: https://reviews.llvm.org/D153610
2023-06-23 00:36:39 -07:00
Florian Hahn
e48b1e87a3
[LV] Split off invariance check from isUniform (NFCI).
After 572cfa3fde5433, isUniform now checks VF based uniformity instead of
just invariance as before.

As follow-up cleanup suggested in D148841, separate the invariance check
out and update callers that currently check only for invariance.

This also moves the implementation of isUniform from LoopAccessAnalysis
to LoopVectorizationLegality, as LoopAccesAnalysis doesn't use the more
general isUniform.
2023-06-01 19:09:11 +01:00
Florian Hahn
3b912e269a
[LV] Bail out on loop-variant steps when rewriting SCEV exprs.
If the step is not loop-invariant, we cannot create a modified AddRec,
as the start needs to be loop-invariant. Mark those cases as
CannotAnalyze and bail out, to fix a crash.
2023-06-01 16:14:02 +01:00
Florian Hahn
572cfa3fde
[LV] Use SCEV for uniformity analysis across VF
This patch uses SCEV to check if a value is uniform across a given VF.

The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the vector lane 0
(offset 0) compare it to the expressions for lanes 1 to the last vector
lane (VF - 1). If they are equal, consider the expression uniform.

While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D148841
2023-05-31 16:01:00 +01:00
Kazu Hirata
638112737e [Analysis] Remove unused function stripIntegerCast
The last use was removed by:

  commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a
  Author: Philip Reames <preames@rivosinc.com>
  Date:   Thu May 11 08:10:49 2023 -0700
2023-05-29 12:19:14 -07:00
Philip Reames
733b8b2b49 [LAA] Simplify identification of speculatable strides [nfc]
Mostly just avoiding the need to keep both Value and SCEVs flowing through with consistent handling.  We can do everything in terms of SCEV - aside from the profitability heuristics which are now isolated in one spot.
2023-05-11 11:48:21 -07:00
Philip Reames
7fbfcc653f [LV/LAA] Use PSE to identify stride multiplies which simplify [mostly nfc]
LV/LAA will speculate that (some) strided access patterns have unit stride, and insert runtime checks if required.

LV cost models a multiply by such a stride as free.  We did this by keeping around the StrideSet structure, just to check if one of the operands were one of the strides we speculated.

We can instead just ask PredicatedScalarEvolution if either of the operands are one (after predicates are applied).  We get mostly the same result - PSE can prove it in more cases in theory - and simpler code.
2023-05-11 11:16:04 -07:00
Philip Reames
e41dce4d49 [LAA/LV] Simplify stride speculation logic [NFC] (try 2)
The original commit wasn't quite NFC, and this was caught by an arguably overly strong assert.  Specifically, I'd failed to strip off the integer cast off the SCEV before saving it in the map.  The result - other than a failed assert - is that we'd speculate on the casted unknown, not the unknown.  The only case I can think of where that might change behavior would be a sext(i1 load).  I doubt that case is interesting in practice, but it's good to be strictly NFC on this change regardless.

Original commit message follows..

The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.

We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.

Differential Revision: https://reviews.llvm.org/D147750
2023-05-11 10:19:23 -07:00
Philip Reames
dc0d00c5fc Revert "[LAA/LV] Simplify stride speculation logic [NFC]"
This reverts commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a.  Running this through broader testing after rebasing is revealing a crash.  Reverting while I investigate.
2023-05-11 09:26:35 -07:00
Philip Reames
d5b8401312 [LAA/LV] Simplify stride speculation logic [NFC]
The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is *profitable* to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point.

We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable.

Differential Revision: https://reviews.llvm.org/D147750
2023-05-11 08:32:56 -07:00
Philip Reames
30cdb2ac7e [LAA] Add command line flag to disable unit stride speculation
This is purely so that we can expose and work through downstream codegen issues.  My intention is to see if we can get this disabled by default, but that requires fixing a bunch of downstream issues first.
2023-05-01 10:49:51 -07:00
Philip Reames
4343534a67 [LAA] Rework overflow checking in getPtrStride [nfc]
The previous code structure and comments were exceedingly confusing.  I have multiple times looked at this code and suspected a bug.  This time, I decided to take the time to reflow the code and comment out why it is correct.

The only suspect (to me) case left is that an underaligned access with a unit stride (in terms of the access type) might miss the undefined null pointer when wrapping.  This is unlikely to be an issue for C/C++ code with real page sizes, so I'm not bothering to fully convince myself whether that case is correct or not.
2023-05-01 10:21:02 -07:00
Philip Reames
89a44b0fae [LAA] Use early return [nfc] 2023-05-01 08:35:56 -07:00
Paul Osmialowski
9cf1881f8f [SCEV] Do not plant SCEV checks unnecessarily
The vectorisation analysis collects strides for loop invariant
pointers, which is wrong because they are not strided. We don't
need to generate SCEV checks (which are costly performancewise)
for such pointers, we just need to do the appropriate aliasing
checks.

This patch fixes the problem by changing getStrideFromPointer()
to treat loop invariant pointers as having no stride.

Originally proposed by David Sherwood with further suggestions
from Florian Hahn.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D146958
2023-04-25 21:47:14 +01:00
Philip Reames
0437f88b77 [LAA] Cleanup casting in replaceSymbolicStrideSCEV [nfc] 2023-04-06 09:13:55 -07:00
Philip Reames
2d79b71366 [LAA] Continue moving utilities to sole use to isolate symbolic stride reasoning [nfc] 2023-04-06 08:27:57 -07:00
Philip Reames
800a99c4f4 [LAA] Group implementation of stride speculation into one file [nfc]
These utilities are only used in one place, so move them there and make them static.
2023-04-05 20:39:08 -07:00
Bjorn Pettersson
951a980dc7 [Analysis] Make order of analysis executions more stable
When debugging and using debug-pass-manager (e.g. in regression tests)
we prefer a consistent order in which analysis passes are executed.
But when for example doing

  return MyClass(AM.getResult<LoopAnalysis>(F),
                 AM.getResult<DominatorTreeAnalysis>(F));

then the order in which LoopAnalysis and DominatorTreeAnalysis isn't
guaranteed, and might for example depend on which compiler that is
used when building LLVM.

I've not scanned the full source tree, but this fixes some occurances
of the above pattern found in lib/Analysis.

This problem was discussed briefly in review for D146206.
2023-03-17 09:33:16 +01:00
Bjorn Pettersson
81d6310da1 [LAA] Fix transitive analysis invalidation bug by implementing LoopAccessInfoManager::invalidate
The default invalidate method for analysis results is just looking
at the preserved state of the pass itself. It does not consider if
the analysis has an internal state that depend on other analyses.
Thus, we need to implement LoopAccessInfoManager::invalidate in order
to catch if LoopAccessAnalysis needs to be invalidated due to
transitive analyses such as AAManager is being invalidated. Otherwise
we might end up having references to an AAManager that is stale.

Fixes https://github.com/llvm/llvm-project/issues/61324

Differential Revision: https://reviews.llvm.org/D146206
2023-03-17 09:33:16 +01:00
Guillaume Chatelet
8fd5558b29 [NFC] Use TypeSize::geFixedValue() instead of TypeSize::getFixedSize()
This change is one of a series to implement the discussion from
https://reviews.llvm.org/D141134.
2023-01-11 16:49:38 +00:00
Fangrui Song
d4b6fcb32e [Analysis] llvm::Optional => std::optional 2022-12-14 07:32:24 +00:00
Nikita Popov
4de3184f07 [LAA] Use cross-iteration alias analysis
LAA analyzes cross-iteration memory dependencies, as such AA should
not make assumptions about equality of values inside the loop, as
they may come from different iterations.

Fix this by exposing the MayBeCrossIteration AA flag and enabling
it for LAA.

Differential Revision: https://reviews.llvm.org/D137958
2022-12-05 09:27:13 +01:00
Nikita Popov
e95ca5bb05 [AST] Make AliasSetTracker work on BatchAA
D138014 restricted AST to work on immutable IR. This means it is
also safe to use a single BatchAA instance for the entire AST
lifetime, instead of only batching parts of individual queries.

The primary motivation for this is not compile-time, but rather
having a central place to control cross-iteration AA, which will
be used by D137958.

Differential Revision: https://reviews.llvm.org/D137955
2022-12-05 08:12:26 +01:00
Benjamin Kramer
856f7937c7 Compress a few pairs using PointerIntPairs
Use the uniform structured bindings interface where possible. NFCI.
2022-12-04 16:55:16 +01:00
Kazu Hirata
19aff0f37d [Analysis] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 19:43:04 -08:00