llvm-project

Author	SHA1	Message	Date
Ramkumar Ramachandra	b6468766f7	[LAA] refactor program logic (NFC) (#92101 ) Implement NFC improvements spotted during a cursory reading of LoopAccessAnalysis.	2024-05-23 12:10:21 +01:00
Florian Hahn	1b377dbeb7	[LAA] Check accesses don't overlap early to determine NoDep (#92307 ) Use getStartAndEndForAccess to compute the start and end of both src and sink (factored out to helper in bce3680f45b57f). If they do not overlap (i.e. SrcEnd <= SinkStart \|\| SinkEnd <= SrcStart), there is no dependence, regardless of stride. PR: https://github.com/llvm/llvm-project/pull/92307	2024-05-21 11:00:11 +01:00
Florian Hahn	bce3680f45	[LAA] Move logic to compute start and end of a pointer to helper (NFC). This allows use at other places, in particular an updated version of https://github.com/llvm/llvm-project/pull/92307.	2024-05-20 19:35:19 +01:00
Ramkumar Ramachandra	b6fa78d54c	[LAA] refactor sortPtrAccesses (NFC) (#92256 ) Use the destructuring syntax in C++ and llvm::enumerate to make sortPtrAccesses a little more readable.	2024-05-17 10:41:41 +01:00
Florian Hahn	179efe5abc	[LAA] Delay applying loop guards until after isSafeDependenceDistance. Applying the loop guards to the distance may prevent isSafeDependenceDistance from determining NoDep, unless loop guards are also applied to the backedge-taken-count. Instead of applying the guards to both Dist and the backedge-taken-count, just apply them after handling isSafeDependenceDistance and constant distances; there is no benefit to applying the guards before then. This fixes a regression flagged by @bjope due to ecae3ed958481cba7d60868cf3504292f7f4fdf5.	2024-05-14 19:47:24 +01:00
Ramkumar Ramachandra	08536b0f9c	[LAA] refactor tryToCreateDiffCheck (NFC) (#92110 ) tryToCreateDiffCheck has one caller, and exits early if CanUseDiffCheck is false. Hence, we can get/set CanUseDiffCheck in the caller to avoid wastefully calling tryToCreateDiffCheck. This patch is an NFC simplification of program logic.	2024-05-14 16:19:55 +01:00
Florian Hahn	28767afd53	[LAA] Support backward dependences with non-constant distance. (#91525 ) Following up to 933f49248, also update the code reasoning about backwards dependences to support non-constant distances. Update the code to use the signed minimum distance instead of a constant distance This means e checked the lower bound of the dependence distance and the distance may be larger at runtime (and safe for vectorization). Whether to classify it as Unknown or Backwards depends on the vector width and LAA was updated to take TTI to get the maximum vector register width. If the minimum dependence distance is larger than the max vector width, we consider it as backwards-vectorizable. Otherwise we classify them as Unknown, so we re-try with runtime checks. PR: https://github.com/llvm/llvm-project/pull/91525	2024-05-10 11:47:13 +01:00
Florian Hahn	ecae3ed958	[LAA] Apply loop guards to dependence distance. After supporting non-constant dependence distances in 933f49248bf, applying information from loop guards can help further disambiguate dependencies.	2024-05-09 18:12:55 +01:00
Florian Hahn	3219c0edb2	[LAA] Directly pass DepChecker to getSource/getDestination (NFC). Instead of passing LoopAccessInfo only to fetch the MemoryDepChecker, directly pass MemoryDepChecker. This simplifies the code and also allows new uses in places where no LAI is available.	2024-05-05 21:16:20 +01:00
Florian Hahn	b54a78d69b	[LV,LAA] Don't vectorize loops with load and store to invar address. Code checking stores to invariant addresses and reductions made an incorrect assumption that the case of both a load & store to the same invariant address does not need to be handled. In some cases when vectorizing with runtime checks, there may be dependences with a load and store to the same address, storing a reduction value. Update LAA to separately track if there was a store-store and a load-store dependence with an invariant addresses. Bail out early if there as a load-store dependence with invariant address. If there was a store-store one, still apply the logic checking if they all store a reduction.	2024-05-04 20:53:54 +01:00
Florian Hahn	82219e547b	[LAA] Pass maximum stride to isSafeDependenceDistance. (#90036 ) As discussed in https://github.com/llvm/llvm-project/pull/88039, support different strides with isSafeDependenceDistance by passing the maximum of both strides. isSafeDependenceDistance tries to prove that \|Dist\| > BackedgeTakenCount * Step holds. Chosing the maximum stride computes the maximum range accesed by the loop for all strides. PR: https://github.com/llvm/llvm-project/pull/90036	2024-04-30 12:59:08 +01:00
Florian Hahn	933f49248b	[LAA] Support different strides & non constant dep distances using SCEV. (#88039 ) Extend LoopAccessAnalysis to support different strides and as a consequence non-constant distances between dependences using SCEV to reason about the direction of the dependence. In multiple places, logic to rule out dependences using the stride has been updated to only be used if StrideA == StrideB, i.e. there's a common stride. We now also may bail out at multiple places where we may have to set FoundNonConstantDistanceDependence. This is done when we need to bail out and the distance is not constant to preserve original behavior. Fixes https://github.com/llvm/llvm-project/issues/87336 PR: https://github.com/llvm/llvm-project/pull/88039	2024-04-25 21:38:07 +01:00
Florian Hahn	fe28a0e482	[LAA] Document reasoning in multiple places in isDependent (NFC). (#89381 ) As suggested in https://github.com/llvm/llvm-project/pull/88039, add extra documentation for reasoning in isDependent. PR: https://github.com/llvm/llvm-project/pull/89381	2024-04-22 14:10:05 +01:00
Florian Hahn	cac4c14ecf	[LAA] Replace std::tuple with struct (NFCI). As suggested in https://github.com/llvm/llvm-project/pull/88039, replace the tuple with a struct, to make it easier to extend.	2024-04-10 10:28:43 +01:00
Florian Hahn	a3ad5faa32	[LAA] Fix typo IndidrectUnsafe -> IndirectUnsafe. Fix type in textual analysis output.	2024-03-12 14:44:04 +00:00
Nikita Popov	cd7ea4ea65	[LAA] Drop alias scope metadata that is not valid across iterations (#79161 ) LAA currently adds memory locations with their original AATags to AST. However, scoped alias AATags may be valid only within one loop iteration, while LAA reasons across iterations. Fix this by determining which alias scopes are defined inside the loop, and drop AATags that reference these scopes. Fixes https://github.com/llvm/llvm-project/issues/79137.	2024-01-24 11:20:16 +01:00
Bruno De Fraine	656bf13004	[AST] Don't merge memory locations in AliasSetTracker (#65731 ) This changes the AliasSetTracker to track memory locations instead of pointers in its alias sets. The motivation for this is outlined in an RFC posted on LLVM discourse: https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336 In the data structures of the AST implementation, I made the choice to replace the linked list of `PointerRec` entries (that had to go anyway) with a simple flat vector of `MemoryLocation` objects, but for the `AliasSet` objects referenced from a lookup table, I retained the mechanism of a linked list, reference counting, forwarding, etc. The data structures could be revised in a follow-up change.	2024-01-17 15:59:13 +01:00
Jannik Silvanus	7954c57124	[IR] Fix GEP offset computations for vector GEPs (#75448 ) Vectors are always bit-packed and don't respect the elements' alignment requirements. This is different from arrays. This means offsets of vector GEPs need to be computed differently than offsets of array GEPs. This PR fixes many places that rely on an incorrect pattern that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`. We replace these by usages of `GTI.getSequentialElementStride(DL)`, which is a new helper function added in this PR. This changes behavior for GEPs into vectors with element types for which the (bit) size and alloc size is different. This includes two cases: * Types with a bit size that is not a multiple of a byte, e.g. i1. GEPs into such vectors are questionable to begin with, as some elements are not even addressable. * Overaligned types, e.g. i16 with 32-bit alignment. Existing tests are unaffected, but a miscompilation of a new test is fixed. --------- Co-authored-by: Nikita Popov <github@npopov.com>	2024-01-04 10:08:21 +01:00
David Sherwood	49b0e6dcc2	[LoopVectorize] Enable hoisting of runtime checks by default (#71538 ) With commit https://reviews.llvm.org/D152366 I introduced functionality that permitted the hoisting of runtime memory checks from a vectorised inner loop to the preheader of the next outer-most loop. This is useful for benchmarks like SPEC2017's x264 where the inner loop is vectorised and only has a small trip count. In such cases the runtime memory checks become expensive and since the checks never fail in the case of x264 it makes sense to do this. However, this behaviour was controlled by the flag -hoist-runtime-checks which was off by default. This patch enables this flag by default for all targets, since I believe this is a generally beneficial thing to do. I have tested this with SPEC2017 and I see 2.3% and 2.6% improvements with x264 on neoverse-v1 and neoverse-n1, respectively. Similarly, I saw slight improvements in the overall geomean on both machines. The only other notable changes were a 1% drop in the roms benchmark, which was compensated for by a 1% improvement in fotonik3d.	2023-12-18 09:41:54 +00:00
Bruno De Fraine	c030778979	[AST] Switch to MemoryLocation in add method (NFC) Pass MemoryLocation as one argument, instead of passing all its parts separately.	2023-12-13 12:15:09 +01:00
David Sherwood	ceb02379a9	[LoopVectorize] Improve algorithm for hoisting runtime checks (#73515 ) When attempting to hoist runtime checks out of a loop we currently avoid creating pointer diff checks and prefer to do expanded range checks instead. This gives us the opportunity to hoist runtime checks out of a loop, since these checks are loop invariant. However, in some cases the pointer diff checks would also be loop invariant and so will naturally get hoisted. Therefore, since diff checks are cheaper so we should prefer to use those instead.	2023-12-12 09:10:39 +00:00
Alexandros Lamprineas	3ad6d1cbe5	[LAA] Fix incorrect dependency classification. (#70819 ) As shown in #70473, the following loop was not considered safe to vectorize. When determining the memory access dependencies in a loop which has negative iteration step, we invert the source and sink of the dependence. Perhaps we should just invert the operands to getMinusSCEV(). This way the dependency is not regarded to be true, since the users of the `IsWrite` variables, which correspond to each of the memory accesses, rely on program order and therefore should not be swapped. void vectorizable_Read_Write(int *A) { for (unsigned i = 1022; i >= 0; i--) A[i+1] = A[i] + 1; }	2023-12-05 15:27:30 +00:00
Florian Hahn	17139f38e5	[LAA] Check HasSameSize before couldPreventStoreLoadForward. After 9645267, TypeByteSize is 0 if both access do not have the same size (i.e. HasSameSize will be false). This can cause an infinite loop in couldPreventStoreLoadForward, if HasSameSize is not checked first. So check HasSameSize first instead of after couldPreventStoreLoadForward. Checking HasSameSize first is also cheaper.	2023-11-27 10:10:41 +00:00
Florian Hahn	96452676e5	[LAA] Factor out logic to compute dependence distance. (NFCI) This patch refactors the logic to compute the dependence distance, stride, size and write info to a separate function. This limits the scope of various A* and B* variables, which in turn makes it easier to reason about their uses. In particular this makes it more explicit why dropping the various std::swaps as done in https://github.com/llvm/llvm-project/pull/70819/ is valid.	2023-11-23 22:08:09 +00:00
Florian Hahn	442fef6eea	Revert "mp" Commit was pushed by accident. This reverts commit 0e15f245557000038f27f2dc8926a65bf3d4ccaf.	2023-11-23 20:24:25 +00:00
Florian Hahn	0e15f24555	mp	2023-11-23 19:18:24 +00:00
Florian Hahn	1dbcaf2777	[LAA] Check if dependencies access loop-varying underlying objects. This patch adds a new dependence kind UnsafeIndirect, for cases where at least one of the memory access instructions may access a loop varying object, e.g. the address of underlying object is loaded inside the loop, like A[B[i]]. We cannot determine direction or distance in those cases, and also are unable to generate any runtime checks. This fixes a miscompile, if we attempt to generate runtime checks for unknown dependencies. Note that in most cases we do not attempt to generate runtime checks for unknown dependences, except if FoundNonConstantDistanceDependence is true. Fixes https://github.com/llvm/llvm-project/issues/69744.	2023-11-15 21:58:57 +00:00
Allen	46cb7e4eea	[LoopDist] Update the pragma info of loop distribute, NFC (#69825 ) Base on D19403, the exact pragma of distribute is `#pragma clang loop distribute`	2023-10-28 17:47:46 +08:00
Kazu Hirata	9c5a5a421d	[llvm] Stop including llvm/ADT/iterator_range.h (NFC) Identified with misc-include-cleaner.	2023-10-22 15:41:18 -07:00
Kazu Hirata	3af0ff99b1	[llvm] Stop including llvm/ADT/DepthFirstIterator.h (NFC) Identified with misc-include-cleaner.	2023-10-22 12:15:46 -07:00
Allen	48caa0723c	[LAA] Analyze pointers forked by a phi (#65834 ) Given a function like the following: https://godbolt.org/z/T9c99fr88 ```c 1161_noReadWrite(int Preds) { for (int i = 0; i < LEN_1D-1; ++i) { if (Preds[i] != 0) b[i] = c[i] + 1; else a[i] = i i; } } ``` LLVM will optimize the IR to a single store by a phi instruction: ```llvm %1 = load ptr, ptr @a, align 64 %2 = load ptr, ptr @b, align 64 ... for.inc: %.sink = phi ptr [ %1, %if.then ], [ %2, %if.else ] %add.sink = phi double [ %add, %if.then ], [ %conv8, %if.else ] %arrayidx7 = getelementptr inbounds double, ptr %.sink, i64 %indvars.iv store double %add.sink, ptr %arrayidx7, align 8 ``` LAA is currently unable to analyze such IR, since ScalarEvolution will return a SCEVUnknown for the forked pointer operand of the store. This patch adds initial optional support for analyzing both possibilities for the pointer and allowing LAA to generate runtime checks for the bounds if required, refers to D108699, but here address the phi node. Fixes https://github.com/llvm/llvm-project/issues/64888 Reviewed By: huntergr-arm, fhahn Differential Revision: https://reviews.llvm.org/D158965	2023-09-19 09:16:47 +08:00
Allen	76c6a8bd36	[LAA] Improve the output remark for LoopVectorize (#65832 ) Don't report 'Use #pragma loop distribute(enable) to allow loop distribution' when we already add #pragma clang loop distribute(enable) Fixes https://github.com/llvm/llvm-project/issues/64637	2023-09-16 12:51:44 +08:00
David Sherwood	c02184f286	[LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop Suppose we have a nested loop like this: void foo(int32_t dst, int32_t src, int m, int n) { for (int i = 0; i < m; i++) { for (int j = 0; j < n; j++) { dst[(i * n) + j] += src[(i * n) + j]; } } } We currently generate runtime memory checks as a precondition for entering the vectorised version of the inner loop. However, if the runtime-determined trip count for the inner loop is quite small then the cost of these checks becomes quite expensive. This patch attempts to mitigate these costs by adding a new option to expand the memory ranges being checked to include the outer loop as well. This leads to runtime checks that can then be hoisted above the outer loop. For example, rather than looking for a conflict between the memory ranges: 1. &dst[(i * n)] -> &dst[(i * n) + n] 2. &src[(i * n)] -> &src[(i * n) + n] we can instead look at the expanded ranges: 1. &dst[0] -> &dst[((m - 1) * n) + n] 2. &src[0] -> &src[((m - 1) * n) + n] which are outer-loop-invariant. As with many optimisations there is a trade-off here, because there is a danger that using the expanded ranges we may never enter the vectorised inner loop, whereas with the smaller ranges we might enter at least once. I have added a HoistRuntimeChecks option that is turned off by default, but can be enabled for workloads where we know this is guaranteed to be of real benefit. In future, we can also use PGO to determine if this is worthwhile by using the inner loop trip count information. When enabling this option for SPEC2017 on neoverse-v1 with the flags "-Ofast -mcpu=native -flto" I see an overall geomean improvement of ~0.5%: SPEC2017 results (+ is an improvement, - is a regression): 520.omnetpp: +2% 525.x264: +2% 557.xz: +1.2% ... GEOMEAN: +0.5% I didn't investigate all the differences to see if they are genuine or noise, but I know the x264 improvement is real because it has some hot nested loops with low trip counts where I can see this hoisting is beneficial. Tests have been added here: Transforms/LoopVectorize/runtime-checks-hoist.ll Differential Revision: https://reviews.llvm.org/D152366	2023-08-24 12:14:02 +00:00
Kazu Hirata	6e6014a260	[Analysis] Fix an unused variable warning This patch fixes: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2001:12: error: unused variable 'MinDepDistBytesOld' [-Werror,-Wunused-variable]	2023-08-16 10:09:40 -07:00
Michael Maitland	87ddd3a191	[LAA] Rename and fix semantics of MaxSafeDepDistBytes to MinDepDistBytes `MaxSafeDepDistBytes` was not correct based on its name an semantics in instances when there was a non-unit stride loop. For example, ``` for (int k = 0; k < len; k+=3) { a[k] = a[k+4]; a[k+2] = a[k+6]; } ``` Here, the smallest dependence distance is 24 bytes, but only vectorizing 8 bytes is safe. `MaxSafeVectorWidthInBits` reported the correct number of bits that could be vectorized as 64 bits. The semantics of of `MaxSafeDepDistBytes` should be: The smallest dependence distance in bytes in the loop. This may not be the same as the maximum number of bytes that are safe to operate on simultaneously. The name of this variable should reflect those semantics and its docstring should be updated accordingly, `MinDepDistBytes`. A debug message that used `MaxSafeDepDistBytes` to signify to the user how many bytes could be accessed in parallel is updated to use `MaxSafeVectorWidthInBits` instead. That way, the same message if communicated to the user, just in different units. This patch makes sure that when `MinDepDistBytes` is modified in a way that should impact `MaxSafeVectorWidthInBits`, that we update the latter accordingly. This patch also clarifies why `MaxSafeVectorWidthInBits` does not to be updated when `MinDepDistBytes` is (i.e. in the case of a forward dependency). Differential Revision: https://reviews.llvm.org/D156158	2023-08-16 09:53:35 -07:00
Bjorn Pettersson	91157a0b26	[LegacyPM] Drop unused includes in passes no longer supporting legacy PM	2023-08-13 16:46:57 +02:00
Johannes Doerfert	fa367d159a	[IR] Mark `llvm.assume` as `memory(inaccessiblemem: write)` It was `inaccessiblemem: readwrite` before, no need for the read. No real benefit is expected but it can help debugging and other efforts. Differential Revision: https://reviews.llvm.org/D156478	2023-07-31 13:44:52 -07:00
Florian Hahn	4d02b472f1	[LAA] Add assertion to check both Start and End are invariant (NFC). Add extra assert to check invariant of RuntimePointerChecking::insert to guard against subtle changes when extending the scope of LAA.	2023-07-24 15:17:18 +02:00
Nikita Popov	9cf5254878	[llvm] Remove some uses of isOpaqueOrPointeeTypeEquals() (NFC)	2023-07-18 11:18:31 +02:00
Kazu Hirata	4eb06e57f1	[LegacyPM] Remove LoopAccessLegacyAnalysis Differential Revision: https://reviews.llvm.org/D153610	2023-06-23 00:36:39 -07:00
Florian Hahn	e48b1e87a3	[LV] Split off invariance check from isUniform (NFCI). After 572cfa3fde5433, isUniform now checks VF based uniformity instead of just invariance as before. As follow-up cleanup suggested in D148841, separate the invariance check out and update callers that currently check only for invariance. This also moves the implementation of isUniform from LoopAccessAnalysis to LoopVectorizationLegality, as LoopAccesAnalysis doesn't use the more general isUniform.	2023-06-01 19:09:11 +01:00
Florian Hahn	3b912e269a	[LV] Bail out on loop-variant steps when rewriting SCEV exprs. If the step is not loop-invariant, we cannot create a modified AddRec, as the start needs to be loop-invariant. Mark those cases as CannotAnalyze and bail out, to fix a crash.	2023-06-01 16:14:02 +01:00
Florian Hahn	572cfa3fde	[LV] Use SCEV for uniformity analysis across VF This patch uses SCEV to check if a value is uniform across a given VF. The basic idea is to construct SCEVs where the AddRecs of the loop are adjusted to reflect the version in the vectorized loop (Step multiplied by VF). We construct a SCEV for the value of the vector lane 0 (offset 0) compare it to the expressions for lanes 1 to the last vector lane (VF - 1). If they are equal, consider the expression uniform. While re-writing expressions, we also need to catch expressions we cannot determine uniformity (e.g. SCEVUnknown). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D148841	2023-05-31 16:01:00 +01:00
Kazu Hirata	638112737e	[Analysis] Remove unused function stripIntegerCast The last use was removed by: commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a Author: Philip Reames <preames@rivosinc.com> Date: Thu May 11 08:10:49 2023 -0700	2023-05-29 12:19:14 -07:00
Philip Reames	733b8b2b49	[LAA] Simplify identification of speculatable strides [nfc] Mostly just avoiding the need to keep both Value and SCEVs flowing through with consistent handling. We can do everything in terms of SCEV - aside from the profitability heuristics which are now isolated in one spot.	2023-05-11 11:48:21 -07:00
Philip Reames	7fbfcc653f	[LV/LAA] Use PSE to identify stride multiplies which simplify [mostly nfc] LV/LAA will speculate that (some) strided access patterns have unit stride, and insert runtime checks if required. LV cost models a multiply by such a stride as free. We did this by keeping around the StrideSet structure, just to check if one of the operands were one of the strides we speculated. We can instead just ask PredicatedScalarEvolution if either of the operands are one (after predicates are applied). We get mostly the same result - PSE can prove it in more cases in theory - and simpler code.	2023-05-11 11:16:04 -07:00
Philip Reames	e41dce4d49	[LAA/LV] Simplify stride speculation logic [NFC] (try 2) The original commit wasn't quite NFC, and this was caught by an arguably overly strong assert. Specifically, I'd failed to strip off the integer cast off the SCEV before saving it in the map. The result - other than a failed assert - is that we'd speculate on the casted unknown, not the unknown. The only case I can think of where that might change behavior would be a sext(i1 load). I doubt that case is interesting in practice, but it's good to be strictly NFC on this change regardless. Original commit message follows.. The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is profitable to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point. We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable. Differential Revision: https://reviews.llvm.org/D147750	2023-05-11 10:19:23 -07:00
Philip Reames	dc0d00c5fc	Revert "[LAA/LV] Simplify stride speculation logic [NFC]" This reverts commit d5b840131223f2ffef4e48ca769ad1eb7bb1869a. Running this through broader testing after rebasing is revealing a crash. Reverting while I investigate.	2023-05-11 09:26:35 -07:00
Philip Reames	d5b8401312	[LAA/LV] Simplify stride speculation logic [NFC] The existing code makes it hard to tell that collectStridedAccess is really about identifying some loop invariant SCEV which is profitable to speculate is equal to one. The odd dual usage structure of Value and SCEV confuses this point. We could choose to loosen the profitability analysis if desired. I'm not proposing doing so at this time as it exposes too many cases where the speculation is unprofitable. Differential Revision: https://reviews.llvm.org/D147750	2023-05-11 08:32:56 -07:00
Philip Reames	30cdb2ac7e	[LAA] Add command line flag to disable unit stride speculation This is purely so that we can expose and work through downstream codegen issues. My intention is to see if we can get this disabled by default, but that requires fixing a bunch of downstream issues first.	2023-05-01 10:49:51 -07:00

1 2 3 4 5 ...

491 Commits