llvm-project

Author	SHA1	Message	Date
Ramkumar Ramachandra	3a4376b8f9	LAA: handle 0 return from getPtrStride correctly (#124539 ) getPtrStride returns 0 when the PtrScev is loop-invariant, and this is not an erroneous value: it returns std::nullopt to communicate that it was not able to find a valid pointer stride. In analyzeLoop, we call getPtrStride with a value_or(0) conflating the zero return value with std::nullopt. Fix this, handling loop-invariant loads correctly.	2025-01-27 14:21:14 +00:00
David Sherwood	b7286dbef9	Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752 " (#123616 ) The last attempt failed a sanitiser build because we were creating a reference to a null Predicates pointer in isDereferenceableAndAlignedInLoop. This was exposed by the unit test IsDerefReadOnlyLoop in unittests/Analysis/LoadsTest.cpp. I fixed this by falling back on getConstantMaxBackedgeTakenCount if Predicates is null - see line 316 in llvm/lib/Analysis/Loads.cpp. There are no other changes.	2025-01-27 11:59:38 +00:00
David Sherwood	a00938eedd	Revert "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop (#96752 )" (#123057 ) This reverts commit bfedf6460c2cad6e6f966b457d8d27084579dcd8.	2025-01-15 13:56:42 +00:00
David Sherwood	bfedf6460c	[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop (#96752 ) Currently when we encounter a negative step in the induction variable isDereferenceableAndAlignedInLoop bails out because the element size is signed greater than the step. This patch adds support for negative steps in cases where we detect the start address for the load is of the form base + offset. In this case the address decrements in each iteration so we need to calculate the access size differently. I have done this by caling getStartAndEndForAccess from LoopAccessAnalysis.cpp. The motivation for this patch comes from PR #88385 where a reviewer requested reusing isDereferenceableAndAlignedInLoop, but that PR itself does support reverse loops. The changed test in LoopVectorize/X86/load-deref-pred.ll now passes because previously we were calculating the total access size incorrectly, whereas now it is 412 bytes and fits perfectly into the alloca.	2025-01-15 12:47:43 +00:00
Ramkumar Ramachandra	8b4561467e	LAA: add missed swap when inverting src, sink (#122254 ) When inverting source and sink on a negative induction step, the types of the source and sink should also be swapped. This fixes a bug in the code that follows, that computes properties based on these types. With 234cc40 ([LAA] Limit no-overlap check to at least one loop-invariant accesses.), that code is guarded by a loop-invariant condition: however, the commit did not add any new tests exercising the guarded code, and hence the bugfix in this patch requires additional tests to exercise that guarded codepath.	2025-01-13 13:07:19 +00:00
Ramkumar Ramachandra	17912f336b	LAA: refactor dependence class to prep for scaled strides (NFC) (#122113 ) Rearrange the DepDistanceAndSizeInfo struct in preparation to scale strides. getDependenceDistanceStrideAndSize now returns the data of CommonStride, MaxStride, and clarifies when to retry with runtime checks, in place of (unscaled) strides.	2025-01-09 16:05:17 +00:00
Nikita Popov	bc0976ed1f	[LAA] Strip non-inbounds offset in getPointerDiff() (NFC) (#118665 ) I believe that this code doesn't care whether the offsets are known to be inbounds a priori. For the same reason the change is not testable, as the SCEV based fallback code will look through non-inbounds offsets anyway. So make it clear that there is no special inbounds requirement here.	2024-12-10 13:05:34 +01:00
Ramkumar Ramachandra	aa5cdcea39	LAA: improve code in a couple of routines (NFC) (#108092 )	2024-11-28 16:15:45 +00:00
Florian Hahn	a353e258ba	[LAA] Don't require Stride == 1/-1 for inbounds pointer AddRecs nowrap. (#113126 ) If we have a pointer AddRec, the maximum increment is 2^(pointer-index-wdith - 1) - 1. This means that if incrementing the AddRec wraps, the distance between the previously accessed location and the wrapped location is > 2^(pointer-index-wdith - 1), i.e. if the GEP for the AddRec is inbounds, this would be poison due to the object being larger than half the pointer index type space. The poison would be immediate UB when the memory access gets executed.. Similar reasoning can be applied for decrements. PR: https://github.com/llvm/llvm-project/pull/113126	2024-11-05 22:45:56 +01:00
Ramkumar Ramachandra	d897ea37db	LAA: check nusw on GEP in place of inbounds (#112223 ) With the introduction of the nusw flag in GEPNoWrapFlags, it should be safe to weaken the check in LoopAccessAnalysis to just check the nusw flag on the GEP, instead of inbounds.	2024-10-22 09:58:54 +01:00
Ramkumar Ramachandra	f719cfa868	LAA: be less conservative in isNoWrap (#112553 ) isNoWrap has exactly one caller which handles Assume = true separately, but too conservatively. Instead, pass Assume to isNoWrap, so it is threaded into getPtrStride, which has the correct handling for the Assume flag. Also note that the Stride == 1 check in isNoWrap is incorrect: getPtrStride returns Strides == 1 or -1, except when isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we can include the case of -1 Stride, and when isNoWrapAddRec is true. With this change, passing Assume = true to getPtrStride could return a non-unit stride, and we correctly handle that case as well.	2024-10-22 09:55:51 +01:00
Kazu Hirata	0614b3cfac	[Analysis] Simplify code with DenseMap::operator[] (NFC) (#111331 )	2024-10-07 07:00:45 -07:00
Florian Hahn	dec4cfdb09	[LAA] Use loop guards when checking invariant accesses. Apply loop guards to start and end pointers like done in other places to improve results.	2024-10-04 12:23:13 +01:00
Benjamin Maxwell	50a1ab12ab	[LAA] Don't assume libcalls with output/input pointers can be vectorized (#108980 ) LoopAccessAnalysis currently does not check/track aliasing from the output pointers, but assumes vectorizing library calls with a mapping is safe. This can result in incorrect codegen if something like the following is vectorized: ``` for(int i=0; i<N; i++) { // No aliasing between input and output pointers detected. sincos(cos_out[0], sin_out+i, cos_out+i); } ``` Where for VF >= 2 `cos_out[1]` to `cos_out[VF-1]` is the cosine of the original value of `cos_out[0]` not the updated value.	2024-09-23 16:05:55 +01:00
Florian Hahn	d43a80936d	Revert "[LAA] Remove loop-invariant check added in 234cc40adc61." This reverts commit a80053322b765eec93951e21db490c55521da2d8. The new asserts exposed an underlying issue where the expanded bounds could wrap, causing the parts of the code to incorrectly determine that accesses do not overlap. Reproducer below based on @mstorsjo's test case. opt -passes='print<access-info>' target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" define i32 @j(ptr %P, i32 %x, i32 %y) { entry: %gep.P.4 = getelementptr inbounds nuw i8, ptr %P, i32 4 %gep.P.8 = getelementptr inbounds nuw i8, ptr %P, i32 8 br label %loop loop: %1 = phi i32 [ %x, %entry ], [ %sel, %loop.latch ] %iv = phi i32 [ %y, %entry ], [ %iv.next, %loop.latch ] %gep.iv = getelementptr inbounds i64, ptr %gep.P.8, i32 %iv %l = load i32, ptr %gep.iv, align 4 %c.1 = icmp eq i32 %l, 3 br i1 %c.1, label %loop.latch, label %if.then if.then: ; preds = %for.body store i64 0, ptr %gep.iv, align 4 %l.2 = load i32, ptr %gep.P.4 br label %loop.latch loop.latch: %sel = phi i32 [ %l.2, %if.then ], [ %1, %loop ] %iv.next = add nsw i32 %iv, 1 %c.2 = icmp slt i32 %iv.next, %sel br i1 %c.2, label %loop, label %exit exit: %res = phi i32 [ %iv.next, %loop.latch ] ret i32 %res }	2024-08-27 11:55:47 +01:00
Florian Hahn	a80053322b	[LAA] Remove loop-invariant check added in 234cc40adc61. 234cc40adc61 introduced a loop-invariance check to limit the compile-time impact of the newly added checks. This patch removes the restriction and avoids extra compile-time impact by sinking the check to exits where we would return an unknown dependence. This notably reduces the amount the extra checks are executed while not missing out on any improvements from them. https://llvm-compile-time-tracker.com/compare.php?from=33e7cd6ff23f6c904314d17c68dc58168fd32d09&to=7c55e66d4f31ce8262b90c119a8e84e1f9515ff1&stat=instructions:u	2024-08-26 10:24:00 +01:00
Florian Hahn	d7c84d7b71	[LAA] Collect loop guards only once in MemoryDepChecker (NFCI). This on its own gives small compile-time improvements in some configs and enables using loop guards at more places in the future while keeping compile-time impact low. https://llvm-compile-time-tracker.com/compare.php?from=c44202574ff9a8c0632aba30c2765b134557435f&to=55ffc3dd920fa9af439fd39f8f9cc13509531420&stat=instructions:u	2024-08-21 08:28:52 +01:00
Nikita Popov	6a84af704f	[LAA] Use computeConstantDifference() (#103725 ) Use computeConstantDifference() instead of casting getMinusSCEV() to SCEVConstant. This can be much faster in some cases, because computeConstantDifference() computes the result without creating new SCEV expressions. This improves LTO/ThinLTO compile-time for lencod by more than 10%. I've verified that computeConstantDifference() does not produce worse results than the previous code for anything in llvm-test-suite. This required raising the iteration cutoff to 6. I ended up increasing it to 8 just to be on the safe side (for code outside llvm-test-suite), and because this doesn't materially affect compile-time anyway (we'll almost always bail out earlier).	2024-08-16 12:52:57 +02:00
Florian Hahn	edf46f365c	[SCEV] Use const SCEV * explicitly in more places. Use const SCEV * explicitly in more places to prepare for https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.	2024-08-03 20:10:01 +01:00
Florian Hahn	844c188c79	[LAA] Refine stride checks for SCEVs during dependence analysis. (#99577 ) Update getDependenceDistanceStrideAndSize to reason about different combinations of strides directly and explicitly. Update getPtrStride to return 0 for invariant pointers. Then proceed by checking the strides. If either source or sink are not strided by a constant (i.e. not a non-wrapping AddRec) or invariant, the accesses may overlap with earlier or later iterations and we cannot generate runtime checks to disambiguate them. Otherwise they are either loop invariant or strided. In that case, we can generate a runtime check to disambiguate them. If both are strided by constants, we proceed as previously. This is an alternative to https://github.com/llvm/llvm-project/pull/99239 and also replaces additional checks if the underlying object is loop-invariant. Fixes https://github.com/llvm/llvm-project/issues/87189. PR: https://github.com/llvm/llvm-project/pull/99577	2024-07-26 13:10:16 +01:00
Ramkumar Ramachandra	3eaf9f7237	LAA: fix style after cursory reading (NFC) (#100447 )	2024-07-25 18:08:12 +01:00
Ramkumar Ramachandra	2754c083cb	LAA: mark LoopInfo pointer const (NFC) (#100373 )	2024-07-24 16:52:11 +01:00
Florian Hahn	19c9a1c2fd	[LAA] Include IndirectUnsafe in ::isPossiblyBackward. Similarly to Unknown, IndirectUnsafe should also be considered possibly backward, as it may be a backwards dependency e.g. via loading different base pointers. This also brings isPossiblyBackward in line with Dependence::isSafeForVectorization. At the moment this can't be tested, as it is not possible to write a test with an AddRec that is based on a loop varying value. But this may change in the future and may cause mis-compiles in the future.	2024-07-18 22:09:05 +01:00
Florian Hahn	3ccda93671	[LAA] Update pointer-bounds cache to also consider access type. The same pointer may be accessed with different types and the bound includes the size of the accessed type to compute the end. Update the cache to correctly disambiguate between different accessed types.	2024-07-14 17:24:12 +01:00
Graham Hunter	22a7f6dcc4	Revert "[LV] Autovectorization for the all-in-one histogram intrinsic" (#98493 ) Reverts llvm/llvm-project#91458 to deal with post-commit reviewer requests.	2024-07-11 16:39:30 +01:00
Graham Hunter	1860fd049e	[LV] Autovectorization for the all-in-one histogram intrinsic (#91458 ) This patch implements limited loop vectorization support for the 'all-in-one' histogram intrinsic. The feature is disabled by default, and when enabled will only vectorize if there are no other users of values in the gather-modify-scatter sequence.	2024-07-11 15:33:30 +01:00
Florian Hahn	5028dea652	[LAA] Only invalidate loops that require runtime checks (NFCI). LAA doesn't keep references to IR outside the loop or references to SCEVs that may be invalidated, unless runtime checks are needed (either memory or SCEV predicates). For the current LAA users, it should be sufficient to invalidate entries for loops that require runtime checks, thus avoiding analyzing loops again unnecessarily. This helps reduce compile-time, in particular when removing the restrictions added in 234cc40adc6. https://llvm-compile-time-tracker.com/compare.php?from=73894dba2cdbcc00678d0c13a6b61765675f60b4&to=05c6bdc41b5f63696ebeb7116325725fa94f66d6&stat=instructions:u	2024-07-06 22:14:01 +01:00
Florian Hahn	28be3f8ac5	[LAA] Cache pointer bounds expansions (NFCI). This avoids expanding the same bounds multiple times, which helps reduce the compile-time impact of removing the restrictions added in 234cc40adc6, notably -0.06% on stage1-O3 and -0.05% on both stage1-ReleaseThinLTO and stage1-ReleaseLTO-g. https://llvm-compile-time-tracker.com/compare.php?from=8b9ebc4bb86cf0979e05908cbb04336f2d01dda5&to=fabd36f96c31e47ea72653f5a404feaadfc7b5b5&stat=instructions:u	2024-07-04 10:00:05 +01:00
Nikita Popov	2d209d964a	[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902 ) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.	2024-06-27 16:38:15 +02:00
Kazu Hirata	1462605ab0	[Analysis] Use range-based for loops (NFC) (#96587 )	2024-06-25 06:57:30 -07:00
Ramkumar Ramachandra	0f111ba790	LoopInfo: introduce Loop::getLocStr; unify debug output (#93051 ) Introduce a Loop::getLocStr stolen from LoopVectorize's static function getDebugLocString in order to have uniform debug output headers across LoopVectorize, LoopAccessAnalysis, and LoopDistribute. The motivation for this change is to have UpdateTestChecks recognize the headers and automatically generate CHECK lines for debug output, with minimal special-casing.	2024-06-25 13:12:15 +01:00
Ramkumar Ramachandra	5ae50698a0	LAA: strip unnecessary getUniqueCastUse (#92119 ) 733b8b2 ([LAA] Simplify identification of speculatable strides [nfc]) refactored getStrideFromPointer() to compute directly on SCEVs, and return an SCEV expression instead of a Value. However, it left behind a call to getUniqueCastUse(), which is completely unnecessary. Remove this, showing a positive test update, and simplify the surrounding program logic.	2024-06-24 22:49:02 +01:00
Ramkumar Ramachandra	18a8983c36	LAA: refactor analyzeLoop to return bool (NFC) (#93824 ) Avoid wastefully setting CanVecMem in several places in analyzeLoop, complicating the logic, to get the function to return a bool, and set CanVecMem in the caller.	2024-06-11 19:39:02 +01:00
Florian Hahn	e949b54a5b	[LAA] Use PSE::getSymbolicMaxBackedgeTakenCount. (#93499 ) Update LAA to use PSE::getSymbolicMaxBackedgeTakenCount which returns the minimum of the countable exits. When analyzing dependences and computing runtime checks, we need the smallest upper bound on the number of iterations. In terms of memory safety, it shouldn't matter if any uncomputable exits leave the loop, as long as we prove that there are no dependences given the minimum of the countable exits. The same should apply also for generating runtime checks. Note that this shifts the responsiblity of checking whether all exit counts are computable or handling early-exits to the users of LAA. Depends on https://github.com/llvm/llvm-project/pull/93498 PR: https://github.com/llvm/llvm-project/pull/93499	2024-06-04 22:23:30 +01:00
Florian Hahn	1880a7bf18	[LAA] Move getDependenceDistanceStrideAndSize to MemoryDepChecker (NFC). This avoids unnecessarily passing a number of parameters, and avoids needing to add extra parameters in the future.	2024-05-29 14:18:05 -07:00
Florian Hahn	b74f50a269	[LAA] Store reference to SymbolicStrides in MemoryDepChecker (NFC). This reduces the need for explicitly passing it through multiple layers of function calls.	2024-05-29 13:37:13 -07:00
Florian Hahn	234cc40adc	[LAA] Limit no-overlap check to at least one loop-invariant accesses. Limit the logic added in https://github.com/llvm/llvm-project/pull/9230 to cases where either sink or source are loop-invariant, to avoid compile-time increases. This is not needed for correctness. I am working on follow-up changes to reduce the compile-time impact in general to allow us to enable this again for any source/sink. This should fix the compile-time regression introduced by this change: * compile-time improvement with this change: https://llvm-compile-time-tracker.com/compare.php?from=4351787fb650da6d1bfb8d6e58753c90dcd4c418&to=b89010a2eb5f98494787c1c3b77f25208c59090c&stat=instructions:u * compile-time improvement with original patch reverted on top of this change: https://llvm-compile-time-tracker.com/compare.php?from=b89010a2eb5f98494787c1c3b77f25208c59090c&to=19a1103fe68115cfd7d6472c6961f4fabe81a593&stat=instructions:u	2024-05-28 09:23:02 -07:00
Ramkumar Ramachandra	b6468766f7	[LAA] refactor program logic (NFC) (#92101 ) Implement NFC improvements spotted during a cursory reading of LoopAccessAnalysis.	2024-05-23 12:10:21 +01:00
Florian Hahn	1b377dbeb7	[LAA] Check accesses don't overlap early to determine NoDep (#92307 ) Use getStartAndEndForAccess to compute the start and end of both src and sink (factored out to helper in bce3680f45b57f). If they do not overlap (i.e. SrcEnd <= SinkStart \|\| SinkEnd <= SrcStart), there is no dependence, regardless of stride. PR: https://github.com/llvm/llvm-project/pull/92307	2024-05-21 11:00:11 +01:00
Florian Hahn	bce3680f45	[LAA] Move logic to compute start and end of a pointer to helper (NFC). This allows use at other places, in particular an updated version of https://github.com/llvm/llvm-project/pull/92307.	2024-05-20 19:35:19 +01:00
Ramkumar Ramachandra	b6fa78d54c	[LAA] refactor sortPtrAccesses (NFC) (#92256 ) Use the destructuring syntax in C++ and llvm::enumerate to make sortPtrAccesses a little more readable.	2024-05-17 10:41:41 +01:00
Florian Hahn	179efe5abc	[LAA] Delay applying loop guards until after isSafeDependenceDistance. Applying the loop guards to the distance may prevent isSafeDependenceDistance from determining NoDep, unless loop guards are also applied to the backedge-taken-count. Instead of applying the guards to both Dist and the backedge-taken-count, just apply them after handling isSafeDependenceDistance and constant distances; there is no benefit to applying the guards before then. This fixes a regression flagged by @bjope due to ecae3ed958481cba7d60868cf3504292f7f4fdf5.	2024-05-14 19:47:24 +01:00
Ramkumar Ramachandra	08536b0f9c	[LAA] refactor tryToCreateDiffCheck (NFC) (#92110 ) tryToCreateDiffCheck has one caller, and exits early if CanUseDiffCheck is false. Hence, we can get/set CanUseDiffCheck in the caller to avoid wastefully calling tryToCreateDiffCheck. This patch is an NFC simplification of program logic.	2024-05-14 16:19:55 +01:00
Florian Hahn	28767afd53	[LAA] Support backward dependences with non-constant distance. (#91525 ) Following up to 933f49248, also update the code reasoning about backwards dependences to support non-constant distances. Update the code to use the signed minimum distance instead of a constant distance This means e checked the lower bound of the dependence distance and the distance may be larger at runtime (and safe for vectorization). Whether to classify it as Unknown or Backwards depends on the vector width and LAA was updated to take TTI to get the maximum vector register width. If the minimum dependence distance is larger than the max vector width, we consider it as backwards-vectorizable. Otherwise we classify them as Unknown, so we re-try with runtime checks. PR: https://github.com/llvm/llvm-project/pull/91525	2024-05-10 11:47:13 +01:00
Florian Hahn	ecae3ed958	[LAA] Apply loop guards to dependence distance. After supporting non-constant dependence distances in 933f49248bf, applying information from loop guards can help further disambiguate dependencies.	2024-05-09 18:12:55 +01:00
Florian Hahn	3219c0edb2	[LAA] Directly pass DepChecker to getSource/getDestination (NFC). Instead of passing LoopAccessInfo only to fetch the MemoryDepChecker, directly pass MemoryDepChecker. This simplifies the code and also allows new uses in places where no LAI is available.	2024-05-05 21:16:20 +01:00
Florian Hahn	b54a78d69b	[LV,LAA] Don't vectorize loops with load and store to invar address. Code checking stores to invariant addresses and reductions made an incorrect assumption that the case of both a load & store to the same invariant address does not need to be handled. In some cases when vectorizing with runtime checks, there may be dependences with a load and store to the same address, storing a reduction value. Update LAA to separately track if there was a store-store and a load-store dependence with an invariant addresses. Bail out early if there as a load-store dependence with invariant address. If there was a store-store one, still apply the logic checking if they all store a reduction.	2024-05-04 20:53:54 +01:00
Florian Hahn	82219e547b	[LAA] Pass maximum stride to isSafeDependenceDistance. (#90036 ) As discussed in https://github.com/llvm/llvm-project/pull/88039, support different strides with isSafeDependenceDistance by passing the maximum of both strides. isSafeDependenceDistance tries to prove that \|Dist\| > BackedgeTakenCount * Step holds. Chosing the maximum stride computes the maximum range accesed by the loop for all strides. PR: https://github.com/llvm/llvm-project/pull/90036	2024-04-30 12:59:08 +01:00
Florian Hahn	933f49248b	[LAA] Support different strides & non constant dep distances using SCEV. (#88039 ) Extend LoopAccessAnalysis to support different strides and as a consequence non-constant distances between dependences using SCEV to reason about the direction of the dependence. In multiple places, logic to rule out dependences using the stride has been updated to only be used if StrideA == StrideB, i.e. there's a common stride. We now also may bail out at multiple places where we may have to set FoundNonConstantDistanceDependence. This is done when we need to bail out and the distance is not constant to preserve original behavior. Fixes https://github.com/llvm/llvm-project/issues/87336 PR: https://github.com/llvm/llvm-project/pull/88039	2024-04-25 21:38:07 +01:00
Florian Hahn	fe28a0e482	[LAA] Document reasoning in multiple places in isDependent (NFC). (#89381 ) As suggested in https://github.com/llvm/llvm-project/pull/88039, add extra documentation for reasoning in isDependent. PR: https://github.com/llvm/llvm-project/pull/89381	2024-04-22 14:10:05 +01:00

1 2 3 4 5 ...

428 Commits