llvm-project

Author	SHA1	Message	Date
Min-Yih Hsu	f6315a9572	[AArch64][LoopIdiom] Disable LoopIdiomTransform when NoImplicitFloat is present (#87677 ) This behavior is aligned with both LoopVectorizer and SLPVectorizer.	2024-04-08 09:10:23 -07:00
David Sherwood	fca6992be1	[AArch64] Fix a minor issue with AArch64LoopIdiomTransform (#78136 ) I found another case where in the end block we could have a PHI that we deal with incorrectly. The two incoming values are unique - one of them is the induction variable and another one is a value defined outside the loop, e.g. %final_val = phi i32 [ %inc, %while.body ], [ %d, %while.cond ] We won't correctly select between the two values in the new end block that we create and so we will get the wrong result.	2024-01-17 14:30:06 +00:00
David Sherwood	ccaf9e0bc0	[AArch64] Enable AArch64 loop idiom transform pass (#77480 ) Following on from https://github.com/llvm/llvm-project/pull/72273 which added the new AArch64 loop idiom transformation pass, this patch enables the pass by default for AArch64.	2024-01-10 10:03:14 +00:00
David Sherwood	c7148467fc	[AArch64] Add an AArch64 pass for loop idiom transformations (#72273 ) We have added a new pass that looks for loops such as the following: ``` while (i != max_len) if (a[i] != b[i]) break; ... use index i ... ``` Although similar to a memcmp, this is slightly different because instead of returning the difference between the values of the first non-matching pair of bytes, it returns the index of the first mismatch. As such, we are not able to lower this to a memcmp call. The new pass can now spot such idioms and transform them into a specialised predicated loop that gives a significant performance improvement for AArch64. It is intended as a stop-gap solution until this can be handled by the vectoriser, which doesn't currently deal with early exits. This specialised loop makes use of a generic intrinsic that counts the trailing zero elements in a predicate vector. This was added in https://reviews.llvm.org/D159283 and for SVE we end up with brkb & incp instructions. Although we have added this pass only for AArch64, it was written in a generic way so that in theory it could be used by other targets. Currently the pass requires scalable vector support and needs to know the minimum page size for the target, however it's possible to make it work for fixed-width vectors too. Also, the llvm.experimental.cttz.elts intrinsic used by the pass has generic lowering, but can be made efficient for targets with instructions similar to SVE's brkb, cntp and incp. Original version of patch was posted on Phabricator: https://reviews.llvm.org/D158291 Patch co-authored by Kerry McLaughlin (@kmclaughlin-arm) and David Sherwood (@david-arm) See the original discussion on Discourse: https://discourse.llvm.org/t/aarch64-target-specific-loop-idiom-recognition/72383	2024-01-09 11:29:28 +00:00

4 Commits