llvm-project

Author	SHA1	Message	Date
Vigneshwar Jayakumar	616af49f06	[AggressiveInstCombine] Allow load folding for root inst with multiple uses. (#176101 ) The load folding optimization was very conservative by requiring the root OR instruction to have a single use. This prevented optimization when to fold loads when only the root had multiple uses. For example: %val = or i32 ... ; Assembles 4 bytes to i32 %use1 = call @foo(%val) %use2 = call @bar(%val)	2026-01-16 13:19:50 -06:00
Nikita Popov	2fe39f24b2	[AggressiveInstCombine] Avoid implicit truncation Cast char to unsigned char to match the unsigned ConstantInt constructor.	2026-01-14 10:44:47 +01:00
Yingwei Zheng	c7c6c0a45c	[AggressiveInstCombine] Fix memory location for alias analysis (#169953 ) When LOps.RootInsert comes after LI2, since we use LI2 as the new insert point, we should make sure the memory region accessed by LOps isn't modified. However, the original implementation passes the bit width `LOps.LoadSize` as the number of bytes to be accessed, causing BasicAA to return NoAlias: `a941e15074/llvm/lib/Analysis/BasicAliasAnalysis.cpp (L1658-L1667)` With `-aa-trace`, we get: ``` End ptr getelementptr inbounds nuw (i8, ptr @g, i64 4) @ LocationSize::precise(1), %gep1 = getelementptr i8, ptr %p, i64 4 @ LocationSize::precise(32) = NoAlias ``` This patch uses `getTypeStoreSize` to compute the correct access size for LOps. Instead of modifying the MemoryLocation for End (i.e., `LOps.RootInsert`), it also uses the computed base and AATag for correctness. Closes https://github.com/llvm/llvm-project/issues/169921.	2025-12-01 22:46:16 +08:00
David Green	6abbbca324	[AggressiveInstCombine] Match long high-half multiply (#168396 ) This patch adds recognition of high-half multiply by parts into a single larger multiply. Considering a multiply made up of high and low parts, we can split the multiply into: x * y == (xhT + xl) (yhT + yl) where `xh == x>>32` and `xl == x & 0xffffffff`. `T = 2^32`. This expands to xhyhTT + xhylT + xlyhT + xlyl which I find it helpful to be drawn as [ xhyh ] [ xhyl ] [ xlyh ] [ xlyl ] We are looking for the "high" half, which is xhyh + xhyl>>32 + xlyh>>32 + carrys. The carry makes this difficult and there are multiple ways of representing it. The ones we attempt to support here are: Carry: xhyh + carry + lowsum carry = lowsum < xhyl ? 0x1000000 : 0 lowsum = xhyl + xlyh + (xlyl>>32) Ladder: xhyh + c2>>32 + c3>>32 c2 = xhyl + (xlyl >> 32); c3 = c2&0xffffffff + xlyh Carry4: xhyh + carry + crosssum>>32 + (xlyl + crosssum&0xffffffff) >> 32 crosssum = xhyl + xlyh carry = crosssum < xhyl ? 0x1000000 : 0 Ladder4: xhyh + (xlyh)>>32 + (xhyl)>>32 + low>>32; low = (xlyl)>>32 + (xlyh)&0xffffffff + (xhyl)&0xfffffff They all start by matching `xhyh` + 2 or 3 other operands. The bottom of the tree is `xhyh`, `xhyl`, `xlyh` and `xl*yl`. Based on #156879 by @c-rhodes	2025-11-27 07:22:41 +00:00
Mircea Trofin	52cb6e9d49	[ProfCheck][NFC] Make Function argument from branch weight setter optional (#166032 ) This picks up from #166028, making the `Function` argument optional: most cases don't need to provide it, but in e.g. InstCombine's case, where the instruction (select, branch) is not attached to a function yet, the function needs to be passed explicitly. Co-authored-by: Florian Hahn <flo@fhahn.com>	2025-11-05 07:40:37 -08:00
Orlando Cazalet-Hyams	411be14eab	[AgressiveInstCombine] Merge debug info on merged stores (#164449 ) A bit of debug info maintenaince for #147540.	2025-10-22 14:36:21 +01:00
Jin Huang	bf34b2e2df	[profcheck] Add heuristical profile metadata for lowering table-based cttz. (#161898 ) When lowering a `table-based cttz` calculation to the `llvm.cttz` intrinsic, `AggressiveInstCombine` was not attaching profile metadata to the newly generated `select` instruction. This PR adds heuristic branch weights to the `select`. It uses a strong 100-to-1 probability favoring the `cttz` path over the zero-input case. This allows later passes to optimize code layout and branch prediction.	2025-10-08 19:50:28 -07:00
Jin Huang	39f292ffa1	[profcheck] Add unknown branch weight for inlined memchr calls. (#160964 ) The memchr inliner creates new switch branches but was failling to add profile metada. This patch fixes the issue by explicitly adding unknown branch weights to these branches. Issue [#147390](https://github.com/llvm/llvm-project/issues/147390)	2025-09-30 00:24:50 +00:00
Jin Huang	c4a134f591	[profcheck] Add unknown branch weights for inlined strcmp/strncmp (#160455 ) The strcmp/strncmp inliner creates new conditional branches but was failing to add profile metadata. This caused the ProfileVerifierPass to fail when profcheck is enabled. This patch fixes the issue by explicitly adding unknown branch weights to these branches. Issue #147390	2025-09-25 20:11:30 -07:00
Yingwei Zheng	c58e22e0fc	[AggressiveInstCombine] Refactor `foldLoadsRecursive` to use `m_ShlOrSelf` (#155176 ) This patch was a part of https://github.com/llvm/llvm-project/pull/154375. Two functional changes: 1. Allow matching other commuted patterns. 2. Allow combining loads even if there are multiple uses on a load. It is beneficial in practice.	2025-08-25 20:11:10 +08:00
Yingwei Zheng	84b31581f8	Revert "[PatternMatch] Add `m_[Shift]OrSelf` matchers." (#152953 ) Reverts llvm/llvm-project#152924 According to `f67668b586`, it is not an NFC change.	2025-08-11 09:35:16 +02:00
Yingwei Zheng	1c499351d6	[PatternMatch] Add `m_[Shift]OrSelf` matchers. (#152924 ) Address the comment https://github.com/llvm/llvm-project/pull/147414/files#r2228612726. As they are usually used to match integer packing patterns, it is enough to handle constant shamts.	2025-08-11 09:58:16 +08:00
David Green	8f968fe3ec	[AggressiveInstCombine] Make cttz fold more resiliant to non-array geps (#150896 ) Similar to #150639 this fixes the AggressiveInstCombine fold for convert tables to cttz instructions if the gep types are not array types. i.e `gep i16 @glob, i64 %idx` instead of `gep [64 x i16] @glob, i64 0, i64 %idx`.	2025-07-31 16:53:55 +01:00
Nikita Popov	b17f4d3366	[AggressiveInstCombine] Use AA during store merge (#149992 ) This is a small extension of #147540, resolving one of the FIXMEs. Instead of bailing out on any instruction that may read/write memory, use AA to check whether it can alias the stored parts. Do this using a crude check based on the underlying object only. This pattern occurs rarely in practice, but at the same time it also doesn't seem to add any compile-time cost, so it's probably worth handling.	2025-07-23 10:09:58 +02:00
Nikita Popov	07527596f3	[AggressiveInstCombine] Support store merge with non-consecutive parts (#149807 ) This is a minor extension of #147540, resolving one of the FIXMEs. If the collected parts contain some non-consecutive elements, we can still handle smaller ranges that are consecutive. This is not common in practice and mostly shows up when the same value is stored at two different offsets.	2025-07-22 10:15:04 +02:00
Nikita Popov	00d3b39f17	[AggressiveInstCombine] Implement store merge optimization (#147540 ) Merge multiple small stores that were originally extracted from one value into a single store. This is the store equivalent of the load merge optimization that AggressiveInstCombine already performs. This implementation is something of an MVP, with various generalizations possible. Fixes https://github.com/llvm/llvm-project/issues/147456.	2025-07-21 10:03:26 +02:00
Jeremy Morse	9eb0020555	[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389 ) Seeing how we can't generate any debug intrinsics any more: delete a variety of codepaths where they're handled. For the most part these are plain deletions, in others I've tweaked comments to remain coherent, or added a type to (what was) type-generic-lambdas. This isn't all the DbgInfoIntrinsic call sites but it's most of the simple scenarios. Co-authored-by: Nikita Popov <github@npopov.com>	2025-06-17 15:55:14 +01:00
Craig Topper	96336b2330	[AggressiveInstCombine] Improve popcount matching if the input has known zero bits (#142501 ) If the input has known zero bits, InstCombine may have simplied one of the expected And masks. Teach AggressiveInstCombine to use MaskedValueIsZero to recover these missing bits. Fixes #142042.	2025-06-03 12:56:53 -07:00
Ramkumar Ramachandra	b40e4ceaa6	[ValueTracking] Make Depth last default arg (NFC) (#142384 ) Having a finite Depth (or recursion limit) for computeKnownBits is very limiting, but is currently a load-bearing necessity, as all KnownBits are recomputed on each call and there is no caching. As a prerequisite for an effort to remove the recursion limit altogether, either using a clever caching technique, or writing a easily-invalidable KnownBits analysis, make the Depth argument in APIs in ValueTracking uniformly the last argument with a default value. This would aid in removing the argument when the time comes, as many callers that currently pass 0 explicitly are now updated to omit the argument altogether.	2025-06-03 17:12:24 +01:00
Ramkumar Ramachandra	111ac69876	[AggressiveInstCombine] Check GEP nusw, not inbounds (#139708 )	2025-05-22 17:43:29 +01:00
David Green	671cef029f	[AggressiveInstcombine] Fold away shift in or reduction chain. (#137875 ) If we have `icmp eq or(a, shl(b)), 0` then the shift can be removed so long as it is nuw or nsw. It is still comparing that some bits are non-zero. https://alive2.llvm.org/ce/z/nhrBVX. This is also true of ne, and true for longer or chains.	2025-05-13 10:33:38 +01:00
Stephen Tozer	19ee7ffdac	[AggrInstCombine][DebugInfo] Propagate DILocation for inlined memchr (#134808 ) When AggressiveInstCombine replaces a memchr with a switch instruction, it currently drops the DILocation for that memchr. This patch changes this, propagating the memchr DILocation to all the generated instructions that replace it. Found using https://github.com/llvm/llvm-project/pull/107279.	2025-04-08 18:54:35 +01:00
Zhenyang Xu	a98707e285	[AggressiveInstCombine] Merge consecutive loads of mixed sizes (#129263 ) Proof: https://alive2.llvm.org/ce/z/r7M-Sf Closes: #128134	2025-03-05 18:56:00 +08:00
Yingwei Zheng	a77346bad0	[IRBuilder] Refactor FMF interface (#121657 ) Up to now, the only way to set specified FMF flags in IRBuilder is to use `FastMathFlagGuard`. It makes the code ugly and hard to maintain. This patch introduces a helper class `FMFSource` to replace the original parameter `Instruction *FMFSource` in IRBuilder. To maximize the compatibility, it accepts an instruction or a specified FMF. This patch also removes the use of `FastMathFlagGuard` in some simple cases. Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=f87a9db8322643ccbc324e317a75b55903129b55&to=9397e712f6010be15ccf62f12740e9b4a67de2f4&stat=instructions%3Au	2025-01-06 14:37:04 +08:00
Alex MacLean	56e944bede	[NFC] add anonymous namespace to a couple classes (#121511 ) This ensures these classes are visible only to the appropriate translation unit and allows for more optimizations.	2025-01-02 20:13:18 -08:00
Antonio Frighetto	f68b0e3699	[AggressiveInstCombine] Use APInt and avoid truncation when folding loads A miscompilation issue has been addressed with improved handling. Fixes: https://github.com/llvm/llvm-project/issues/118467.	2024-12-04 10:20:14 +01:00
Jay Foad	85c17e4092	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706 ) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.	2024-10-17 16:20:43 +01:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Jeremy Morse	96f37ae453	[NFC] Use initial-stack-allocations for more data structures (#110544 ) This replaces some of the most frequent offenders of using a DenseMap that cause a malloc, where the typical element-count is small enough to fit in an initial stack allocation. Most of these are fairly obvious, one to highlight is the collectOffset method of GEP instructions: if there's a GEP, of course it's going to have at least one offset, but every time we've called collectOffset we end up calling malloc as well for the DenseMap in the MapVector.	2024-09-30 23:15:18 +01:00
Stephen Tozer	40d6497a97	[DebugInfo] Transfer strcmp DILocation to generated inline code (#108531 ) When AggressiveInstCombine inlines a strcmp call, we currently copy the strcmp's DILocation only to the br instruction that jumps to the inline code. While this is roughly analogous to the original call, it leaves the generated code without any source location, which is precarious for a memory operation. This patch copies the strcmp call's DILocation to all the generated code. An alternative solution would be to generate a new DILocation with a line 0 location and an inlinedAt pointing to the original call location, but this would still give limited attribution to the generated code without traversing the DIE, whereas the submitted solution allows attribution with just the line table; even though it would be technically more accurate, pragmatically I believe that copying the call's location will be more useful for users.	2024-09-23 15:39:44 +01:00
Yingwei Zheng	62e9f40949	[PatternMatch] Use `m_SpecificCmp` matchers. NFC. (#100878 ) Compile-time improvement: http://llvm-compile-time-tracker.com/compare.php?from=13996378d81c8fa9a364aeaafd7382abbc1db83a&to=861ffa4ec5f7bde5a194a7715593a1b5359eb581&stat=instructions:u baseline: 803eaf29267c6aae9162d1a83a4a2ae508b440d3 ``` Top 5 improvements: stockfish/movegen.ll 2541620819 2538599412 -0.12% minetest/profiler.cpp.ll 431724935 431246500 -0.11% abc/luckySwap.c.ll 581173720 580581935 -0.10% abc/kitTruth.c.ll 2521936288 2519445570 -0.10% abc/extraUtilTruth.c.ll 1216674614 1215495502 -0.10% Top 5 regressions: openssl/libcrypto-shlib-sm4.ll 1155054721 1155943201 +0.08% openssl/libcrypto-lib-sm4.ll 1155054838 1155943063 +0.08% spike/vsm4r_vv.ll 1296430080 1297039258 +0.05% spike/vsm4r_vs.ll 1312496906 1313093460 +0.05% nuttx/lib_rand48.c.ll 126201233 126246692 +0.04% Overall: -0.02112308% ```	2024-07-29 10:04:06 +08:00
Yingwei Zheng	f58cfacfaf	[AggressiveInstCombine] Expand memchr with small constant strings (#98501 ) This patch converts memchr with a small constant string into a switch. It will reduce overhead of libcall and enable more folds (e.g., comparing the result with null). References: https://en.cppreference.com/w/c/string/byte/memchr	2024-07-17 00:25:36 +08:00
Nikita Popov	74deadf196	[IRBuilder] Don't include Module.h (NFC) (#97159 ) This used to be necessary to fetch the DataLayout, but isn't anymore.	2024-06-29 15:05:04 +02:00
Nikita Popov	9df71d7673	[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919 ) Similar to https://github.com/llvm/llvm-project/pull/96902, this adds `getDataLayout()` helpers to Function and GlobalValue, replacing the current `getParent()->getDataLayout()` pattern.	2024-06-28 08:36:49 +02:00
Stephen Tozer	d75f9dd1d2	Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497 )" Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.	2024-06-24 18:00:22 +01:00
Stephen Tozer	6481dc5761	[IR][NFC] Update IRBuilder to use InsertPosition (#96497 ) Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.	2024-06-24 17:27:43 +01:00
Franklin Zhang	1241e7692a	[AggressiveInstCombine] Fix strncmp inlining (#91204 ) Fix the issue that `char` constants are converted to `uint64_t` in the wrong way when doing the inlining.	2024-05-06 22:04:35 +08:00
Franklin Zhang	6b948705a0	[AggressiveInstCombine] Inline strcmp/strncmp (#89371 ) Inline calls to strcmp(s1, s2) and strncmp(s1, s2, N), where N and exactly one of s1 and s2 are constant. For example: ```c int res = strcmp(s, "ab"); ``` is converted to ```c int res = (int)s[0] - (int)'a'; if (res != 0) goto END; res = (int)s[1] - (int)'b'; if (res != 0) goto END; res = (int)s[2] - (int)'\0'; END: ``` Ported from a similar gcc feature [Inline strcmp with small constant strings](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809).	2024-05-03 13:24:38 +09:00
Yingwei Zheng	930996e9e4	[ValueTracking][NFC] Pass `SimplifyQuery` to `computeKnownFPClass` family (#80657 ) This patch refactors the interface of the `computeKnownFPClass` family to pass `SimplifyQuery` directly. The motivation of this patch is to compute known fpclass with `DomConditionCache`, which was introduced by https://github.com/llvm/llvm-project/pull/73662. With `DomConditionCache`, we can do more optimization with context-sensitive information. Example (extracted from [fmt/format.h](`e17bc67547/include/fmt/format.h (L3555-L3566)`)): ``` define float @test(float %x, i1 %cond) { %i32 = bitcast float %x to i32 %cmp = icmp slt i32 %i32, 0 br i1 %cmp, label %if.then1, label %if.else if.then1: %fneg = fneg float %x br label %if.end if.else: br i1 %cond, label %if.then2, label %if.end if.then2: br label %if.end if.end: %value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ] %ret = call float @llvm.fabs.f32(float %value) ret float %ret } ``` We can prove the signbit of `%value` is always zero. Then the fabs can be eliminated.	2024-02-06 02:30:12 +08:00
Nikita Popov	6c2fbc3a68	[IRBuilder] Add CreatePtrAdd() method (NFC) (#77582 ) This abstracts over the common pattern of creating a gep with i8 element type.	2024-01-12 14:21:21 +01:00
Mikael Holmen	ce0a750fe4	[AggressiveInstCombine] Ignore debug instructions when load combining (#70200 ) We previously included debug instructions when counting instructions when looking for loads to combine. This meant that the presence of debug instructions could affect optimization, as shown in the updated testcase. This fixes #69925.	2023-10-26 09:58:54 +02:00
Craig Topper	e9e458418f	[AggressiveInstCombine] Improve line breaks in comment. NFC The comments contain IR where some instructions don't fit in 80 columns. The extra part of the line was placed in front of the next IR instruction instead of on its own line.	2023-08-25 10:08:23 -07:00
Alexander Kornienko	0b779b0daa	Revert "[AggressiveInstCombine] Fold strcmp for short string literals" This reverts commit 5dde755188e34c0ba5304365612904476c8adfda, cbfcf90152de5392a36d0a0241eef25f5e159eef and 8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 due to a miscompile introduced in 8981520b19f2d2fe3d2bc80cf26318ee6b5b7473 (see https://reviews.llvm.org/D154725#4568845 for details) Differential Revision: https://reviews.llvm.org/D157430	2023-08-08 22:53:45 +02:00
Maksim Kita	5dde755188	[AggressiveInstCombine][NFC] Fix typo AggressiveInstCombine fix typo in expandStrcmp method. Differential Revision: https://reviews.llvm.org/D156556	2023-08-07 21:51:44 +03:00
David Green	aa97f6b494	[AIC] Fix the sext cost operands in tryToFPToSat As pointed out in D125755 the operand of a call to getCastInstrCost had the Src and Dst the wrong way around. Differential Revision: https://reviews.llvm.org/D154841	2023-08-07 09:33:18 +01:00
Maksim Kita	cbfcf90152	[AggressiveInstCombine] Fold strcmp for short string literals with size 2 Fold strcmp for short string literals with size 2. Depends D155742. Differential Revision: https://reviews.llvm.org/D155743	2023-07-27 18:45:21 +03:00
Maksim Kita	8981520b19	[AggressiveInstCombine] Fold strcmp for short string literals Fold strcmp() against 1-char string literals. This designates AggressiveInstCombine as the pass for libcalls simplifications that may need to change the control flow graph. Fixes https://github.com/llvm/llvm-project/issues/58003. Differential Revision: https://reviews.llvm.org/D154725	2023-07-19 17:12:27 +02:00
Matt Arsenault	6640df94f9	ValueTracking: Remove CannotBeOrderedLessThanZero Replace the last user of CannotBeOrderedLessThanZero with new version. Makes assumes work in this case.	2023-07-11 20:42:18 -04:00
Youngsuk Kim	d22a236ae7	[llvm] Replace use of Type::getPointerTo() (NFC) Partial progress towards replacing in-tree uses of `Type::getPointerTo()`. If `getPointerTo()` is used solely to support an unnecessary bitcast, remove the bitcast. Reviewed By: barannikov88, nikic Differential Revision: https://reviews.llvm.org/D153307	2023-06-23 22:32:29 -04:00
bipmis	cbc50ba12e	[AggressiveInstCombine] Handle the nested GEP/BitCast scenario in Load Merge. This seems to be an issue currently where there are nested/chained GEP/BitCast Pointers. The patch generates a new GEP for the wider load to avoid dominance problems. Differential Revision: https://reviews.llvm.org/D150864	2023-05-24 10:36:11 +01:00

1 2 3

114 Commits