llvm-project

Author	SHA1	Message	Date
Usman Nadeem	8baa5bf499	[DFAJumpThreading] Try harder to avoid cycles in paths. (#169151 ) If a threading path has cycles within it then the transformation is not correct. This patch fixes a couple of cases that create such cycles. Fixes https://github.com/llvm/llvm-project/issues/166868	2025-11-22 10:59:27 -08:00
Hongyu Chen	affed57d36	[DFAJumpThreading] Add MaxOuterUseBlocks threshold (#163428 ) For every threadable path `B1 -> B2 -> ... -> Bn`, we need to insert phi nodes into every unduplicated successor of `Bi` if there are outer uses of duplicated definitions in `B_i`. To prevent the booming of phi nodes, this patch adds a threshold for the maximum number of unduplicated successors that may contain outer uses. This threshold makes sense especially when multi-target branches like switch/indirectbr/callbr are duplicated. Note that the O3 statistics in llvm-test-suite are not influenced.	2025-10-30 00:34:48 +08:00
Usman Nadeem	f275d2b057	[DFAJumpThreading] Don't insert existing edge to DomTree while unfolding (#163296 ) The edge `StartBlock -> EndBlock` already exists before unfolding. The instructions for `applyUpdates()` say that you are supposed not to insert an existing edge. Fixes issues reported by @mikaelholmen in https://github.com/llvm/llvm-project/pull/162802	2025-10-14 09:52:39 -07:00
Hongyu Chen	90cbc37905	[DFAJumpThreading] Verify dominator tree by option (#163334 ) Note that the test coverage misses the dominator tree verification. This patch controls verification by option, instead of using the EXPENSIVE_CHECKS macro.	2025-10-14 11:52:16 +00:00
Hongyu Chen	3340b245af	[DFAJumpThreading] Precompute value => successor mapping (#162824 ) Address some previous comments. Note that the value => successor mapping is invalid during DFA transformation unless we update it correctly.	2025-10-10 22:57:42 +08:00
Hongyu Chen	881a57b995	[DFAJumpThreading] Use a single lazy DTU (#162802 ) Resolves https://github.com/llvm/llvm-project/pull/162240#pullrequestreview-3309102803	2025-10-10 09:16:12 +00:00
Hongyu Chen	22a02b63e8	[DFAJumpThreading] Unify equivalent states (#162447 ) DFAJumpThreading determines the switch destination of a threadable path based on its next state and cannot reuse cloned blocks if their destinations differ. However, different states may lead to the same destination. This patch unifies equivalent states, thereby avoiding redundant duplication of cloned blocks.	2025-10-10 13:44:06 +08:00
Hongyu Chen	daf81a6c08	[DFAJumpThreading] Pretty print anonymous blocks (#162607 ) We previously printed the address of anonymous blocks, which makes no sense for debugging. This patch ensures consistency with the IR printer.	2025-10-09 08:29:36 +00:00
Hongyu Chen	ad00610831	[DFAJumpThreading][NFC] Clear cleanPhiNodes and phi-related code (#162423 )	2025-10-08 09:55:29 +00:00
Mircea Trofin	278a99e8e9	[DFAJT][profcheck] Propagate `select` -> `br` profile metadata (#162213 ) Issue #147390	2025-10-07 09:28:04 -07:00
Rahul Joshi	b256d0a7aa	[NFC][LLVM] Cleanup namespace usage in DFAJumpThreading.cpp (#162179 )	2025-10-07 07:57:34 -07:00
Mircea Trofin	95144b176f	[NFC][DFAJT] Place `cl::opt`s in the llvm namespace (#162212 ) Along the lines of [#161240](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html "Open Issue or Pull Request #161240 on GitHub")	2025-10-07 07:30:01 -07:00
Hongyu Chen	ed113e7904	[DFAJumpThreading] Update domtree lazily (#162240 )	2025-10-07 10:37:37 +00:00
Hongyu Chen	f0a787b55d	[DFAJumpThreading] Set MadeChanges only if threading happened (#162241 ) Threading may fail due to legality and profitability checks. This patch preserves the analysis if no threading happens.	2025-10-07 10:34:03 +00:00
Hongyu Chen	a81b6c6836	[DFAJumpThreading][NFC] Replace with const reference (#162238 )	2025-10-07 08:39:04 +00:00
Hongyu Chen	cf85ec54c1	[DFAJumpThreading] Constraint the number of cloned instructions (#161632 ) Duplicating blocks of threaded paths may cause a significant regression in IR size and slow down compile-time in later optimizations. This patch adds a coarse constraint on the number of duplicated instructions.	2025-10-07 12:33:42 +08:00
Hongyu Chen	a05e004b28	[DFAJumpThreading] Unfold select to the incoming block of phi user (#160987 ) Fixes #160250 We previously assumed the select to unfold is defined in the incoming block of phi user, as `isValidSelectInst` filters other cases at the initial stage. However, the selects not defined in the incoming block may occur after unfolding the arms of the unfolded select. This patch sinks the select into the incoming block of the phi user and unfolds it at the incoming block.	2025-10-01 14:06:08 +00:00
Bushev Dmitry	b3306cbb53	[DFAJumpThreading] Prevent pass from using too much memory. (#153193 ) The limit 'dfa-max-num-paths' that is used to control number of enumerated paths was not checked against inside getPathsFromStateDefMap. It may lead to large memory consumption for complex enough switch statements. Reland llvm/llvm-project#145482	2025-09-10 18:41:41 +08:00
Daniel Kuts	31199459f4	[DFAJumpThreading] Fix possible null dereference (#157461 ) Fixes #157450 --------- Co-authored-by: Nikita Popov <github@npopov.com>	2025-09-09 02:15:06 +08:00
Kazu Hirata	11b4f110e0	[llvm] Remove unused includes of SmallSet.h (NFC) (#154893 ) We just replaced SmallSet<T , N> with SmallPtrSet<T , N>, bypassing the redirection found in SmallSet.h. With that, we no longer need to include SmallSet.h in many files.	2025-08-22 10:33:46 -07:00
Kazu Hirata	07eb7b7692	[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068 ) This patch replaces SmallSet<T , N> with SmallPtrSet<T , N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType, N> : public SmallPtrSet<PointeeType, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.	2025-08-18 07:01:29 -07:00
XChy	df75b4b942	Revert "[DFAJumpThreading] Prevent pass from using too much memory." (#153075 ) Reverts llvm/llvm-project#145482	2025-08-12 04:26:47 +08:00
Bushev Dmitry	b5902924b2	[DFAJumpThreading] Prevent pass from using too much memory. (#145482 ) The limit 'dfa-max-num-paths' that is used to control number of enumerated paths was not checked against inside getPathsFromStateDefMap. It may lead to large memory consumption for complex enough switch statements.	2025-08-07 20:15:42 +03:00
Kazu Hirata	06cb7b1e14	[Transforms] Use llvm::append_range (NFC) (#133650 )	2025-03-30 12:21:59 -07:00
Kazu Hirata	d8b078d550	[Transforms] Use llvm::append_range (NFC) (#133607 )	2025-03-29 18:57:50 -07:00
Kazu Hirata	73dc2afd2c	[Transforms] Use Set::insert_range (NFC) (#132652 ) We can use Set::insert_range to collapse: for (auto Elem : Range) Set.insert(E); down to: Set.insert_range(Range); In some cases, we can further fold that into the set declaration.	2025-03-23 19:42:53 -07:00
Usman Nadeem	5fb57131b7	[DFAJumpThreading] Don't bail early after encountering unpredictable values (#119774 ) After #96127 landed, mshockwave reported that the pass was no longer threading SPEC2006/perlbench. After 96127 we started bailing out in `getStateDefMap` and rejecting the transformation because one of the unpredictable values was coming from inside the loop. There was no fundamental change in that function except that we started calling `Loop->contains(IncomingBB)` instead of `LoopBBs.count(IncomingBB)`. After some analysis I came to the conclusion that even before 96127 we would reject the transformation if we provided large enough limits on the path traversal (large enough so that LoopBBs contained blocks corresponding to that unpredictable value). In this patch I changed `getStateDefMap` to not terminate early on finding an unpredictable value, this is because `getPathsFromStateDefMap`, later, actually has checks to ensure that the final list of paths only have predictable values. As a result we can now partially thread functions like `negative6` in the tests that have some predictable paths. This patch does not really have any compile-time impact on the test suite without `-dfa-early-exit-heuristic=false` (early exit is enabled by default). Change-Id: Ie1633b370ed4a0eda8dea52650b40f6f66ef49a3	2024-12-25 01:29:01 -08:00
Kazu Hirata	94f9cbbe49	[Scalar] Remove unused includes (NFC) (#114645 ) Identified with misc-include-cleaner.	2024-11-02 08:32:26 -07:00
Usman Nadeem	d4a38c8ff5	[DFAJumpThreading] Handle select unfolding when user phi is not a dir… (#109511 ) …ect successor Previously the code assumed that the select instruction is defined in a block that is a direct predecessor of the block where the PHINode uses it. So, we were hitting an assertion when we tried to access the def block as an incoming block for the user phi node. This patch handles that case by using the correct end block and creating a new phi node that aggregates both the values of the select in that end block, and then using that new unfolded phi to overwrite the original user phi node. Fixes #106083 Change-Id: Ie471994cca232318f74a6e6438efa21e561c2dc0	2024-09-24 08:54:36 -07:00
Kazu Hirata	23a26e7120	[DFAJumpThreading] Avoid repeated hash lookups (NFC) (#107670 )	2024-09-07 08:22:21 -07:00
Usman Nadeem	b167ada896	[DFAJumpThreading] Rewrite the way paths are enumerated (#96127 ) I tried to add a limit to number of blocks visited in the paths() function but even with a very high limit the transformation coverage was being reduced. After looking at the code it seemed that the function was trying to create paths of the form `SwitchBB...DeterminatorBB...SwitchPredecessor`. This is inefficient because a lot of nodes in those paths (nodes before DeterminatorBB) would be irrelevant to the optimization. We only care about paths of the form `DeterminatorBB_Pred DeterminatorBB...SwitchBB`. This weeds out a lot of visited nodes. In this patch I have added a hard limit to the number of nodes visited and changed the algorithm for path calculation. Primarily I am traversing the use-def chain for the PHI nodes that define the state. If we have a hole in the use-def chain (no immediate predecessors) then I call the paths() function. I also had to the change the select instruction unfolding code to insert redundant one input PHIs to allow the use of the use-def chain in calculating the paths. The test suite coverage with this patch (including a limit on nodes visited) is as follows: Geomean diff: dfa-jump-threading.NumTransforms: +13.4% dfa-jump-threading.NumCloned: +34.1% dfa-jump-threading.NumPaths: -80.7% Compile time effect vs baseline (pass enabled by default) is mostly positive: https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed	2024-08-10 12:13:53 -07:00
Kazu Hirata	34d48279b8	[llvm] Initialize SmallVector with ranges (NFC) (#100948 )	2024-07-28 22:08:12 -07:00
Sameer Sahasrabuddhe	e0ac087ff0	[LoopUnroll] Consider convergence control tokens when unrolling (#91715 ) - There is no restriction on a loop with controlled convergent operations when the relevant tokens are defined and used within the loop. - When a token defined outside a loop is used inside (also called a loop convergence heart), unrolling is allowed only in the absence of remainder or runtime checks. - When a token defined inside a loop is used outside, such a loop is said to be "extended". This loop can only be unrolled by also duplicating the extended part lying outside the loop. Such unrolling is disabled for now. - Clean up loop hearts: When unrolling a loop with a heart, duplicating the heart will introduce multiple static uses of a convergence control token in a cycle that does not contain its definition. This violates the static rules for tokens, and needs to be cleaned up into a single occurrence of the intrinsic. - Spell out the initializer for UnrollLoopOptions to improve readability. Original implementation [D85605] by Nicolai Haehnle <nicolai.haehnle@amd.com>.	2024-06-06 13:13:46 +05:30
XChy	7b5b5214a6	[DFAJumpThreading][NFC] Use const reference as range variable (#90342 ) Fixes #90286	2024-04-27 21:55:53 +08:00
XChy	c229f767e4	[DFAJumpThreading] Avoid exploring the paths that never come back (#85505 ) This patch does: - Preserve loop info when unfolding selects. - Reduce the search space for loop paths.	2024-04-27 16:10:16 +08:00
Usman Nadeem	c9325f8a2e	[DFAJumpThreading] Add an early exit heuristic for unpredictable values (#85015 ) Right now the algorithm does not exit on unpredictable values. It waits until all the paths have been enumerated to see if any of those paths have that value. Waiting this late leads to a lot of wasteful computation and higher compile time. In this patch I have added a heuristic that checks if the value comes from the same inner loops as the switch, if so, then it is likely that the value will also be seen on a threadable path and the code in `getStateDefMap()` return an empty map. I tested this on the llvm test suite and the only change in the number of threaded switches was in 7zip (before 23, after 18). In all of those cases the current algorithm was partially threading the loop because it was hitting a limit on the number of paths to be explored. On increasing this limit even the current algorithm finds paths where the unpredictable value is seen. Compile time(with pass enabled by default and this patch): https://llvm-compile-time-tracker.com/compare.php?from=8c5e9cf737138aba22a4a8f64ef2c5efc80dd7f9&to=42c75d888058b35c6d15901b34e36251d8f766b9&stat=instructions:u	2024-03-16 11:24:42 -07:00
XChy	6b53ada69a	[DFAJumpThreading] Early exit if switch is not in a loop (#85360 ) This patch prevents taking non-loop switch as candidate.	2024-03-15 23:00:13 +08:00
XChy	2c0fc0f37f	[DFAJumpThreading] Handle circular determinator (#78177 ) Fixes the buildbot failure in https://github.com/llvm/llvm-project/pull/78134#issuecomment-1892195197 When we meet the path with single `determinator`, the determinator actually takes itself as a predecessor. Thus, we need to let `Prev` be the determinator when `PathBBs` has only one element.	2024-01-15 17:52:53 -08:00
XChy	019ffbf324	[DFAJumpThreading] Extends the bitwidth of state from uint64_t to APInt (#78134 ) Fixes #78059	2024-01-15 18:24:18 +08:00
Kazu Hirata	03dc806b12	[Transforms] Use {DenseMap,SmallPtrSet}::contains (NFC)	2023-12-22 14:51:22 -08:00
XChy	c880fdc0f0	[DFAJumpThreading] Remove incoming StartBlock from all phis when unfolding select (#71082 ) Fixes #65222. When unfolding select into diamond-like control flow, we need to remove the StartBlock from all phis in EndBlock.	2023-11-04 03:32:20 +08:00
XChy	2fba4694d0	[DFAJumpThreading] Don't thread switch without multiple successors (#71060 ) Fixes #56882. Fixes #60254. When switch has only one successor, it make no sense to thread it. And computing the cost of it brings div-by-zero exception. We prevent it in this patch.	2023-11-02 22:22:45 +08:00
XChy	7fa41d8a8f	[DFAJumpThreading] Only unfold select coming from directly where it is defined (#70966 ) Fixes #64860. When a select instruction comes in by PHINode, the phi's incoming block for it can flow indirectly past other BasicBlock into it. In this case, we cannot unfold select to the phi's BB.	2023-11-02 21:25:54 +08:00
Bjorn Pettersson	a20f7efbc5	Remove several no longer needed includes. NFCI Mostly removing includes of InitializePasses.h and Pass.h in passes that no longer has support for the legacy PM.	2023-04-17 13:54:19 +02:00
Kazu Hirata	c83c4b58d1	[Transforms] Apply fixes from performance-for-range-copy (NFC)	2023-04-16 08:25:28 -07:00
Arthur Eubanks	7c3c981442	[Passes] Remove some legacy passes DFAJumpThreading JumpThreading LibCallsShrink LoopVectorize SLPVectorizer DeadStoreElimination AggressiveDCE CorrelatedValuePropagation IndVarSimplify These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.	2023-03-10 17:17:00 -08:00
Kazu Hirata	5ea3155565	[llvm] Use llvm::find (NFC)	2022-10-16 16:21:00 -07:00
Daniil Fukalov	9c710ebbdb	[TTI] NFC: Reduce InstructionCost::getValue() usage... in order to propagate `InstructionCost` value upper. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103406	2022-08-26 16:37:32 +03:00
Nuno Lopes	53dc0f1078	[NFC] Switch a few uses of undef to poison as placeholders for unreachble code	2022-07-03 14:34:03 +01:00
Philip Reames	f85c5079b8	Pipe potentially invalid InstructionCost through CodeMetrics Per the documentation in Support/InstructionCost.h, the purpose of an invalid cost is so that clients can change behavior on impossible to cost inputs. CodeMetrics was instead asserting that invalid costs never occurred. On a target with an incomplete cost model - e.g. RISCV - this means that transformations would crash on (falsely) invalid constructs - e.g. scalable vectors. While we certainly should improve the cost model - and I plan to do so in the near future - we also shouldn't be crashing. This violates the explicitly stated purpose of an invalid InstructionCost. I updated all of the "easy" consumers where bailouts were locally obvious. I plan to follow up with loop unroll in a following change. Differential Revision: https://reviews.llvm.org/D127131	2022-06-09 15:17:24 -07:00

1 2

67 Commits