llvm-project

Author SHA1 Message Date

Author	SHA1	Message	Date
Usman Nadeem	b167ada896	[DFAJumpThreading] Rewrite the way paths are enumerated (#96127 ) I tried to add a limit to number of blocks visited in the paths() function but even with a very high limit the transformation coverage was being reduced. After looking at the code it seemed that the function was trying to create paths of the form `SwitchBB...DeterminatorBB...SwitchPredecessor`. This is inefficient because a lot of nodes in those paths (nodes before DeterminatorBB) would be irrelevant to the optimization. We only care about paths of the form `DeterminatorBB_Pred DeterminatorBB...SwitchBB`. This weeds out a lot of visited nodes. In this patch I have added a hard limit to the number of nodes visited and changed the algorithm for path calculation. Primarily I am traversing the use-def chain for the PHI nodes that define the state. If we have a hole in the use-def chain (no immediate predecessors) then I call the paths() function. I also had to the change the select instruction unfolding code to insert redundant one input PHIs to allow the use of the use-def chain in calculating the paths. The test suite coverage with this patch (including a limit on nodes visited) is as follows: Geomean diff: dfa-jump-threading.NumTransforms: +13.4% dfa-jump-threading.NumCloned: +34.1% dfa-jump-threading.NumPaths: -80.7% Compile time effect vs baseline (pass enabled by default) is mostly positive: https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed	2024-08-10 12:13:53 -07:00
Roman Lebedev	641a684fa0	[NFC] Port all DFAJumpThreading tests to `-passes=` syntax	2022-12-08 02:38:41 +03:00
Alexey Zhikhartsev	02077da7e7	Add jump-threading optimization for deterministic finite automata The current JumpThreading pass does not jump thread loops since it can result in irreducible control flow that harms other optimizations. This prevents switch statements inside a loop from being optimized to use unconditional branches. This code pattern occurs in the core_state_transition function of Coremark. The state machine can be implemented manually with goto statements resulting in a large runtime improvement, and this transform makes the switch implementation match the goto version in performance. This patch specifically targets switch statements inside a loop that have the opportunity to be threaded. Once it identifies an opportunity, it creates new paths that branch directly to the correct code block. For example, the left CFG could be transformed to the right CFG: ``` sw.bb sw.bb / \| \ / \| \ case1 case2 case3 case1 case2 case3 \ \| / / \| \ latch.bb latch.2 latch.3 latch.1 br sw.bb / \| \ sw.bb.2 sw.bb.3 sw.bb.1 br case2 br case3 br case1 ``` Co-author: Justin Kreiner @jkreiner Co-author: Ehsan Amiri @amehsan Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99205	2021-07-27 14:34:04 -04:00

Usman Nadeem

b167ada896

[DFAJumpThreading] Rewrite the way paths are enumerated (#96127 )

I tried to add a limit to number of blocks visited in the paths()
function but even with a very high limit the transformation coverage was
being reduced.

After looking at the code it seemed that the function was trying to
create paths of the form
`SwitchBB...DeterminatorBB...SwitchPredecessor`. This is inefficient
because a lot of nodes in those paths (nodes before DeterminatorBB)
would be irrelevant to the optimization. We only care about paths of the
form `DeterminatorBB_Pred DeterminatorBB...SwitchBB`. This weeds out a
lot of visited nodes.

In this patch I have added a hard limit to the number of nodes visited
and changed the algorithm for path calculation. Primarily I am
traversing the use-def chain for the PHI nodes that define the state. If
we have a hole in the use-def chain (no immediate predecessors) then I
call the paths() function.

I also had to the change the select instruction unfolding code to insert
redundant one input PHIs to allow the use of the use-def chain in
calculating the paths.

The test suite coverage with this patch (including a limit on nodes
visited) is as follows:

    Geomean diff:
      dfa-jump-threading.NumTransforms: +13.4%
      dfa-jump-threading.NumCloned: +34.1%
      dfa-jump-threading.NumPaths: -80.7%

Compile time effect vs baseline (pass enabled by default) is mostly
positive:
https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u

Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed

2024-08-10 12:13:53 -07:00

Roman Lebedev

641a684fa0

[NFC] Port all DFAJumpThreading tests to -passes= syntax

2022-12-08 02:38:41 +03:00

Alexey Zhikhartsev

02077da7e7

Add jump-threading optimization for deterministic finite automata

The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

```
          sw.bb                        sw.bb
        /   |   \                    /   |   \
   case1  case2  case3          case1  case2  case3
        \   |   /                /       |       \
        latch.bb             latch.2  latch.3  latch.1
         br sw.bb              /         |         \
                           sw.bb.2     sw.bb.3     sw.bb.1
                            br case2    br case3    br case1
```

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D99205

2021-07-27 14:34:04 -04:00

3 Commits