15 Commits

Author SHA1 Message Date
Usman Nadeem
d4a38c8ff5
[DFAJumpThreading] Handle select unfolding when user phi is not a dir… (#109511)
…ect successor

Previously the code assumed that the select instruction is defined in a
block that is a direct predecessor of the block where the PHINode uses
it. So, we were hitting an assertion when we tried to access the def
block as an incoming block for the user phi node.

This patch handles that case by using the correct end block and creating
a new phi node that aggregates both the values of the select in that end
block, and then using that new unfolded phi to overwrite the original
user phi node.

Fixes #106083

Change-Id: Ie471994cca232318f74a6e6438efa21e561c2dc0
2024-09-24 08:54:36 -07:00
Usman Nadeem
b167ada896
[DFAJumpThreading] Rewrite the way paths are enumerated (#96127)
I tried to add a limit to number of blocks visited in the paths()
function but even with a very high limit the transformation coverage was
being reduced.

After looking at the code it seemed that the function was trying to
create paths of the form
`SwitchBB...DeterminatorBB...SwitchPredecessor`. This is inefficient
because a lot of nodes in those paths (nodes before DeterminatorBB)
would be irrelevant to the optimization. We only care about paths of the
form `DeterminatorBB_Pred DeterminatorBB...SwitchBB`. This weeds out a
lot of visited nodes.

In this patch I have added a hard limit to the number of nodes visited
and changed the algorithm for path calculation. Primarily I am
traversing the use-def chain for the PHI nodes that define the state. If
we have a hole in the use-def chain (no immediate predecessors) then I
call the paths() function.

I also had to the change the select instruction unfolding code to insert
redundant one input PHIs to allow the use of the use-def chain in
calculating the paths.

The test suite coverage with this patch (including a limit on nodes
visited) is as follows:

    Geomean diff:
      dfa-jump-threading.NumTransforms: +13.4%
      dfa-jump-threading.NumCloned: +34.1%
      dfa-jump-threading.NumPaths: -80.7%

Compile time effect vs baseline (pass enabled by default) is mostly
positive:
https://llvm-compile-time-tracker.com/compare.php?from=ad8705fda25f64dcfeb6264ac4d6bac36bee91ab&to=5a3af6ce7e852f0736f706b4a8663efad5bce6ea&stat=instructions:u

Change-Id: I0fba9e0f8aa079706f633089a8ccd4ecf57547ed
2024-08-10 12:13:53 -07:00
Usman Nadeem
c9325f8a2e
[DFAJumpThreading] Add an early exit heuristic for unpredictable values (#85015)
Right now the algorithm does not exit on unpredictable values. It
waits until all the paths have been enumerated to see if any of
those paths have that value. Waiting this late leads to a lot of
wasteful computation and higher compile time.

In this patch I have added a heuristic that checks if the value
comes from the same inner loops as the switch, if so, then it is
likely that the value will also be seen on a threadable path and
the code in `getStateDefMap()` return an empty map.

I tested this on the llvm test suite and the only change in the
number of threaded switches was in 7zip (before 23, after 18).
In all of those cases the current algorithm was partially threading
the loop because it was hitting a limit on the number of paths to
be explored. On increasing this limit even the current algorithm
finds paths where the unpredictable value is seen.

Compile time(with pass enabled by default and this patch):

https://llvm-compile-time-tracker.com/compare.php?from=8c5e9cf737138aba22a4a8f64ef2c5efc80dd7f9&to=42c75d888058b35c6d15901b34e36251d8f766b9&stat=instructions:u
2024-03-16 11:24:42 -07:00
XChy
43414e736c [DFAJumpThreading][NFC] Reduce tests 2024-01-16 12:25:08 +08:00
XChy
2c0fc0f37f
[DFAJumpThreading] Handle circular determinator (#78177)
Fixes the buildbot failure in
https://github.com/llvm/llvm-project/pull/78134#issuecomment-1892195197
When we meet the path with single `determinator`, the determinator
actually takes itself as a predecessor. Thus, we need to let `Prev` be
the determinator when `PathBBs` has only one element.
2024-01-15 17:52:53 -08:00
XChy
019ffbf324
[DFAJumpThreading] Extends the bitwidth of state from uint64_t to APInt (#78134)
Fixes #78059
2024-01-15 18:24:18 +08:00
XChy
c880fdc0f0
[DFAJumpThreading] Remove incoming StartBlock from all phis when unfolding select (#71082)
Fixes #65222.
When unfolding select into diamond-like control flow, we need to remove
the StartBlock from all phis in EndBlock.
2023-11-04 03:32:20 +08:00
XChy
2fba4694d0
[DFAJumpThreading] Don't thread switch without multiple successors (#71060)
Fixes #56882.
Fixes #60254.

When switch has only one successor, it make no sense to thread it. And
computing the cost of it brings div-by-zero exception. We prevent it in
this patch.
2023-11-02 22:22:45 +08:00
XChy
7fa41d8a8f
[DFAJumpThreading] Only unfold select coming from directly where it is defined (#70966)
Fixes #64860.
When a select instruction comes in by PHINode, the phi's incoming block
for it can flow indirectly past other BasicBlock into it. In this case,
we cannot unfold select to the phi's BB.
2023-11-02 21:25:54 +08:00
Nikita Popov
055fb7795a [Transforms] Convert some tests to opaque pointers (NFC)
These are all tests where conversion worked automatically, and
required no manual fixup.
2023-01-05 12:43:45 +01:00
Roman Lebedev
641a684fa0
[NFC] Port all DFAJumpThreading tests to -passes= syntax 2022-12-08 02:38:41 +03:00
Nuno Lopes
53dc0f1078 [NFC] Switch a few uses of undef to poison as placeholders for unreachble code 2022-07-03 14:34:03 +01:00
Alex Zhikhartsev
8b0d763474 [DFAJumpThreading] Relax analysis to handle unpredictable initial values
Responding to a feature request from the Rust community:

https://github.com/rust-lang/rust/issues/80630

    void foo(X) {
      for (...)
	switch (X)
	  case A
	    X = B
	  case B
	    X = C
    }

Even though the initial switch value is non-constant, the switch
statement can still be threaded: the initial value will hit the switch
statement but the rest of the state changes will proceed by jumping
unconditionally.

The early predictability check is relaxed to allow unpredictable values
anywhere, but later, after the paths through the switch statement have
been enumerated, no non-constant state values are allowed along the
paths. Any state value not along a path will be an initial switch value,
which can be safely ignored.

Differential Revision: https://reviews.llvm.org/D124394
2022-05-26 11:29:54 -04:00
Alexey Zhikhartsev
d5dc3964a7 [DFAJumpThreading] Determinator BB should precede switch-defining BB
Otherwise, it is possible that the state defined in the determinator
block defines the state for the next iteration of the loop, rather than
for the current one.

Fixes llvm-test-suite's
SingleSource/Regression/C/gcc-c-torture/execute/pr80421.c

Differential Revision: https://reviews.llvm.org/D115832
2021-12-24 10:27:03 -05:00
Alexey Zhikhartsev
02077da7e7 Add jump-threading optimization for deterministic finite automata
The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

```
          sw.bb                        sw.bb
        /   |   \                    /   |   \
   case1  case2  case3          case1  case2  case3
        \   |   /                /       |       \
        latch.bb             latch.2  latch.3  latch.1
         br sw.bb              /         |         \
                           sw.bb.2     sw.bb.3     sw.bb.1
                            br case2    br case3    br case1
```

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D99205
2021-07-27 14:34:04 -04:00