136 Commits

Author SHA1 Message Date
Robert Imschweiler
a8ea7f4580
Reapply: [AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm (#152161) (#166195)
Reapply #152161 with fixed 'changed' flags.
2025-11-03 20:59:48 +01:00
Robert Imschweiler
af68efc9c4
Revert "[AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm" (#166186)
Reverts llvm/llvm-project#152161

Need to revert to fix changed logic for the expensive checks.
2025-11-03 16:33:20 +00:00
Robert Imschweiler
332f9b5eee
[AMDGPU][UnifyDivergentExitNodes][StructurizeCFG] Add support for callbr instruction with inline-asm (#152161)
Finishes adding inline-asm callbr support for AMDGPU, started by
https://github.com/llvm/llvm-project/pull/149308.
2025-11-03 16:09:12 +01:00
Kazu Hirata
707bab651f
[llvm] Remove redundant typename (NFC) (#166087)
Identified with readability-redundant-typename.
2025-11-02 13:15:16 -08:00
Valery Pykhtin
fc2afbda36
[AMDGPU] Improve StructurizeCFG pass performance by using SSAUpdaterBulk. (#150937)
SSAUpdaterBulk replaces legacy SSAUpdater.
2025-10-14 20:07:10 +02:00
Vigneshwar Jayakumar
d9fa0de5c8
[StructurizeCFG] bug fix in zero cost hoist (#157969)
This fixes a bug where zero cost instruction was hoisted to nearest
common dominator but the hoisted instruction's operands didn't dominate
the common dominator causing poison values.
2025-09-15 14:29:32 -05:00
Vigneshwar Jayakumar
df96e09c1e
[StructurizeCFG] nested-if zerocost hoist bugfix (#155408)
When zero cost instructions are hoisted, the simplifyHoistedPhi function
was setting incoming phi values which were not dominating the use
causing runtime failure. This was set to poison by rebuildSSA function.
This commit fixes the issue.
2025-08-28 09:04:14 -05:00
Kazu Hirata
11b4f110e0
[llvm] Remove unused includes of SmallSet.h (NFC) (#154893)
We just replaced SmallSet<T *, N> with SmallPtrSet<T *, N>, bypassing
the redirection found in SmallSet.h.  With that, we no longer need to
include SmallSet.h in many files.
2025-08-22 10:33:46 -07:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
Vigneshwar Jayakumar
56ae79a6ab
reland "[StructurizeCFG] Hoist and simplify zero-cost incoming else p… (#149744)
…hi values (#139605)"

This relands commit b11523b494b with the fix for llvm-buildbot failures
"clang-hip-vega20" and "openmp-offload-amdgpu-runtime-2". The reland
prevents hoisting the phi node which fixes the issue.

Original PR description:

The order of if and else blocks can introduce unnecessary VGPR copies.
Consider the case of an if-else block where the incoming phi from the
'Else block' only contains zero-cost instructions, and the 'Then' block
modifies some value. There would be no interference when coalescing
because only one value is live at any point before structurization.
However, in the structurized CFG, the Then value is live at 'Else' block
due to the path if→flow→else, leading to additional VGPR copies.

This patch addresses the issue by:
- Identifying PHI nodes with zero-cost incoming values from the Else
block and hoisting those values to the nearest common dominator of the
Then and Else blocks.
- Updating Flow PHI nodes by replacing poison entries (on the if→flow
edge) with the correct hoisted values.
2025-07-25 15:23:45 -05:00
Vigneshwar Jayakumar
25c3f64105
Revert "[StructurizeCFG] Hoist and simplify zero-cost incoming else phi values" (#148016)
reverting to fix Buildbot failures.
2025-07-10 13:06:38 -05:00
Vigneshwar Jayakumar
8d3f497eb8
[StructurizeCFG] Hoist and simplify zero-cost incoming else phi values (#139605)
The order of if and else blocks can introduce unnecessary VGPR copies.  
Consider the case of an if-else block where the incoming phi from the
'Else block' only contains zero-cost instructions, and the 'Then' block
modifies some value. There would be no interference when coalescing
because only one value is live at any point before structurization.
However, in the structurized CFG, the Then value is live at 'Else' block
due to the path if→flow→else, leading to additional VGPR copies.

This patch addresses the issue by:
- Identifying PHI nodes with zero-cost incoming values from the Else
block and hoisting those values to the nearest common dominator of the
Then and Else blocks.
- Updating Flow PHI nodes by replacing poison entries (on the if→flow
edge) with the correct hoisted values.
2025-07-10 12:03:04 -05:00
Emma Pilkington
7babf22461
[StructurizeCFG] Stop setting DebugLocs in flow blocks (#139088)
Flow blocks are generated code that don't really correspond to any
location in the source, so principally they should have empty DebugLocs.
Practically, setting these debug locs leads to redundant is_stmts being
generated after #108251, causing stepping test failures in the ROCm GDB
test suite.

Fixes SWDEV-502134
2025-05-09 14:22:14 -04:00
Shilei Tian
fc55ad4ceb
Revert "[StructurizeCFG] Refactor insertConditions. NFC. (#115476)" (#136370) 2025-04-19 08:04:48 -04:00
Kazu Hirata
d8b078d550
[Transforms] Use llvm::append_range (NFC) (#133607) 2025-03-29 18:57:50 -07:00
Kazu Hirata
673f4705a8
[llvm] Use *Set::insert_range (NFC) (#133353)
We can use *Set::insert_range to collapse:

  for (auto Elem : Range)
    Set.insert(E.first);

down to:

  Set.insert_range(llvm::make_first_range(Range));

In some cases, we can further fold that into the set declaration.
2025-03-27 20:44:20 -07:00
Valery Pykhtin
bb2e85f12f
[AMDGPU] Improve StructurizeCFG pass performance: avoid redundant DebugLoc map initialization. NFC. (#130568)
Previously, the TermDL (BB terminator → DebugLoc) map was initialized at
the start of processing each function's region, creating entries for the
entire function. This could be inefficient for large functions.

This patch improves performance by creating map entries only when
needed—when a terminator is being killed or when a flow block is
created. Additionally, entries are removed immediately after use,
preventing unnecessary map growth and ensuring DebugLocs are not
"retracked."

A mapless variant was also explored, but due to limited familiarity with
the structurizer, it was not pursued further.

In my cases, this change improves performance by 2-3×.
2025-03-11 07:19:53 +01:00
Matt Arsenault
2bada417c1
StructurizeCFG: Use poison instead of undef (#130459)
There are a surprising number of codegen changes from this.
2025-03-10 22:29:15 +07:00
Kazu Hirata
7c7cebf49d
[Scalar] Avoid repeated hash lookups (NFC) (#130463) 2025-03-09 00:48:17 -08:00
Pedro Lobo
9865296343
[StructurizeCFG] Use poison instead of undef as placeholder [NFC] (#119137) 2024-12-10 15:03:44 +00:00
Jay Foad
231e63d816
[StructurizeCFG] Refactor insertConditions. NFC. (#115476)
This just makes it more obvious that having Parent as the single
predecessor is a special case, instead of checking for it in the middle
of a loop that finds the nearest common dominator of multiple
predecessors.
2024-11-26 09:40:33 +00:00
Jay Foad
b535e4ecac
[StructurizeCFG] Remove one SSAUpdater::AddAvailableValue. NFCI. (#115472) 2024-11-08 17:20:29 +00:00
Jay Foad
107af4a62e
[StructurizeCFG] Introduce struct PredInfo. NFC. (#115457)
This just provides a neater encapsulation of the info about the
predicate for an edge, rather than ValueWeightPair aka std::pair.
2024-11-08 14:26:29 +00:00
Kazu Hirata
94f9cbbe49
[Scalar] Remove unused includes (NFC) (#114645)
Identified with misc-include-cleaner.
2024-11-02 08:32:26 -07:00
Ruiling, Song
54d31bde32
Reapply "StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)" (#114347)
This reverts commit be40c723ce2b7bf2690d22039d74d21b2bd5b7cf.
2024-11-01 08:29:59 +08:00
Juan Manuel Martinez Caamaño
b40ff5ac2d
[AMDGPU][StructurizeCFG] Maintain branch MD_prof metadata (#109813)
Currently `StructurizeCFG` drops branch_weight metadata .
This metadata can be generated from user annotations in the source code
like:

```cpp
if (...) [[likely]] {
}
```
2024-09-25 13:15:23 +02:00
Kazu Hirata
a2f659c134
[StructurizeCFG] Avoid repeated hash lookups (NFC) (#107797) 2024-09-09 07:15:12 -07:00
Matt Arsenault
f86da4cb7d
StructurizeCFG: Add SkipUniformRegions pass parameter to new PM version (#102812)
Keep respecting the old cl::opt for now.
2024-08-12 15:13:15 +04:00
Yaxun (Sam) Liu
be40c723ce Revert "StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)"
This reverts commit c62e2a2a4ed69d53a3c6ca5c24ee8d2504d6ba2b.

Since it caused regression in HIP buildbot:

https://lab.llvm.org/buildbot/#/builders/123/builds/3282
2024-08-08 11:59:39 -04:00
Ruiling, Song
c62e2a2a4e
StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301)
After investigating more while-break cases, I think we should try to
optimize
the way we reconstruct phi nodes. Previously, we reconstruct each phi 
nodes separately, but this is not optimal. For example:

```
header:
  %v.1 = phi float [ %v, %entry ], [ %v.2, %latch ]
  br i1 %cc, label %if, label %latch

if:
  %v.if = fadd float %v.1, 1.0 
  br i1 %cc2, label %latch, label %exit

latch:
  %v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]
  br i1 %cc3, label %exit, label %header

exit:
  %v.3 = phi float [ %v.2, %latch ], [ %v.if, %if ]
```

For this case, we have different copies of value `v`, but there is at
most one copy of value `v` alive at any program point shown above.

The existing ssa reconstruction will use the incoming values from the 
old deleted phi. Below is a possible output after ssa reconstruction.

```
header:
  %v.1 = phi float [ %v, %entry ], [ %v.loop, %Flow1 ]
  br i1 %cc, label %if, label %flow

if:
  %v.if = fadd float %v.1, 1.0 
  br label %flow

flow:
  %v.exit.if = phi float [ %v.if, %if ], [ undef, %header ]
  %v.latch = phi float [ %v.if, %if ], [ %v.1, %header ]

latch:
  br label %flow1

flow1:
  %v.loop = phi float [ %v.latch, %latch ], [ undef, %Flow ]
  %v.exit = phi float [ %v.latch, %latch ], [ %v.exit.if, %Flow ]

exit:
  %v.3 = phi float [ %v.exit, %flow1 ]
```

If we look closely, in order to reconstruct `v.1` `v.2` `v.3`, we are 
having two simultaneous copies of `v` alive at `flow` and `flow1`.
We highly depend on register coalescer to coalesce them together.
But register coalescer may not always be able to coalesce them
because of the complexity in the chain of phi.

On the other side, now that we have only one copy of `v` alive at any 
program point before the transform, why not simplify the phi network
as much as we can? Look at the incoming values of these PHIs:
```
      header    if     latch
v.1:   --       --      v.2 
v.2:   v.1      v.if    --  
v.3:   --       v.if    v.2 
```
If we let them share the same incoming values for these three different
incoming blocks, then we would have only one copy of alive `v` at any 
program point after ssa reconstruction. Something like:

```
header:
  %v.1 = phi float [ %v, %entry ], [ %v.2, %Flow1 ]
  br i1 %cc, label %if, label %flow

if:
  %v.if = fadd float %v.1, 1.0 
  br label %flow

flow:
  %v.2 = phi float [ %v.if, %if ], [ %v.1, %header ]

latch:
  br label %flow1

flow1:
  ...

exit:
  %v.3 = phi float [ %v.2, %flow1 ]
```
2024-08-08 14:47:49 +08:00
Nikita Popov
9df71d7673
[IR] Add getDataLayout() helpers to Function and GlobalValue (#96919)
Similar to https://github.com/llvm/llvm-project/pull/96902, this adds
`getDataLayout()` helpers to Function and GlobalValue, replacing the
current `getParent()->getDataLayout()` pattern.
2024-06-28 08:36:49 +02:00
Ruiling, Song
ac24238002
[LowerSwitch] Don't let pass manager handle the dependency (#68662)
Some passes has limitation that only support simple terminators:
branch/unreachable/return. Right now, they ask the pass manager to add
LowerSwitch pass to eliminate `switch`. Let's manage such kind of pass
dependency by ourselves. Also add the assertion in the related passes.
2023-10-25 09:24:36 +08:00
Nuno Lopes
29d0b60430 [StructurizeCFG] Use poison instead of undef as placeholder [NFC]
These are used to create branch instructions. The condition is patched later
2023-07-22 13:23:39 +01:00
pvanhout
e4ea2d5919 [StructurizeCFG] Correctly depend on UniformityAnalysis
Small oversight in https://reviews.llvm.org/D145688 - the pass' dependency was not updated to reflect the change to UA.

Also, change DivergenceAnalysis to UniformityAnalysis in a comment. That way, StructurizeCFG only refers to UA and not DA anymore.
2023-03-14 11:25:22 +01:00
pvanhout
240e2cba67 [StructurizeCFG] Use UniformityAnalysis instead of DivergenceAnalysis
Depends on D145572

Reviewed By: foad, sameerds

Differential Revision: https://reviews.llvm.org/D145688
2023-03-13 08:31:20 +01:00
Juan Manuel MARTINEZ CAAMAÑO
96ad51e3eb [StructurizeCFG][DebugInfo] Avoid use-after-free
Reviewed By: dstuttard

Differential Revision: https://reviews.llvm.org/D137408
2022-11-04 13:39:49 +00:00
Juan Manuel MARTINEZ CAAMAÑO
256f8b06c6 [StructurizeCFG][DebugInfo] Maintain DILocations in the branches created by StructurizeCFG
Make StructurizeCFG preserve the debug locations of the branch instructions it introduces.

Differential Revision: https://reviews.llvm.org/D135967
2022-10-28 02:51:02 -05:00
Juan Manuel MARTINEZ CAAMAÑO
e9716c64ec [StructurizeCFG] Remove imposible case and replace by assert
In addition, replace outdated XFAIL test by a new one.

Differential Revision: https://reviews.llvm.org/D134439
2022-09-29 08:27:49 +00:00
Ruiling Song
a5676a3a7e StructurizeCFG: Set Undef for non-predecessors in setPhiValues()
During structurization process, we may place non-predecessor blocks
between the predecessors of a block in the structurized CFG. Take
the typical while-break case as an example:
```
 /---A(v=...)
 |  / \
 ^ B   C
 |  \ /|
 \---L |
     \ /
      E (r = phi (v:C)...)
```
After structurization, the CFG would be look like:
```
 /---A
 |   |\
 |   | C
 |   |/
 |   F1
 ^   |\
 |   | B
 |   |/
 |   F2
 |   |\
 |   | L
 \   |/
  \--F3
     |
     E
```
We can see that block B is placed between the predecessors(C/L) of E.
During phi reconstruction, to achieve the same sematics as before, we
are reconstructing the PHIs as:
  F1: v1 = phi (v:C), (undef:A)
  F3: r = phi (v1:F2), ...
But this is also saying that `v1` would be live through B, which is not
quite necessary. The idea in the change is to say the incoming value
from B is Undef for the PHI in E. With this change, the reconstructed
PHI would be:
  F1: v1 = phi (v:C), (undef:A)
  F2: v2 = phi (v1:F1), (undef:B)
  F3: r  = phi (v2:F2), ...

Reviewed by: sameerds

Differential Revision: https://reviews.llvm.org/D132450
2022-09-26 09:54:47 +08:00
Ruiling Song
40e9284f3c StructurizeCFG: prefer reduced number of live values
The instruction simplification will try to simplify the affected phis.
In some cases, this might extend the liveness of values. For example:

  BB0:
   | \
   | BB1
   | /
  BB2:phi (BB0, v), (BB1, undef)

The phi in BB2 will be simplified to v as v dominates BB2, but this is
increasing the number of active values in BB1. By setting CanUseUndef
to false, we will not simplify the phi in this way, this would help
register pressure. This is mandatory for the later change to help
reducing VGPR pressure for AMDGPU.

Reviewed by: foad, sameerds

Differential Revision: https://reviews.llvm.org/D132449
2022-09-26 09:54:47 +08:00
Kazu Hirata
6b1bc80188 [Scalar] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-20 21:18:25 -07:00
Kazu Hirata
e20d210eef [llvm] Qualify auto (NFC)
Identified with readability-qualified-auto.
2022-08-07 23:55:27 -07:00
Brendon Cahoon
c945d88d2b Revert "[StructurizeCFG] Improve basic block ordering"
This reverts commit f1b05a0a2bbbea160002be709f8a1c59de366761.

Need to revert to due to issues identified with testing. The
transformation is incorrect for blocks that contain convergent
instructions.
2022-07-14 09:40:51 -05:00
Brendon Cahoon
f1b05a0a2b [StructurizeCFG] Improve basic block ordering
StructurizeCFG linearizes the successors of branching basic block
by adding Flow blocks to record the true/false path for branches
and back edges. This patch reduces the number of Phi values needed
to capture the control flow path by improving the basic block
ordering.

Previously, StructurizeCFG adds loop exit blocks outside of the
loop. StructurizeCFG sets a boolean value to indicate the path
taken, and all exit block live values extend to after the loop.
For loops with a large number of exits blocks, this creates a
huge number of values that are maintained, which increases
compilation time and register pressure. This is problem
especially with ASAN, which adds early exits to blocks with
unreachable instructions for each instrumented check in the loop.

In specific cases, this patch reduces the number of values needed
after the loop by moving the exit block into the loop. This is
done for blocks that have a single predecessor and single successor
by moving the block to appear just after the predecessor.

Differential Revision: https://reviews.llvm.org/D123231
2022-06-22 16:10:41 -05:00
Simon Moll
b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
serge-sans-paille
59630917d6 Cleanup includes: Transform/Scalar
Estimated impact on preprocessor output line:
before: 1062981579
after:  1062494547

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120817
2022-03-03 07:56:34 +01:00
Jay Foad
0e74d75a29 [StructurizeCFG] Fix boolean not bug
D118623 added code to fold not-of-compare into a compare
with the inverted predicate, if the compare had no other
uses. This relies on accurate use lists in the IR but it
was run before setPhiValues, when some phi inputs are still
stored in a data structure on the side, instead of being
real uses in the IR. The effect was that a phi that should
be using the original compare result would now get an
inverted result instead.

Fix this by moving simplifyConditions after setPhiValues.

Differential Revision: https://reviews.llvm.org/D120312
2022-02-22 17:36:20 +00:00
Jay Foad
d2e5d3512b [StructurizeCFG] Clean up some boolean not instructions
In some cases StructurizeCFG inserts i1 xor instructions to invert
predicates. Add a quick loop to clean these up afterwards if we can get
away with modifying an existing compare instruction instead.
(StructurizeCFG is generally run late in the pipeline so instcombine
does not clean them up for us.)

Differential Revision: https://reviews.llvm.org/D118623
2022-02-01 09:35:37 +00:00
Kazu Hirata
5fc9e30985 [Scalar] Use range-based for loops (NFC) 2021-02-25 19:54:38 -08:00
Kazu Hirata
fb74e1e78a [Transforms/Scalar] Use range-based for loops (NFC) 2021-02-04 21:18:05 -08:00