338 Commits

Author SHA1 Message Date
Afanasyev Ivan
d0e5d6fd61
[CodeGen][CodeLayout] Fix segfault on access to deleted block in MBP. (#142357)
Problem 1: There is a typo which reassigns `BlockWorkList` to
`EHPadWorkList` on attempt to remove `RemBB` from work lists.

Problem 2: `Chain->UnscheduledPredecessors == 0` is an incorrect way to
check whether `RemBB` is enqueued or not. The root cause is a postponed
deletion of `WorkList` from already scheduled blocks in
`selectBestCandidateBlock`. Bug happens in the following scenario:

* `FunctionChain` is being processed with non-zero
`UnscheduledPredecessors`
* Block `B'` is added to the `BlockWorkList`
* Block `B'` is chosen as the best successor (`selectBestSuccessor`) for
some another block and added into `Chain`
* Block `B'` is removed by tail duplicator.

`RemovalCallback` erroneously won't erase `B'` from `BlockWorkList`,
because `UnscheduledPredecessors` value of `FunctionChain` is not zero
(and it is allowed to be non-zero).

Proposed solution is to always cleanup worklists on block deletion by
tail duplicator.
2025-06-23 23:04:22 +09:00
Alexis Engelke
61972054f3
[CodeGen] Limit number of analyzed predecessors
MachineBlockPlacement has quadratic runtime in the number of
predecessors: in some situation, for an edge, all predecessors of the
successor are considered.

Limit the number of considered predecessors to bound compile time for
large functions.

Pull Request: https://github.com/llvm/llvm-project/pull/142584
2025-06-20 11:23:00 +02:00
Kazu Hirata
5cfd81b0cc
[llvm] Use range constructors of *Set (NFC) (#137552) 2025-04-27 15:59:57 -07:00
Akshat Oke
a09fd9c653
[CodeGen][NPM] Port MachineBlockPlacementStats to NPM (#129853) 2025-04-17 15:24:46 +05:30
Kazu Hirata
ebd1667059
[CodeGen] Avoid repeated hash lookups (NFC) (#135584) 2025-04-14 01:03:06 -07:00
Akshat Oke
87916f8c32
[CodeGen][NPM] Port MachineBlockPlacement to NPM (#129828) 2025-03-14 10:31:53 +05:30
Kazu Hirata
ef94d8a0f2
[CodeGen] Avoid repeated hash lookups (NFC) (#129652) 2025-03-04 01:49:48 -08:00
Kazu Hirata
19a7fe03b4
[CodeGen] Avoid repeated hash lookups (NFC) (#123894) 2025-01-22 00:17:55 -08:00
Ryan Mansfield
67efbd0bf1
[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955) 2025-01-08 11:07:23 +01:00
Ellis Hoag
c9260e21d0
[CodeLayout] Do not rebuild chains with -apply-ext-tsp-for-size (#115934)
https://github.com/llvm/llvm-project/pull/109711 disables
`buildCFGChains()` when `-apply-ext-tsp-for-size` is used to improve
codesize. Tail merging can change the layout and normally requires
`buildCFGChains()` to be called again, but we want to prevent this when
optimizing for codesize. We saw slight size improvement on large
binaries with this change. If `-apply-ext-tsp-for-size` is not used,
this should be a NFC.
2024-11-18 09:16:09 -08:00
Ellis Hoag
b8d6659bff
[CodeLayout] Do not flip branch condition when using optsize (#114607)
* Do not use profile data when flipping a branch condition when
optimizing for size. This should improving outlining and ICF due to more
uniform instruction sequences.
* Refactor `optimizeBranches()` to use early `continue`s
* Use the correct debug location for `insertBranch()`
2024-11-12 09:50:29 -08:00
Ellis Hoag
6ab26eab4f
Check hasOptSize() in shouldOptimizeForSize() (#112626) 2024-10-28 09:45:03 -07:00
Ellis Hoag
cb5fbd2f60
[CodeLayout] Do not verify after assigning blocks (#111754)
Rather than invariantly running `F->verify()` when asserts are enabled,
run machine IR verification in LIT tests only.

Swap `CHECK-PERF` and `CHECK-SIZE` in `code_placement_ext_tsp_large.ll`.

Remove `={0,1,true,false}` from flags in tests.
2024-10-10 09:01:50 -07:00
spupyrev
9016f27c42
[CodeLayout] Size-aware machine block placement (#109711)
This is an implementation of a new "size-aware" machine block placement.
The
idea is to reorder blocks so that the number of fall-through jumps is
maximized.
Observe that profile data is ignored for the optimization, and it is
applied only
for instances with hasOptSize()=true.
This strategy has two benefits:
(i) it eliminates jump instructions, which results in smaller text size;
(ii) we avoid using profile data while reordering blocks, which yields
more
"uniform" functions, thus helping ICF and machine outliner/merger.

For large (mobile) apps, the size benefits of (i) and (ii) are roughly
the same,
combined providing up to 0.5% uncompressed and up to 1% compressed
savings size
on top of the current solution.

The optimization is turned off by default.
2024-10-02 10:48:08 -07:00
Ellis Hoag
fbec1c2a08
[NFC][CodeLayout] Remove unused parameter (#110145)
The `NodeCounts` parameter of `calcExtTspScore()` is unused, so remove
it.
Use `SmallVector` since arrays are expected to be small since they
represent MBBs.
2024-09-26 10:28:06 -07:00
Matt Arsenault
71ca9fcb8d
llvm-reduce: Don't print verifier failed machine functions (#109673)
This produces far too much terminal output, particularly for the
instruction reduction. Since it doesn't consider the liveness of of
the instructions it's deleting, it produces quite a lot of verifier
errors.
2024-09-24 22:32:53 +04:00
spupyrev
36dce5091e
[CodeLayout][NFC] Format and minor refactoring of MBP (#109729)
This PR has two (NFC) commits:
- clang-format MBP
- move a part of tail duplication and block aligning into helper
functions for better readability.
2024-09-24 08:36:57 -07:00
Alexis Engelke
862d822d83
[CodeGen] Don't renumber invalid domtree (#102427)
Machine block placement might remove nodes from the function but does
not update the dominator tree accordingly. Instead of renumbering (which
might crash due to accessing removed blocks), set the domtree to null to
make clear that it is invalid at this point.

Fixup of #102107.
2024-08-08 08:53:45 +02:00
Alexis Engelke
d871b2e0d0
[CodeGen] Use optimized domtree for MachineFunction (#102107)
The dominator tree gained an optimization to use block numbers instead
of a DenseMap to store blocks. Given that machine basic blocks already
have numbers, expose these via appropriate GraphTraits. For debugging,
block number epochs are added to MachineFunction -- this greatly helps
in finding uses of block numbers after RenumberBlocks().

In a few cases where dominator trees are preserved across renumberings,
the dominator tree is updated to use the new numbers.
2024-08-06 13:46:19 +02:00
Krzysztof Pszeniczny
5bae81ba9e
[CodeGen] Add an option to skip extTSP BB placement for huge functions. (#99310)
The extTSP-based basic block layout algorithm improves the performance
of the generated code, but unfortunately it has a super-linear time
complexity. This leads to extremely long compilation times for certain
relatively rare kinds of autogenerated code.

This patch adds an `-mllvm` flag to optionally restrict extTSP only to
functions smaller than a specified threshold. While commit
bcdc0477319a26fd8dcdde5ace3bdd6743599f44 added a knob to to limit the
maximum chain size, it's still possible that for certain huge functions
the number of chains is very large, leading to a quadratic behaviour in
ExtTSPImpl::mergeChainPairs.
2024-07-24 17:43:26 +02:00
John Brawn
f0bd705c9b
[CodeGen] Restore MachineBlockPlacement block ordering (#99351)
PR #91843 changed the algorithm used to find the next unplaced block so
that it iterates through the blocks in BlockFilter instead of iterating
through the blocks in the function and checking if they are in the block
filter. Unfortunately this sometimes results in a different block
ordering being chosen, as the order of blocks in BlockFilter comes from
the order in MachineLoopInfo, and in some cases this differs from the
order they are in the function. This can also give an end result that
has worse performance.

Fix this by making collectLoopBlockSet place blocks in its output in the
order that they are in the function.
2024-07-24 10:49:50 +01:00
paperchalice
099899961c
[CodeGen][NewPM] Port machine-block-freq to new pass manager (#98317)
- Add `MachineBlockFrequencyAnalysis`.
- Add `MachineBlockFrequencyPrinterPass`.
- Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager.
- `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new
pass manager migration.
2024-07-12 15:45:01 +08:00
paperchalice
79d0de2ac3
[CodeGen][NewPM] Port machine-loops to new pass manager (#97793)
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-09 09:11:18 +08:00
paperchalice
d38b518e04
Reapply "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858) (#96869)
This reverts commit ab58b6d58edf6a7c8881044fc716ca435d7a0156.
In `CodeGen/Generic/MachineBranchProb.ll`, `llc` crashed with dumped MIR
when targeting PowerPC. Move test to `llc/new-pm`, which is X86
specific.
2024-06-28 10:59:23 +08:00
paperchalice
ab58b6d58e
Revert "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858)
Reverts llvm/llvm-project#96389
Some ppc bots failed.
2024-06-27 15:00:17 +08:00
paperchalice
73e46c2bb4
[CodeGen][NewPM] Port machine-branch-prob to new pass manager (#96389)
Like IR version `print<branch-prob>`, there is also a
`print<machine-branch-prob>`.
2024-06-27 14:04:51 +08:00
William Junda Huang
75882ed4c7
[Codegen] (NFC) Faster algorithm for MachineBlockPlacement (#91843)
In MachineBlockPlacement, the function getFirstUnplacedBlock is
inefficient because in most cases (for usual loop CFG), this function
fails to find a candidate, and its complexity becomes O(#(loops in
function) * #(blocks in function)). This makes the compilation of very
long functions slow. This update reduces it to O(k * #(blocks in
function)) where k is the maximum loop nesting depth, by iterating
through the BlockFilter instead.
2024-06-13 22:13:38 -04:00
paperchalice
4b24c2dfb5
[CodeGen][NewPM] Split MachinePostDominators into a concrete analysis result (#95113)
`MachinePostDominators` version of #94571.
2024-06-12 14:29:22 +08:00
Kazu Hirata
026a29e8b3
[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of StringRef::equals (NFC) (#91304)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  53 under llvm/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-07 10:20:10 -07:00
Kazu Hirata
7d269a4841 [CodeGen] Use range-based for loops (NFC) 2024-02-03 21:43:01 -08:00
Freddy Ye
d102f8bda1
[MachineBlockPlacement][X86] Use max of MDAlign and TLIAlign to align Loops. (#71026)
This patch added backend consumption on a new loop metadata:
!1 = !{!"llvm.loop.align", i32 64}
which is generated from clang's new loop attribute:
[[clang::code_align()]]
clang patch: #70762
2023-11-21 14:06:32 +08:00
Kazu Hirata
f9306f6de3
[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)
C++20 comes with std::erase to erase a value from std::vector.  This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.

We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.

Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
2023-10-24 23:03:13 -07:00
Matthias Braun
2e26d09106
BlockFrequencyInfo: Add PrintBlockFreq helper (#67512)
- Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions
into a `PrintBlockFreq()` function returning a `Printable` object. This
simplifies usage as it can be directly piped to a `raw_ostream` like
`dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`.
- Previously there was an interesting behavior where
`BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number
and as an `uint64_t`. Most algorithms use the `BlockFrequency`
abstraction with the integers, the print function for basic blocks
printed the `Scaled64` number potentially showing higher accuracy than
was used by the algorithm. This changes things to only print
`BlockFrequency` values.
- Replace some instances of `dbgs() << Freq.getFrequency()` with the new
function.
2023-10-05 18:26:50 -07:00
Matthias Braun
5181156b37
Use BlockFrequency type in more places (NFC) (#68266)
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.

- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
2023-10-05 11:40:17 -07:00
Fangrui Song
6b8d04c23d [CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFC
* Place types and functions in the llvm::codelayout namespace
* Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t> to a struct and utilize structured bindings.
  It is not conventional to use the "T" suffix for structure types.
* Remove a redundant copy in ChainT::merge.
* Change {ExtTSPImpl,CDSortImpl}::run to use return value instead of an output parameter
* Rename applyCDSLayout to computeCacheDirectedLayout: (a) avoid rare
  abbreviation "CDS" (cache-directed sort) (b) "compute" is more conventional
  for the specific use case
* Change the parameter types from std::vector to ArrayRef so that
  SmallVector arguments can be used.
* Similarly, rename applyExtTspLayout to computeExtTspLayout.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D159526
2023-09-21 13:13:03 -07:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Guozhi Wei
1bcb6a3da2 [MBP] Enable duplicating return block to remove jump to return
Sometimes LLVM generates branch to return instruction, like PR63227.

It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds
we avoid duplicating a BB into another already placed BB to prevent destroying
computed layout. But if the successor BB is a return block, duplicating it will
only reduce taken branches without hurt to any other branches.

Differential Revision: https://reviews.llvm.org/D153093
2023-06-21 18:54:31 +00:00
Akshay Khadse
43b38696aa Fix uninitialized class members
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D148692
2023-04-20 11:18:34 +08:00
Akshay Khadse
8bf7f86d79 Fix uninitialized pointer members in CodeGen
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D148303
2023-04-17 16:32:46 +08:00
Kazu Hirata
a585fa2637 [CodeGen] Use *{Set,Map}::contains (NFC) 2023-03-14 08:07:42 -07:00
Fangrui Song
1e6921131a Move global namespace cl::opt inside llvm:: 2023-02-14 00:09:44 -08:00
Mingming Liu
36e8e19337 [NFC][BlockPlacement]Add an option to renumber blocks based on function layout order.
Use case:
- When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order.
- As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order.

This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only.

Use https://godbolt.org/z/5WTW36bMr as an example:
- As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`);
    `bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout.
- Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered.
   - `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map.
   - [[ b5626ae975/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp (L127) | DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ a8d93783f3/llvm/include/llvm/Support/GraphWriter.h (L209)
 | called by graph writer ]].
        So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other.

[1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm  -filter-print-funcs=_Z9func_loopv test.c`

[2] {F25201785}

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D137467
2022-11-07 07:52:45 -08:00
spupyrev
8d5b694da1 extending code layout alg
The diff modifies ext-tsp code layout algorithm in the following ways:
(i) fixes merging of cold block chains (this is a port of D129397);
(ii) adjusts the cost model utilized for optimization;
(iii) adjusts some APIs so that the implementation can be used in BOLT; this is
a prerequisite for D129895.

The only non-trivial change is (ii). Here we introduce different weights for
conditional and unconditional branches in the cost model. Based on the new model
it is slightly more important to increase the number of "fall-through
unconditional" jumps, which makes sense, as placing two blocks with an
unconditional jump next to each other reduces the number of jump instructions in
the generated code. Experimentally, this makes a mild impact on the performance;
I've seen up to 0.2%-0.3% perf win on some benchmarks.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D129893
2022-08-24 09:40:25 -07:00
Kazu Hirata
9e6d1f4b5d [CodeGen] Qualify auto variables in for loops (NFC) 2022-07-17 01:33:28 -07:00
Mingming Liu
1e67385d28 [MachineBlockPlacementStats] Added check for "-filter-print-funcs"
option to the machine-block-placement-stats.

Differential Revision: https://reviews.llvm.org/D128019
2022-06-16 21:59:54 -07:00
Mingming Liu
b7d09557f6 Revert "[MachineBlockPlacementStats] Add check for -filter-print-funcs option to machine-block-placement stats."
This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to
add differential revision link to commit message and re-commit.
2022-06-16 21:56:08 -07:00
Mingming Liu
46d45df451 [MachineBlockPlacementStats] Add check for -filter-print-funcs option to machine-block-placement stats. 2022-06-16 21:48:08 -07:00
serge-sans-paille
989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Nico Weber
a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille
7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00