llvm-project

Author	SHA1	Message	Date
Afanasyev Ivan	d0e5d6fd61	[CodeGen][CodeLayout] Fix segfault on access to deleted block in MBP. (#142357 ) Problem 1: There is a typo which reassigns `BlockWorkList` to `EHPadWorkList` on attempt to remove `RemBB` from work lists. Problem 2: `Chain->UnscheduledPredecessors == 0` is an incorrect way to check whether `RemBB` is enqueued or not. The root cause is a postponed deletion of `WorkList` from already scheduled blocks in `selectBestCandidateBlock`. Bug happens in the following scenario: * `FunctionChain` is being processed with non-zero `UnscheduledPredecessors` * Block `B'` is added to the `BlockWorkList` * Block `B'` is chosen as the best successor (`selectBestSuccessor`) for some another block and added into `Chain` * Block `B'` is removed by tail duplicator. `RemovalCallback` erroneously won't erase `B'` from `BlockWorkList`, because `UnscheduledPredecessors` value of `FunctionChain` is not zero (and it is allowed to be non-zero). Proposed solution is to always cleanup worklists on block deletion by tail duplicator.	2025-06-23 23:04:22 +09:00
Alexis Engelke	61972054f3	[CodeGen] Limit number of analyzed predecessors MachineBlockPlacement has quadratic runtime in the number of predecessors: in some situation, for an edge, all predecessors of the successor are considered. Limit the number of considered predecessors to bound compile time for large functions. Pull Request: https://github.com/llvm/llvm-project/pull/142584	2025-06-20 11:23:00 +02:00
Kazu Hirata	5cfd81b0cc	[llvm] Use range constructors of *Set (NFC) (#137552 )	2025-04-27 15:59:57 -07:00
Akshat Oke	a09fd9c653	[CodeGen][NPM] Port MachineBlockPlacementStats to NPM (#129853 )	2025-04-17 15:24:46 +05:30
Kazu Hirata	ebd1667059	[CodeGen] Avoid repeated hash lookups (NFC) (#135584 )	2025-04-14 01:03:06 -07:00
Akshat Oke	87916f8c32	[CodeGen][NPM] Port MachineBlockPlacement to NPM (#129828 )	2025-03-14 10:31:53 +05:30
Kazu Hirata	ef94d8a0f2	[CodeGen] Avoid repeated hash lookups (NFC) (#129652 )	2025-03-04 01:49:48 -08:00
Kazu Hirata	19a7fe03b4	[CodeGen] Avoid repeated hash lookups (NFC) (#123894 )	2025-01-22 00:17:55 -08:00
Ryan Mansfield	67efbd0bf1	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955 )	2025-01-08 11:07:23 +01:00
Ellis Hoag	c9260e21d0	[CodeLayout] Do not rebuild chains with -apply-ext-tsp-for-size (#115934 ) https://github.com/llvm/llvm-project/pull/109711 disables `buildCFGChains()` when `-apply-ext-tsp-for-size` is used to improve codesize. Tail merging can change the layout and normally requires `buildCFGChains()` to be called again, but we want to prevent this when optimizing for codesize. We saw slight size improvement on large binaries with this change. If `-apply-ext-tsp-for-size` is not used, this should be a NFC.	2024-11-18 09:16:09 -08:00
Ellis Hoag	b8d6659bff	[CodeLayout] Do not flip branch condition when using optsize (#114607 ) * Do not use profile data when flipping a branch condition when optimizing for size. This should improving outlining and ICF due to more uniform instruction sequences. * Refactor `optimizeBranches()` to use early `continue`s * Use the correct debug location for `insertBranch()`	2024-11-12 09:50:29 -08:00
Ellis Hoag	6ab26eab4f	Check hasOptSize() in shouldOptimizeForSize() (#112626 )	2024-10-28 09:45:03 -07:00
Ellis Hoag	cb5fbd2f60	[CodeLayout] Do not verify after assigning blocks (#111754 ) Rather than invariantly running `F->verify()` when asserts are enabled, run machine IR verification in LIT tests only. Swap `CHECK-PERF` and `CHECK-SIZE` in `code_placement_ext_tsp_large.ll`. Remove `={0,1,true,false}` from flags in tests.	2024-10-10 09:01:50 -07:00
spupyrev	9016f27c42	[CodeLayout] Size-aware machine block placement (#109711 ) This is an implementation of a new "size-aware" machine block placement. The idea is to reorder blocks so that the number of fall-through jumps is maximized. Observe that profile data is ignored for the optimization, and it is applied only for instances with hasOptSize()=true. This strategy has two benefits: (i) it eliminates jump instructions, which results in smaller text size; (ii) we avoid using profile data while reordering blocks, which yields more "uniform" functions, thus helping ICF and machine outliner/merger. For large (mobile) apps, the size benefits of (i) and (ii) are roughly the same, combined providing up to 0.5% uncompressed and up to 1% compressed savings size on top of the current solution. The optimization is turned off by default.	2024-10-02 10:48:08 -07:00
Ellis Hoag	fbec1c2a08	[NFC][CodeLayout] Remove unused parameter (#110145 ) The `NodeCounts` parameter of `calcExtTspScore()` is unused, so remove it. Use `SmallVector` since arrays are expected to be small since they represent MBBs.	2024-09-26 10:28:06 -07:00
Matt Arsenault	71ca9fcb8d	llvm-reduce: Don't print verifier failed machine functions (#109673 ) This produces far too much terminal output, particularly for the instruction reduction. Since it doesn't consider the liveness of of the instructions it's deleting, it produces quite a lot of verifier errors.	2024-09-24 22:32:53 +04:00
spupyrev	36dce5091e	[CodeLayout][NFC] Format and minor refactoring of MBP (#109729 ) This PR has two (NFC) commits: - clang-format MBP - move a part of tail duplication and block aligning into helper functions for better readability.	2024-09-24 08:36:57 -07:00
Alexis Engelke	862d822d83	[CodeGen] Don't renumber invalid domtree (#102427 ) Machine block placement might remove nodes from the function but does not update the dominator tree accordingly. Instead of renumbering (which might crash due to accessing removed blocks), set the domtree to null to make clear that it is invalid at this point. Fixup of #102107.	2024-08-08 08:53:45 +02:00
Alexis Engelke	d871b2e0d0	[CodeGen] Use optimized domtree for MachineFunction (#102107 ) The dominator tree gained an optimization to use block numbers instead of a DenseMap to store blocks. Given that machine basic blocks already have numbers, expose these via appropriate GraphTraits. For debugging, block number epochs are added to MachineFunction -- this greatly helps in finding uses of block numbers after RenumberBlocks(). In a few cases where dominator trees are preserved across renumberings, the dominator tree is updated to use the new numbers.	2024-08-06 13:46:19 +02:00
Krzysztof Pszeniczny	5bae81ba9e	[CodeGen] Add an option to skip extTSP BB placement for huge functions. (#99310 ) The extTSP-based basic block layout algorithm improves the performance of the generated code, but unfortunately it has a super-linear time complexity. This leads to extremely long compilation times for certain relatively rare kinds of autogenerated code. This patch adds an `-mllvm` flag to optionally restrict extTSP only to functions smaller than a specified threshold. While commit bcdc0477319a26fd8dcdde5ace3bdd6743599f44 added a knob to to limit the maximum chain size, it's still possible that for certain huge functions the number of chains is very large, leading to a quadratic behaviour in ExtTSPImpl::mergeChainPairs.	2024-07-24 17:43:26 +02:00
John Brawn	f0bd705c9b	[CodeGen] Restore MachineBlockPlacement block ordering (#99351 ) PR #91843 changed the algorithm used to find the next unplaced block so that it iterates through the blocks in BlockFilter instead of iterating through the blocks in the function and checking if they are in the block filter. Unfortunately this sometimes results in a different block ordering being chosen, as the order of blocks in BlockFilter comes from the order in MachineLoopInfo, and in some cases this differs from the order they are in the function. This can also give an end result that has worse performance. Fix this by making collectLoopBlockSet place blocks in its output in the order that they are in the function.	2024-07-24 10:49:50 +01:00
paperchalice	099899961c	[CodeGen][NewPM] Port `machine-block-freq` to new pass manager (#98317 ) - Add `MachineBlockFrequencyAnalysis`. - Add `MachineBlockFrequencyPrinterPass`. - Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager. - `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new pass manager migration.	2024-07-12 15:45:01 +08:00
paperchalice	79d0de2ac3	[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793 ) - Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.	2024-07-09 09:11:18 +08:00
paperchalice	d38b518e04	Reapply "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858 ) (#96869 ) This reverts commit ab58b6d58edf6a7c8881044fc716ca435d7a0156. In `CodeGen/Generic/MachineBranchProb.ll`, `llc` crashed with dumped MIR when targeting PowerPC. Move test to `llc/new-pm`, which is X86 specific.	2024-06-28 10:59:23 +08:00
paperchalice	ab58b6d58e	Revert "[CodeGen][NewPM] Port machine-branch-prob to new pass manager" (#96858 ) Reverts llvm/llvm-project#96389 Some ppc bots failed.	2024-06-27 15:00:17 +08:00
paperchalice	73e46c2bb4	[CodeGen][NewPM] Port machine-branch-prob to new pass manager (#96389 ) Like IR version `print<branch-prob>`, there is also a `print<machine-branch-prob>`.	2024-06-27 14:04:51 +08:00
William Junda Huang	75882ed4c7	[Codegen] (NFC) Faster algorithm for MachineBlockPlacement (#91843 ) In MachineBlockPlacement, the function getFirstUnplacedBlock is inefficient because in most cases (for usual loop CFG), this function fails to find a candidate, and its complexity becomes O(#(loops in function) * #(blocks in function)). This makes the compilation of very long functions slow. This update reduces it to O(k * #(blocks in function)) where k is the maximum loop nesting depth, by iterating through the BlockFilter instead.	2024-06-13 22:13:38 -04:00
paperchalice	4b24c2dfb5	[CodeGen][NewPM] Split `MachinePostDominators` into a concrete analysis result (#95113 ) `MachinePostDominators` version of #94571.	2024-06-12 14:29:22 +08:00
Kazu Hirata	026a29e8b3	[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of StringRef::equals (NFC) (#91304 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 53 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-07 10:20:10 -07:00
Kazu Hirata	7d269a4841	[CodeGen] Use range-based for loops (NFC)	2024-02-03 21:43:01 -08:00
Freddy Ye	d102f8bda1	[MachineBlockPlacement][X86] Use max of MDAlign and TLIAlign to align Loops. (#71026 ) This patch added backend consumption on a new loop metadata: !1 = !{!"llvm.loop.align", i32 64} which is generated from clang's new loop attribute: [[clang::code_align()]] clang patch: #70762	2023-11-21 14:06:32 +08:00
Kazu Hirata	f9306f6de3	[ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156 ) C++20 comes with std::erase to erase a value from std::vector. This patch renames llvm::erase_value to llvm::erase for consistency with C++20. We could make llvm::erase more similar to std::erase by having it return the number of elements removed, but I'm not doing that for now because nobody seems to care about that in our code base. Since there are only 50 occurrences of erase_value in our code base, this patch replaces all of them with llvm::erase and deprecates llvm::erase_value.	2023-10-24 23:03:13 -07:00
Matthias Braun	2e26d09106	BlockFrequencyInfo: Add PrintBlockFreq helper (#67512 ) - Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions into a `PrintBlockFreq()` function returning a `Printable` object. This simplifies usage as it can be directly piped to a `raw_ostream` like `dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`. - Previously there was an interesting behavior where `BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number and as an `uint64_t`. Most algorithms use the `BlockFrequency` abstraction with the integers, the print function for basic blocks printed the `Scaled64` number potentially showing higher accuracy than was used by the algorithm. This changes things to only print `BlockFrequency` values. - Replace some instances of `dbgs() << Freq.getFrequency()` with the new function.	2023-10-05 18:26:50 -07:00
Matthias Braun	5181156b37	Use BlockFrequency type in more places (NFC) (#68266 ) The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it more consistently in various APIs and disable implicit conversion to make usage more consistent and explicit. - Use `BlockFrequency Freq` parameter for `setBlockFreq`, `getProfileCountFromFreq` and `setBlockFreqAndScale` functions. - Return `BlockFrequency` in `getEntryFreq()` functions. - While on it change some `const BlockFrequency& Freq` parameters to plain `BlockFreqency Freq`. - Mark `BlockFrequency(uint64_t)` constructor as explicit. - Add missing `BlockFrequency::operator!=`. - Remove `uint64_t BlockFreqency::getMaxFrequency()`. - Add `BlockFrequency BlockFrequency::max()` function.	2023-10-05 11:40:17 -07:00
Fangrui Song	6b8d04c23d	[CodeLayout] Refactor std::vector uses, namespace, and EdgeCountT. NFC * Place types and functions in the llvm::codelayout namespace * Change EdgeCountT from pair<pair<uint64_t, uint64_t>, uint64_t> to a struct and utilize structured bindings. It is not conventional to use the "T" suffix for structure types. * Remove a redundant copy in ChainT::merge. * Change {ExtTSPImpl,CDSortImpl}::run to use return value instead of an output parameter * Rename applyCDSLayout to computeCacheDirectedLayout: (a) avoid rare abbreviation "CDS" (cache-directed sort) (b) "compute" is more conventional for the specific use case * Change the parameter types from std::vector to ArrayRef so that SmallVector arguments can be used. * Similarly, rename applyExtTspLayout to computeExtTspLayout. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D159526	2023-09-21 13:13:03 -07:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Guozhi Wei	1bcb6a3da2	[MBP] Enable duplicating return block to remove jump to return Sometimes LLVM generates branch to return instruction, like PR63227. It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds we avoid duplicating a BB into another already placed BB to prevent destroying computed layout. But if the successor BB is a return block, duplicating it will only reduce taken branches without hurt to any other branches. Differential Revision: https://reviews.llvm.org/D153093	2023-06-21 18:54:31 +00:00
Akshay Khadse	43b38696aa	Fix uninitialized class members Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148692	2023-04-20 11:18:34 +08:00
Akshay Khadse	8bf7f86d79	Fix uninitialized pointer members in CodeGen This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D148303	2023-04-17 16:32:46 +08:00
Kazu Hirata	a585fa2637	[CodeGen] Use *{Set,Map}::contains (NFC)	2023-03-14 08:07:42 -07:00
Fangrui Song	1e6921131a	Move global namespace cl::opt inside llvm::	2023-02-14 00:09:44 -08:00
Mingming Liu	36e8e19337	[NFC][BlockPlacement]Add an option to renumber blocks based on function layout order. Use case: - When block layout is visualized after MBP pass, the basic blocks are labeled in layout order; meanwhile blocks could be numbered in a different order. - As a result, it's hard to map between the graph and pass output. With this option on, the basic blocks are renumbered in function layout order. This option is only useful when a function is to be visualized (i.e., when view options are on) to make it debugging only. Use https://godbolt.org/z/5WTW36bMr as an example: - As MBP pass output (shown in godbolt output window), `func2` is in a basic block numbered `2` (`bb.2`), and `func1` is in a basic block numbered `3` (`bb.3`); `bb.3` is a block with higher block frequency than `bb.2`, and `bb.3` is placed before `bb.2` in the functin layout. - Use [1] to get the dot graph (graph uploaded in [2]), the blocks are re-numbered. - `func1` is in 'if.end' block, and labeled `1` in visualized dot; `func2` is in 'if.then' blocks, and labeled `3` --> the labeled number and bb number won't map. - [[ `b5626ae975/llvm/lib/CodeGen/MachineBlockFrequencyInfo.cpp (L127)` \| DOTGraphTraits<MachineBlockFrequencyInfo *>::getNodeLabel ]] is where labeled numbers are based on function layout number, and [[ `a8d93783f3/llvm/include/llvm/Support/GraphWriter.h (L209)` \| called by graph writer ]]. So call 'MachineFunction::RenumberBlocks' would make labeled number (in dot graph) and block number (in pass output) consistent with each other. [1] `./bin/clang++ -O3 -S -mllvm -view-block-layout-with-bfi=count -mllvm -view-bfi-func-name=_Z9func_loopv -mllvm -print-after=block-placement -mllvm -filter-print-funcs=_Z9func_loopv test.c` [2] {F25201785} Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D137467	2022-11-07 07:52:45 -08:00
spupyrev	8d5b694da1	extending code layout alg The diff modifies ext-tsp code layout algorithm in the following ways: (i) fixes merging of cold block chains (this is a port of D129397); (ii) adjusts the cost model utilized for optimization; (iii) adjusts some APIs so that the implementation can be used in BOLT; this is a prerequisite for D129895. The only non-trivial change is (ii). Here we introduce different weights for conditional and unconditional branches in the cost model. Based on the new model it is slightly more important to increase the number of "fall-through unconditional" jumps, which makes sense, as placing two blocks with an unconditional jump next to each other reduces the number of jump instructions in the generated code. Experimentally, this makes a mild impact on the performance; I've seen up to 0.2%-0.3% perf win on some benchmarks. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D129893	2022-08-24 09:40:25 -07:00
Kazu Hirata	9e6d1f4b5d	[CodeGen] Qualify auto variables in for loops (NFC)	2022-07-17 01:33:28 -07:00
Mingming Liu	1e67385d28	[MachineBlockPlacementStats] Added check for "-filter-print-funcs" option to the machine-block-placement-stats. Differential Revision: https://reviews.llvm.org/D128019	2022-06-16 21:59:54 -07:00
Mingming Liu	b7d09557f6	Revert "[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats." This reverts commit 46d45df4516e9a5bc43460429cd02cd04a85db1a. Going to add differential revision link to commit message and re-commit.	2022-06-16 21:56:08 -07:00
Mingming Liu	46d45df451	[MachineBlockPlacementStats] Add check for `-filter-print-funcs` option to machine-block-placement stats.	2022-06-16 21:48:08 -07:00
serge-sans-paille	989f1c72e0	Cleanup codegen includes This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681	2022-03-16 08:43:00 +01:00
Nico Weber	a278250b0f	Revert "Cleanup codegen includes" This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169	2022-03-10 07:59:22 -05:00
serge-sans-paille	7f230feeea	Cleanup codegen includes after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169	2022-03-10 10:00:30 +01:00

1 2 3 4 5 ...

338 Commits