llvm-project

Author	SHA1	Message	Date
Craig Topper	b7eee2c3fe	[CodeGen] Remove some implict conversions of MCRegister to unsigned by using(). NFC Many of these are indexing BitVectors or something where we can't using MCRegister and need the register number.	2025-01-19 13:18:04 -08:00
Akshat Oke	4f96fb5fb3	Reapply "Spiller: Detach legacy pass and supply analyses instead (#119181 )" (#122665 ) Makes Inline Spiller amenable to the new PM. This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted because of two unused private members reported on sanitizer bots.	2025-01-13 14:14:13 +05:30
Akshat Oke	089555095b	Revert "Spiller: Detach legacy pass and supply analyses instead (#119… (#122426 ) …181)" This reverts commit a531800344dc54e9c197a13b22e013f919f3f5e1.	2025-01-10 12:23:07 +05:30
Akshat Oke	a531800344	Spiller: Detach legacy pass and supply analyses instead (#119181 ) Makes Inline Spiller amenable to the new PM.	2025-01-10 11:46:56 +05:30
Ryan Mansfield	67efbd0bf1	[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955 )	2025-01-08 11:07:23 +01:00
Matt Arsenault	93220e7e06	RegAllocGreedy: Fix use after free during last chance recoloring (#120697 ) Last chance recoloring can delete the current fixed interval during recursive assignment of interfering live intervals. Check if the virtual register value was assigned before attempting the unassignment, as is done in other scenarios. This relies on the fact that we do not recycle virtual register numbers. I have only seen this occur in error situations where the allocation will fail, but I think this can theoretically happen in working allocations. This feels very brute force, but I've spent over a week debugging this and this is what works without any lit regressions. The surprising piece to me was that unspillable live ranges may be spilled, and a number of tests rely on optimizations occurring on them. My other attempts to fixed this mostly revolved around not identifying unspillable live ranges as snippet copies. I've also discovered we're making some unproductive live range splits with subranges. If we avoid such splits, some of the unspillable copies disappear but mandating that be precise to fix a use after free doesn't sound right.	2025-01-06 23:12:55 +07:00
Matt Arsenault	11e482c4a3	RegAllocGreedy: Add dummy priority advisor for writing MIR tests (#121207 ) I regularly struggle reproducing failures in greedy due to changes in priority when resuming the allocation from MIR vs. a complete compilation starting at IR. That is, the fix in e0919b189bf2df4f97f22ba40260ab5153988b14 did not really fix the problem of the instruction distance mattering. Add a way to bypass all of the priority heuristics for MIR tests, by prioritizing only by virtual register number. Could also give this a more specific name, like PrioritizeLowVirtRegNumber	2025-01-02 23:04:44 +07:00
Matt Arsenault	e2cabd715b	RegAllocGreedy: Fix comment typo	2024-12-17 12:46:04 +07:00
Matt Arsenault	818bffcb1c	RegAlloc: Fix failure on undef use when all registers are reserved (#119647 ) Greedy and fast would hit different assertions on undef uses if all registers in a class were reserved.	2024-12-16 10:56:45 +09:00
Akshat Oke	2c7ece2e8c	[CodeGen][NewPM] Port LiveStacks analysis to NPM (#118778 )	2024-12-06 15:16:07 +05:30
Akshat Oke	d9b4bdbff5	[CodeGen][NewPM] Port LiveDebugVariables to NPM (#115468 ) The existing analysis was already a pimpl wrapper. I have extracted legacy pass logic to a LDVImpl wrapper named `LiveDebugVariables` which is the analysis::Result now. This controls whether to activate the LDV (depending on `-live-debug-variables` and DIsubprogram) itself. The legacy and new analysis only construct the LiveDebugVariables. VirtRegRewriter will test this.	2024-12-04 14:31:34 +05:30
Akshat Oke	b68340c835	[CodeGen][NewPM] Port SpillPlacement analysis to NPM (#116618 )	2024-11-29 16:55:40 +05:30
Akshat Oke	cac13606c2	[CodeGen][NewPM] Port EdgeBundles analysis to NPM (#116616 )	2024-11-22 16:51:50 +05:30
Kazu Hirata	735ab61ac8	[CodeGen] Remove unused includes (NFC) (#115996 ) Identified with misc-include-cleaner.	2024-11-12 23:15:06 -08:00
Akshat Oke	4e32d7236b	[NewPM][CodeGen] Port LiveRegMatrix to NPM (#109938 )	2024-10-22 15:28:04 +05:30
Akshat Oke	93802815ab	[NewPM][CodeGen] Port VirtRegMap to NPM (#109936 )	2024-10-22 15:15:56 +05:30
Bevin Hansson	1a65d95d00	[CodeGen][RAGreedy] Inform LiveDebugVariables about snippets spilled by InlineSpiller. (#109962 ) RAGreedy invokes InlineSpiller to spill a particular virtreg inline. When the spiller does this, it also identifies small, adjacent liveranges called snippets. These are also spilled or rematerialized in the process. However, the spiller does not inform RA that it has spilled these regs. This means that debug variable locations referencing these regs/ranges are lost. Mark any spilled regs which do not have a stack slot assigned to them as allocated to the slot being spilled to to tell LDV that those regs are located in that slot, even though the regs might no longer exist in the program after regalloc is finished. Also, inform RA about all of the regs which were replaced (spilled or rematted), not just the one that was requested so that it can properly manage the ranges of the debug vars.	2024-10-02 10:29:56 +02:00
Matt Arsenault	71ca9fcb8d	llvm-reduce: Don't print verifier failed machine functions (#109673 ) This produces far too much terminal output, particularly for the instruction reduction. Since it doesn't consider the liveness of of the instructions it's deleting, it produces quite a lot of verifier errors.	2024-09-24 22:32:53 +04:00
Christudasan Devadasan	15b41d207e	[CodeGen] change prototype of regalloc filter function (#93525 ) [CodeGen] Change the prototype of regalloc filter function Change the prototype of the filter function so that we can filter not just by RegClass. We need to implement more complicated filter based upon some other info associated with each register. Patch provided by: Gang Chen (gangc@amd.com)	2024-07-22 16:49:39 +05:30
Kazu Hirata	66cd2e0f9a	[CodeGen] Use range-based for loops (NFC) (#98706 )	2024-07-13 13:29:47 -07:00
paperchalice	099899961c	[CodeGen][NewPM] Port `machine-block-freq` to new pass manager (#98317 ) - Add `MachineBlockFrequencyAnalysis`. - Add `MachineBlockFrequencyPrinterPass`. - Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager. - `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new pass manager migration.	2024-07-12 15:45:01 +08:00
paperchalice	abde52aa66	[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118 ) - Add `LiveIntervalsAnalysis`. - Add `LiveIntervalsPrinterPass`. - Use `LiveIntervalsWrapperPass` in legacy pass manager. - Use `std::unique_ptr` instead of raw pointer for `LICalc`, so destructor and default move constructor can handle it correctly. This would be the last analysis required by `PHIElimination`.	2024-07-10 19:34:48 +08:00
paperchalice	4010f894a1	[CodeGen][NewPM] Port `SlotIndexes` to new pass manager (#97941 ) - Add `SlotIndexesAnalysis`. - Add `SlotIndexesPrinterPass`. - Use `SlotIndexesWrapperPass` in legacy pass.	2024-07-09 12:09:11 +08:00
paperchalice	79d0de2ac3	[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793 ) - Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.	2024-07-09 09:11:18 +08:00
Alexis Engelke	739a960567	[RegAlloc] Don't call always-true ShouldAllocClass (#96296 ) Previously, there was at least one virtual function call for every allocated register. The only users of this feature are AMDGPU and RISC-V (RVV), other targets don't use this. To easily identify these cases, change the default functor to nullptr and don't call it for every allocated register.	2024-06-21 13:18:35 +02:00
paperchalice	837dc542b1	[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571 ) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes.	2024-06-11 21:27:14 +08:00
Piyou Chen	32cb3c5508	[NFC][LLVM][CodeGen] Move LiveDebugVariables.h into llvm/include/llvm/CodeGen (#88374 ) This patch make `LiveDebugVariables` can be used by passes outside of `lib/CodeGen`. If we run a pass that occurs between the split register allocation pass without preserving this pass, it will be freed and recomputed until it encounters the next pass that needs LiveDebugVariables. However, `LiveDebugVariables` will raise an assertion due to the pass being freed without emitting a debug value. This is reason we need `LiveDebugVariables` to be available for passes outside of lib/Codegen.	2024-04-15 21:58:57 +08:00
David Green	303a7835ff	[GreedyRA] Improve RA for nested loop induction variables (#72093 ) Imagine a loop of the form: ``` preheader: %r = def header: bcc latch, inner inner1: .. inner2: b latch latch: %r = subs %r bcc header ``` It can be possible for code to spend a decent amount of time in the header<->latch loop, not going into the inner part of the loop as much. The greedy register allocator can prefer to spill _around_ %r though, adding spills around the subs in the loop, which can be very detrimental for performance. (The case I am looking at is actually a very deeply nested set of loops that repeat the header<->latch pattern at multiple different levels). The greedy RA will apply a preference to spill to the IV, as it is live through the header block. This patch attempts to add a heuristic to prevent that in this case for variables that look like IVs, in a similar regard to the extra spill weight that gets added to variables that look like IVs, that are expensive to spill. That will mean spills are more likely to be pushed into the inner blocks, where they are less likely to be executed and not as expensive as spills around the IV. This gives a 8% speedup in the exchange benchmark from spec2017 when compiled with flang-new, whilst importantly stabilising the scores to be less chaotic to other changes. Running ctmark showed no difference in the compile time. I've tried to run a range of benchmarking for performance, most of which were relatively flat not showing many large differences. One matrix multiply case improved 21.3% due to removing a cascading chains of spills, and some other knock-on effects happen which usually cause small differences in the scores.	2023-11-18 09:55:19 +00:00
Kazu Hirata	bafd35ca04	[llvm] Stop including llvm/ADT/SmallPtrSet.h (NFC) Identified with clangd.	2023-11-11 00:35:14 -08:00
weiguozhi	b6043f9867	[RA] Disable split around hint register if optimize for size (#68619 ) Split a virtual register with hint may generate COPY instructions in multiple cold basic blocks, and increase code size. So disable this split when the function is optimized for size.	2023-10-11 14:57:15 -07:00
Matthias Braun	2e26d09106	BlockFrequencyInfo: Add PrintBlockFreq helper (#67512 ) - Refactor the (Machine)BlockFrequencyInfo::printBlockFreq functions into a `PrintBlockFreq()` function returning a `Printable` object. This simplifies usage as it can be directly piped to a `raw_ostream` like `dbgs() << PrintBlockFreq(MBFI, Freq) << '\n';`. - Previously there was an interesting behavior where `BlockFrequencyInfoImpl` stores frequencies both as a `Scaled64` number and as an `uint64_t`. Most algorithms use the `BlockFrequency` abstraction with the integers, the print function for basic blocks printed the `Scaled64` number potentially showing higher accuracy than was used by the algorithm. This changes things to only print `BlockFrequency` values. - Replace some instances of `dbgs() << Freq.getFrequency()` with the new function.	2023-10-05 18:26:50 -07:00
Matthias Braun	5181156b37	Use BlockFrequency type in more places (NFC) (#68266 ) The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it more consistently in various APIs and disable implicit conversion to make usage more consistent and explicit. - Use `BlockFrequency Freq` parameter for `setBlockFreq`, `getProfileCountFromFreq` and `setBlockFreqAndScale` functions. - Return `BlockFrequency` in `getEntryFreq()` functions. - While on it change some `const BlockFrequency& Freq` parameters to plain `BlockFreqency Freq`. - Mark `BlockFrequency(uint64_t)` constructor as explicit. - Add missing `BlockFrequency::operator!=`. - Remove `uint64_t BlockFreqency::getMaxFrequency()`. - Add `BlockFrequency BlockFrequency::max()` function.	2023-10-05 11:40:17 -07:00
Matt Arsenault	7252787dd9	RegAllocGreedy: Fix detection of lanes read by a bundle SplitKit creates questionably formed bundles of copies when it needs to copy a subset of live lanes and can't do it with a single subregister index. These are merely marked as part of a bundle, and don't start with a BUNDLE instruction. Queries for the slot index would give the first copy in the bundle, and we need to inspect the operands of all the other bundled copies. Also fix and simplify detection of read lane subsets. This causes some RISCV test regressions, but these look like accidentally beneficial splits. I don't see a subrange based reason to perform these splits. Avoids some really ugly regressions in a future patch. https://reviews.llvm.org/D146859	2023-10-01 11:37:48 +03:00
weiguozhi	31f81e96a4	[RA] Don't split a register generated from another split (#67351 ) Split a register generated from another split usually doesn't bring us too much benefit. It may also cause dead loop as pr67188 shows if the heuristic cost always satisfy the split condition. So prevent such splitting. It fixed pr67188.	2023-09-26 08:38:18 -07:00
Jay Foad	e0919b189b	[CodeGen] Renumber slot indexes before register allocation (#66334 ) RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate the length of a live range for its heuristics. Renumbering all slot indexes with the default instruction distance ensures that this estimate will be as accurate as possible, and will not depend on the history of how instructions have been added to and removed from SlotIndexes's maps. This also means that enabling -early-live-intervals, which runs the SlotIndexes analysis earlier, will not cause large amounts of churn due to different register allocator decisions.	2023-09-19 11:18:12 +01:00
Guozhi Wei	cbdccb30c2	[RA] Split a virtual register in cold blocks if it is not assigned preferred physical register If a virtual register is not assigned preferred physical register, it means some COPY instructions will be changed to real register move instructions. In this case we can try to split the virtual register in colder blocks, if success, the original COPY instructions can be deleted, and the new COPY instructions in colder blocks will be generated as register move instructions. It results in fewer dynamic register move instructions executed. The new test case split-reg-with-hint.ll gives an example, the hot path contains 24 instructions without this patch, now it is only 4 instructions with this patch. Differential Revision: https://reviews.llvm.org/D156491	2023-09-15 19:52:50 +00:00
Matt Arsenault	4d42e8b5d1	Reapply "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd. The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work around the underlying problem with SUBREG_TO_REG.	2023-07-31 20:15:45 -04:00
Vitaly Buka	a496c8be6e	Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting" And dependent commits. Details in D150388. This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c. This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e. This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826. This reverts commit b7836d856206ec39509d42529f958c920368166b. No conflicts in the code, few tests had conflicts in autogenerated CHECKs: llvm/test/CodeGen/Thumb2/mve-float32regloops.ll llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll Reviewed By: alexfh Differential Revision: https://reviews.llvm.org/D156381	2023-07-26 22:13:32 -07:00
Yashwant Singh	b7836d8562	[CodeGen]Allow targets to use target specific COPY instructions for live range splitting Replacing D143754. Right now the LiveRangeSplitting during register allocation uses TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a problem as we have both vector and scalar copies. Vector copies perform a copy over a vector register but only on the lanes(threads) that are active. This is mostly sufficient however we do run into cases when we have to copy the entire vector register and not just active lane data. One major place where we need that is live range splitting. Allowing targets to use their own copy instructions(if defined) will provide a lot of flexibility and ease to lower these pseudo instructions to correct MIR. - Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range splitting. - Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline. Reviewed By: arsenm, cdevadas, kparzysz Differential Revision: https://reviews.llvm.org/D150388	2023-07-07 22:29:50 +05:30
Matt Arsenault	c3b27c236d	RegAllocGreedy: Fix assert with remarks on unassigned subregisters This tried to query the physical subregister on virtual registers if they were left unassigned.	2023-06-25 19:26:25 -04:00
Sergei Barannikov	b70b96c0f5	[RegAlloc] Simplify RegAllocEvictionAdvisor::canReassign (NFC) Use range-based for loops. The return type has been changed to bool because the method is only used in boolean contexts. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D152665	2023-06-16 05:39:56 +03:00
Sergei Barannikov	aa2d0fbc30	[MC] Add MCRegisterInfo::regunits for iteration over register units Reviewed By: foad Differential Revision: https://reviews.llvm.org/D152098	2023-06-16 05:39:50 +03:00
Matt Arsenault	ce6c36bab5	RegAllocGreedy: Don't use Register reference	2023-03-17 15:22:13 -04:00
Craig Topper	e72ca520bb	[CodeGen] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFC Use isPhysical/isVirtual methods. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D141715	2023-01-13 14:38:08 -08:00
serge-sans-paille	38818b60c5	Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t), (uint8_t)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955	2023-01-05 14:11:08 +01:00
Fangrui Song	67819a72c6	[CodeGen] llvm::Optional => std::optional	2022-12-13 09:06:36 +00:00
Kazu Hirata	998960ee1f	[CodeGen] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-02 20:36:08 -08:00
Kazu Hirata	20d764aff0	[llvm] Don't including SetVector.h (NFC) llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without including SetVector.h, so this patch adds an appropriate #include there.	2022-09-17 12:36:43 -07:00
Matt Arsenault	63d1d37d35	RegAllocGreedy: Avoid overflowing priority bitfields The class priority is expected to be at most 5 bits before it starts clobbering bits used for other fields. Also clamp the instruction distance in case we have millions of instructions. AMDGPU was accidentally overflowing into the global priority bit in some cases. I think in principal we would have wanted this, but in the cases I've looked at, it had the counter intuitive effect and de-prioritized the large register tuple. Avoid using weird bit hack PPC uses for global priority. The AllocationPriority field is really 5 bits, and PPC was relying on overflowing this to 6-bits to forcibly set the global priority bit. Split this out as a separate flag to avoid having magic behavior for values above 31.	2022-09-15 10:38:40 -04:00
Aiden Grossman	eec183c171	[nfc] Refactor SlotIndex::getInstrDistance to better reflect actual functionality This patch refactors SlotIndex::getInstrDistance to SlotIndex::getApproxInstrDistance to better describe the actual functionality of this function. This patch also adds in some additional comments better documenting the assumptions that this function makes to increase clarity. Based on discussion on the LLVM Discourse: https://discourse.llvm.org/t/odd-behavior-in-slotindex-getinstrdistance/64934/5 Reviewed By: mtrofin, foad Differential Revision: https://reviews.llvm.org/D133386	2022-09-12 23:33:35 +00:00

1 2 3 4 5 ...

491 Commits