527 Commits

Author SHA1 Message Date
theRonShark
3a1079fa25
Revert "[RegAlloc] Relax the split constrain on MBB prolog" (#169990)
Reverts llvm/llvm-project#168259

breaks hip buildot
2025-11-29 08:01:23 -05:00
Luo Yuanke
9bae84b017
[RegAlloc] Relax the split constrain on MBB prolog (#168259)
https://reviews.llvm.org/D52052 is to prevent register split on the MBB
which have prolog instructions defining the exec register (or mask register
that activate the threads of a warp in GPU). The constrain seems too
strict, because 1) If the split is allowed, it may fit the free live range
of a physical register, and no spill will happen; 2) The register class of
register that is under splitting may not be the same to the register that
is defined in prolog, so there is no interference with the register being
defined in prolog. 
The current code has another small issue. The MBB->getFirstNonDebugInstr()
just skip debug instructions, but SA->getFirstSplitPoint(Number) would skip
label and phi instructions. This cause some MBB with label instruction
being taken as prolog.
This patch is to relax the split constrain on MMB with prolog by checking
if the register defined in prolog has the common register class with the
register being split. It allow the split if the register defined in prolog
is physical register or there is no common register class.

---------

Co-authored-by: Yuanke Luo <ykluo@birentech.com>
2025-11-29 07:27:19 +08:00
Sergei Barannikov
97a60aa37a
[CodeGen] Turn MCRegUnit into an enum class (NFC) (#167943)
This changes `MCRegUnit` type from `unsigned` to `enum class : unsigned`
and inserts necessary casts.
The added `MCRegUnitToIndex` functor is used with `SparseSet`,
`SparseMultiSet` and `IndexedMap` in a few places.

`MCRegUnit` is opaque to users, so it didn't seem worth making it a
full-fledged class like `Register`.

Static type checking has detected one issue in
`PrologueEpilogueInserter.cpp`, where `BitVector` created for
`MCRegister` is indexed by both `MCRegister` and `MCRegUnit`.

The number of casts could be reduced by using `IndexedMap` in more
places and/or adding a `BitVector` adaptor, but the number of casts *per
file* is still small and `IndexedMap` has limitations, so it didn't seem
worth the effort.

Pull Request: https://github.com/llvm/llvm-project/pull/167943
2025-11-16 20:46:44 +03:00
Craig Topper
388ef61250
[RegAllocGreedy] Use MCRegister instead of MCPhysReg. NFC (#167974) 2025-11-13 23:26:35 +00:00
Matt Arsenault
11b0cf8fbe
Greedy: Take hints from copy to physical subreg (#160467)
Previously this took hints from subregister extract of physreg,
like  %vreg.sub = COPY $physreg

This now also handles the rarer case:
  $physreg_sub = COPY %vreg

Also make an accidental bug here before explicit; this was
only using the superregister as a hint if it was already
in the copy, and not if using the existing assignment. There are
a handful of regressions in that case, so leave that extension
for a future change.
2025-10-02 23:28:38 +09:00
Matt Arsenault
f98735f126
Greedy: Use initializer list for recoloring candidates (NFC) (#160486) 2025-10-02 23:22:20 +09:00
Matt Arsenault
706b79002e
Greedy: Merge VirtRegMap queries into one use (NFC) (#160485) 2025-10-02 14:06:54 +00:00
Matt Arsenault
c4e1bca407
Greedy: Move physreg check when trying to recolor vregs (NFC) (#160484)
Instead of checking if the recoloring candidate is a virtual register,
avoid adding it to the candidates in the first place.
2025-10-02 22:24:49 +09:00
Matt Arsenault
3537e8abfa
RegAllocGreedy: Check if copied lanes are live in trySplitAroundHintReg (#160424)
For subregister copies, do a subregister live check instead of checking
the main range. Doesn't do much yet, the split analysis still does not
track live ranges.
2025-10-02 12:21:02 +00:00
Matt Arsenault
129394e3f2
Greedy: Make trySplitAroundHintReg try to match hints with subreg copies (#160294)
This is essentially the same patch as
116ca9522e89f1e4e02676b5bbe505e80c4d4933;
when trying to match a physreg hint, try to find a compatible physreg if
there is
a subregister copy. This has the slight difference of using getSubReg on
the hint
instead of getMatchingSuperReg (the other use should also use getSubReg
instead,
it's faster).

At the moment this turns out to have very little effect. The adjacent
code needs
better handling of subregisters, so continue adding this piecemeal. The
X86 test
shows a net reduction in real instructions, plus a few new kills.
2025-09-27 00:14:38 +09:00
Jay Foad
cecdff9283
Greedy: Simplify collectHintInfo using MachineOperands. NFCI. (#159724)
If a COPY uses Reg but only in an implicit operand then the new
implementation ignores it but the old implementation would have treated
it as a copy of Reg. Probably this case never occurs in practice. Other
than that, this patch is NFC.

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-09-22 11:48:52 +01:00
Matt Arsenault
116ca9522e
Greedy: Take copy hints involving subregisters (#159570)
Previously this would only accept full copy hints. This relaxes
this to accept some subregister copies. Specifically, this now
accepts:
  - Copies to/from physical registers if there is a compatible
    super register
  - Subreg-to-subreg copies

This has the potential to repeatedly add the same hint to the
hint vector, but not sure if that's a real problem.
2025-09-19 09:37:36 +09:00
Rahul Joshi
1fdf02ad5a
[LLVM][CodeGen] Add convenience accessors for MachineFunctionProperties (#140002)
Add per-property has<Prop>/set<Prop>/reset<Prop> functions to
MachineFunctionProperties.
2025-05-22 08:07:52 -07:00
Philip Reames
2bb2f8ab49
[CodeGen] Remove experimental deferred spilling from GreedyRegAlloc (#137850)
This experimental option was introduced in 2015 via commit 1192294, and
the target hook was added in 2020 via commit 99e865b6. There does not
appear to have ever been a use of this target hook in tree.

This code is complicating one of the most complicated and hard to
understand parts of our code base, and was an experiment introduced
nearly 10 years ago. Let's get rid of it.

Note that the idea described in the original patch is not neccessarily a
bad one, and we might return to it someday.
2025-05-01 08:11:51 -07:00
weiguozhi
b25b51eb63
[InlineSpiller] Check rematerialization before folding operand (#134015)
Current implementation tries to fold the operand before
rematerialization because it can reduce one register usage. But if there
is a physical register available we can still rematerialize it without
causing high register pressure.

This patch do this check to find the better choice. Then we can produce

    xorps %xmm1, %xmm1
    ucomiss %xmm1, %xmm0

instead of 

    ucomiss LCPI0_1(%rip), %xmm0
2025-04-28 09:52:03 -07:00
Abhishek Kaushik
5543d9ded7
[RegAlloc][NFC] Use std::move to avoid copy (#134533) 2025-04-10 14:45:02 +05:30
Kazu Hirata
e3a3f78f35
[CodeGen] Use llvm::append_range (NFC) (#133603) 2025-03-29 16:53:02 -07:00
Michael Maitland
00fabd21bc
[RISCV][RegAlloc] Add getCSRFirstUseCost for RISC-V (#131349)
This is based off of 63efd8e7e68bc.

The following table shows the percent change to the dynamic instruction
count when the function in this patch returns 0 (default) versus other
values.

| benchmark | % speedup 1 over 0 | % speedup 4 over 0 | % speedup 16
over 0 | % speedup 64 over 0 | % speedup 128 over 0 |
| --------------- | ---------------------- | --------------------- |
--------------------- | -------------------- | -------------------- |
| 500.perlbench_r | 0.001018570165 | 0.001049508358 | 0.001001106529 |
0.03382582818 | 0.03395354577 |
| 502.gcc_r | 0.02850551412 | 0.02170512371 | 0.01453021263 |
0.06011008637 | 0.1215691521 |
| 505.mcf_r | -0.00009506373338 | -0.00009090057642 | -0.0000860991497 |
-0.00005027849766 | 0.00001251173791 |
| 520.omnetpp_r | 0.2958940288 | 0.2959715925 | 0.2961141505 |
0.2959823497 | 0.2963124341 |
| 523.xalancbmk_r | -0.0327074721 | -0.01037021046 | -0.3226810542 |
0.02127133714 | 0.02765388389 |
| 525.x264_r | 0.0000001381714403 | -0.00000007041540345 |
-0.00000002156399465 | 0.0000002108993364 | 0.0000002463382874 |
| 531.deepsjeng_r | 0.00000000339777238 | 0.000000003874652714 |
0.000000003636212547 | 0.000000003874652714 | 0.000000003159332213 |
| 541.leela_r | 0.0009186059953 | -0.000424159199 | 0.0004984456879 |
0.274948447 | 0.8135521414 |
| 557.xz_r | -0.000000003547118854 | -0.00004896449559 |
-0.00004910691576 | -0.0000491109983 | -0.00004895599589 |
| geomean | 0.03265937388 | 0.03424232324 | -0.00107917442 |
0.07629116165 | 0.1439913192 |

The following table shows the percent change to the runtime when the
function in this patch returns 0 (default) versus other values.

| benchmark | % speedup 1 over 0 | % speedup 4 over 0 | % speedup 16
over 0 | % speedup 64 over 0 | %speedup 128 over 0 |
| --------------- | ------------------ | ------------------ |
------------------- | ------------------- | ------------------- |
| 500.perlbench_r | 0.1722356761 | 0.2269681109 | 0.2596825578 |
0.361573851 | 1.15041305 |
| 502.gcc_r | -0.548415855 | -0.06187002799 | -0.5553684674 |
-0.8876686237 | -0.4668665535 |
| 505.mcf_r | -0.8786414258 | -0.4150938441 | -1.035517726 |
-0.1860770377 | -0.01904825648 |
| 520.omnetpp_r | 0.4130256072 | 0.6595976188 | 0.897332171 |
0.6252625622 | 0.3869467278 |
| 523.xalancbmk_r | 1.318132014 | -0.003927574 | 1.025962975 |
1.090320253 | -0.789206202 |
| 525.x264_r | -0.03112871796 | -0.00167557587 | 0.06932423155 |
-0.1919840015 | -0.1203585732 |
| 531.deepsjeng_r | -0.259516072 | -0.01973455652 | -0.2723227894 |
-0.005417022257 | -0.02222388177 |
| 541.leela_r | -0.3497178495 | -0.3510447393 | 0.1274508001 |
0.6485542452 | 0.2880651727 |
| 557.xz_r | 0.7683565263 | -0.2197509447 | -0.0431183874 |
0.07518130872 | 0.5236853039 |
| geomean | 0.06506952742 | -0.0211865386 | 0.05072694648 | 0.1684530637
| 0.1020533557 |

I chose to set the value to 5 on RISC-V because it has improvement to
both the dynamic IC and the runtime and because it showed good results
empirically and had a similar effect as setting it to higher numbers.

I looked at some diff and it seems like this patch leads to two things:
1. Less spilling -- not spilling the CSR led to better register
allocation and helped us avoid spills down the line
2. Avoid spilling CSR but spill more on paths that static heuristics
estimate as cold.
2025-03-20 15:20:04 -04:00
Craig Topper
13cce8c0bc [CodeGen] Use Register::id() to avoid implicit cast. NFC 2025-03-02 22:33:26 -08:00
Matt Arsenault
1a114fa302
RegAlloc: Use new approach to handling failed allocations (#128469)
This fixes an assert after allocation failure.

Rather than collecting failed virtual registers and hacking
on the uses after the fact, directly hack on the uses and rewrite
the registers to the dummy assignment immediately.

Previously we were bypassing LiveRegMatrix and directly assigning
in the VirtRegMap. This resulted in inconsistencies where illegal
overlapping assignments were missing. Rather than try to hack in
some system to manage these in LiveRegMatrix (i.e. hacking around
cases with invalid iterators), avoid this by directly using the
physreg. This should also allow removal of special casing in
virtregrewriter for failed allocations.
2025-02-26 15:34:47 +07:00
Matt Arsenault
e160c35c9e
Reapply "RegAlloc: Fix verifier error after failed allocation (#119690)" (#128400)
Reapply "RegAlloc: Fix verifier error after failed allocation (#119690)"

This reverts commit 0c50054820799578be8f62b6fd2cc3fbc751c01e.

Reapply with more fixes to avoid expensive_checks failures. Make sure to
call splitSeparateComponents after shrinkToUses, and update the VirtRegMap
with the split registers. Also set undef on all physical register aliases to
the assigned register.

Move physreg handling. Not sure if necessary

Remove intervals from regunits. Not sure if necessary
2025-02-26 15:31:48 +07:00
Akshat Oke
fe13cb985c
[CodeGen][NewPM] Port RegAllocGreedy to NPM (#119540)
Leaving out NPM command line support for the next patch.
2025-02-26 12:11:22 +05:30
Matt Arsenault
b66ec64b5b
RegAllocGreedy: Remove unnecessary null register class check (#128487) 2025-02-24 22:56:54 +07:00
Matt Arsenault
3532651b6f RegAllocGreedy: Add braces 2025-02-24 17:08:42 +07:00
Craig Topper
228dbd254a [RegAllocGreedy] Use MCRegister instead of Register for functions that return a physical register.
The callers of these functions return the value as an MCRegister
so this removes some casts from Register to MCRegister.
2025-02-22 21:39:25 -08:00
Craig Topper
0bd66c4194 [RegAllocGreedy] Remove unnecessary conversion from MCRegister to Register. NFC 2025-02-22 16:20:19 -08:00
Craig Topper
6fe780ce63 [RegAllocGreedy] Use Register() instead of 0 for invalid Register. NFC 2025-02-22 16:20:19 -08:00
Matt Arsenault
0c50054820 Revert "RegAlloc: Fix verifier error after failed allocation (#119690)"
This reverts commit 34167f99668ce4d4d6a1fb88453a8d5b56d16ed5.

Different set of verifier errors appears after other regalloc failure
tests with EXPENSIVE_CHECKS.
2025-02-22 00:23:21 +07:00
Matt Arsenault
34167f9966
RegAlloc: Fix verifier error after failed allocation (#119690)
In some cases after reporting an allocation failure, this would fail
the verifier. It picks the first allocatable register and assigns it,
but didn't update the liveness appropriately. When VirtRegRewriter
relied on the liveness to set kill flags, it would incorrectly add
kill flags if there was another overlapping kill of the virtual
register.

We can't properly assign the register to an overlapping range, so
break the liveness of the failing register (and any other interfering
registers) instead. Give the virtual register dummy liveness by
effectively deleting all the uses by setting them to undef.

The edge case not tested here which I'm worried about is if the read
of the register is a def of a subregister. I've been unable to come up
with a test where this occurs.

https://reviews.llvm.org/D122616
2025-02-21 22:11:51 +07:00
Akshat Oke
557628dbe6
[CodeGen][NewPM] Port RegAllocPriorityAdvisor analysis to NPM (#118462)
Similar to #117309.

The advisor and logger are accessed through the provider, which is
served by the new PM. Legacy PM forwards calls to the provider.
New PM is a machine function analysis that lazily initializes the
provider.
2025-02-20 09:35:49 +05:30
Akshat Oke
519b53e65e
[CodeGen][NewPM] Port RegAllocEvictionAdvisor analysis to NPM (#117309)
Legacy pass used to provide the advisor, so this extracts that logic
into a provider class used by both analysis passes.

All three (Default, Release, Development) legacy passes
`*AdvisorAnalysis` are basically renamed to `*AdvisorProvider`, so the
actual legacy wrapper passes are `*AdvisorAnalysisLegacy`.

There is only one NPM analysis `RegAllocEvictionAnalysis` that switches
between the three providers in the `::run` method, to be cached by the
NPM.

Also adds `RequireAnalysis<RegAllocEvictionAnalysis>` to the optimized
target reg alloc codegen builder.
2025-02-18 18:55:06 +07:00
Matt Arsenault
43780f4f92 RegAllocGreedy: Use Register type 2025-02-13 20:49:27 +07:00
Akshat Oke
7b60e03d73
Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126684)
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.

Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.

---------

Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
2025-02-12 18:54:39 +05:30
Akshat Oke
564b9b7f4d
Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126268)
This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I
investigate what's causing the compile-time regression.
2025-02-08 15:36:48 +05:30
Christudasan Devadasan
5aa4979c47
CodeGen][NewPM] Port MachineScheduler to NPM. (#125703) 2025-02-05 12:17:59 +05:30
Akshat Oke
fe9a97ca38
[CodeGen][NewPM] Port RegisterCoalescer to NPM (#124698) 2025-02-03 13:41:51 +07:00
Craig Topper
b7eee2c3fe [CodeGen] Remove some implict conversions of MCRegister to unsigned by using(). NFC
Many of these are indexing BitVectors or something where we can't
using MCRegister and need the register number.
2025-01-19 13:18:04 -08:00
Akshat Oke
4f96fb5fb3
Reapply "Spiller: Detach legacy pass and supply analyses instead (#119181)" (#122665)
Makes Inline Spiller amenable to the new PM.

This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted
because of two unused private members reported on sanitizer bots.
2025-01-13 14:14:13 +05:30
Akshat Oke
089555095b
Revert "Spiller: Detach legacy pass and supply analyses instead (#119… (#122426)
…181)"

This reverts commit a531800344dc54e9c197a13b22e013f919f3f5e1.
2025-01-10 12:23:07 +05:30
Akshat Oke
a531800344
Spiller: Detach legacy pass and supply analyses instead (#119181)
Makes Inline Spiller amenable to the new PM.
2025-01-10 11:46:56 +05:30
Ryan Mansfield
67efbd0bf1
[LLVM] Fix various cl::desc typos and whitespace issues (NFC) (#121955) 2025-01-08 11:07:23 +01:00
Matt Arsenault
93220e7e06
RegAllocGreedy: Fix use after free during last chance recoloring (#120697)
Last chance recoloring can delete the current fixed interval
during recursive assignment of interfering live intervals. Check
if the virtual register value was assigned before attempting the
unassignment, as is done in other scenarios. This relies on the fact
that we do not recycle virtual register numbers.

I have only seen this occur in error situations where the allocation
will fail, but I think this can theoretically happen in working
allocations.

This feels very brute force, but I've spent over a week debugging
this and this is what works without any lit regressions. The surprising
piece to me was that unspillable live ranges may be spilled, and
a number of tests rely on optimizations occurring on them. My other
attempts to fixed this mostly revolved around not identifying unspillable
live ranges as snippet copies. I've also discovered we're making some
unproductive live range splits with subranges. If we avoid such splits,
some of the unspillable copies disappear but mandating that be precise
to fix a use after free doesn't sound right.
2025-01-06 23:12:55 +07:00
Matt Arsenault
11e482c4a3
RegAllocGreedy: Add dummy priority advisor for writing MIR tests (#121207)
I regularly struggle reproducing failures in greedy due to changes
in priority when resuming the allocation from MIR vs. a complete
compilation starting at IR. That is, the fix in
e0919b189bf2df4f97f22ba40260ab5153988b14 did not really fix the
problem of the instruction distance mattering.

Add a way to bypass all of the priority heuristics for MIR tests,
by prioritizing only by virtual register number. Could also
give this a more specific name, like PrioritizeLowVirtRegNumber
2025-01-02 23:04:44 +07:00
Matt Arsenault
e2cabd715b RegAllocGreedy: Fix comment typo 2024-12-17 12:46:04 +07:00
Matt Arsenault
818bffcb1c
RegAlloc: Fix failure on undef use when all registers are reserved (#119647)
Greedy and fast would hit different assertions on undef uses if all
registers in a class were reserved.
2024-12-16 10:56:45 +09:00
Akshat Oke
2c7ece2e8c
[CodeGen][NewPM] Port LiveStacks analysis to NPM (#118778) 2024-12-06 15:16:07 +05:30
Akshat Oke
d9b4bdbff5
[CodeGen][NewPM] Port LiveDebugVariables to NPM (#115468)
The existing analysis was already a pimpl wrapper.

I have extracted legacy pass logic to a LDVImpl wrapper named
`LiveDebugVariables` which is the analysis::Result now. This controls
whether to activate the LDV (depending on `-live-debug-variables` and
DIsubprogram) itself.

The legacy and new analysis only construct the LiveDebugVariables.

VirtRegRewriter will test this.
2024-12-04 14:31:34 +05:30
Akshat Oke
b68340c835
[CodeGen][NewPM] Port SpillPlacement analysis to NPM (#116618) 2024-11-29 16:55:40 +05:30
Akshat Oke
cac13606c2
[CodeGen][NewPM] Port EdgeBundles analysis to NPM (#116616) 2024-11-22 16:51:50 +05:30
Kazu Hirata
735ab61ac8
[CodeGen] Remove unused includes (NFC) (#115996)
Identified with misc-include-cleaner.
2024-11-12 23:15:06 -08:00