518 Commits

Author SHA1 Message Date
Jeffrey Byrnes
122e79c35f
[MISched] Advance HazardRec past stalls before calling EmitInstruction (#182977)
There are three calls to bumpCycle in bumpNode. Prior to the first call,
we calculate NextCycle as the next cycle in which all of a given
instruction's required hardware resources (as defined by the SchedModel)
are available. Any gap between this calculated NextCycle and CurrCycle
measures stalls that must occur before we can schedule the given
instruction.

The second and third call handle adjustments that occur during or after
issuing of the instruction (e.g. if the number of microops exceeds the
issue width).

According to the documentation of HazardRec->EmitInstruction, we should
call this method when an instruction is emitted: "This callback is
invoked when an instruction is emitted, to advance the hazard state."

In the context of bumpNode, this implies that it should be called after
we bumpCycle for stalls that must occur before issue of the
instructions, but before those that occur during or after. This PR moves
the placement to do that.

In practice, this affects schedulers that use both the SchedModel and
HazardRec. Suppose we have instructions A, B and C, and partial schedule
AB. Also, suppose instruction A exclusively holds ProcResource X for 2
cycles, and B uses ProcResource X, and there is a HazardRec hazard
between B and C which requires 1 cycle stall.

Currently, we call HazardRec->EmitInstruction on B before we call
HazardRec->AdvanceCycle for the stall between A->B. Then, when deciding
whether to schedule C, HazardRec sees that a cycle has already occurred
after B, so we do not need to stall.

After this change, we HazardRec->EmitInstruction on B after we call
HazardRec->AdvanceCycle for the stall between A->B. So, HazardRec
accurately places the stall cycle between A and B. Then, when deciding
whether to schedule C, HazardRec accurately sees that no cycles have
occurred after B, so we do need to stall for 1 cycle.
2026-02-25 15:24:54 -08:00
Rahul Joshi
26f962465e
[LLVM][CodeGen] Remove pass initialization calls from pass constructors (#173061)
- Remove pass initialization calls from pass constructors.
- For some passes, add the initialization to `initializeCodeGen` or
`initializeGlobalISel`.
- Remove redundant initializations from llc and X86 target for some
passes.
2026-01-21 08:44:51 -08:00
Tony Linthicum
15b9109bc7
Make MachineBlockFrequencyInfo a required pass for the MachineScheduler pass. (#176172)
This is needed to support functionality in the AMDGPU scheduler. Various
passes have been modified to preserve MBFI to ensure that this change
does not introduce new invocations of MBFI. Some targets have passes
reordered, but there are no new runs of MBFI.
2026-01-15 20:26:51 +00:00
Sergei Barannikov
ef9a02ce02
[CodeGen] Use VirtRegOrUnit where appropriate (NFCI) (#167730)
Use it in `printVRegOrUnit()`, `getPressureSets()`/`PSetIterator`,
and in functions/classes dealing with register pressure.

Static type checking revealed several bugs, mainly in MachinePipeliner.
I'm not very familiar with this pass, so I left a bunch of FIXMEs.

There is one bug in `findUseBetween()` in RegisterPressure.cpp, also
annotated with a FIXME.
2025-11-13 10:26:58 +00:00
Min-Yih Hsu
6d4e75cc93
[MISched][NFC] Rename isUnbufferedGroup to isReservedGroup (#166439)
In both ScheduleDAGInstrs and MachineScheduler, we call `BufferSize = 0`
as _reserved_ and `BufferSize = 1` as _unbuffered_. This convention is
stem from the fact that we set `SUnit::hasReservedResource` to true when
any of the SUnit's consumed resources has BufferSize equal to zero; set
`SUnit::isUnbuffered` to true when any of its consumed resources has
BufferSize equal to one.

However, `SchedBoundary::isUnbufferedGroup` doesn't really follow this
convention: it returns true when the resource in question is a
`ProcResGroup` and its BufferSize equals to **zero** rather than one.
This could be really confusing for the reader. This patch renames this
function to `isReservedGroup` in aligned with the convention mentioned
above.

NFC.
2025-11-04 16:21:37 -08:00
Kazu Hirata
b82bde695e
[Analysis, CodeGen] Use "= default" (NFC) (#166024)
Identified with modernize-use-equals-default.
2025-11-01 23:20:11 -07:00
Rahul Joshi
2a4f5b2751
[NFC][LLVM][CodeGen] Namespace related cleanups (#162999) 2025-10-13 07:54:50 -07:00
Timm Bäder
5284c83a8f Revert "[MachineScheduler] Convert some of the debug prints into using LDBG. NFC (#161997)"
This reverts commit a7414796c0854a9e6f649d922a58aa63147ae2e4.

This breaks builds:
 3355 |            << SchedModel->getResourceName(CurrZone.getZoneCritResIdx()) << "\n";
      |               ~~~~~~~~~~  ^
/home/buildbot/workspace/bolt-aarch64-ubuntu-clang/llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:3358:51: error: no member named 'getResourceName' in 'llvm::TargetSchedModel'
 3358 |     LDBG() << "  RemainingLimit: " << SchedModel->getResourceName(OtherCritIdx)
      |                                       ~~~~~~~~~~  ^
2 errors generated.

E.g. https://lab.llvm.org/buildbot/#/builders/128/builds/7522
2025-10-05 09:04:27 +02:00
Min-Yih Hsu
a7414796c0
[MachineScheduler] Convert some of the debug prints into using LDBG. NFC (#161997)
These lines are heavily skewed and hard to read. Using the new LDBG
there instead.

NFC.
2025-10-04 23:19:20 -07:00
Jonas Paulsson
eaff28c93e
[MachineScheduler] Turn SU->isScheduled check into an assert in pickNode() (#160145)
It is unnecessary and confusing to have a do/while loop that checks
SU->isScheduled as this should never be true.

ScheduleDAGMI::updateQueues() is always called after pickNode() and it
sets isScheduled on the SU. Turn this into an assertion instead.
2025-09-24 10:28:35 +02:00
Ruiling, Song
451912a24a
[MachineScheduler] Make cluster check more efficient (#150884) 2025-08-01 16:00:42 +08:00
Harrison Hao
8c14d3f44f
[MISched] Use SchedRegion in overrideSchedPolicy and overridePostRASchedPolicy (#149297)
This patch updates `overrideSchedPolicy` and `overridePostRASchedPolicy`
to take a
`SchedRegion` parameter instead of just `NumRegionInstrs`. This provides
access to both the
instruction range and the parent `MachineBasicBlock`, which enables
looking up function-level
attributes.

With this change, targets can select post-RA scheduling direction per
function using a function
attribute. For example:

```cpp
void overridePostRASchedPolicy(MachineSchedPolicy &Policy,
                               const SchedRegion &Region) const {
  const Function &F = Region.RegionBegin->getMF()->getFunction();
  Attribute Attr = F.getFnAttribute("amdgpu-post-ra-direction");
  ...
}
2025-07-22 15:55:12 +08:00
Kazu Hirata
43ab5bb921
[CodeGen] Use std::tie to implement a comparison functor (NFC) (#146252)
std::tie clearly expresses the intent while slightly shortening the
code.
2025-06-29 08:25:53 -07:00
Ruiling, Song
0487db1f13
MachineScheduler: Improve instruction clustering (#137784)
The existing way of managing clustered nodes was done through adding
weak edges between the neighbouring cluster nodes, which is a sort of
ordered queue. And this will be later recorded as `NextClusterPred` or
`NextClusterSucc` in `ScheduleDAGMI`.

But actually the instruction may be picked not in the exact order of the
queue. For example, we have a queue of cluster nodes A B C. But during
scheduling, node B might be picked first, then it will be very likely
that we only cluster B and C for Top-Down scheduling (leaving A alone).

Another issue is:
```
   if (!ReorderWhileClustering && SUa->NodeNum > SUb->NodeNum)
      std::swap(SUa, SUb);
   if (!DAG->addEdge(SUb, SDep(SUa, SDep::Cluster)))
```
may break the cluster queue.

For example, we want to cluster nodes (order as in `MemOpRecords`): 1 3
2. 1(SUa) will be pred of 3(SUb) normally. But when it comes to (3, 2),
As 3(SUa) > 2(SUb), we would reorder the two nodes, which makes 2 be
pred of 3. This makes both 1 and 2 become preds of 3, but there is no
edge between 1 and 2. Thus we get a broken cluster chain.

To fix both issues, we introduce an unordered set in the change. This
could help improve clustering in some hard case.

One key reason the change causes so many test check changes is: As the
cluster candidates are not ordered now, the candidates might be picked
in different order from before.

The most affected targets are: AMDGPU, AArch64, RISCV.

For RISCV, it seems to me most are just minor instruction reorder, don't
see obvious regression.

For AArch64, there were some combining of ldr into ldp being affected.
With two cases being regressed and two being improved. This has more
deeper reason that machine scheduler cannot cluster them well both
before and after the change, and the load combine algorithm later is
also not smart enough.

For AMDGPU, some cases have more v_dual instructions used while some are
regressed. It seems less critical. Seems like test `v_vselect_v32bf16`
gets more buffer_load being claused.
2025-06-05 15:28:04 +08:00
Pengcheng Wang
f393986b53
[MISched] Add templates for creating custom schedulers (#141935)
We rename `createGenericSchedLive` and `createGenericSchedPostRA`
to `createSchedLive` and `createSchedPostRA`, and add a template
parameter `Strategy` which is the generic implementation by default.

This can simplify some code for targets that have custom scheduler
strategy.
2025-06-03 11:37:40 +08:00
Ruiling, Song
3e47d8deba
MachineScheduler: Reset next cluster candidate for each node (#139513)
When a node is picked, we should reset its next cluster candidate to
null before releasing its successors/predecessors.
2025-05-28 14:53:46 +08:00
Kazu Hirata
d1cd68881a
[llvm] Use llvm::is_sorted (NFC) (#140399) 2025-05-17 14:29:35 -07:00
Cullen Rhodes
cdde6a650a
[MISched] Add statistics for heuristics (#137981)
When diagnosing scheduling issues it can be useful to know which
heuristics are driving the scheduler. This adds pre-RA and post-RA
statistics for all heuristics.
2025-05-09 10:06:35 +01:00
Cullen Rhodes
ddfdecbd00
[MISched] Add statistics to quantify scheduling (#138090)
When diagnosing scheduler issues it can be useful to know how scheduling
changes the order of instructions, particularly for large functions when
it's not trivial to figure out from the debug output by looking at the
scheduling unit (SU) IDs.

This adds pre-RA and post-RA statistics to track 1) the number of
instructions that remain in source order after scheduling and 2) the
total number of instructions scheduled, to compare 1) against.
2025-05-07 07:47:16 +01:00
Cullen Rhodes
8ea5eacea2
[MISched] Fix off-by-one error in debug output with -misched-cutoff=<n> flag (#137988)
This flag instructs the scheduler to stop scheduling after N
instructions, but
in the debug output it appears as if it's scheduling N+1 instructions,
e.g.

$ llc -misched-cutoff=10 -debug-only=machine-scheduler
example.ll 2>&1 | grep "^Scheduling SU" | wc -l
11

as it calls pickNode before calling checkSchedLimit.
2025-05-06 11:12:23 +01:00
Philip Reames
f2ecd86e34
[Analysis] Remove implicit LocationSize conversion from uint64_t (#133342)
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.

We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
2025-04-18 07:46:31 -07:00
Min-Yih Hsu
9bfb4b8fb1
[MachineScheduler] Add more debug prints w.r.t hazards and pending SUnits (#134328)
While we already have some detailed debug messages on the candidate
selection process -- which selects a SUnit from the Available queue, we
didn't say much about why a SUnit was _not_ moved from Pending queue to
Available queue in the first place, which is just as important as why we
scheduled a node IMHO. Therefore, I added some debug prints for this
very purpose.

I decide to print these extra messages by default (instead of being
guarded by command line like `-misched-detail-resource-booking`) because
we have been printing some of the hazard remarks, so I thought we might
as well print these new messages -- which are mostly about hazard -- by
default.
2025-04-08 10:31:05 -07:00
Kazu Hirata
40c65e8589
[CodeGen] Avoid repeated hash lookups (NFC) (#129821) 2025-03-04 22:17:00 -08:00
Craig Topper
6ca2a9f2df
[CodeGen] Use Register in SDep interface. NFC (#129734) 2025-03-04 12:26:28 -08:00
Lucas Ramirez
03677f63a7
[MachineScheduler] Optional scheduling of single-MI regions (#129704)
Following 15e295d the machine scheduler no longer filters-out single-MI
regions when emitting regions to schedule. While this has no functional
impact at the moment, it generally has a negative compile-time impact
(see #128739).

Since all targets but AMDGPU do not care for this behavior, this
introduces an off-by-default flag to `ScheduleDAGInstrs` to control
whether such regions are going to be scheduled, effectively reverting
15e295d for all targets but AMDGPU (currently the only target enabling
this flag).
2025-03-04 17:46:44 +01:00
Akshat Oke
3aab3fe56f
[NPM][NFC] Chain PreservedAnalyses methods (#129505) 2025-03-04 10:23:01 +05:30
chrisPyr
71f4c7dabe
[NFC]Make file-local cl::opt global variables static (#126486)
#125983
2025-03-03 13:46:33 +07:00
Lucas Ramirez
15e295d30a
[MachineScheduler][AMDGPU] Allow scheduling of single-MI regions (#128739)
The MI scheduler skips regions containing a single MI during scheduling.
This can prevent targets that perform multi-stage scheduling and move
MIs between regions during some stages to reason correctly about the
entire IR, since some MIs will not be assigned to a region at the
beginning.

This makes the machine scheduler no longer skip single-MI regions. Only
a few unit tests are affected (mainly those which check for the
scheduler's debug output).
2025-02-27 11:27:07 +01:00
Philip Reames
49c3120127 [MachineSched] Add a first valid reason [nfc]
For debugging, distinguish the first valid candidate encountered and
a preference decision driven by node number.
2025-02-24 16:04:30 -08:00
Christopher Di Bella
309e3ca081 Revert "[CodeGen] Remove static member function Register::isPhysicalRegister. NFC"
This reverts commit 5fadb3d680909ab30b37eb559f80046b5a17045e.
2025-02-20 22:06:21 +00:00
Craig Topper
5fadb3d680 [CodeGen] Remove static member function Register::isPhysicalRegister. NFC
Prefer the nonstatic member by converting unsigned to Register instead.
2025-02-20 10:49:53 -08:00
Cullen Rhodes
df62441336
[MISched][NFC] Remove unused heuristic NextDefUse from enum (#125879)
Heuristic was removed in 46533e614b78 due to being ineffective.
2025-02-13 08:46:51 +00:00
Akshat Oke
7b60e03d73
Reland "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126684)
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.

Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.

---------

Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
2025-02-12 18:54:39 +05:30
Akshat Oke
564b9b7f4d
Revert "CodeGen][NewPM] Port MachineScheduler to NPM. (#125703)" (#126268)
This reverts commit 5aa4979c47255770cac7b557f3e4a980d0131d69 while I
investigate what's causing the compile-time regression.
2025-02-08 15:36:48 +05:30
Cullen Rhodes
1cf909208e
[MISched] Small debug improvements (#125072)
Changes:
1. Fix inconsistencies in register pressure set printing. "Max Pressure"
   printing is inconsistent with "Bottom Pressure" and "Top Pressure".
   For the former, register class begins on the same line vs newline for
   latter. Also for the former, the first register class is on the same
   line, but subsequent register classes are newline separated. That's
   removed so all are on the same line.

   Before:
     Max Pressure: FPR8=1
     GPR32=14
     Top Pressure:
     GPR32=2
     Bottom Pressure:
     FPR8=7
     GPR32=17

   After:
     Max Pressure: FPR8=1 GPR32=14
     Top Pressure: GPR32=2
     Bottom Pressure: FPR8=7 GPR32=17

2. After scheduling an instruction, don't print pressure diff if there
   isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g.,

   Before:
     UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64
                 to
     UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12
                 to GPR32 -1

   After:
     UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12
                        to GPR32 -1
3. Don't print excess pressure sets if there are none.
2025-02-05 09:14:51 +00:00
Christudasan Devadasan
5aa4979c47
CodeGen][NewPM] Port MachineScheduler to NPM. (#125703) 2025-02-05 12:17:59 +05:30
Christudasan Devadasan
68e7df395e
[CodeGen][MachineScheduler] Remove the unimplemented print method. (#125702) 2025-02-05 12:10:12 +05:30
Christudasan Devadasan
a47c35a699
[CodeGen] Move MISched target hooks into TargetMachine (#125700)
The createSIMachineScheduler & createPostMachineScheduler
target hooks are currently placed in the PassConfig interface.
Moving it out to TargetMachine so that both legacy and
the new pass manager can effectively use them.
2025-02-05 11:41:37 +05:30
Craig Topper
9e6494c0fb
[CodeGen] Rename RegisterMaskPair to VRegMaskOrUnit. NFC (#123799)
This holds a physical register unit or virtual register and mask.

While I was here I've used emplace_back and removed an unneeded use of a
template.
2025-01-22 09:11:22 -08:00
Pengcheng Wang
da71203e6f
[MISched] Unify the way to specify scheduling direction (#119518)
For pre-ra scheduling, we use two options `-misched-topdown` and
`-misched-bottomup` to force the direction.

While for post-ra scheduling, we use `-misched-postra-direction`
with enumerated values (`topdown`, `bottomup` and `bidirectional`).

This is not unified and adds some mental burdens. Here we replace
these two options `-misched-topdown` and `-misched-bottomup` with
`-misched-prera-direction` with the same enumerated values.

To avoid the condition of `getNumOccurrences() > 0`, we add a new
enum value `Unspecified` and make it the default initial value.

These options are hidden, so we needn't keep the compatibility.
2024-12-12 11:24:07 +08:00
Pengcheng Wang
920495c959
[MISched] Compare right next cluster node (#116584)
We support bottom-up and bidirectonal postra scheduling now, but we
only compare successive next cluster node as if we are doing topdown
scheduling. This makes load/store clustering and macro fusions wrong.

This patch makes sure that we can get the right cluster node by the
scheduling direction.
2024-12-10 14:44:02 +08:00
Pengcheng Wang
db9057edca
[Sched] Skip MemOp with unknown size when clustering (#118443)
In #83875, we changed the type of `Width` to `LocationSize`. To get
the clsuter bytes, we use `LocationSize::getValue()` to calculate
the value.

But when `Width` is an unknown size `LocationSize`, an assertion
"Getting value from an unknown LocationSize!" will be triggered.

This patch simply skips MemOp with unknown size to fix this issue
and keep the logic the same as before.

This issue was found when implementing software pipeliner for
RISC-V in #117546. The pipeliner may clone some memory operations
with `BeforeOrAfterPointer` size.
2024-12-05 20:14:58 +08:00
Pengcheng Wang
3618c9930f
[MISched] Use right boundary when trying latency heuristics (#116592)
We may do bottom-up or bidirectional scheduling but previously we
assume we are doing top-down scheduling, which may cause some issues.
2024-11-27 14:46:05 +08:00
Pengcheng Wang
5a1f239df5
[MISched] Add a hook to override PostRA scheduling policy (#115455)
PostRA scheduling supports different directions now, but we can
only specify it via command line options.

This patch adds a new hook `overridePostRASchedPolicy` for targets
to override PostRA scheduling policy.

Note that some options like tracking register pressure won't take
effect in PostRA scheduling.
2024-11-12 18:14:57 +08:00
Pengcheng Wang
ee1608dd8e
[CodeGen][MISched] Set DumpDirection after initPolicy (#115112)
Previously we set the dump direction according to command line
options, but we may override the scheduling direction in `initPolicy`
and this results in mismatch between dump and actual policy.

Here we simply set the dump direction after initializing the policy.
2024-11-08 11:45:36 +08:00
Matt Arsenault
71ca9fcb8d
llvm-reduce: Don't print verifier failed machine functions (#109673)
This produces far too much terminal output, particularly for the
instruction reduction. Since it doesn't consider the liveness of of
the instructions it's deleting, it produces quite a lot of verifier
errors.
2024-09-24 22:32:53 +04:00
Stephen Tozer
3d08ade7bd
[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149)
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:

https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850

This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).

Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
2024-08-29 17:53:32 +01:00
Kazu Hirata
8d1b17b662
[CodeGen] Construct SmallVector with ArrayRef (NFC) (#101841) 2024-08-04 00:41:29 -07:00
paperchalice
abde52aa66
[CodeGen][NewPM] Port LiveIntervals to new pass manager (#98118)
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.

This would be the last analysis required by `PHIElimination`.
2024-07-10 19:34:48 +08:00
paperchalice
4010f894a1
[CodeGen][NewPM] Port SlotIndexes to new pass manager (#97941)
- Add `SlotIndexesAnalysis`.
- Add `SlotIndexesPrinterPass`.
- Use `SlotIndexesWrapperPass` in legacy pass.
2024-07-09 12:09:11 +08:00