The MI scheduler skips regions containing a single MI during scheduling.
This can prevent targets that perform multi-stage scheduling and move
MIs between regions during some stages to reason correctly about the
entire IR, since some MIs will not be assigned to a region at the
beginning.
This makes the machine scheduler no longer skip single-MI regions. Only
a few unit tests are affected (mainly those which check for the
scheduler's debug output).
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.
Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.
---------
Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
Changes:
1. Fix inconsistencies in register pressure set printing. "Max Pressure"
printing is inconsistent with "Bottom Pressure" and "Top Pressure".
For the former, register class begins on the same line vs newline for
latter. Also for the former, the first register class is on the same
line, but subsequent register classes are newline separated. That's
removed so all are on the same line.
Before:
Max Pressure: FPR8=1
GPR32=14
Top Pressure:
GPR32=2
Bottom Pressure:
FPR8=7
GPR32=17
After:
Max Pressure: FPR8=1 GPR32=14
Top Pressure: GPR32=2
Bottom Pressure: FPR8=7 GPR32=17
2. After scheduling an instruction, don't print pressure diff if there
isn't one. Also s/UpdateRegP/UpdateRegPressure. E.g.,
Before:
UpdateRegP: SU(3) %0:gpr64common = ADDXrr %58:gpr64common, gpr64
to
UpdateRegP: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 390, 12
to GPR32 -1
After:
UpdateRegPressure: SU(4) %393:gpr64sp = ADDXri %58:gpr64common, 12
to GPR32 -1
3. Don't print excess pressure sets if there are none.
The createSIMachineScheduler & createPostMachineScheduler
target hooks are currently placed in the PassConfig interface.
Moving it out to TargetMachine so that both legacy and
the new pass manager can effectively use them.
For pre-ra scheduling, we use two options `-misched-topdown` and
`-misched-bottomup` to force the direction.
While for post-ra scheduling, we use `-misched-postra-direction`
with enumerated values (`topdown`, `bottomup` and `bidirectional`).
This is not unified and adds some mental burdens. Here we replace
these two options `-misched-topdown` and `-misched-bottomup` with
`-misched-prera-direction` with the same enumerated values.
To avoid the condition of `getNumOccurrences() > 0`, we add a new
enum value `Unspecified` and make it the default initial value.
These options are hidden, so we needn't keep the compatibility.
We support bottom-up and bidirectonal postra scheduling now, but we
only compare successive next cluster node as if we are doing topdown
scheduling. This makes load/store clustering and macro fusions wrong.
This patch makes sure that we can get the right cluster node by the
scheduling direction.
In #83875, we changed the type of `Width` to `LocationSize`. To get
the clsuter bytes, we use `LocationSize::getValue()` to calculate
the value.
But when `Width` is an unknown size `LocationSize`, an assertion
"Getting value from an unknown LocationSize!" will be triggered.
This patch simply skips MemOp with unknown size to fix this issue
and keep the logic the same as before.
This issue was found when implementing software pipeliner for
RISC-V in #117546. The pipeliner may clone some memory operations
with `BeforeOrAfterPointer` size.
PostRA scheduling supports different directions now, but we can
only specify it via command line options.
This patch adds a new hook `overridePostRASchedPolicy` for targets
to override PostRA scheduling policy.
Note that some options like tracking register pressure won't take
effect in PostRA scheduling.
Previously we set the dump direction according to command line
options, but we may override the scheduling direction in `initPolicy`
and this results in mismatch between dump and actual policy.
Here we simply set the dump direction after initializing the policy.
This produces far too much terminal output, particularly for the
instruction reduction. Since it doesn't consider the liveness of of
the instructions it's deleting, it produces quite a lot of verifier
errors.
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
Prepare for new pass manager version of `MachineDominatorTreeAnalysis`.
We may need a machine dominator tree version of `DomTreeUpdater` to
handle `SplitCriticalEdge` in some CodeGen passes.
I had some trouble understanding why `removeReady` removed nodes from
the Pending queue, since my intuition told me that the Pending queue did
not represent a node that was ready. I took a deeper look and found that
pickOnlyNode and pickNodeFromQueue only picked nodes from the Available
queue too.
I found that need to nodes from the Available and Pending queues that
correspond to the opposite direction that we ended up choosing from
(IsTopNode vs !IsTopNode).
It took me a little longer than I would have liked to understand this
fact, so I figured that I would add a comment in the code that makes it
clear for future readers.
Machine scheduler will suppress register pressure when the scheduling
window is too small, but now it doesn't consider i64 register type,
and this MR extends it into i64 register type, so architecture like
RISCV64 that only supports i64 interger register will have the same
behavior like RISCV32.
This PR is stacked on #76186.
This PR keeps the default strategy as top-down since that is what
existing targets expect. It can be enabled using
`-misched-postra-direction=bidirectional`.
It is up to targets to decide whether they would like to enable this
option for themselves.
This is another part of #70452 which makes getMemOperandsWithOffsetWidth
use a LocationSize for Width, as opposed to the unsigned it currently
uses. The advantages on it's own are not super high if
getMemOperandsWithOffsetWidth usually uses known sizes, but if the
values can come from an MMO it can help be more accurate in case they
are Unknown (and in the future, scalable).
There is the possibility that the bottom-up direction will lead to
performance improvements on certain targets, as this is certainly the case for
the pre-regalloc GenericScheduler. This patch will give people the
opportunity to experiment for their sub-targets. However, this patch
keeps the top-down approach as the default for the PostGenericScheduler
since that is what subtargets expect today.
This is a precommit to supporting post reg-alloc bottom up scheduling.
We'd like to have post-ra scheduling direction that can be different from
pre-ra direction. The current dumpSchedule function is changed in this
patch to support the fact that the post-ra and pre-ra directions will
depend on different command line options.
b1ae461a5358932851de42b66ffde8748da51a83 renamed Cycle ->
ReleaseAtCycle.
7e09239e24b339f45f63a670e2e831150826bf70 was committed without rebasing
but used the old Cycle syntax.
This caused a build failure when
7e09239e24b339f45f63a670e2e831150826bf70 was squash-and-merged. This
patch fixes this problem.
TargetSchedule.td explicitly allows the usage of a ProcResource for zero
cycles, in order to represent that the ProcResource must be available
but is not consumed by the instruction. On the other hand,
ResourceSegments explicitly does not allow for a zero sized interval. In
order to remedy this, this patch handles the special case of when there
is an empty interval usage of a resource by not adding an empty
interval.
We ran into this issue downstream, but it makes sense to have
this upstream since it is explicitly allowed by TargetSchedule.td.
Reordering based on the sort order of the MemOpInfo array was disabled
in <https://reviews.llvm.org/D72706>. However, it's not clear this is
desirable for al targets. It also makes it more difficult to compare the
incremental benefit of enabling load clustering in the selectiondag
scheduler as well was the machinescheduler, as the sdag scheduler does
seem to allow this reordering.
This patch adds a parameter that can control the behaviour on a
per-target basis.
Split out from #73789.
The member functions of ScheduleDAGMI are called back from
PostMachineScheduler::runOnMachineFunction, instead of
MachineScheduler::runOnMachineFunction.
These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).
This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.
As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
Differential Revision: https://reviews.llvm.org/D158568