This provides the `disable-schedmodel-in-sched-mi` flag. Using this, we
will disable the SchedModel / Itineraries during scheduling. This has
the effect of not using any latency / hardware resource information for
scheduling decisions.
We have the `schedmodel` flag, but this disables the `SchedModel` for
all passes. This allows disabling only for scheduling while preserving
the behavior of other passes (e.g. MachineLICM). This is conceptually
similar to other flags like `enable-aa-sched-mi`
Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.
This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
https://github.com/llvm/llvm-project/pull/123787#issuecomment-2606797041
Note: This follows Nikita's suggestion on #123787.
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.
Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
By default the scheduling info of instructions into a BUNDLE are given a
latency of 0 as they operate on the implicit register of the bundle.
This modifies that for AArch64 so that the latency is adjusted to use
the latency from the instruction in the bundle instead. This essentially
assumes that the bundled instructions are executed in a single cycle,
which for AArch64 is probably OK considering they are mostly used for
MOVPFX bundles, where this can help create slightly better scheduling
especially for in-order cores.
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:
- The dependency tracking is more accurate and creates fewer useless
edges in the dependency graph. An AMDGPU example, edited for clarity:
SU(0): $vgpr1 = V_MOV_B32 $sgpr0
SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0
There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
SU(1) to SU(2). But the old dependency tracking code also added a
useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.
- On targets like AMDGPU that make heavy use of subregisters, each
register can have a huge number of aliases - it can be quadratic in
the size of the largest defined register tuple. There is a much lower
bound on the number of regunits per register, so iterating over
regunits is faster than iterating over aliases.
The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.
Recommit after fixing AggressiveAntiDepBreaker in D156880.
Differential Revision: https://reviews.llvm.org/D156552
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:
- The dependency tracking is more accurate and creates fewer useless
edges in the dependency graph. An AMDGPU example, edited for clarity:
SU(0): $vgpr1 = V_MOV_B32 $sgpr0
SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0
There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
SU(1) to SU(2). But the old dependency tracking code also added a
useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.
- On targets like AMDGPU that make heavy use of subregisters, each
register can have a huge number of aliases - it can be quadratic in
the size of the largest defined register tuple. There is a much lower
bound on the number of regunits per register, so iterating over
regunits is faster than iterating over aliases.
The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.
Differential Revision: https://reviews.llvm.org/D156552
This patch replaces the uses of PointerUnion.is function by llvm::isa,
PointerUnion.get function by llvm::cast, and PointerUnion.dyn_cast by
llvm::dyn_cast_if_present. This is according to the FIXME in
the definition of the class PointerUnion.
This patch does not remove them as they are being used in other
subprojects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D148449
LLVM_DUMP_METHOD includes ATTRIBUTE_NOINLINE. operator<< isn't
what we normally consider a dump method so it should be ok to inline.
This fixes a warning from gcc that some other declaration for some
other class was inline but this one is noinline. Seems like a bogus
warning from gcc really.
This is a helper function to very slightly simplify many calls to
MachineInstruction::getOperandNo.
Differential Revision: https://reviews.llvm.org/D143250
Add an extra command line option to `llc` that allows checking at what cycle an instruction has been scheduled by the machine scheduler.
Differential Revision: https://reviews.llvm.org/D141289
Some uses of std::make_pair and the std::pair's first/second members
in the ScheduleDAGInstrs.[cpp|h] files were replaced with using of the
vector's emplace_back along with structure bindings from C++17.
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
This removes a condition and the corresponding FIXME comment, because
the Hexagon assertion it refers to has apparently been fixed, probably
by D76134.
NFCI. This just gives targets the opportunity to adjust latencies that
were set to 0 by the generic code because they involve "implicit pseudo"
operands.
Differential Revision: https://reviews.llvm.org/D112306
Avoid several crashes when DBG_INSTR_REF and DBG_PHI instructions are fed
to the instruction scheduler. DBG_INSTR_REFs should be treated like
DBG_LABELs, and just ignored for the purpose of scheduling [0].
DBG_PHIs however behave much more like DBG_VALUEs: they refer to register
operands, and if some register defs get shuffled around during instruction
scheduling, there's a risk that the debug instr will refer to the wrong
value. There's already a facility for updating DBG_VALUEs to reflect this;
add DBG_PHI to the list of instructions that it will update.
[0] Suboptimal, but it's what instr scheduling does right now.
Differential Revision: https://reviews.llvm.org/D106663
This is a cleanup patch -- we're now able to support all flavours of
variable location in instruction referencing mode. This patch updates
various tests for debug instructions to be broader: numerous code paths
try to ignore debug isntructions, and they now have to ignore the
additional DBG_PHI and DBG_INSTR_REFs that we can generate.
A small amount of rework happens for LiveDebugVariables: as we don't need
to track live intervals through regalloc any more, we can get away with
unlinking debug instructions before regalloc, then re-inserting them after.
Note that this isn't (yet) true of DBG_VALUE_LISTs, they still have to go
through live interval tracking.
In SelectionDAG, add a helper lambda that emits half-formed DBG_INSTR_REFs
for arguments in instr-ref mode, DBG_VALUE otherwise. This is one of the
final locations where DBG_VALUEs are emitted for vreg arguments.
X86InstrInfo now un-sets the debug instr number on SUB instructions that
get mutated into CMP instructions. As the instruction no longer computes a
subtraction, we can't use it for variable locations.
Differential Revision: https://reviews.llvm.org/D88898
Pseudo probe are currently given a slot index like other regular instructions. This affects register pressure and lifetime weight computation because of enlarged lifetime length with pseudo probe instructions. As a consequence, program could get different code generated w/ and w/o pseudo probes. I'm closing the gap by excluding pseudo probes from stack index and downstream register allocation related passes.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D100334
Both FastRegAlloc and LiveDebugVariables/greedy need to cope with
DBG_INSTR_REFs. None of them actually need to take any action, other than
passing DBG_INSTR_REFs through: variable location information doesn't refer
to any registers at this stage.
LiveDebugVariables stashes the instruction information in a tuple, then
re-creates it later. This is only necessary as the register allocator
doesn't expect to see any debug instructions while it's working. No
equivalence classes or interval splitting is required at all!
No changes are needed for the fast register allocator, as it just ignores
debug instructions. The test added checks that both of them preserve
DBG_INSTR_REFs.
This also expands ScheduleDAGInstrs.cpp to treat DBG_INSTR_REFs the same as
DBG_VALUEs when rescheduling instructions around. The current movement of
DBG_VALUEs around is less than ideal, but it's not a regression to make
DBG_INSTR_REFs subject to the same movement.
Differential Revision: https://reviews.llvm.org/D85757
If the end instruction of the scheduling region was a DBG_VALUE, the
uses of the debug instruction were tracked as if they were real
uses. This would then hit the deadDefHasNoUse assertion in
addVRegDefDeps if the only use was the debug instruction.
This reverts commit 0345d88de654259ae90494bf9b015416e2cccacb.
Google internal backend uses EntrySU, we are looking into removing
dependency on it.
Differential Revision: https://reviews.llvm.org/D88018
This temporarily reverts commit 7019cea26dfef5882c96f278c32d0f9c49a5e516.
It seems that, for some targets, there are instructions with a lot of memory operands (probably more than would be expected). This causes a lot of buildbots to timeout and notify failed builds. While investigations are ongoing to find out why this happens, revert the changes.
Summary:
To support all targets, the mayAlias member function needs to support instructions with multiple operands.
This revision also changes the order of the emitted instructions in some test cases.
Reviewers: efriedma, hfinkel, craig.topper, dmgreen
Reviewed By: efriedma
Subscribers: MatzeB, dmgreen, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80161
This allows targets to know exactly which operands are contributing to
the dependency, which is required for targets with per-operand
scheduling models.
Differential Revision: https://reviews.llvm.org/D77135
This reverts the AMDGPU DAG mutation implemented in D72487 and gives
a more general way of adjusting BUNDLE operand latency.
It also replaces FixBundleLatencyMutation with adjustSchedDependency
callback in the AMDGPU, fixing not only successor latencies but
predecessors' as well.
Differential Revision: https://reviews.llvm.org/D72535
If an instruction had multiple subregister defs, and one of them was
undef, this would improperly conclude all other lanes are
killed. There could still be other defs of those read-undef lanes in
other operands. This would improperly remove register uses from
CurrentVRegUses, so the visitation of later operands would not find
the necessary register dependency. This would also mean this would
fail or not depending on how different subregister def operands were
ordered.
On an undef subregister def, scan the instruction for other
subregister defs and avoid killing those.
This possibly should be deferring removing anything from
CurrentVRegUses until the entire instruction has been processed
instead.
llvm-svn: 372362