The `RPTarget`'s way of determining whether VGPRs are beneficial to save
and whether the target has been reached w.r.t. VGPR usage currently
assumes, if `CombinedVGPRSavings` is true, that free slots in one VGPR
RC can always be used for the other. Implicitly, this makes the
rematerialization stage (only current user of `RPTarget`) follow a
different occupancy calculation than the "regular one" that the
scheduler uses, one that assumes that ArchVGPR/AGPR usage can be
balanced perfectly and at no cost, which is untrue in general. This
ultimately yields suboptimal rematerialization decisions that require
cross-VGPR-RC copies unnecessarily.
This fixes that, making the `RPTarget`'s internal model of occupancy
consistent with the regular one. The `CombinedVGPRSavings` flag is
removed, and a form of cross-VGPR-RC saving implemented only for unified
RFs, which is where it makes the most sense. Only when the amount of
free VGPRs in a given VGPR RC (ArchVPGR or AGPR) is lower than the
excess VGPR usage in the other VGPR RC does the `RPTarget` consider that
a pressure reduction in the former will be beneficial to the latter.
Adds new entries in the GCNPressure array for AVGPR pressure. In this
PR, AVGPR pressure is added to pure VGPR pressure under all the pressure
queries, so it is NFC.
Separating out this pseudo RC will help us make more informed decisions
in future work. This RC can be assigned as either VGPR or AGPR, so
tracking them as pure VGPR pressure is not accurate.
This adds the `GCNRPTarget` class which models a register pressure
target (i.e., maximum number of SGPRs/VGPRS) that one can track register
savings against. The only current use of this class is in the
scheduler's rematerialization stage. It replaces the more ad-hoc (and
now deleted) `ExcessRP` class which used to serve the same purpose.
This is only NFC~ish because `GCNRPTarget` tracks VGPR usage more
accurately than `ExcessRP` used to. To estimate required combined VGPR
savings we now additionally take into account the number of available
VGPRs in both banks (ArchVGPR and AGPR) at the time where the RP target
is created, whereas we used to only consider explicit savings made from
the starting RP. This makes VGPR savings estimations more accurate in
cases where we allow for savings in one VGPR bank to help towards
reducing pressure in another VGPR bank (see
`GCNRPTarget::CombineVGPRSavings`). This is the cause for unit test
changes.
This NFC simplifies the `GCNRegPressure::RegKind` enum so that instead
of containing a pair of values for each type of register (one for
non-tuple registers and one for tuple registers of that type) it only
contains one value representing all registers of that type.
The `GCNRegPressure::Value` array is still sized as before, though all
elements corresponding to tuple-kinds now start after all elements
corresponding to non-tuple-kinds instead of the two being interleaved.
This allows to simplify the `GCNRegPressure::inc` logic, eliminating the
switch entirely.
This adds the ability to use the GCNRPTrackers during scheduling. These trackers have several advantages over the generic trackers: 1. global live-thru trackers, 2. subregister based RP deltas, and 3. flexible vreg -> PressureSet mappings.
This feature is off-by-default to ease with the roll-out process. In particular, when using the optional trackers, the scheduler will still maintain the generic trackers leading to unnecessary compile time.
For speculative RP queries, recede may calculate inaccurate masks for subreg uses. Previously, the calculation would look at any live lane for the use at the position of the MI in the LIS. This also adds lanes for any subregs which are live at but not used by the instruction. By constraining against the getSubRegIndexLaneMask for the operand's subreg, we are sure to not pick up on these extra lanes.
For current clients of recede, this is not an issue. This is because 1. the current clients do not violate the program order in the LIS, and 2. the change to RP is based on the difference between previous mask and new mask. Since current clients are not exposed to this issue, this patch is sort of NFC.
Co-authored-by: Valery Pykhtin Valery.Pykhtin@amd.com
Change-Id: Iaed80271226b2587297e6fb78fe081afec1a9275
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
Add live-through register set printing, assuming live-through register
is in live-in and live-out sets, has no redefinitions but may have uses
in the block.
Fixed:
1. Maximum register pressure calculation at the instruction level.
Previously max RP included both def and use of registers of an
instruction. Now maximum RP includes _uses_ and _early-clobber defs_.
2. Uses were incorrectly tracked and this resulted in a mismatch of
live-in set reported by LiveIntervals and tracked live reg set when the
beginning of the block is reached.
Interface has changed, moveMaxPressure becomes deprecated and
getMaxPressure, resetMaxPressure functions are added. reset function
seem now more consistent.
Using GCNDownwardRPTracker or GCNUpwardRPTracker the pass collects register pressure values for a function and prints these values next to instructions. Output can be used to generate Filecheck rules in mir tests.
The function makes liveness tests for the entire live register set for every instruction it passes by.
This becomes very slow on high RP regions such as ASAN enabled code.
Instead only uses of last tracked instruction should be tested and this greatly improves compilation time.
This patch revealed few bugs in SIFormMemoryClauses and PreRARematStage::sinkTriviallyRematInsts which should
be fixed first.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D136267
The function makes liveness tests for the entire live register set for every instruction it passes by.
This becomes very slow on high RP regions such as ASAN enabled code.
Instead only uses of last tracked instruction should be tested and this greatly improves compilation time.
This patch revealed few bugs in SIFormMemoryClauses and PreRARematStage::sinkTriviallyRematInsts which should
be fixed first.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D136267
The problem with GCNDownwardRPTracker::advanceBeforeNext is that it doesn't allow to get register pressure
after the last instruction in a MBB.
However when we track RP through the boundary of a MBB we need the state that is after the last instruction
of the MBB and before the first instruction of the successor MBB. Currently we stop traking RP in the state
'at' the last instruction of the MBB which is incorrect.
This patch fixes 27 lit tests with EXPENSIVE_CHECKS enabled.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D136927
This reverts commit fecf067db40ffa1a6d5d665769c90cd29547f502.
The change introduces 420 test failures with EXPENSIVE_CHECK in AMDGPU which I don't want to disable.
Going to fix the failures and recommit the check.
This would check if passed in live-ins registers match those calculated using LIS.
This check currently breaks 420 lit tests when enabled.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D136860
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.
Also, added the missing AV register classes from
192b to 1024b.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D109300
DBG_VALUES placed between memory instructions would change
codegen. Skip over these and re-insert them after the bundle instead
of giving up on bundling.
We are using countPopulation on a LaneBitmask to determine
a number of registers it covers. This is the assumption which
does not necessarily need to be true. It is not changed but
factored into a single call SIRegisterInfo::getNumCoveredRegs().
Some other places are cleaned up with respect to assumptions
about subreg indexes values and tablegen behavior.
Differential Revision: https://reviews.llvm.org/D74177
Summary:
Added file headers for files which implement iterative lightweight scheduling
strategies. Which is basically an exercise which I undertook in order to get
used to LLVM development process.
Reviewers: arsenm, vpykhtin, cdevadas
Reviewed By: vpykhtin
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73417
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.
This patch has no new test case. I have run regression test and there is
no difference in regression test.
Differential Revision: https://reviews.llvm.org/D45342
Patch by Hsiangkai Wang.
llvm-svn: 331844
See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
Headers/Implementation files should be named after the class they
declare/define.
Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in
favor of `class LiveIntarvals;`
llvm-svn: 320546
LLVM Coding Standards:
Function names should be verb phrases (as they represent actions), and
command-like function should be imperative. The name should be camel
case, and start with a lower case letter (e.g. openFile() or isFoo()).
Differential Revision: https://reviews.llvm.org/D40416
llvm-svn: 319168