CopyHints set has been collecting duplicates of a register with
increasing weight and then deduplicated with HintedRegs set. Let's stop
collecting duplicates at the first place.
Change the spill weight calculations for `optsize` functions to remove
the block frequency multiplier. For those functions, we do not want to
consider the runtime cost of spilling, only the codesize cost.
I built a large app with the basic and greedy (default) register
allocator enabled.
| Regalloc Type | Uncompressed Size Delta | Compressed Size Delta |
| - | - | - |
| Basic | -303.8 KiB (-0.23%) | -232.0 KiB (-0.39%) |
| Greedy | 159.1 KiB (0.12%) | 130.1 KiB (0.22%) |
Since I only saw a size win with the basic register allocator, I decided
to only change the behavior for that type.
This time with 100% more building unit tests. Original commit message follows.
[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417)
If we use SmallDenseMaps instead of DenseMaps at these locations,
we get a substantial speedup because there's less spurious malloc
traffic. Discovered by instrumenting DenseMap with some accounting
code, then selecting sites where we'll get the most bang for our buck.
If we use SmallDenseMaps instead of DenseMaps at these locations,
we get a substantial speedup because there's less spurious malloc
traffic. Discovered by instrumenting DenseMap with some accounting
code, then selecting sites where we'll get the most bang for our buck.
Fixes#99396
The result of `VirtRegAuxInfo::weightCalcHelper` can be influenced by
x87 excess precision, which can result in slightly different register
choices when the compiler is hosted on x86_64 or i386. This leads to
different object file output when cross-compiling to i386, or native.
Similar to 7af3432e22b0, we need to add a `volatile` qualifier to the
local `Weight` variable to force it onto the stack, and avoid the excess
precision. Define `stack_float_t` in `MathExtras.h` for this purpose,
and use it.
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
This is necessary for RegAllocGreedy support for memory folding inline
asm that uses "rm" constraints.
Thanks to @qcolombet for the suggestion.
Link: https://github.com/llvm/llvm-project/issues/20571
`MachineInstr.h` is a commonly included file and this includes
`llvm/ADT/SmallSet.h` for one function `getUsedDebugRegs()`, which is
used only in one place.
According to `ClangBuildAnalyzer` (run solely on building LLVM, no other
projects) the second most expensive template to instantiate is the
`SmallSet::insert` method used in the `inline` implementation in
`getUsedDebugRegs()`:
```
**** Templates that took longest to instantiate:
554239 ms: std::unordered_map<int, int> (2826 times, avg 196 ms)
521187 ms: llvm::SmallSet<llvm::Register, 4>::insert (930 times, avg 560
ms)
...
```
By removing this method and putting its implementation in the one call
site we greatly reduce the template instantiation time and reduce the
number of includes.
When copying the implementation, I removed a check on `MO.getReg()` as
this is checked within `MO.isVirtual()`.
Differential Revision: https://reviews.llvm.org/D157720
This reverts commit a496c8be6e638ae58bb45f13113dbe3a4b7b23fd.
The workaround in c26dfc81e254c78dc23579cf3d1336f77249e1f6 should work
around the underlying problem with SUBREG_TO_REG.
And dependent commits.
Details in D150388.
This reverts commit 825b7f0ca5f2211ec3c93139f98d1e24048c225c.
This reverts commit 7a98f084c4d121244ef7286bc6503b6a181d446e.
This reverts commit b4a62b1fa546312d882fa12dfdcd015177d66826.
This reverts commit b7836d856206ec39509d42529f958c920368166b.
No conflicts in the code, few tests had conflicts in autogenerated CHECKs:
llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll
Reviewed By: alexfh
Differential Revision: https://reviews.llvm.org/D156381
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.
Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.
- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.
Reviewed By: arsenm, cdevadas, kparzysz
Differential Revision: https://reviews.llvm.org/D150388
The first member of the pair should be unsigned instead of Register
because it is the hint type, 0 for simple (target independent) hints and
other values for target dependent hints.
Differential Revision: https://reviews.llvm.org/D146646
This is a helper function to very slightly simplify many calls to
MachineInstruction::getOperandNo.
Differential Revision: https://reviews.llvm.org/D143250
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
We need to reuse them for the ML regalloc eviction advisor, as we
'explode' the weight calculation into sub-features.
Differential Revision: https://reviews.llvm.org/D116074
Statepoint instruction is known to have a variable and big number of operands.
It is possible that Register Allocator will split live intervals in the way that all
physical registers are occupied by "zero-length" live intervals which are marked
as not-spillable.
While intervals are marked as not-spillable in the moment of creation when they are
really zero-length it is possible that in future as part of re-materialization there will
need for physical register between def and use of such tiny interval (the use is not
related to this interval at all).
As all physical registers are assigned to not-spillable intervals there is not avaialbe
registers and RA reports an error.
The idea of the fix is avoid marking tiny live intervals where there is a use in statepoint
instruction in var args section. Such interval may be perfectly spilled and folded to
operand of statepoint.
Reviewers: reames, dantrushin, qcolombet, dsanders, dmgreen
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D98766
We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop to be reverted late in the backend. As they will
eventually become a single instruction, this patch introduces a
t2LoopEndDec which is the combination of the two, combined before
registry allocation to make sure this does not fail.
Unfortunately this instruction is a terminator that produces a value
(and also branches - it only produces the value around the branching
edge). So this needs some adjustment to phi elimination and the register
allocator to make sure that we do not spill this LR def around the loop
(needing to put a spill after the terminator). We treat the loop very
carefully, making sure that there is nothing else like calls that would
break it's ability to use LR. For that, this adds a
isUnspillableTerminator to opt in the new behaviour.
There is a chance that this could cause problems, and so I have added an
escape option incase. But I have not seen any problems in the testing
that I've tried, and not reverting Low overhead loops is important for
our performance. If this does work then we can hopefully do the same for
t2WhileLoopStart and t2DoLoopStart instructions.
This patch also contains the code needed to convert or revert the
t2LoopEndDec in the backend (which just needs a subs; bne) and the code
pre-ra to create them.
Differential Revision: https://reviews.llvm.org/D91358
It's never null - the reason it's modeled as a pointer is because the
pass can't init it in its ctor. Passing by ref simplifies the code, too,
as the null checks were unnecessary complexity.
Differential Revision: https://reviews.llvm.org/D89171
It is only used in weightCalcHelper, and cleared upon its finishing its
job there.
The patch further cleans up style guide discrepancies, and simplifies
CopyHint by removing duplicate 'IsPhys' information (it's what the Reg
field would report).
All the state of VRAI is allocator-wide, so we can avoid creating it
every time we need it. In addition, the normalization function is
allocator-specific. In a next change, we can simplify that design in
favor of just having it as a virtual member.
Differential Revision: https://reviews.llvm.org/D88499
Also renamed the fields to follow style guidelines.
Accessors help with readability - weight mutation, in particular,
is easier to follow this way.
Differential Revision: https://reviews.llvm.org/D87725
When we calculate the weight of a live-interval, add some code to
check if the original live-interval was markied as not spillable and
if so, progagate that information down to the new interval.
Previously we would just recompute a weight for the new interval,
thus, we could in theory just spill live-intervals marked as not
spillable by just splitting them. That goes against the spirit of
a non-spillable live-interval.
E.g., previously we could do:
v1 = // v1 must not be spilled
...
= v1
Split:
v1 = // v1 must not be spilled
...
v2 = v1 // v2 can be spilled
...
v3 = v2 // v3 can be spilled
= v3
There's no test case for that one as we would need to split a
non-spillable live-interval without using LiveRangeEdit to see this
happening.
RegAlloc inserts non-spillable intervals only as part of the spilling
mechanism, thus at this point the intervals are not splittable anymore.
On top of that, RegAlloc uses the LiveRangeEdit API, which already
properly propagate that information.
In other words, this could only happen if a target was to mark
a live-interval as not spillable before register allocation and
split it without using LRE, e.g., through
LiveIntervals::splitSeparateComponent.
Skip debug instructions before calling functions not expecting them.
In particular, LIS.getInstructionIndex(*mi) would fail if mi was a debg instr.
Differential Revision: https://reviews.llvm.org/D76129
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
Summary:
As part of this, define DenseMapInfo for MCRegister (and Register while I'm at it)
Depends on D65599
Reviewers: arsenm
Subscribers: MatzeB, qcolombet, jvesely, wdng, nhaehnle, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65605
llvm-svn: 367719
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.
NFC.
Review: Simon Pilgrim
https://reviews.llvm.org/D52316
llvm-svn: 343851
This patch makes sure that a register is only hinted once to RA. In extreme
cases the same register can otherwise be hinted numerous times and cause a
compile time slowdown.
Review: Simon Pilgrim
https://reviews.llvm.org/D52826
llvm-svn: 343686
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.
This patch has no new test case. I have run regression test and there is
no difference in regression test.
Differential Revision: https://reviews.llvm.org/D45342
Patch by Hsiangkai Wang.
llvm-svn: 331844
Headers/Implementation files should be named after the class they
declare/define.
Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in
favor of `class LiveIntarvals;`
llvm-svn: 320546
MachineRegisterInfo used to allow just one regalloc hint per virtual
register. This patch extends this to a vector of regalloc hints, which is
filled in by common code with sorted copy hints. Such hints will make for
more ID copies that can be removed.
NB! This improvement is currently (and hopefully temporarily) *disabled* by
default, except for SystemZ. The only reason for this is the big impact this
has on tests, which has unfortunately proven unmanageable. It was a long
while since all the tests were updated and just waiting for review (which
didn't happen), but now targets have to enable this themselves
instead. Several targets could get a head-start by downloading the tests
updates from the Phabricator review. Thanks to those who helped, and sorry
you now have to do this step yourselves.
This should be an improvement generally for any target!
The target may still create its own hint, in which case this has highest
priority and is stored first in the vector. If it has target-type, it will
not be recomputed, as per the previous behaviour.
The temporary hook enableMultipleCopyHints() will be removed as soon as all
targets return true.
Review: Quentin Colombet, Ulrich Weigand.
https://reviews.llvm.org/D38128
llvm-svn: 319754
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.
llvm-svn: 317647
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810
This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload
Such sequences are created in 2 scenarios:
Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D35816
Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr. This is a
general API improvement.
Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other. Instead I've done everything as a block and just
updated what was necessary.
This is mostly mechanical fixes: adding and removing `*` and `&`
operators. The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy. I couldn't run tests
for AVR since llc doesn't link with it turned on.
llvm-svn: 274189