We must also track the super sources of a copy, otherwise we introduce a sort of subtle bug.
Consider:
1. DEF r0:r1
2. USE r1
3. r6:r9 = COPY r10:r13
4. r14:15 = COPY r0:r1
5. USE r6
6.. r1:4 = COPY r6:9
BackwardCopyPropagateBlock processes the instructions from bottom up. After processing 6., we will have propagatable copy for r1-r4 and r6-r9. After 5., we invalidate and erase the propagatble copy for r1-r4 and r6 but not for r7-r9.
The issue is that when processing 3., data structures still say we have valid copies for dest regs r7-r9 (from 6.). The corresponding defs for these registers in 6. are r1:r4, which we mark as registers to invalidate. When invalidating, we find the copy that corresponds to r1 is 4. (this was added when processing 4.), and we say that r1 now maps to unpropagatable copies. Thus, when we process 2., we do not have a valid copy, but when we process 1. we do -- because the mapped copy for subregister r0 was never invalidated.
The net result is to propagate the copy from 4. to 1., and replace DEF r0:r1 with DEF r14:r15. Then, we have a use before def in 2.
The main issue is that we have an inconsitent state between which def regs and which src regs are valid. When processing 5., we mark all the defs in 6. as invalid, but only the subreg use as invalid. Either we must only invalidate the individual subreg for both uses and defs, or the super register for both.
Differential Revision: https://reviews.llvm.org//D157564
Change-Id: I99d5e0b1a0d735e8ea3bd7d137b6464690aa9486
Revert D152502 and instead optimize away copy from undefs, but clear the undef flag on the original copy.
Apparently, not optimizing the COPY can cause performance issues in some cases.
Fixes SWDEV-405813, SWDEV-405899
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153838
I don't think we can safely remove the second COPY as redundant in such cases.
The first COPY (which has undef src) may be lowered to a KILL instruction instead, resulting in no COPY being emitted at all.
Testcase is X86 so it's in the same place as other testcases for this function, but this was initially spotted on AMDGPU with the following:
```
renamable $vgpr24 = PRED_COPY undef renamable $vgpr25, implicit $exec
renamable $vgpr24 = PRED_COPY killed renamable $vgpr25, implicit $exec
```
The second COPY waas removed as redundant, and the first one was lowered to a KILL (= removed too), causing $vgpr24 to not have $vgpr25's value.
Fixes SWDEV-401507
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D152502
This change initializes the members TSI, LI, DT, PSI, and ORE pointer feilds of the SelectOptimize class to nullptr.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D148303
In this example:
```
$d14 = COPY killed $d18
$s0 = MI $s28
```
$s28 is a sub-register of $d14. However, $d18 does not have
sub-registers and thus cannot be forwarded. Previously, this resulted
in $noreg being substituted in place of the use of $s28, which later
led to an assertion failure.
Fixes https://github.com/llvm/llvm-project/issues/60908, a regression
that was introduced in D141747.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D146930
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
D125335 makes regsOverlap skip following control flow, which is not entended
in the original code.
Differential Revision: https://reviews.llvm.org/D128039
treated as Copy instruction in MCP.
This is then used in AArch64 to remove copy instructions after taildup
ran in machine block placement
Differential Revision: https://reviews.llvm.org/D125335
Change the implementation of isForwardableRegClassCopy so that it
does not rely on getMinimalPhysRegClass. Instead, iterate over all
classes looking for any that satisfy a required property.
NFCI on current upstream targets, but this copes better with
downstream AMDGPU changes where some new smaller classes have been
introduced, which was breaking regclass equality tests in the old
code like:
if (UseDstRC != CrossCopyRC && CopyDstRC == CrossCopyRC)
Differential Revision: https://reviews.llvm.org/D121903
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
On some AMDGPU subtargets, copying to and from AGPR registers using another
AGPR register is not possible. A intermediate VGPR register is needed for AGPR
to AGPR copy. This is an issue when machine copy propagation forwards a
COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is
removing a cross class copy that may have been optimized by previous passes and
potentially creating an unoptimized cross class copy later on.
To avoid this issue, check CrossCopyRegClass if a different register class will
be needed for the copy. If so then avoid forwarding the copy when the
destination does not match the desired register class and if the original copy
already matches the desired register class.
Issue seen while attempting to optimize another AGPR to AGPR issue:
Live-ins: $agpr0
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $vgpr0
$agpr3 = COPY $vgpr0
$agpr4 = COPY $vgpr0
After machine-cp:
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $agpr0
$agpr3 = COPY $agpr0
$agpr4 = COPY $agpr0
Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR
copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each
copy when the prior VGPR->AGPR copy was already optimal.
Reviewed By: lkail, rampitec
Differential Revision: https://reviews.llvm.org/D108011
Fixes bugs [[ https://bugs.llvm.org/show_bug.cgi?id=50580 | 50580 ]] and [[ https://bugs.llvm.org/show_bug.cgi?id=49446 | 49446 ]]
When compiling with -g "DBG_VALUE <reg>" instructions are added in the MIR, if such a instruction is inserted between instructions that use <reg> then MachineCopyPropagation invalidates <reg> , this causes some copies to not be propagated and causes differences in code generation (ex bugs 50580 and 49446 ). DBG_VALUE instructions should be ignored since they don't actually modify the register.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D104394
Previous crashes caused by this patch were the result of machine
subregisters being incorrectly handled in updateDbgUsersToReg; this has
been fixed by using RegUnits to determine overlapping registers, instead
of using the register values directly.
Differential Revision: https://reviews.llvm.org/D101523
This reverts commit 7ca26c5fa2df253878cab22e1e2f0d6f1b481218.
This patch modifies updateDbgUsersToReg to properly handle
DBG_VALUE_LIST instructions, by replacing the hard-coded operand indices
(i.e. getOperand(0)) with the more general getDebugOperandsForReg(), and
updating the register for all matching operands.
Differential Revision: https://reviews.llvm.org/D101523
Previously if the source match we asserted that the destination
matched. But GPR <-> mask register copies on X86 can violate this
since we use the same K-registers for multiple sizes.
Fixes this ISPC issue https://github.com/ispc/ispc/issues/1851
Differential Revision: https://reviews.llvm.org/D86507
In MachineCopyPropagation::BackwardPropagatableCopy(),
a check is added for multiple destination registers.
The copy propagation is avoided if the copied destination register
is the same register as another destination on the same instruction.
A new test is added. This used to fail on ARM like this:
error: unpredictable instruction, RdHi and RdLo must be different
umull r9, r9, lr, r0
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D82638
This decreases the time consumed by the pass [during RawSpeed unity build]
by 25% (0.0586 s -> 0.04388 s).
While that isn't really impressive overall, that wasn't the goal here.
The memory results here are noticeable.
The baseline results are:
```
total runtime: 55.65s.
calls to allocation functions: 19754254 (354960/s)
temporary memory allocations: 4951609 (88974/s)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.79MB
total memory leaked: 198.01MB
```
While with this patch the results are:
```
total runtime: 55.37s.
calls to allocation functions: 19068237 (344403/s) # -3.47 %
temporary memory allocations: 4261772 (76974/s) # -13.93 % (!!!)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.73MB
total memory leaked: 198.01MB
```
So we get rid of *a lot* of temporary allocations.
Using `SmallSet<8>` makes sense to me because at least here
for x86 BdVer2, the size of that set is *never* more than 3,
over all of llvm test-suite + RawSpeed.
The story might be different on other targets,
not sure if it will ever justify whole DenseSet,
but if it does SmallDenseSet might be a compromise.
Fix assertion error
```
bool llvm::MachineOperand::isRenamable() const: Assertion `Register::isPhysicalRegister(getReg()) && "isRenamable should only be checked on physical registers"' failed.
```
by checking if the register is 0 before invoking `isRenamable`.
Summary:
This patch mainly do such transformation
```
$R0 = OP ...
... // No read/clobber of $R0 and $R1
$R1 = COPY $R0 // $R0 is killed
```
Replace $R0 with $R1 and remove the COPY, we have
```
$R1 = OP ...
```
This transformation can also expose more opportunities for existing
copy elimination in MCP.
Differential Revision: https://reviews.llvm.org/D67794
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
In MachineCopyPropagation, when propagating the source of a copy into
the operand of a later instruction, bail if a destination overlaps
(partly defines) the copy source. If the instruction where the
substitution is happening is also a copy, allowing the propagation
confuses the tracking mechanism.
Differential Revision: https://reviews.llvm.org/D69953
Change-Id: Ic570754f878f2d91a4a50a9bdcf96fbaa240726d
Summary:
After tailduplication, we have redundant copies. We can remove these
copies in machine-cp if it's safe to, i.e.
```
$reg0 = OP ...
... <<< No read or clobber of $reg0 and $reg1
$reg1 = COPY $reg0 <<< $reg0 is killed
...
<RET>
```
will be transformed to
```
$reg1 = OP ...
...
<RET>
```
Differential Revision: https://reviews.llvm.org/D65267
llvm-svn: 371359
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
MCP currently uses changeDebugValuesDefReg / collectDebugValues to find
debug users of a register, however those functions assume that all
DBG_VALUEs immediately follow the specified instruction, which isn't
necessarily true. This is going to become very often untrue when we turn
off CodeGenPrepare::placeDbgValues.
Instead of calling changeDebugValuesDefReg on an instruction to change its
debug users, in this patch we instead collect DBG_VALUEs of copies as we
iterate over insns, and update the debug users of copies that are made
dead. This isn't a non-functional change, because MCP will now update
DBG_VALUEs that aren't immediately after a copy, but refer to the same
register. I've hijacked the regression test for PR38773 to test for this
new behaviour, an entirely new test seemed overkill.
Differential Revision: https://reviews.llvm.org/D56265
llvm-svn: 368835
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Recommits r342942, which was reverted in r343189, with a fix for an
issue where we would propagate unsafely if we defined only the upper
part of a register.
Original message:
Change the copy tracker to keep a single map of register units
instead of 3 maps of registers. This gives a very significant
compile time performance improvement to the pass. I measured a
30-40% decrease in time spent in MCP on x86 and AArch64 and much
more significant improvements on out of tree targets with more
registers.
llvm-svn: 344942
When MachineCopyPropagation eliminates a dead 'copy', its associated debug information becomes invalid. as the recorded register has been removed. It causes the debugger to display wrong variable value.
Differential Revision: https://reviews.llvm.org/D52614
llvm-svn: 343445
It seems to have broken several targets, see comments on the llvm-commits thread.
> Change the copy tracker to keep a single map of register units instead
> of 3 maps of registers. This gives a very significant compile time
> performance improvement to the pass. I measured a 30-40% decrease in
> time spent in MCP on x86 and AArch64 and much more significant
> improvements on out of tree targets with more registers.
>
> Differential Revision: https://reviews.llvm.org/D52374
llvm-svn: 343189
Change the copy tracker to keep a single map of register units instead
of 3 maps of registers. This gives a very significant compile time
performance improvement to the pass. I measured a 30-40% decrease in
time spent in MCP on x86 and AArch64 and much more significant
improvements on out of tree targets with more registers.
Differential Revision: https://reviews.llvm.org/D52374
llvm-svn: 342942
Instead of updating the CopyTracker's maps each time we come across a
RegMask, defer checking for this kind of interference until we're
actually trying to propagate a copy. This avoids the need to
repeatedly iterate over maps in the cases where we don't end up doing
any work.
This is a slight compile time improvement for MachineCopyPropagation
as is, but it also enables a much bigger improvement that I'll follow
up with soon.
Differential Revision: https://reviews.llvm.org/D52370
llvm-svn: 342940
This is a bit easier to follow than handling the copy and src maps
directly in the pass, and will make upcoming changes to how this is
done easier to follow.
llvm-svn: 342703