- Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code.
- Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().
Imagine a loop of the form:
```
preheader:
%r = def
header:
bcc latch, inner
inner1:
..
inner2:
b latch
latch:
%r = subs %r
bcc header
```
It can be possible for code to spend a decent amount of time in the
header<->latch loop, not going into the inner part of the loop as much.
The greedy register allocator can prefer to spill _around_ %r though,
adding spills around the subs in the loop, which can be very detrimental
for performance. (The case I am looking at is actually a very deeply
nested set of loops that repeat the header<->latch pattern at multiple
different levels).
The greedy RA will apply a preference to spill to the IV, as it is live
through the header block. This patch attempts to add a heuristic to
prevent that in this case for variables that look like IVs, in a similar
regard to the extra spill weight that gets added to variables that look
like IVs, that are expensive to spill. That will mean spills are more
likely to be pushed into the inner blocks, where they are less likely to
be executed and not as expensive as spills around the IV.
This gives a 8% speedup in the exchange benchmark from spec2017 when
compiled with flang-new, whilst importantly stabilising the scores to be
less chaotic to other changes. Running ctmark showed no difference in
the compile time. I've tried to run a range of benchmarking for
performance, most of which were relatively flat not showing many large
differences. One matrix multiply case improved 21.3% due to removing a
cascading chains of spills, and some other knock-on effects happen which
usually cause small differences in the scores.
This is a preparatory patch for extending DWARFDebugLine to properly
parse line number programs with maximum_operations_per_instruction > 1
for VLIW targets.
Add some scaffolding for handling op-index in line number programs, and
add printouts for that in the table. As this affects a lot of tests,
this is done in a separate commit to get a cleaner review for the actual
op-index implementation.
Verbose printouts are not present in many tests, and adding op-index to
those will require a bit more code changes, so that is done in the
actual implementation patch.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D152535
X86's CMOV conversion transforms CMOV instructions into control flow between
blocks, meaning the value is computed by a PHI rather than a "real" machine
instruction. In instruction-referencing mode, we need to transfer the
instruction label between the old CMOV and the new PHI instruction to mark
where the variable value is computed.
There's an extra complication in that memory operands can be unfolded from the
CMOV and sunk into the new blocks -- the test checks both scenarios where the
instruction number has to hop between instructions.
This omission exposed by Dexter testing.
Reviewed By: Orlando
Differential Revision: https://reviews.llvm.org/D145565
X86's CMOV conversion transforms CMOV instructions into control flow between
blocks, meaning the value is computed by a PHI rather than a "real" machine
instruction. In instruction-referencing mode, we need to transfer the
instruction label between the old CMOV and the new PHI instruction to mark
where the variable value is computed.
There's an extra complication in that memory operands can be unfolded from the
CMOV and sunk into the new blocks -- the test checks both scenarios where the
instruction number has to hop between instructions.
This omission exposed by Dexter testing.
Reviewed By: Orlando
Differential Revision: https://reviews.llvm.org/D145565
D63932 added a module flag to indicate that we are executing the regular
LTO post merge pipeline, so that GlobalDCE could perform more aggressive
optimization for Dead Virtual Function Elimination. This caused issues
trying to reuse bitcode that had already been through the LTO pipeline
(see context in D139816).
Instead support this by passing down a parameter flag to the
GlobalDCEPass constructor, which is the more usual way for indicating
this information.
Most test changes are to remove incidental uses of this flag. Of the 2
real uses, llvm/test/LTO/ARM/lto-linking-metadata.ll is now obsolete and
removed in this patch, and the virtual-functions-visibility-post-lto.ll
test is updated to use the regular LTO default pipeline where this
parameter is set to true.
Differential Revision: https://reviews.llvm.org/D153655
This caused compiler assertions, see comment on
https://reviews.llvm.org/D150107.
This also reverts the dependent follow-up change:
> [X86] Remove patterns for ADD/AND/OR/SUB/XOR/CMP with immediate 8 and optimize during MC lowering, NFCI
>
> This is follow-up of D150107.
>
> In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be
> shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp.
>
> Differential Revision: https://reviews.llvm.org/D150949
This reverts commit 2ef8ae134828876ab3ebda4a81bb2df7b095d030 and
5586bc539acb26cb94e461438de01a5080513401.
This is follow-up of D150107.
In addition, the function `X86::optimizeToFixedRegisterOrShortImmediateForm` can be
shared with project bolt and eliminates the code in X86InstrRelaxTables.cpp.
Differential Revision: https://reviews.llvm.org/D150949
Some metadata prettyprinting, including variable prettyprinting and
debug line info comments, is currently only supported for `DBG_VALUE`.
This allows `DBG_INSTR_REF` can be printed in the same way.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D150620
This is a follow-up to b71edfaa4ec3c998aadb35255ce2f60bba2940b0
since I forgot the lit.local.cfg files in that one.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: barannikov88, kwk
Differential Revision: https://reviews.llvm.org/D150762
In rare situations involving AVX intrinsics, it seems LLVM can be coaxed
into generating copies to arguments that look like this:
$xmm0 = VMOVAPSrr $xmm1, implicit-def $ymm0
CALL64 @something ymm0
This particular form of copy implicitly zeros the upper lanes of ymm0,
hence there's an implicit-def for the register in the copy. The X86
implementation of describeLoadedValue doesn't attempt to describe this sort
of copy which causes the generic implementation in
TargetInstrInfo::describeLoadedValue to fire an assertion saying it
expected the target hook to handle it.
Play it safe in the generic implementation and return the "no location /
value" return value, rather than asserting.
Differential Revision: https://reviews.llvm.org/D148626
For example, if you have a chain of inlined funtions like this:
1 #include <stdlib.h>
2 int g1 = 4, g2 = 6;
3
4 static inline void bar(int q) {
5 if (q > 5)
6 abort();
7 }
8
9 static inline void foo(int q) {
10 bar(q);
11 }
12
13 int main() {
14 foo(g1);
15 foo(g2);
16 return 0;
17 }
with optimizations you could end up with a single abort call for the two
inlined instances of foo(). When merging the locations for those inlined
instances you would previously end up with a 0:0 location in main().
Leaving out that inlined chain from the location for the abort call
could make troubleshooting difficult in some cases.
This patch changes DILocation::getMergedLocation() to try to handle such
cases. The function is rewritten to first find a common starting point
for the two locations (same subprogram and inlined-at location), and
then in reverse traverses the inlined-at chain looking for matches in
each subprogram. For each subprogram, the merge function will find the
nearest common scope for the two locations, and matching line and
column (or set them to 0 if not matching).
In the example above, you will for the abort call get a location in
bar() at 6:5, inlined in foo() at 10:3, inlined in main() at 0:0 (since
the two inlined functions are on different lines, but in the same
scope).
I have not seen anything in the DWARF standard that would disallow
inlining a non-zero location at 0:0 in the inlined-at function, and both
LLDB and GDB seem to accept these locations (with D142552 needed for
LLDB to handle cases where the file, line and column number are all 0).
One incompatibility with GDB is that it seems to ignore 0-line locations
in some cases, but I am not aware of any specific issue that this patch
produces related to that.
With x86-64 LLDB (trunk) you previously got:
frame #0: 0x00007ffff7a44930 libc.so.6`abort
frame #1: 0x00005555555546ec a.out`main at merge.c:0
and will now get:
frame #0: 0x[...] libc.so.6`abort
frame #1: 0x[...] a.out`main [inlined] bar(q=<unavailable>) at merge.c:6:5
frame #2: 0x[...] a.out`main [inlined] foo(q=<unavailable>) at merge.c:10:3
frame #3: 0x[...] a.out`main at merge.c:0
and with x86-64 GDB (11.1) you will get:
(gdb) bt
#0 0x00007ffff7a44930 in abort () from /lib64/libc.so.6
#1 0x00005555555546ec in bar (q=<optimized out>) at merge.c:6
#2 foo (q=<optimized out>) at merge.c:10
#3 0x00005555555546ec in main ()
Reviewed By: aprantl, dblaikie
Differential Revision: https://reviews.llvm.org/D142556
The two 3c LEA cases:
lea D(base, index,1) -> lea D(,index,2)
lea D(r13/rbp, index) -> lea D(,r13/rbp,2) // D maybe zero
Current take 2 instructions to transform. We can do a bit better by
using LEA w.o a base if base == index and scale == 1.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D141980
With D68945, more DBG_VALUEs were created without the indirect operand,
instead relying on OP_deref to accomplish the same effect.
At the time, however, we were not able to handle arbitrary expressions
in combination with OP_LLVM_entry_value, so D71416 prevented the use of
such operation in the presence of expressions.
As per the comment in DIExpression::isValid, "we support only entry
values of a simple register location." As such, a simple deref operation
should be supported. In fact, D80345 added support for indirect
DBG_VALUEs.
Taken the patches above into consideration, this commit relaxes the
restrictions on which expressions are allowed for entry value
candidates: the expression must be either empty or a single dereference
operator.
This patch is useful for D141381, which adds support for storing the
address of ABI-indirect parameters on the stack.
Depends on D142160
Differential Revision: https://reviews.llvm.org/D142654
When finding call-site argument locations, don't consider registers to be
location candidates if they will be clobbered between the copy to/from them
and call site. Doing so would present overwritten register values as entry
values in called functions.
This patch adds a collection of register units defined as we walk back from
the call site, and prevents the acceptance of a call-site parameter
location if it will be clobbered on that path.
Fixes https://github.com/llvm/llvm-project/issues/57444
Differential Revision: https://reviews.llvm.org/D141279
Add a flag state (and a MIR key) to MachineFunctions indicating whether they
contain instruction referencing debug-info or not. Whether DBG_VALUEs or
DBG_INSTR_REFs are used needs to be determined by LiveDebugValues at least, and
using the current optimisation level as a proxy is proving unreliable.
Test updates are purely adding the flag to tests, in a couple of cases it
involves separating out VarLocBasedLDV/InstrRefBasedLDV tests into separate
files, as they can no longer share the same input.
Differential Revision: https://reviews.llvm.org/D141387
Following support from the previous patches in this stack being added for
variadic DBG_INSTR_REFs to exist, this patch modifies LiveDebugValues to
handle those instructions. Support already exists for DBG_VALUE_LISTs, which
covers most of the work needed to handle these instructions; this patch only
modifies the transferDebugInstrRef function to correctly track them.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133927
Prior to this patch, variadic DIExpressions (i.e. ones that contain
DW_OP_LLVM_arg) could only be created by salvaging debug values to create
stack value expressions, resulting in a DBG_VALUE_LIST being created. As of
the previous patch in this patch stack, DBG_INSTR_REF's syntax has been
changed to match DBG_VALUE_LIST in preparation for supporting variadic
expressions. This patch adds some minor changes needed to allow variadic
expressions that aren't stack values to exist, and allows variadic expressions
that are trivially reduceable to non-variadic expressions to be handled
similarly to non-variadic expressions.
Reviewed by: jmorse
Differential Revision: https://reviews.llvm.org/D133926
This patch makes two notable changes to the MIR debug info representation,
which result in different MIR output but identical final DWARF output (NFC
w.r.t. the full compilation). The two changes are:
* The introduction of a new MachineOperand type, MO_DbgInstrRef, which
consists of two unsigned numbers that are used to index an instruction
and an output operand within that instruction, having a meaning
identical to first two operands of the current DBG_INSTR_REF
instruction. This operand is only used in DBG_INSTR_REF (see below).
* A change in syntax for the DBG_INSTR_REF instruction, shuffling the
operands to make it resemble DBG_VALUE_LIST instead of DBG_VALUE,
and replacing the first two operands with a single MO_DbgInstrRef-type
operand.
This patch is the first of a set that will allow DBG_INSTR_REF
instructions to refer to multiple machine locations in the same manner
as DBG_VALUE_LIST.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D129372
Currently the instruction referencing live debug values has 3 separate
places where we iterate over all known locations to find the best machine
location for a set of values at a given point in the program. Each of these
places has an implementation of this check, and one of them has slightly
different logic to the others. This patch moves the check for the "quality"
of a machine location into a separate function, which also avoids repeatedly
calling expensive functions, giving a slight performance improvement.
Differential Revision: https://reviews.llvm.org/D140412
Fix a test using invalid MLIR using different VRegs for the tied operands
of ADD64rr, which happened to trigger an assertion after my latest
changes.
Also attempting to adjust the MLRegalloc tests to the adjusted regalloc
(though I don't have a 100% working setup for them even without my
changes)
This stops reporting CostPerUse 1 for `R8`-`R15` and `XMM8`-`XMM31`.
This was previously done because instruction encoding require a REX
prefix when using them resulting in longer instruction encodings. I
found that this regresses the quality of the register allocation as the
costs impose an ordering on eviction candidates. I also feel that there
is a bit of an impedance mismatch as the actual costs occure when
encoding instructions using those registers, but the order of VReg
assignments is not primarily ordered by number of Defs+Uses.
I did extensive measurements with the llvm-test-suite wiht SPEC2006 +
SPEC2017 included, internal services showed similar patterns. Generally
there are a log of improvements but also a lot of regression. But on
average the allocation quality seems to improve at a small code size
regression.
Results for measuring static and dynamic instruction counts:
Dynamic Counts (scaled by execution frequency) / Optimization Remarks:
Spills+FoldedSpills -5.6%
Reloads+FoldedReloads -4.2%
Copies -0.1%
Static / LLVM Statistics:
regalloc.NumSpills mean -1.6%, geomean -2.8%
regalloc.NumReloads mean -1.7%, geomean -3.1%
size..text mean +0.4%, geomean +0.4%
Static / LLVM Statistics:
mean -2.2%, geomean -3.1%) regalloc.NumSpills
mean -2.6%, geomean -3.9%) regalloc.NumReloads
mean +0.6%, geomean +0.6%) size..text
Static / LLVM Statistics:
regalloc.NumSpills mean -3.0%
regalloc.NumReloads mean -3.3%
size..text mean +0.3%, geomean +0.3%
Differential Revision: https://reviews.llvm.org/D133902
This patch builds on prior support patches to enable support for
variadic debug values in InstrRefLDV, allowing DBG_VALUE_LISTs to
have their ranges extended.
Differential Revision: https://reviews.llvm.org/D128212
In the InstrRefBasedImpl for LiveDebugValues, we attempt to propagate
debug values through basic blocks in part by checking to see whether all
a variable's incoming debug values to a BB "agree", i.e. whether their
properties match and they refer to the same underlying value.
Prior to this patch, the check for agreement between incoming values
relied on exact equality, which meant that a VPHI and a Def DbgValue
that referred to the same underlying value would be seen as disagreeing.
This patch changes this behaviour to treat them as referring to the same
value, allowing the shared value to propagate into the BB.
Differential Revision: https://reviews.llvm.org/D125953
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697
This is a follow-up patch to D130999. In the test, the MIR contains an
unreachable MBB but the code attempts to look it up in MLocs. This
patch fixes this issue by checking for the default-constructed value.
rdar://97226240
Differential Revision: https://reviews.llvm.org/D131453
The testcase was delta-reduced from an LTO build with sanitizer
coverage and the MIR tail duplication pass caused a machine basic
block to become unreachable in MIR. This caused the MBB to be invisible
to the reverse post-order traversal used to initialize the MBB <->
RPONumber lookup tables.
rdar://97226240
Differential Revision: https://reviews.llvm.org/D130999
This is a re-apply of D123599, which was reverted in 4fe2ab5279408, now
with a more appropriate assertion. Original commit message follow:
InstrRefBasedLDV can track and describe variable values that are spilt to
the stack -- however it does not current describe the size of the value on
the stack. This can cause uninitialized bytes to be read from the stack if
a small register is spilt for a larger variable, or theoretically on
big-endian machines if a large value on the stack is used for a small
variable.
Fix this by using DW_OP_deref_size to specify the amount of data to load
from the stack, if there's any possibility for ambiguity. There are a few
scenarios where this can be omitted (such as when using DW_OP_piece and a
non-DW_OP_stack_value location), see deref-spills-with-size.mir for an
explicit table of inputs flavours and output expressions.
Differential Revision: https://reviews.llvm.org/D123599
This reverts commit a15b66e76d1ecff625a4bbb4a46ff83a43138f49.
This causes linker to crash at assertion: `Assertion failed: !Expr->isComplex(), file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\CodeGen\LiveDebugValues\InstrRefBasedImpl.cpp, line 907`.
InstrRefBasedLDV can track and describe variable values that are spilt to
the stack -- however it does not current describe the size of the value on
the stack. This can cause uninitialized bytes to be read from the stack if
a small register is spilt for a larger variable, or theoretically on
big-endian machines if a large value on the stack is used for a small
variable.
Fix this by using DW_OP_deref_size to specify the amount of data to load
from the stack, if there's any possibility for ambiguity. There are a few
scenarios where this can be omitted (such as when using DW_OP_piece and a
non-DW_OP_stack_value location), see deref-spills-with-size.mir for an
explicit table of inputs flavours and output expressions.
Differential Revision: https://reviews.llvm.org/D123599
This was reverted twice, in 987cd7c3ed75b and 13815e8cbf8d4. The latter
stemed from not accounting for rare register classes in a pre-allocated
array, and the former from an array not being completely initialized,
leading to asan complaining.
This was applied in fda4305e53784, reverted in 13815e8cbf8d49, the problem
was that fp80 X86 registers that were spilt to the stack aren't expected by
LiveDebugValues. It pre-allocates a position number for all register sizes
that can be spilt, and 80 bits isn't exactly common.
The solution is to scan the register classes to find any unrecognised
register sizes, adn pre-allocate those position numbers, avoiding a later
assertion.
This reverts commit fda4305e5378478051be225248bfe9c1d401d938.
Green dragon has spotted a problem -- it's understood, but might be fiddly
to fix, reverting in the meantime.
DBG_PHI instructions can refer to stack slots, to indicate that multiple
values merge together on control flow joins in that slot. This is fine --
however the slot might be merged at a later date with a slot of a different
size. In doing so, we lose information about the size the eliminated PHI.
Later analysis passes have to guess.
Improve this by attaching an optional "bit size" operand to DBG_PHI, which
only gets added for stack slots, to let us know how large a size the value
on the stack is.
Differential Revision: https://reviews.llvm.org/D124184