This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
RFC:
https://discourse.llvm.org/t/rfc-extend-machine-value-type-from-uint8-t-to-uint16-t/80274
compile-time-tracker:
https://llvm-compile-time-tracker.com/compare.php?from=4b9fab591916eec9fd1942f37afe3b137b564089&to=177d28247efe5a4d59a8d8150b4daf01e4f57d74&stat=wall-time
Currently 208 out of 256 MVTs are used, it will be run out soon, so
ultimately we need to extend the original `MVT::SimpleValueType` from
`uint8_t` to `uint16_t` to accomodate more types.
The `MatcherTable` uses `unsigned char` for encoding the matcher code,
so the extended MVTs are no longer fit into the table, thus we need to
use VBR to encode them as we do on others that are wider than 8 bits.
The statistics below shows the difference of "Total Array size" of the
matcher table that appears in every files:
```
Table Before After Change(%)
WebAssemblyGenDAGISel.inc 23576 23775 0.844
NVPTXGenDAGISel.inc 173498 173498 0
RISCVGenDAGISel.inc 2179121 2369929 8.756
AVRGenDAGISel.inc 2754 2754 0
PPCGenDAGISel.inc 163315 163617 0.185
MipsGenDAGISel.inc 47280 47447 0.353
SystemZGenDAGISel.inc 56243 56461 0.388
AArch64GenDAGISel.inc 467893 487830 4.261
MSP430GenDAGISel.inc 8069 8069 0
LoongArchGenDAGISel.inc 78928 79131 0.257
XCoreGenDAGISel.inc 3432 3432 0
BPFGenDAGISel.inc 3733 3733 0
VEGenDAGISel.inc 65174 66456 1.967
LanaiGenDAGISel.inc 2067 2067 0
X86GenDAGISel.inc 628787 636987 1.304
ARMGenDAGISel.inc 170968 171036 0.040
HexagonGenDAGISel.inc 155764 155764 0
SparcGenDAGISel.inc 5762 5798 0.625
AMDGPUGenDAGISel.inc 504356 504463 0.021
R600GenDAGISel.inc 29785 29785 0
```
The statistics below shows the runtime peak memory usage by compiling a
simple C program:
`/bin/time -v clang -target $TARGET -O3 -c test.c`
```
int test(int a) {
return a * 3;
}
```
```
Target Before(kbytes) After(kbytes) Change(%)
wasm64 110172 110088 -0.076
nvptx64 109784 109980 0.179
riscv64 114020 113656 -0.319
avr 110352 110068 -0.257
ppc64 112612 112476 -0.120
mips64 113588 113668 0.070
systemz 110860 110760 -0.090
aarch64 113704 113432 -0.239
msp430 110284 110200 -0.076
loongarch64 111052 110756 -0.267
xcore 108340 108020 -0.295
bpf 110620 110708 0.080
ve 110960 110920 -0.036
lanai 110180 109960 -0.200
x86_64 113640 113304 -0.296
arm64 113540 113172 -0.324
hexagon 114620 114684 0.056
sparc 110412 110136 -0.250
amdgcn 118164 117144 -0.863
r600 111200 110508 -0.622
```
This is only used by x86 and only used in the AsmPrinter module pass. I
think implementing this by looking at the underlying IR types instead
of the selected instructions is a pretty horrifying implementation,
but it's still available in the AsmPrinter.
This is https://reviews.llvm.org/D123933 resurrected.
I still don't know what the point of emitting _fltused is, but this
approach of looking at the IR types probably isn't the right way to
do this in the first place. If the intent is report any FP instructions,
this will miss any implicitly introduced ones during codegen. Also don't
know why just unconditionally emitting it isn't an option.
The last review mentioned the ARMs might want to emit this, but I'm
not going to go fix that. If someone wants to emit this on ARM, they
can move this to a common helper or analysis somewhere.
Since `raw_string_ostream` doesn't own the string buffer, it is
desirable (in terms of memory safety) for users to directly reference
the string buffer rather than use `raw_string_ostream::str()`.
Work towards TODO comment to remove `raw_string_ostream::str()`.
This duplicates the documentation from the declaration, but has
gotten out of sync. There was a mention of a DenseMap that has
been gone for 14 years. Since this declaration is a default
implementation of a virtual function, its unlikely to be looked at.
Just remove it.
UI++ in the loop might appear to indicate that the loop modifies the
container in some way (deletion or insertion), but the loop just
examines the container.
Previously this assumed that `LLVM_ENABLE_ABI_BREAKING_CHECKS` would
always be enabled in this case, if it's not `TTI` does not exist.
Introduced in 7652a59407018c057cdc1163c9f64b5b6f0954eb
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1
It still breaks EXPENSIVE_CHECKS build. Sorry.
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
This is the major rename patch that prior patches have built towards.
The DPValue class is being renamed to DbgVariableRecord, which reflects
the updated terminology for the "final" implementation of the RemoveDI
feature. This is a pure string substitution + clang-format patch. The
only manual component of this patch was determining where to perform
these string substitutions: `DPValue` and `DPV` are almost exclusively
used for DbgRecords, *except* for:
- llvm/lib/target, where 'DP' is used to mean double-precision, and so
appears as part of .td files and in variable names. NB: There is a
single existing use of `DPValue` here that refers to debug info, which
I've manually updated.
- llvm/tools/gold, where 'LDPV' is used as a prefix for symbol
visibility enums.
Outside of these places, I've applied several basic string
substitutions, with the intent that they only affect DbgRecord-related
identifiers; I've checked them as I went through to verify this, with
reasonable confidence that there are no unintended changes that slipped
through the cracks. The substitutions applied are all case-sensitive,
and are applied in the order shown:
```
DPValue -> DbgVariableRecord
DPVal -> DbgVarRec
DPV -> DVR
```
Following the previous rename patches, it should be the case that there
are no instances of any of these strings that are meant to refer to the
general case of DbgRecords, or anything other than the DPValue class.
The idea behind this patch is therefore that pure string substitution is
correct in all cases as long as these assumptions hold.
This patch changes DPValue::filter to be a non-member method
filterDbgVars. There are two reasons for this: firstly, the name of
DPValue is about to change to DbgVariableRecord, which will result in
every `for` loop that uses DPValue::filter to require a line break. This
is a small thing, but it makes the rename patch more difficult to
review, and is just generally more awkward for what is a fairly common
loop. Secondly, the intent is to later break up the DPValue class into
subclasses, at which point it would be better to have a non-member
function that allows template arguments for the cases we want to filter
with greater specificity.
As part of the effort to rename the DbgRecord classes, this patch
renames the widely-used functions that operate on DbgRecords but refer
to DbgValues or DPValues in their names to refer to DbgRecords instead;
all such functions are defined in one of `BasicBlock.h`,
`Instruction.h`, and `DebugProgramInstruction.h`.
This patch explicitly does not change the names of any comments or
variables, except for where they use the exact name of one of the
renamed functions. The reason for this is reviewability; this patch can
be trivially examined to determine that the only changes are direct
string substitutions and any results from clang-format responding to the
changed line lengths. Future patches will cover renaming variables and
comments, and then renaming the classes themselves.
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
Patch 1 of 3 to add llvm.dbg.label support to the RemoveDIs project. The
patch stack adds a new base class
-> 1. Add DbgRecord base class for DPValue and the not-yet-added
DPLabel class.
2. Add the DPLabel class.
3. Enable dbg.label conversion and add support to passes.
Patches 1 and 2 are NFC.
In the near future we also will rename DPValue to DbgVariableRecord and
DPLabel to DbgLabelRecord, at which point we'll overhaul the function
names too. The name DPLabel keeps things consistent for now.
We record the usage of each `Predicate` and sort them by usage.
For the top 8 `Predicate`s, we will emit a `PC_CheckPredicateN` to
save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 61K.
This is a recommit of 1a57927, which was reverted in bc98c31.
The CI failures occurred when doing expensive checks (with option
`LLVM_ENABLE_EXPENSIVE_CHECKS` being ON).
The key point here is that we need stable sorting result in the
test, but doing expensive checks uncovered the non-determinism of
`llvm::sort`. So `llvm::sort` is changed to `llvm::stable_sort`
in this revised patch.
And we use `llvm::MapVector` to keep insertion order.
We record the usage of each `Predicate` and sort them by usage.
For the top 8 `Predicate`s, we will emit a `PC_CheckPredicateN` to
save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 61K.
We record the usage of each `PatternPredicate` and sort them by
usage.
For the top 8 `PatternPredicate`s, we will emit a
`OPC_CheckPatternPredicateN` to save one byte.
The old `OPC_CheckPatternPredicate2` is renamed to
`OPC_CheckPatternPredicateTwoByte`.
Overall this reduces the llc binary size with all in-tree targets by
about 93K.
We record the usage of each `ComplexPat` and sort the `ComplexPat`s
by usage.
For the top 8 `ComplexPat`s, we will emit a `OPC_CheckComplexPatN`
to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 89K.
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
The change is fairly mechanical:
1. Factor code from `FastISel::selectIntrinsicCall`, which converts
debug intrinsics into debug instructions, into functions (NFC).
2. Call those functions for DPValues attached to instructions too.
The test updates look the same as other RemoveDIs changes: re-run the
tests with `--try-experimental-debuginfo-iterators`, which checks the
output is identical using the new debug info format (if it has been
enabled in the cmake configuration).
Depends on #76941 (otherwise some modified tests spuriously fail).
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.
Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
The followed byte of `OPC_EmitRegister` is a MVT type, which is
usually i32 or i64.
We add `OPC_EmitRegisterI32` and `OPC_EmitRegisterI64` so that we
can reduce one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 10K.
The CheckInteger routine called from TableGen-generated selection logic
uses getSExtValue - which will abort if the underlying APInt does not
fit into an int64_t.
This case is now triggered by the SystemZ back-end since i128 is a legal
type on certain machines. While we do not have any regular instructions
that take 128-bit immediates (like most other platforms), there are
patterns in the .td files that recognize an i128 "xor ..., -1" as a
"not".
These patterns cause code to be generated that calls the CheckInteger
routine on some i128-valued integer, which may trigger the assert.
Fix by using trySExtValue instead.
Fixes https://github.com/llvm/llvm-project/issues/75710
This is a boring mechanical update to support DPValues that look like
dbg.declares in SelectionDAG.
The tests will become "live" once #74090 lands (see for more info).
There are a lot of operations to move current node to parent and
then move to another child.
So `OPC_MoveSibling` and its space-optimized forms are added to do
this "move to sibling" operations.
These new operations will be generated when optimizing matcher in
`ContractNodes`. Currently `MoveParent+MoveChild` will be optimized
to `MoveSibling` and sequences `MoveParent+RecordChild+MoveChild`
will be transformed into `MoveSibling+RecordNode`.
Overall this reduces the llc binary size with all in-tree targets by
about 30K.
If there is only one bit set in EmitNodeInfo, then we can encode it
implicitly to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 168K.
The most common type is i32 or i64 so we add `OPC_CheckChildTypeI32`
and `OPC_CheckChildTypeI64` to save one byte.
Overall this reduces the llc binary size with all in-tree targets by
about 70K.
These new opcodes implicitly indicate the RecNo.
The old `OPC_EmitCopyToReg2` is renamed to `OPC_EmitCopyToRegTwoByte`.
Overall this reduces the llc binary size with all in-tree targets by
about 33K (most are from RISCV target).