- The direct use case (in [1]) is to add `llvm::IntervalMap` [2] and the allocator required by IntervalMap ctor [3]
to class `InstrProfSymtab` as owned members. The allocator class doesn't have a move-assignment operator;
and it's going to take much effort to implement move-assignment operator for the allocator class such that the
enclosing class is movable.
- There is only one use of compiler-generated move-assignment operator in the repo, which is in
CoverageMappingReader.cpp. Luckily it's possible to use std::unique_ptr<InstrProfSymtab> instead, so did the change.
[1] https://github.com/llvm/llvm-project/pull/66825
[2] 4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L936)
[3] 4c2f68840e/llvm/include/llvm/ADT/IntervalMap.h (L1041)
Also, Let `NumConditions` `uint16_t`.
It is smarter to handle the ID as signed.
Narrowing to `int16_t` will reduce costs of handling byvalue. (See also
#81221 and #81227)
External behavior doesn't change. They below handle values as internal
values plus 1.
* `-dump-coverage-mapping`
* `CoverageMappingReader.cpp`
* `CoverageMappingWriter.cpp`
Introduce `mcdc::DecisionParameters` and `mcdc::BranchParameters` and make
sure them not initialized as zero.
FIXME: Could we make `CoverageMappingRegion` as a smart tagged union?
They can be also used in `clang`.
Introduce the lightweight header instead of `CoverageMapping.h`.
This includes for now:
* `mcdc::ConditionID`
* `mcdc::Parameters`
## Motivation
Since we don't need the metadata sections at runtime, we can somehow
offload them from memory at runtime. Initially, I explored [debug info
correlation](https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113),
which is used for PGO with value profiling disabled. However, it
currently only works with DWARF and it's be hard to add such artificial
debug info for every function in to CodeView which is used on Windows.
So, offloading profile metadata sections at runtime seems to be a
platform independent option.
## Design
The idea is to use new section names for profile name and data sections
and mark them as metadata sections. Under this mode, the new sections
are non-SHF_ALLOC in ELF. So, they are not loaded into memory at runtime
and can be stripped away as a post-linking step. After the process
exits, the generated raw profiles will contains only headers + counters.
llvm-profdata can be used correlate raw profiles with the unstripped
binary to generate indexed profile.
## Data
For chromium base_unittests with code coverage on linux, the binary size
overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and
the raw profile files size reduce from 128M to 68M (46.9%)
```
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
```
A few things to note:
1. llvm-profdata doesn't support filter raw profiles by binary id yet,
so when a raw profile doesn't belongs to the binary being digested by
llvm-profdata, merging will fail. Once this is implemented,
llvm-profdata should be able to only merge raw profiles with the same
binary id as the binary and discard the rest (with mismatched/missing
binary id). The workflow I have in mind is to have scripts invoke
llvm-profdata to get all binary ids for all raw profiles, and
selectively choose the raw pnrofiles with matching binary id and the
binary to llvm-profdata for merging.
2. Note: In COFF, currently they are still loaded into memory but not
used. I didn't do it in this patch because I noticed that `.lcovmap` and
`.lcovfunc` are loaded into memory. A separate patch will address it.
3. This should works with PGO when value profiling is disabled as debug
info correlation currently doing, though I haven't tested this yet.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form. This patch replaces
support::endianness with llvm::endianness.
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form. This patch replaces
support::endianness::{big,little,native} with
llvm::endianness::{big,little,native}.
This reverts commit 32db121b29f78e4c41116b2a8f1c730f9522b202 and subsequent commits.
This causes time regression on llvm-cov even with debug info correlation off.
This seems to cause Clang to crash, see comments on the code review. Reverting
until the problem can be investigated.
> Part 1 of 3. This includes the LLVM back-end processing and profile
> reading/writing components. compiler-rt changes are included.
>
> Differential Revision: https://reviews.llvm.org/D138846
This reverts commit a50486fd736ab2fe03fcacaf8b98876db77217a7.
Part 1 of 3. This includes the LLVM back-end processing and profile
reading/writing components. compiler-rt changes are included.
Differential Revision: https://reviews.llvm.org/D138846
Debug info correlation is an option in InstrProfiling pass, which is used by
both IR instrumentation and front-end instrumentation. So, Clang coverage can
also benefits the binary size saving from it.
Reviewed By: ellis
Differential Revision: https://reviews.llvm.org/D157913
The current llvm-cov error message for kind `coveragemap_error::malforme`, just gives the issue kind without any reason for what caused the issue. This patch is aimed at improving the llvm-cov error message to help identify what caused the issue.
Reviewed By: MaskRay
Close: https://github.com/llvm/llvm-project/pull/65264
`llvm-cov convert-for-testing` only functions properly when the input binary contains a single source file. When the binary has multiple source files, a `Malformed coverage data` error will occur when the generated .covmapping is read back. This is because the testing format lacks support for indicating the size of its file records, and current implementation just assumes there's only one record in it. This patch fixes this problem by introducing a new testing format version.
Changes to the code:
- Add a new format version. The version number is stored in the the last 8 bytes of the orignial magic number field to be backward-compatible.
- Output a LEB128 number before the file records section to indicate its size in the new version.
- Change the format parsing code correspondingly.
- Update the document to formalize the testing format.
- Additionally, fix the bug when converting COFF binaries.
Reviewed By: phosek, gulfem
Differential Revision: https://reviews.llvm.org/D156611
`llvm-cov convert-for-testing` is used to build the .covmapping files used in its regression tests. However the current implementation only works when there's only one source file in the mapping information data. If there are more than 1 source files, `llvm-cov convert-for-testing` can still produce a .covmapping file, but when read it back, `llvm-cov` will report:
```
error: Failed to load coverage: 'main.covmapping': Malformed coverage data
```
This is because the output .covmapping file doesn't have any mark to indicate the boundary between file records and function records, and current implementation jsut assume there's only one file record in the .covmapping file.
Changes to the code:
- Make `llvm-cov convert-for-testing` output a LEB128 number before file records to indicate its size.
- Change the testing format parsing code correspondingly.
- Update existing .covmapping files.
Reviewed By: gulfem
Differential Revision: https://reviews.llvm.org/D156611
This makes parsing for build IDs in the markup filter slightly more
permissive, in line with fromHex.
It also removes the distinction between missing build ID and empty build
ID; empty build IDs aren't a useful concept, since their purpose is to
uniquely identify a binary. This removes a layer of indirection wherever
build IDs are obtained.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D147485
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This improves consistency with other places (e.g. llvm::compression::decompress,
llvm::object::Decompressor::decompress, llvm-objcopy).
Note: when zstd::uncompress was added, we noticed that the API `ZSTD_decompress`
is fine while the zlib API `uncompress` is a misnomer.
It's more natural to use uint8_t * (std::byte needs C++17 and llvm has
too much uint8_t *) and most callers use uint8_t * instead of char *.
The functions are recently moved into `llvm::compression::zlib::`, so
downstream projects need to make adaption anyway.
* Refactor compression namespaces across the project, making way for a possible
introduction of alternatives to zlib compression.
Changes are as follows:
* Relocate the `llvm::zlib` namespace to `llvm::compression::zlib`.
Reviewed By: MaskRay, leonardchan, phosek
Differential Revision: https://reviews.llvm.org/D128953
We already remove dots from collected paths and path mappings. This
makes it difficult to match paths inside the profile which contain
dots. For example, we would never match /path/to/../file.c because
the collected path is always be normalized to /path/file.c. This
change enables dot removal for paths inside the profile to address
the issue.
Differential Revision: https://reviews.llvm.org/D123164
We already remove dots from collected paths and path mappings. This
makes it difficult to match paths inside the profile which contain
dots. For example, we would never match /path/to/../file.c because
the collected path is always be normalized to /path/file.c. This
change enables dot removal for paths inside the profile to address
the issue.
Differential Revision: https://reviews.llvm.org/D122750
Most notably,
llvm/Object/Binary.h no longer includes llvm/Support/MemoryBuffer.h
llvm/Object/MachOUniversal*.h no longer include llvm/Object/Archive.h
llvm/Object/TapiUniversal.h no longer includes llvm/Object/TapiFile.h
llvm-project preprocessed size:
before: 1068185081
after: 1068324320
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119457
If profile data is malformed for any kind of reason, we generate
an error that only reports "malformed instrumentation profile data"
without any further information. This patch extends InstrProfError
class to receive an optional error message argument, so that we can
do better error reporting.
Differential Revision: https://reviews.llvm.org/D108942
Function Records are required to be aligned on 8 bytes. This is enforced for each
records except the first, when one relies on the default alignment within an
std::string. There's no such guarantee, and indeed on 32 bits for some
implementation of std::string this is not enforced.
Provide a portable implementation based on llvm's MemoryBuffer.
Differential Revision: https://reviews.llvm.org/D104745
When making compilation relocatable, for example in distributed
compilation scenarios, we want to set compilation dir to a relative
value like `.` but this presents a problem when generating reports
because if the file path is relative as well, for example `..`, you
may end up writing files outside of the output directory.
This change introduces a flag that allows overriding the compilation
directory that's stored inside the profile with a different value that
is absolute.
Differential Revision: https://reviews.llvm.org/D100232
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.
(Already remarked by jhenderson on D70769.)
No behavior change.
Differential Revision: https://reviews.llvm.org/D100957
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
This is an enhancement to LLVM Source-Based Code Coverage in clang to track how
many times individual branch-generating conditions are taken (evaluate to TRUE)
and not taken (evaluate to FALSE). Individual conditions may comprise larger
boolean expressions using boolean logical operators. This functionality is
very similar to what is supported by GCOV except that it is very closely
anchored to the ASTs.
Differential Revision: https://reviews.llvm.org/D84467
llvm-cov reports a poor error message when the -arch specifier is
missing or invalid, and a binary has multiple slices. Make the error
message more specific.
(This version of the patch avoids using llvm::none_of -- the way I used
the utility caused compile errors on many bots, possibly because the
wrong overload of `none_of` was selected.)
rdar://40312677
This reverts commit b81d4bfb44c14575130bb06c047728b69c3213aa.
It's causing some bots to fail to build due to: "error: no matching
function for call to ‘__iterator_category".