Big archive writer operation is not currently supported so mark these tests XFAIL for now.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D122949
Summary:
when run "llvm-ar cr" on AIX OS , it created a gnu archive, it is not desirable in aix OS.
instead of creating a gnu archive, the patch will print a unsupport message for llvm-ar big archive write operation in AIX OS.
after implement the big archive operation, I will revert the XFAIL: AIX " and "--format=gnu" test cases in the patch.
Reviewer : James Henderson, Jinsong Ji
Differential Revision: https://reviews.llvm.org/D122746
https://reviews.llvm.org/D116787
This reverts commit 33b3c86afab06ad61d46456c85c0b565cfff8287.
New change: fixed build failures:
- in stabs-sorted:restore the the ERR-KEY statements, which were accidentally deleted during refactoring
- in ObjDumper.h/MachODumper.cpp: refactor so that current dumpers which didn't provide an impl that accept a SymCom still works
Sometimes we would like to run post-processing repeatedly on the original sample profile for tuning. In order to avoid regenerating the original profile from scratch every time, this change adds the support of reading in the original profile (called symbolized profile) and running the post-processor on it.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D121655
Summary:
Address post-commit review comments in the https://reviews.llvm.org/D82549, including
changed file name from llvm/test/tools/llvm-readobj/XCOFF/xcoff-auxiliary-header.test --> llvm/test/tools/llvm-readobj/XCOFF/auxiliary-header.test
replaced macro define by using lambda function.
added a helper function to reduce the duplicated check and print error code.
Reviewer : James Henderson
Differential Revision: https://reviews.llvm.org/D116220
Fix#54456: `objcopy --only-keep-debug` produces a linked image with invalid
empty dynamic section. llvm-objdump -p currently reports an error which seems
excessive.
```
% llvm-readelf -l a.out
llvm-readelf: warning: 'a.out': no valid dynamic table was found
...
```
Follow the spirit of llvm-readelf -l (D64472) and report a warning instead.
This allows later files to be dumped despite warnings for an input file, and
improves objdump compatibility in that the exit code is now 0 instead of 1.
```
% llvm-objdump -p a.out # new behavior
...
Program Header:
llvm-objdump: warning: 'a.out': invalid empty dynamic section
% objdump -p a.out
...
Dynamic Section:
```
Reviewed By: jhenderson, raj.khem
Differential Revision: https://reviews.llvm.org/D122505
Re-commit of 32e8b550e5439c7e4aafa73894faffd5f25d0d05
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
...
```
after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov x29, sp
.cfi_def_cfa w29, 16
...
```
Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
Use `ProfileSummaryBuilder::DefaultCutoffs` for llvm-profdata detailed summary printing for Instr profile.
Differential Revision: https://reviews.llvm.org/D122210
Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled.
To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain.
I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding.
I'm seeing 20GB memory is saved for one of our internal large service.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D121643
NVPTX does not support generating binary files, which is required for
these tests.
The majority of tests in 'DebugInfo/Generic' also require emitting
object files, so they all are disabled for NVPTX.
Differential Revision: https://reviews.llvm.org/D121996
Update the FileCheck patterns in a test case to prevent a path name
containing the `@` character from causing it to fail unnecessarily,
e.g. during a Jenkins CI job.
This reverts commit 0d362c90d335509c57c0fbd01ae1829e2b9c3765.
Reason: Causes the MSan buildbot to fail (see comments on
https://reviews.llvm.org/D121179 for more information
To ease profile annotation, each of the callsites in a function can be
annotated with profile data - "IR metadata format for MemProf" [1]. This
patch extends the on-disk serialized record format to store the debug
information for allocation callsites incl inline frames. This change is
incompatible with the existing format i.e. indexed profiles must be
regenerated, raw profiles are unaffected.
[1] https://groups.google.com/g/llvm-dev/c/aWHsdMxKAfE/m/WtEmRqyhAgAJ
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D121179
The strippable Swift reflection sections contain subtractor relocations
that need to be applied. There are two situations we need to support.
1) Both symbols used in the relocation come from the .o file (for
example, one symbol lives in __swift5_fieldmd and the second in
__swift5_reflstr).
2) One symbol comes from th .o file and the second from the main
binary (for example, __swift5_fieldmd and __swift5_typeref).
Differential Revision: https://reviews.llvm.org/D120574
We add to ensure that we are observing the correct callstack order in memprof
during symbolization. There was some confusion whether the order of
DIFrame objects were reversed but in reality the leaf function is at
index 0 so no code changes are required.
Differential Revision: https://reviews.llvm.org/D121759
STB_GNU_UNIQUE is like STB_GLOBAL with extra semantics:
* gold and ld.lld: changed to STB_GLOBAL if --no-gnu-unique is specified
* glibc: unique even with dlopen `RTLD_LOCAL`, implies DF_1_NODELETE
Therefore, I think it makes sense for --weaken-symbol/--weaken-symbols/--weaken
to change STB_GNU_UNIQUE symbols.
binutils 2.39 will have the same behavior: https://sourceware.org/bugzilla/show_bug.cgi?id=28926
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D120638
Template functions share the same lines in source files, so the common
container of lines' properties cannot be used to calculate the coverage
statistics of individual functions.
> cat tmpl.cpp
template <int N> int test() { return N; }
int main() { return test<1>() + test<2>(); }
> clang++ --coverage tmpl.cpp -o tmpl
> ./tmpl
> llvm-cov gcov tmpl.cpp -f
...
Function '_Z4testILi1EEiv'
Lines executed:100.00% of 1
Function '_Z4testILi2EEiv'
Lines executed:-nan% of 0
...
> llvm-cov-patched gcov tmpl.cpp -f
...
Function '_Z4testILi1EEiv'
Lines executed:100.00% of 1
Function '_Z4testILi2EEiv'
Lines executed:100.00% of 1
...
Differential Revision: https://reviews.llvm.org/D121390
Early adoption of new technologies or adjusting certain code generation/IR optimization thresholds
is often available through some cl::opt options (which have unstable surfaces).
Specifying such an option twice will lead to an error.
```
% clang -c a.c -mllvm -disable-binop-extract-shuffle -mllvm -disable-binop-extract-shuffle
clang (LLVM option parsing): for the --disable-binop-extract-shuffle option: may only occur zero or one times!
% clang -c a.c -mllvm -hwasan-instrument-reads=0 -mllvm -hwasan-instrument-reads=0
clang (LLVM option parsing): for the --hwasan-instrument-reads option: may only occur zero or one times!
% clang -c a.c -mllvm --scalar-evolution-max-arith-depth=32 -mllvm --scalar-evolution-max-arith-depth=16
clang (LLVM option parsing): for the --scalar-evolution-max-arith-depth option: may only occur zero or one times!
```
The option is specified twice, because there is sometimes a global setting and
a specific file or project may need to override (or duplicately specify) the
value.
The error is contrary to the common practice of getopt/getopt_long command line
utilities that let the last option win and the `getLastArg` behavior used by
Clang driver options. I have seen such errors for several times. I think the
error just makes users inconvenient, while providing very little value on
discouraging production usage of unstable surfaces (this goal is itself
controversial, because developers might not want to commit to a stable surface
too early, or there is just some subtle codegen toggle which is infeasible to
have a driver option). Therefore, I suggest we drop the diagnostic, at least
before the diagnostic gets sufficiently better support for the overridding needs.
Removing the error is a degraded error checking experience. I think this error
checking behavior, if desirable, should be enabled explicitly by tools. Users
preferring the behavior can figure out a way to do so.
Reviewed By: jhenderson, rnk
Differential Revision: https://reviews.llvm.org/D120455
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.
But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.
This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.
And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.
As an added benefit, this patch simplifies overall return instruction handling.
Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.
Reviewed By: arsenm, ronlieb
Differential Revision: https://reviews.llvm.org/D114652
CS nested profile has a benefit over the CS flat profile that is to speed up the build while achieve an on-par performance. I'm turning it on by default for CSSPGO.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D121142
This adds JSON output to llvm-remark-size-diff.
The goal here is to make it easy for external tools to consume output from
llvm-remark-size-diff. These tools could be used for automated size analysis.
(E.g. in CI).
To specify JSON output, use `--report_style=json`. JSON output can be
pretty-printed via `--pretty`.
With automation in mind, the schema looks like this:
```
"Files": {
"A": <filename_a>
"B": <filename_b>
},
"InBoth": [
{
"FunctionName": <function name>,
"InstCount": [
<count_in_a>,
<count_in_b>
],
"StackSize": [
<count_in_a>,
<count_in_b>
]
},
...
]
"OnlyInA": [
{
"FunctionName": <function name>,
"InstCount": [
<count_in_a>,
0
],
"StackSize": [
<count_in_a>,
0
]
},
...
]
"OnlyInB": [
{
"FunctionName": <function name>,
"InstCount": [
0,
<count_in_b>
],
"StackSize": [
0,
<count_in_b>
]
},
...
]
```
A few notes:
- Filenames are included, because tools may want to combine many outputs
together in some way (a big JSON file, a big CSV, or something.)
- Counts are represented as [a, b] so that a diff can be calculated via b - a.
The original counts may be useful for size analysis (e.g. was this function
extremely large before?) and so both are preserved.
- `OnlyInA` and `OnlyInB` have a 0 for one of the counts always. This is to
make it easier for tools to share code between `OnlyInA`, `OnlyInB`, and
`InBoth`.
Differential Revision: https://reviews.llvm.org/D121173
This ELF note is aarch64 and Android-specific. It specifies to the
dynamic loader that specific work should be scheduled to enable MTE
protection of stack and heap regions.
Current synthesis of the ".note.android.memtag" ELF note is done in the
Android build system. We'd like to move that to the compiler, and this
is the first step.
Reviewed By: MaskRay, jhenderson
Differential Revision: https://reviews.llvm.org/D119381
Previously we used sra+add+xor if ADDCARRY is supported. This changes
to sra+xor+sub is SUBCARRY is available.
This is consistent with the recent change to the default expansion
in LegalizeDAG.
Differential Revision: https://reviews.llvm.org/D121039
We can't just split by space, that's not going to give us the same
argv we'd have gotten from the shell, it could be in a string,
we must actually parse that as argv.
When running llvm-bitcode-strip we want to remove the __LLVM
segment as well as the __bundle section when there are no other
sections in the segment.
Differential Revision: https://reviews.llvm.org/D120927
This diff adds functionality to the llvm-bitcode-strip tool for
stripping of LLVM bitcode sections.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D120669
This patch filters out callstack frames which can't be symbolized or if
the frames belong to the runtime. Symbolization may not be possible if
debug information is unavailable or if the addresses are from a shared
library. For now we only support optimization of the main binary which
is statically linked to the compiler runtime.
Differential Revision: https://reviews.llvm.org/D120860
It caused builds to assert with:
(StackSize == 0 && "We already have the CFA offset!"),
function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.
when targeting iOS. See comment on the code review for reproducer.
> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
> mov x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411
This reverts commit 32e8b550e5439c7e4aafa73894faffd5f25d0d05.
Remove the executable name from the test match as this will have
a `.exe` suffix on windows.
Reviewed By: drodriguez
Differential Revision: https://reviews.llvm.org/D121000
Add the -o flag to specify an output path for llvm-bitcode-strip.
This matches the interface to the Xcode bitcode_strip tool.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D120731
body_start was never used, resulting in the first filtered line to be
skipped.
Fixes the --filter option introduced in D117694.
Differential Revision: https://reviews.llvm.org/D119704
Based off Agner and AMD SoG tables, the XOP VPHADD/VPHSUB unary horizontal ops are as fast as basic arithmetic ops, not the slower SSSE3 binary horizontal add/sub ops. This also matches what the bdver2 model already lists.
Noticed while investigating reduction add optimizations.
Starting from Windows SDK for Windows 11 (10.0.22000.x), all the system
libraries (.lib files) contain a section with the '/<XFGHASHMAP>/' name.
This looks like the libraries are built with control flow guard enabled:
https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-control-flow-guard?view=msvc-170
To let the LLVM tools (llvm-ar, llvm-lib) work with these libraries,
this patch just skips the section offset check for sections with the
'/<XFGHASHMAP>/' name.
Closes: llvm/llvm-project#53814
Signed-off-by: Pavel Samolysov <pavel.samolysov@intel.com>
Reviewed By: jhenderson, thieta
Differential Revision: https://reviews.llvm.org/D120645