This is an ongoing series of commits that are reformatting our Python
code. Reformatting is done with `black` (23.1.0).
If you end up having problems merging this commit because you have made
changes to a python file, the best way to handle that is to run `git
checkout --ours <yourfile>` and then reformat it with black.
RFC: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Differential revision: https://reviews.llvm.org/D151460
These tests were being tested against a version of libipt from last
year. We just updated libipt to top of tree and many errors broke
because the new version of libipt emits more events than the older one,
which is fine.
`./bin/lldb-dotest -p TestTrace` passes
The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.
A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening
1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.
The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.
As part of making sure this diff is correct, I added some other features that are very useful.
- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.
I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.
A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!
```
thread #1: tid = 3578223
0: (event) trace synchronization point [offset = 0x0xef0]
a.out`main + 20 at main.cpp:29:20
1: 0x0000000000402479 leaq -0x1210(%rbp), %rax
2: (event) software disabled tracing
3: 0x0000000000402480 movq %rax, %rdi
4: (event) software disabled tracing
5: (event) software disabled tracing
6: 0x0000000000402483 callq 0x403bd4 ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
7: (event) software disabled tracing
a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
8: 0x0000000000403bd4 pushq %rbp
9: (event) software disabled tracing
10: 0x0000000000403bd5 movq %rsp, %rbp
11: (event) software disabled tracing
```
This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f thread #1: tid = 3603082
0: (event) trace synchronization point [offset = 0x0x80]
47417: (event) software disabled tracing
129231: (event) trace synchronization point [offset = 0x0x800]
146747: (event) software disabled tracing
246076: (event) software disabled tracing
259068: (event) trace synchronization point [offset = 0x0xf78]
259276: (event) software disabled tracing
259278: (event) software disabled tracing
no more data
```
Differential Revision: https://reviews.llvm.org/D131630
Thanks to rnofenko@fb.com for coming up with these changes.
This diff adds support for passing units in the trace size inputs. For example,
it's now possible to specify 64KB as the trace size, instead of the
problematic 65536. This makes the user experience a bit friendlier.
Differential Revision: https://reviews.llvm.org/D129613
We want to include events with metadata, like context switches, and this
requires the API to handle events with payloads (e.g. information about
such context switches). Besides this, we want to support multiple
similar events between two consecutive instructions, like multiple
context switches. However, the current implementation is not good for this because
we are defining events as bitmask enums associated with specific
instructions. Thus, we need to decouple instructions from events and
make events actual items in the trace, just like instructions and
errors.
- Add accessors in the TraceCursor to know if an item is an event or not
- Modify from the TraceDumper all the way to DecodedThread to support
- Renamed the paused event to disabled.
- Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs.
event items. Fortunately this simplified many things
- As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
Eliminate boilerplate of having each test manually assign to `mydir` by calling
`compute_mydir` in lldbtest.py.
Differential Revision: https://reviews.llvm.org/D128077
As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json schema
- traceBuffer -> iptTrace
- core -> cpu
Differential Revision: https://reviews.llvm.org/D127817
This updates the documentation of the gdb-remote protocol, as well as the help messages, to include the new --per-core-tracing option.
Differential Revision: https://reviews.llvm.org/D124640
In order to support quick arbitrary access to instructions in the trace, we need
each instruction to have an id. It could be an index or any other value that the
trace plugin defines.
This will be useful for reverse debugging or for creating callstacks, as each
frame will need an instruction id associated with them.
I've updated the `thread trace dump instructions` command accordingly. It now
prints the instruction id instead of relative offset. I've also added a new --id
argument that allows starting the dump from an arbitrary position.
Differential Revision: https://reviews.llvm.org/D122254
There's a bug caused when a process is relaunched: the target, which
doesn't change, keeps the Trace object from the previous process, which
is already defunct, and causes segmentation faults when it's attempted
to be used.
A fix is to clean up the Trace object when the target is disposing of
the previous process during relaunches.
A way to reproduce this:
```
lldb a.out
b main
r
process trace start
c
r
process trace start
```
Differential Revision: https://reviews.llvm.org/D122176
D104422 added the interface for TraceCursor, which is the main way to traverse instructions in a trace. This diff implements the corresponding cursor class for Intel PT and deletes the now obsolete code.
Besides that, the logic for the "thread trace dump instructions" was adapted to use this cursor (pretty much I ended up moving code from Trace.cpp to TraceCursor.cpp). The command by default traverses the instructions backwards, and if the user passes --forwards, then it's not forwards. More information about that is in the Options.td file.
Regarding the Intel PT cursor. All Intel PT cursors for the same thread share the same DecodedThread instance. I'm not yet implementing lazy decoding because we don't need it. That'll be for later. For the time being, the entire thread trace is decoded when the first cursor for that thread is requested.
Differential Revision: https://reviews.llvm.org/D105531
This adds a basic SB API for creating and stopping traces.
Note: This doesn't add any APIs for inspecting individual instructions. That'd be a more complicated change and it might be better to enhande the dump functionality to output the data in binary format. I'll leave that for a later diff.
This also enhances the existing tests so that they test the same flow using both the command interface and the SB API.
I also did some cleanup of legacy code.
Differential Revision: https://reviews.llvm.org/D103500
When dumping the traced instructions in a for loop, like this one
4: for (int a = 0; a < n; a++)
5: do something;
there might be multiple LineEntry objects for line 4, but with different address ranges. This was causing the dump command to dump something like this:
```
a.out`main + 11 at main.cpp:4
[1] 0x0000000000400518 movl $0x0, -0x8(%rbp)
[2] 0x000000000040051f jmp 0x400529 ; <+28> at main.cpp:4
a.out`main + 28 at main.cpp:4
[3] 0x0000000000400529 cmpl $0x3, -0x8(%rbp)
[4] 0x000000000040052d jle 0x400521 ; <+20> at main.cpp:5
```
which is confusing, as main.cpp:4 appears twice consecutively.
This diff fixes that issue by making the line entry comparison strictly about the line, column and file name. Before it was also comparing the address ranges, which we don't need because our output is strictly about what the user sees in the source.
Besides, I've noticed that the logic that traverses instructions and calculates symbols and disassemblies had too much coupling, and made my changes harder to implement, so I decided to decouple it. Now there are two methods for iterating over the instruction of a trace. The existing one does it on raw load addresses, but the one provides a SymbolContext and an InstructionSP, and does the calculations efficiently (not as efficient as possible for now though), so the caller doesn't need to care about these details. I think I'll be using that iterator to reconstruct the call stacks.
I was able to fix a test with this change.
Differential Revision: https://reviews.llvm.org/D100740
This implements the interactive trace start and stop methods.
This diff ended up being much larger than I anticipated because, by doing it, I found that I had implemented in the beginning many things in a non optimal way. In any case, the code is much better now.
There's a lot of boilerplate code due to the gdb-remote protocol, but the main changes are:
- New tracing packets: jLLDBTraceStop, jLLDBTraceStart, jLLDBTraceGetBinaryData. The gdb-remote packet definitions are quite comprehensive.
- Implementation of the "process trace start|stop" and "thread trace start|stop" commands.
- Implementaiton of an API in Trace.h to interact with live traces.
- Created an IntelPTDecoder for live threads, that use the debugger's stop id as checkpoint for its internal cache.
- Added a functionality to stop the process in case "process tracing" is enabled and a new thread can't traced.
- Added tests
I have some ideas to unify the code paths for post mortem and live threads, but I'll do that in another diff.
Differential Revision: https://reviews.llvm.org/D91679
Depends on D90490.
The stop command is simple and invokes the new method Trace::StopTracingThread(thread).
On the other hand, the start command works by delegating its implementation to a CommandObject provided by the Trace plugin. This is necessary because each trace plugin needs different options for this command. There's even the chance that a Trace plugin can't support live tracing, but instead supports offline decoding and analysis, which means that "thread trace dump instructions" works but "thread trace start" doest. Because of this and a few other reasons, it's better to have each plugin provide this implementation.
Besides, I'm using the GetSupportedTraceType method introduced in D90490 to quickly infer what's the trace plug-in that works for the current process.
As an implementation note, I moved CommandObjectIterateOverThreads to its header so that I can use it from the IntelPT plugin. Besides, the actual start and stop logic for intel-pt is not part of this diff.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D90729