To better ensure that bytecode `@update` implementations return a 0/1
value (see https://github.com/llvm/llvm-project/pull/181199), this
changes the Python -> formatter bytecode compiler to require that Python
`update` methods be declared to return `bool`.
A declaration like this will be a compiler error:
```py
def update(self):
# implementation...
```
When a scripted frame provider calls back into the thread's frame
machinery (e.g. via HandleCommand or EvaluateExpression), two problems
arise:
1. GetStackFrameList() re-enters the SyntheticStackFrameList
construction, causing infinite recursion.
2. ClearStackFrames() tries to read-lock the StackFrameList's
shared_mutex that is already write-locked by GetFramesUpTo,
causing a deadlock.
This patch fixes those issues by tracking when a provider is actively
fetching frames via a per-host-thread map (m_provider_frames_by_thread)
keyed by HostThread. The map is pushed/popped in
SyntheticStackFrameList::FetchFramesUpTo before calling into the
provider. GetStackFrameList() checks it to route re-entrant calls:
- The provider's own host thread gets the parent frame list, preventing
circular dependency when get_frame_at_index calls back into
GetFrameAtIndex.
- The private state thread also gets the parent frame list, preventing
deadlock when a provider calls EvaluateExpression (which needs the
private state thread to process events).
- Other host threads proceed normally and block on the frame list mutex
until the provider finishes, getting the correct synthetic result.
ClearStackFrames() returns early if any provider is active, since the
frame state is shared and tearing it down while a provider is
mid-construction is both unnecessary and unsafe.
rdar://171558394
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
The `num_temps` was introduced under the mistaken understanding of the
index values given to the `pick` op. I thought the `pick` index was from
the top of the stack, but it's from the bottom. The model of indexing
from the bottom of the stack has the benefit of simplifying the
compiler.
Replace "compile" with "assemble" in formatter_bytecode. This is in
preparation for the addition of a Python to formatter bytecode compiler.
It will be more clear to have one meaning for "compile".
Add the ability to generate a C source file, which is in addition to the
existing functionality of generating binary.
An example of the generated source:
```c
#ifdef __APPLE__
#define FORMATTER_SECTION "__DATA_CONST,__lldbformatters"
#else
#define FORMATTER_SECTION ".lldbformatters"
#endif
__attribute__((used, section(FORMATTER_SECTION)))
unsigned char _Account_synthetic[] =
// version
"\x01"
// remaining record size
"\x15"
// type name size
"\x07"
// type name
"Account"
// flags
"\x00"
// sig_get_num_children
"\x02"
// program size
"\x02"
// program
"\x20\x01"
// sig_get_child_at_index
"\x04"
// program size
"\x06"
// program
"\x02\x20\x00\x23\x11\x60"
;
```
Changes `formatter_bytecode.compile_file` to return a `BytecodeSection`
value. The `BytecodeSection` holds the data that needs to be emitted to
an `__lldbformatters` section.
The `BytecodeSection` currently provides `write_binary`, but will be
updated in a follow up commit to include `write_source` which will allow
the data to be emitted as C source code, or Swift source code. This will
make it easier to integrate into build systems, as it's easier to get
data into a binary via source code, than as a raw binary file.
Updates formatter_bytecode.py to support compilation and disassembly for
synthetic formatters, in other words support for multiple functions
(signatures).
This includes a number of other changes:
* String parsing and encoding have bugs fixed
* CLI args are updated, primarily to support an output file
* Added uleb encoding/decoding support
This work is a prelude the ongoing work of a Python to formatter
bytecode compiler. The python compiler to emit assembly, and this module
(formatter_bytecode) will compile it into binary bytecode.
This revert #181334 and its follow-up PRs (including #181488, #181492,
#181493, #181494 and #181498) as well as Ismail's documentation changes
(#181594, #181717). The original commit causes a test failure in CI
(https://github.com/llvm/llvm-project/issues/181938) but the more I look
at the patch, the more I'm convinced it was not ready to land. It will
be easier to iterate on the feedback by re-landing this than by using
post-commit review.
## Summary
Based on discussion from
[RFC](https://discourse.llvm.org/t/rfc-python-callback-for-source-file-resolution/83545),
this PR adds a new `SymbolLocatorScripted` plugin that allows Python
scripts to implement custom symbol and source file resolution logic.
This enables downstream users to build custom symbol servers, source
file remapping, and build artifact resolution entirely in Python.
### Changes
- Adds `LocateSourceFile()` to the SymbolLocator plugin interface,
called during source path resolution with a fully loaded `ModuleSP`, so
the plugin has access to the module's UUID, file paths, and symbols.
- Adds `SymbolLocatorScripted` plugin that delegates all four
SymbolLocator methods (`LocateExecutableObjectFile`,
`LocateExecutableSymbolFile`, `DownloadObjectAndSymbolFile`,
`LocateSourceFile`) to a user-provided Python class.
- Adds `ScriptedSymbolLocatorPythonInterface` to bridge C++ calls to
Python, with proper GIL management and error handling.
- Results for `LocateSourceFile` are cached per (module UUID, source
file) pair.
- The Python class is configured via: `settings set
plugin.symbol-locator.scripted.script-class module.ClassName`
### Python class interface
```python
class MyLocator:
def __init__(self, exe_ctx, args): ...
def locate_source_file(self, module, original_source_file):
...
def locate_executable_object_file(self, module_spec): ...
def locate_executable_symbol_file(self, module_spec,
default_search_paths): ...
def download_object_and_symbol_file(self, module_spec,
force_lookup, copy_executable): ...
```
### Test plan
```
Added TestScriptedSymbolLocator.py with 3 test cases:
- test_locate_source_file — verifies the locator resolves source
files, receives a valid SBModule with UUID, and remaps paths correctly
- test_locate_source_file_none_fallthrough — verifies returning
None falls through to default LLDB resolution, and that having no script
class set works normally
- test_invalid_script_class — verifies graceful handling of
invalid class names without crashing
```
Co-authored-by: Rahul Reddy Chamala <rachamal@fb.com>
`GetSyntheticValue` in synthetic providers which need to operate on raw
root values, but will often want to use the synthetic value of children,
or nested children.
* Provide a minimal, working example
* Document the instance variables
* Remove mention of `thread.SetScriptedFrameProvider` (which doesn't exist)
* add missing `@staticmethod` annotation
* fix rendering of bullet-pointed lists
When the crashing thread was submitted to a libdispatch queue, the
`Triggered by Thread:` string looks like:
```
Triggered by Thread: 0, Dispatch Queue: com.apple.main-thread
```
Trying to run the `crashlog` command on such crashlog file fails with:
```
(lldb) crashlog -i '/tmp/lldb.crash'
Traceback (most recent call last):
File "Git/llvm-worktrees/main/builds/release/lib/python3.13/site-packages/lldb/macosx/crashlog.py", line 1444, in __call__
SymbolicateCrashLogs(debugger, shlex.split(command), result, True)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Git/llvm-worktrees/main/builds/release/lib/python3.13/site-packages/lldb/macosx/crashlog.py", line 1888, in SymbolicateCrashLogs
load_crashlog_in_scripted_process(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
debugger, crashlog_path, options, result
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "Git/llvm-worktrees/main/builds/release/lib/python3.13/site-packages/lldb/macosx/crashlog.py", line 1495, in load_crashlog_in_scripted_process
crashlog = CrashLogParser.create(debugger, crashlog_path, options).parse()
File "Git/llvm-worktrees/main/builds/release/lib/python3.13/site-packages/lldb/macosx/crashlog.py", line 1043, in parse
self.parsers[self.parse_mode](line)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "Git/llvm-worktrees/main/builds/release/lib/python3.13/site-packages/lldb/macosx/crashlog.py", line 1121, in parse_normal
self.crashlog.crashed_thread_idx = int(line[20:].strip().split()[0])
~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '0,'
```
This patch strips the possibly trailing comma in when parsing this
string.
Tested locally that this fixes the issue. LLDB test suggestions welcome
This patch adds `get_priority()` support to synthetic frame providers to
enable priority-based selection when multiple providers match a thread.
This is the first step toward supporting frame provider chaining for
visualizing coroutines, Swift async tasks, and et al.
Priority ordering follows Unix nice convention where lower numbers
indicate higher priority (0 = highest). Providers without explicit
priority return `std::nullopt`, which maps to UINT32_MAX (lowest
priority), ensuring backward compatibility with existing providers.
The implementation adds `GetPriority()` as a virtual method to
`SyntheticFrameProvider` base class, implements it through the scripting
interface hierarchy (`ScriptedFrameProviderInterface` and
`ScriptedFrameProviderPythonInterface`), and updates
`Thread::GetStackFrameList()` to sort applicable providers by priority
before attempting to load them.
Python frame providers can now specify priority:
```python
@staticmethod
def get_priority():
return 10 # Or return None for default priority.
```
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This pr fixes#167388 .
## Description
This pr adds new method `GetArchName` to `SBTarget` so that no need to
parse triple to get arch name in client code.
## Testing
### All from `TestTargetAPI.py`
run test with
```
./build/bin/lldb-dotest -v -p TestTargetAPI.py
```
<details>
<summary>existing tests (without newly added)</summary>
<img width="1425" height="804" alt="image"
src="https://github.com/user-attachments/assets/617e4c69-5c6b-44c4-9aeb-b751a47e253c"
/>
</details>
<details>
<summary>existing tests (with newly added)</summary>
<img width="1422" height="778" alt="image"
src="https://github.com/user-attachments/assets/746990a1-df88-4348-a090-224963d3c640"
/>
</details>
### Only `test_get_arch_name`
run test with
```
./build/bin/lldb-dotest -v -p TestTargetAPI.py -f test_get_arch_name_dwarf -f test_get_arch_name_dwo -f test_get_arch_name_dsym lldb/test/API/python_api/target
```
<details>
<summary>only newly added</summary>
<img width="1422" height="778" alt="image"
src="https://github.com/user-attachments/assets/fcaafa5d-2622-4171-acee-e104ecee0652"
/>
</details>
---------
Signed-off-by: Nikita B <n2h9z4@gmail.com>
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
This patch extends ScriptedFrame to work with real (non-scripted)
threads,
enabling frame providers to synthesize frames for native processes.
Previously, ScriptedFrame only worked within
ScriptedProcess/ScriptedThread
contexts. This patch decouples ScriptedFrame from ScriptedThread,
allowing
users to augment or replace stack frames in real debugging sessions for
use
cases like custom calling conventions, reconstructing corrupted frames
from
core files, or adding diagnostic frames.
Key changes:
- ScriptedFrame::Create() now accepts ThreadSP instead of requiring
ScriptedThread, extracting architecture from the target triple rather
than ScriptedProcess.arch
- Added SBTarget::RegisterScriptedFrameProvider() and
ClearScriptedFrameProvider() APIs, with Target storing a
SyntheticFrameProviderDescriptor template for new threads
- Added "target frame-provider register/clear" commands for CLI access
- Thread class gains LoadScriptedFrameProvider(),
ClearScriptedFrameProvider(),
and GetFrameProvider() methods for per-thread frame provider management
- New SyntheticStackFrameList overrides FetchFramesUpTo() to lazily
provide
frames from either the frame provider or the real stack
This enables practical use of the SyntheticFrameProvider infrastructure
in
real debugging workflows.
rdar://161834688
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This patch implements the base and python interface for the
ScriptedFrameProvider class.
This is necessary to call python APIs from the ScriptedFrameProvider
that will come in a follow-up.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
The Tkinter module was renamed to tkinter in Python 3.0.
https://docs.python.org/2/library/tkinter.htmlhttps://docs.python.org/3/library/tkinter.html
Rest of it appears to work when imported inside of LLDB:
```
$ ./bin/lldb /tmp/test.o
(lldb) target create "/tmp/test.o"
Current executable set to '/tmp/test.o' (x86_64).
(lldb) b main
Breakpoint 1: where = test.o`main + 8 at test.c:1:18, address = 0x0000000000001131
(lldb) run
Process 121572 launched: '/tmp/test.o' (x86_64)
Process 121572 stopped
* thread #1, name = 'test.o', stop reason = breakpoint 1.1
frame #0: 0x0000555555555131 test.o`main at test.c:1:18
-> 1 int main() { int a = 1; char b = '?'; return 0; }
(lldb) command script import <...>/llvm-project/lldb/examples/python/lldbtk.py
(lldb) tk-
Available completions:
tk-process -- For more information run 'help tk-process'
tk-target -- For more information run 'help tk-target'
tk-variables -- For more information run 'help tk-variables'
(lldb) tk-process
(lldb) tk-target
(lldb) tk-variables
```
This patch introduces a new scripting affordance in lldb:
`ScriptedFrame`.
This allows user to produce mock stackframes in scripted threads and
scripted processes from a python script.
With this change, StackFrame can be synthetized from different sources:
- Either from a dictionary containing a load address, and a frame index,
which is the legacy way.
- Or by creating a ScriptedFrame python object.
One particularity of synthezising stackframes from the ScriptedFrame
python object, is that these frame have an optional PC, meaning that
they don't have a report a valid PC and they can act as shells that just
contain static information, like the frame function name, the list of
variables or registers, etc. It can also provide a symbol context.
rdar://157260006
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
In 88f409194, we changed the way the crashlog scripted process was
launched since the previous approach required to parse the file twice,
by stopping at entry, setting the crashlog object in the middle of the
scripted process launch and resuming it.
Since then, we've introduced SBScriptObject which allows to pass any
arbitrary python object accross the SBAPI boundary to another scripted
affordance.
This patch make sure of that to include the parse crashlog object into
the scripted process launch info dictionary, which eliviates the need to
stop at entry.
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This changes the example command added in
https://github.com/llvm/llvm-project/pull/145793 so that the fdis
program does not have to be a single program name.
Doing so also means we can run the test on Windows where the program
needs to be "python.exe script_name".
I've changed "fdis set" to treat the rest of the command as the program.
Then store that as a list to be passed to subprocess. If we just use a
string, Python will think that "python.exe foo" is the name of an actual
program instead of a program and an argument to it.
This will still break if the paths have spaces in, but I'm trying to do
just enough to fix the test here without rewriting all the option
handling.
LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.
This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.
The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.
With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .
Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
valid disassembly, and has the size as an out paramter.
Use the size even if the disassembly is invalid.
Disassemble if disassemby is valid.
- Always print out the opcode when -b is specified.
Previously it wouldn't print out the opcode if it couldn't disassemble.
- Print out RISC-V opcodes the way llvm-objdump does.
Code for the new Opcode Type eType16_32Tuples by Jason Molenda.
- Print <unknown> for instructions that can't be disassembled, matching
llvm-objdump, instead of printing nothing.
- Update max riscv32 and riscv64 instruction size to 8.
- Add example "fdis" command script.
- Added disassembly byte test for x86 with known and unknown instructions.
- Added disassembly byte test for riscv32 with known and unknown instructions,
with and without filtering.
- Added test from Jason Molenda to RISC-V disassembly unit tests.
This patch addresses 2 issues:
1. It makes registers available on non-crashed threads all the time
2. It fixes arm64 registers parsing for registers that don't use the `x`
prefix (`fp` -> `x29` / `lr` -> `x30`)
---------
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This patch adds support to the haswell sub-architecture (x86_64h) to
scripted processes.
rdar://147208252
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This patch makes interactive mode as the default when using the crashlog
command. It replaces the existing `-i|--interactive` flag with a new
`-m|--mode` option, that can either be `interactive` or `batch`.
By default, when the option is not explicitely set by the user, the
interactive mode is selected, however, lldb will fallback to batch mode
if the command interpreter is not interactive or if stdout is not a tty.
This also adds some railguards to prevent users from using interactive
only options with the batch mode and updates the tests accordingly.
rdar://97801509
Differential Revision: https://reviews.llvm.org/D141658
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
lldb's history log file is written to at the end of a debugging session.
As a result, the log does not contain commands run during the current
session.
This extends the `fzf_history` to include the output of `session
history`.
Adds a `fzf_history` to the examples directory.
This python command invokes [fzf](https://github.com/junegunn/fzf) to
select from lldb's command history.
Tighter integration is available on macOS, via commands for copy and
paste. The user's chosen history entry back is pasted into the lldb
console (via AppleScript). By pasting it, users have the opportunity to
edit it before running it. This matches how fzf's history search works.
Without copy and paste, the user's chosen history entry is printed to
screen and then run automatically.
This fixes a typo when creating a target from the crashlog script and
that we were not able to find a valid architecture from the crash
report.
rdar://137344016
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
This PR adds a proof-of-concept for a bytecode designed to ship and
run LLDB data formatters. More motivation and context can be found in
the formatter-bytecode.rst file and on discourse.
https://discourse.llvm.org/t/a-bytecode-for-lldb-data-formatters/82696
Relanding with a fix for a case-sensitive path.
This PR adds a proof-of-concept for a bytecode designed to ship and
run LLDB data formatters. More motivation and context can be found in
the formatter-bytecode.rst file and on discourse.
https://discourse.llvm.org/t/a-bytecode-for-lldb-data-formatters/82696
Relanding with a fix for a case-sensitive path.
The newly added
test/API/functionalities/scripted_process_empty_memory_region/dummy_scripted_process.py
imports
examples/python/templates/scripted_process.py
which only has register definitions for x86_64 and arm64.
Only run this test on those two architectures for now.
If your arguments or option values are of a type that naturally uses one
of our common completion mechanisms, you will get completion for free.
But if you have your own custom values or if you want to do fancy things
like have `break set -s foo.dylib -n ba<TAB>` only complete on symbols
in foo.dylib, you can use this new mechanism to achieve that.