llvm-project/lldb/test/API/functionalities/disassembler-variables/TestVariableAnnotationsDisassembler.py
Abdullah Mohammad Amin 8ec4db5e17
Stateful variable-location annotations in Disassembler::PrintInstructions() (follow-up to #147460) (#152887)
**Context**  
Follow-up to
[#147460](https://github.com/llvm/llvm-project/pull/147460), which added
the ability to surface register-resident variable locations.
This PR moves the annotation logic out of `Instruction::Dump()` and into
`Disassembler::PrintInstructions()`, and adds lightweight state tracking
so we only print changes at range starts and when variables go out of
scope.

---

## What this does

While iterating the instructions for a function, we maintain a “live
variable map” keyed by `lldb::user_id_t` (the `Variable`’s ID) to
remember each variable’s last emitted location string. For each
instruction:

- **New (or newly visible) variable** → print `name = <location>` once
at the start of its DWARF location range, cache it.
- **Location changed** (e.g., DWARF range switched to a different
register/const) → print the updated mapping.
- **Out of scope** (was tracked previously but not found for the current
PC) → print `name = <undef>` and drop it.

This produces **concise, stateful annotations** that highlight variable
lifetime transitions without spamming every line.

---

## Why in `PrintInstructions()`?

- Keeps `Instruction` stateless and avoids changing the
`Instruction::Dump()` virtual API.
- Makes it straightforward to diff state across instructions (`prev →
current`) inside the single driver loop.

---

## How it works (high-level)

1. For the current PC, get in-scope variables via
`StackFrame::GetInScopeVariableList(/*get_parent=*/true)`.
2. For each `Variable`, query
`DWARFExpressionList::GetExpressionEntryAtAddress(func_load_addr,
current_pc)` (added in #144238).
3. If the entry exists, call `DumpLocation(..., eDescriptionLevelBrief,
abi)` to get a short, ABI-aware location string (e.g., `DW_OP_reg3 RBX →
RBX`).
4. Compare against the last emitted location in the live map:
   - If not present → emit `name = <location>` and record it.
   - If different → emit updated mapping and record it.
5. After processing current in-scope variables, compute the set
difference vs. the previous map and emit `name = <undef>` for any that
disappeared.

Internally:
- We respect file↔load address translation already provided by
`DWARFExpressionList`.
- We reuse the ABI to map LLVM register numbers to arch register names.

---

## Example output (x86_64, simplified)

```
->  0x55c6f5f6a140 <+0>:  cmpl   $0x2, %edi                                                         ; argc = RDI, argv = RSI
    0x55c6f5f6a143 <+3>:  jl     0x55c6f5f6a176            ; <+54> at d_original_example.c:6:3
    0x55c6f5f6a145 <+5>:  pushq  %r15
    0x55c6f5f6a147 <+7>:  pushq  %r14
    0x55c6f5f6a149 <+9>:  pushq  %rbx
    0x55c6f5f6a14a <+10>: movq   %rsi, %rbx
    0x55c6f5f6a14d <+13>: movl   %edi, %r14d
    0x55c6f5f6a150 <+16>: movl   $0x1, %r15d                                                        ; argc = R14
    0x55c6f5f6a156 <+22>: nopw   %cs:(%rax,%rax)                                                    ; i = R15, argv = RBX
    0x55c6f5f6a160 <+32>: movq   (%rbx,%r15,8), %rdi
    0x55c6f5f6a164 <+36>: callq  0x55c6f5f6a030            ; symbol stub for: puts
    0x55c6f5f6a169 <+41>: incq   %r15
    0x55c6f5f6a16c <+44>: cmpq   %r15, %r14
    0x55c6f5f6a16f <+47>: jne    0x55c6f5f6a160            ; <+32> at d_original_example.c:5:10
    0x55c6f5f6a171 <+49>: popq   %rbx                                                               ; i = <undef>
    0x55c6f5f6a172 <+50>: popq   %r14                                                               ; argv = RSI
    0x55c6f5f6a174 <+52>: popq   %r15                                                               ; argc = RDI
    0x55c6f5f6a176 <+54>: xorl   %eax, %eax
    0x55c6f5f6a178 <+56>: retq  
```

Only transitions are shown: the start of a location, changes, and
end-of-lifetime.

---

## Scope & limitations (by design)

- Handles **simple locations** first (registers, const-in-register cases
surfaced by `DumpLocation`).
- **Memory/composite locations** are out of scope for this PR.
- Annotations appear **only at range boundaries** (start/change/end) to
minimize noise.
- Output is **target-independent**; register names come from the target
ABI.

## Implementation notes

- All annotation printing now happens in
`Disassembler::PrintInstructions()`.
- Uses `std::unordered_map<lldb::user_id_t, std::string>` as the live
map.
- No persistent state across calls; the map is rebuilt while walking
instruction by instruction.
- **No changes** to the `Instruction` interface.

---

## Requested feedback

- Placement and wording of the `<undef>` marker.
- Whether we should optionally gate this behind a setting (currently
always on when disassembling with an `ExecutionContext`).
- Preference for immediate inclusion of tests vs. follow-up patch.

---

Thanks for reviewing! Happy to adjust behavior/format based on feedback.

---------

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Co-authored-by: Adrian Prantl <adrian.prantl@gmail.com>
2025-08-28 10:41:38 -07:00

109 lines
4.6 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

from lldbsuite.test.lldbtest import *
from lldbsuite.test.decorators import *
import lldb
import os
import re
class TestVariableAnnotationsDisassembler(TestBase):
def _build_obj(self, obj_name: str) -> str:
# Let the Makefile build all .os (pattern rule). Then grab the one we need.
self.build()
obj = self.getBuildArtifact(obj_name)
self.assertTrue(os.path.exists(obj), f"missing object: {obj}")
return obj
def _create_target(self, path):
target = self.dbg.CreateTarget(path)
self.assertTrue(target, f"failed to create target for {path}")
return target
def _disassemble_verbose_symbol(self, symname):
self.runCmd(f"disassemble -n {symname} -v", check=True)
return self.res.GetOutput()
def test_d_original_example_O1(self):
obj = self._build_obj("d_original_example.o")
target = self._create_target(obj)
out = self._disassemble_verbose_symbol("main")
print(out)
self.assertIn("argc = ", out)
self.assertIn("argv = ", out)
self.assertIn("i = ", out)
self.assertNotIn("<decoding error>", out)
@no_debug_info_test
def test_regs_int_params(self):
obj = self._build_obj("regs_int_params.o")
target = self._create_target(obj)
out = self._disassemble_verbose_symbol("regs_int_params")
print(out)
self.assertRegex(out, r"\ba\s*=\s*(DW_OP_reg5\b|RDI\b)")
self.assertRegex(out, r"\bb\s*=\s*(DW_OP_reg4\b|RSI\b)")
self.assertRegex(out, r"\bc\s*=\s*(DW_OP_reg1\b|RDX\b)")
self.assertRegex(out, r"\bd\s*=\s*(DW_OP_reg2\b|RCX\b)")
self.assertRegex(out, r"\be\s*=\s*(DW_OP_reg8\b|R8\b)")
self.assertRegex(out, r"\bf\s*=\s*(DW_OP_reg9\b|R9\b)")
self.assertNotIn("<decoding error>", out)
@no_debug_info_test
def test_regs_fp_params(self):
obj = self._build_obj("regs_fp_params.o")
target = self._create_target(obj)
out = self._disassemble_verbose_symbol("regs_fp_params")
print(out)
self.assertRegex(out, r"\ba\s*=\s*(DW_OP_reg17\b|XMM0\b)")
self.assertRegex(out, r"\bb\s*=\s*(DW_OP_reg18\b|XMM1\b)")
self.assertRegex(out, r"\bc\s*=\s*(DW_OP_reg19\b|XMM2\b)")
self.assertRegex(out, r"\bd\s*=\s*(DW_OP_reg20\b|XMM3\b)")
self.assertRegex(out, r"\be\s*=\s*(DW_OP_reg21\b|XMM4\b)")
self.assertRegex(out, r"\bf\s*=\s*(DW_OP_reg22\b|XMM5\b)")
self.assertNotIn("<decoding error>", out)
@no_debug_info_test
def test_regs_mixed_params(self):
obj = self._build_obj("regs_mixed_params.o")
target = self._create_target(obj)
out = self._disassemble_verbose_symbol("regs_mixed_params")
print(out)
self.assertRegex(out, r"\ba\s*=\s*(DW_OP_reg5\b|RDI\b)")
self.assertRegex(out, r"\bb\s*=\s*(DW_OP_reg4\b|RSI\b)")
self.assertRegex(out, r"\bx\s*=\s*(DW_OP_reg17\b|XMM0\b|DW_OP_reg\d+\b)")
self.assertRegex(out, r"\by\s*=\s*(DW_OP_reg18\b|XMM1\b|DW_OP_reg\d+\b)")
self.assertRegex(out, r"\bc\s*=\s*(DW_OP_reg1\b|RDX\b)")
self.assertRegex(out, r"\bz\s*=\s*(DW_OP_reg19\b|XMM2\b|DW_OP_reg\d+\b)")
self.assertNotIn("<decoding error>", out)
@no_debug_info_test
def test_live_across_call(self):
obj = self._build_obj("live_across_call.o")
target = self._create_target(obj)
out = self._disassemble_verbose_symbol("live_across_call")
print(out)
self.assertRegex(out, r"\bx\s*=\s*(DW_OP_reg5\b|RDI\b)")
self.assertIn("call", out)
self.assertRegex(out, r"\br\s*=\s*(DW_OP_reg0\b|RAX\b|DW_OP_reg\d+\b)")
self.assertNotIn("<decoding error>", out)
@no_debug_info_test
def test_loop_reg_rotate(self):
obj = self._build_obj("loop_reg_rotate.o")
target = self._create_target(obj)
out = self._disassemble_verbose_symbol("loop_reg_rotate")
print(out)
self.assertRegex(out, r"\bn\s*=\s*(DW_OP_reg\d+\b|R[A-Z0-9]+)")
self.assertRegex(out, r"\bseed\s*=\s*(DW_OP_reg\d+\b|R[A-Z0-9]+)")
self.assertRegex(out, r"\bk\s*=\s*(DW_OP_reg\d+\b|R[A-Z0-9]+)")
self.assertRegex(out, r"\bj\s*=\s*(DW_OP_reg\d+\b|R[A-Z0-9]+)")
self.assertRegex(out, r"\bi\s*=\s*(DW_OP_reg\d+\b|R[A-Z0-9]+)")
self.assertNotIn("<decoding error>", out)
@no_debug_info_test
def test_seed_reg_const_undef(self):
obj = self._build_obj("seed_reg_const_undef.o")
target = self._create_target(obj)
out = self._disassemble_verbose_symbol("main")
print(out)
self.assertRegex(out, r"\b(i|argc)\s*=\s*(DW_OP_reg\d+\b|R[A-Z0-9]+)")
self.assertNotIn("<decoding error>", out)