llvm-project/lldb/test/API/functionalities/thread/concurrent_events/TestConcurrentBatchedBreakpointStepOver.py
Bar Soloveychik b3c4d44c44
[lldb] Batch breakpoint step-over for threads stopped at the same BP (#183412)
When multiple threads are stopped at the same breakpoint, LLDB currently
steps each thread over the breakpoint one at a time. Each step requires
disabling the breakpoint, single-stepping one thread, and re-enabling
it, resulting in N disable/enable cycles and N individual vCont packets
for N threads. This is a common scenario for hot breakpoints in
multithreaded programs and scales poorly.

This patch batches the step-over so that all threads at the same
breakpoint site are stepped together in a single vCont packet, with the
breakpoint disabled once at the start and re-enabled once after the last
thread finishes.

At the top of WillResume, any leftover StepOverBreakpoint plans from a
previous cycle are popped with their re-enable side effect suppressed
via SetReenabledBreakpointSite, giving a clean slate.
SetupToStepOverBreakpointIfNeeded then creates fresh plans for all
threads that still need to step over a breakpoint, and these are grouped
by breakpoint address.

For groups with multiple threads, each plan is set to defer its
re-enable through SetDeferReenableBreakpointSite. Instead of re-enabling
the breakpoint directly when a plan completes, it calls
ThreadFinishedSteppingOverBreakpoint, which decrements a per-address
tracking count. The breakpoint is only re-enabled when the count reaches
zero.

All threads in the largest group are resumed together in a single
batched vCont packet. If some threads don't complete their step in one
cycle, the pop-and-recreate logic naturally re-batches the remaining
threads on the next WillResume call.

For 10 threads at the same breakpoint, this reduces the operation from
10 z0/Z0 pairs and 10 vCont packets to 1 z0 + 1 Z0 and a few
progressively smaller batched vCont packets.

EDIT:
Tried to merge this PR twice, the first time the test was flaky so we
had to revert. The second time, we broke 2 tests on windows machine:
https://lab.llvm.org/buildbot/#/builders/141/builds/15798

The tests that were failing were failing because the cleanup code in
`WillResume` was popping **ALL** `StepOverBreakpoint` plans, including
non-deferred ones from incomplete single-steps.
The issue was: 
1) Multiple threads hit the same breakpoint. One thread's breakpoint
condition evaluates to false, so it needs to auto-continue.
2) A `StepOverBreakpoint` plan is created for that thread
(non-deferred).
3) On the next WillResume, the cleanup pops that non-deferred plan.
4) Now the `StopOthers` scan finds no thread with a StopOthers() plan,
so thread_to_run stays null.
5) The else branch runs, calling `SetupToStepOverBreakpointIfNeeded` on
**ALL** threads, including the thread that legitimately hit the
breakpoint with a true condition.
6) That thread gets a new `StepOverBreakpoint` plan pushed, which
overwrites its breakpoint stop reason with trace when the step
completes.

The error `trace (2) != breakpoint (3)` confirms this, the thread that
should have reported breakpoint as its stop reason instead reports
trace, because an unwanted `StepOverBreakpoint` plan was pushed on it
and completed.

The newly added code fixes it by only popping plans that have
`GetDeferReenableBreakpointSite() == true`

Co-authored-by: Bar Soloveychik <barsolo@fb.com>
2026-03-02 10:46:23 -08:00

128 lines
5.1 KiB
Python

"""
Test that the batched breakpoint step-over optimization activates when
multiple threads hit the same breakpoint. Verifies that the optimization
reduces breakpoint toggle operations compared to stepping one at a time.
"""
import os
import re
from lldbsuite.test.decorators import *
from lldbsuite.test.concurrent_base import ConcurrentEventsBase
from lldbsuite.test.lldbtest import TestBase
@skipIfWindows
class ConcurrentBatchedBreakpointStepOver(ConcurrentEventsBase):
@skipIf(triple="^mips")
@skipIf(archs=["aarch64"])
def test(self):
"""Test that batched breakpoint step-over reduces breakpoint
toggle operations when multiple threads hit the same breakpoint."""
self.build()
num_threads = 10
# Enable logging to capture optimization messages and GDB packets.
lldb_logfile = self.getBuildArtifact("lldb-log.txt")
self.runCmd("log enable lldb step break -f {}".format(lldb_logfile))
gdb_logfile = self.getBuildArtifact("gdb-remote-log.txt")
self.runCmd("log enable gdb-remote packets -f {}".format(gdb_logfile))
# Run with breakpoint threads.
self.do_thread_actions(num_breakpoint_threads=num_threads)
self.assertTrue(os.path.isfile(lldb_logfile), "lldb log file not found")
with open(lldb_logfile, "r") as f:
lldb_log = f.read()
# Verify the optimization activated by looking for "Registered thread"
# messages, which indicate threads were grouped for batching.
registered_matches = re.findall(
r"Registered thread 0x[0-9a-fA-F]+ stepping over "
r"breakpoint at (0x[0-9a-fA-F]+)",
lldb_log,
)
self.assertGreater(
len(registered_matches),
0,
"Expected batched breakpoint step-over optimization to be "
"used (no 'Registered thread' messages found in log).",
)
thread_bp_addr = registered_matches[0]
# Verify all threads completed their step-over.
completed_count = lldb_log.count("Completed step over breakpoint plan.")
self.assertGreaterEqual(
completed_count,
num_threads,
"Expected at least {} 'Completed step over breakpoint plan.' "
"messages (one per thread), but got {}.".format(
num_threads, completed_count
),
)
# Count z0/Z0 packets for the thread breakpoint address.
# z0 = remove (disable) software breakpoint.
# Z0 = set (enable) software breakpoint.
# Strip the "0x" prefix and leading zeros to match the GDB packet
# format (which uses lowercase hex without "0x" prefix).
bp_addr_hex = thread_bp_addr[2:].lstrip("0") if thread_bp_addr else ""
z0_count = 0 # disable packets
Z0_count = 0 # enable packets
initial_Z0_seen = False
max_vcont_step_threads = 0 # largest number of s: actions in one vCont
self.assertTrue(os.path.isfile(gdb_logfile), "gdb-remote log file not found")
with open(gdb_logfile, "r") as f:
for line in f:
if "send packet: $" not in line:
continue
# Match z0,<addr> (disable) or Z0,<addr> (enable).
m = re.search(r"send packet: \$([Zz])0,([0-9a-fA-F]+),", line)
if m and m.group(2) == bp_addr_hex:
if m.group(1) == "Z":
if not initial_Z0_seen:
initial_Z0_seen = True
else:
Z0_count += 1
else:
z0_count += 1
# Count step actions in vCont packets to detect batching.
# A batched vCont looks like: vCont;s:tid1;s:tid2;...
vcont_m = re.search(r"send packet: \$vCont((?:;[^#]+)*)", line)
if vcont_m:
actions = vcont_m.group(1)
step_count = len(re.findall(r";s:", actions))
if step_count > max_vcont_step_threads:
max_vcont_step_threads = step_count
# With the optimization, fewer breakpoint toggles should occur.
# Without optimization we'd see num_threads z0 and num_threads Z0.
# With batching, even partial, we expect fewer toggles.
self.assertLess(
z0_count,
num_threads,
"Expected fewer than {} breakpoint disables (z0) due to "
"batching, but got {}.".format(num_threads, z0_count),
)
self.assertLess(
Z0_count,
num_threads,
"Expected fewer than {} breakpoint re-enables (Z0) due to "
"batching, but got {}.".format(num_threads, Z0_count),
)
# Verify at least one batched vCont packet contained multiple
# step actions, proving threads were stepped together.
self.assertGreater(
max_vcont_step_threads,
1,
"Expected at least one batched vCont packet with multiple "
"step actions (s:), but the maximum was {}.".format(max_vcont_step_threads),
)