Investigating some of the biggest slow downs during tests, the biggest
offender is 'wait_for_stopped' requiring a negative assertion around the
'stopped' event.
It currently waits for a negative predicate to fail before continuing.
This means it must wait for the full DEFAULT_TIMEOUT (50s) before the
test is allowed to continue.
To mitigate this, I added a new `collect_events` helper that will wait
for the given event to occur with the DEFAULT_TIMEOUT, then wait for a
quiet period (0.25s) before returning.
This greatly reduces the amount of idle waiting during tests.
Additionally, looking a the performance of individual test files,
`TestDAP_launch` is the slowest overall test. No individual test is that
slow, but the fact it has so many tests in the same file results in the
test harness waiting for that one file to finish.
To mitigate that, I split `TestDAP_launch` into individual test files
that run in parallel, reducing the runtime locally from over 2mins to
~5s.
… concurrency." (#165688)""
This reverts commit 17dbd8690e36f8e514fb47f4418f78420d0fc019.
This was causing timeouts on the premerge runners. Reverting for now
until the timeouts trigger within lit and/or we have a better testing
strategy for this.
This reverts commit f205be095609aa61dfac3ae729406e0af2dcd15f.
This new select mechanism has exposed the fact that the resources
the Arm Linux bot has can vary a lot. We do limit it to a low number
of parallel tests but in this case, I think it's write performance
somewhere.
Reland the changes since they work elsewhere, and disable lldb-dap
tests on Arm Linux while I fix our buildbot.
We currently use a background thread to read the DAP output. This means
the test thread and the background thread can race at times and we may
have inconsistent timing due to these races.
To improve the consistency I've removed the reader thread and instead
switched to using the `selectors` module that wraps `select` in a
platform independent way.
Various DAP tests are specifying their own timeouts, with values ranging
from "1" to "20". Most of them seem arbitrary, but some come with a
comment.
The performance characters of running these tests in CI are
unpredictable (they generally run much slower than developers expect)
and really not something we can make assumptions about. I suspect these
timeouts are a contributing factor to the flakiness of the DAP tests.
This PR unifies the timeouts around a central value in the DAP server.
Fixes#162523
This test sometimes times out in CI runs:
/usr/bin/python3 /home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/./lib --env LLVM_INCLUDE_DIR=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/include --env LLVM_TOOLS_DIR=/home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin --libcxx-include-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/include/c++/v1 --libcxx-include-target-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/include/x86_64-unknown-linux-gnu/c++/v1 --libcxx-library-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./lib/x86_64-unknown-linux-gnu --arch x86_64 --build-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lldb-test-build.noindex --lldb-module-cache-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin/lldb --compiler /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin/clang --dsymutil /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./bin --lldb-obj-root /home/gha/actions-runner/_work/llvm-project/llvm-project/build/tools/lldb --lldb-libs-dir /home/gha/actions-runner/_work/llvm-project/llvm-project/build/./lib --cmake-build-type Release /home/gha/actions-runner/_work/llvm-project/llvm-project/lldb/test/API/tools/lldb-dap/module -p TestDAP_module.py
--
Exit Code: -9
Timeout: Reached timeout of 1200 seconds
...
https://github.com/llvm/llvm-project/actions/runs/17940597997/job/51016081497?pr=139293
The logs say that one test passed, but the printed communication logs relate to
that test too. So I can't tell if that one failed or a different part did.
Given that I've had to disable multiple parts of DAP tests before, I'm just going to
skip the whole file. It's not pretty, but it'll save contributors hassle.
See #137660.
Originally commited in 362b9d78b4ee9107da2b5e90b3764b0f0fa610fe and then
reverted in cb63b75e32a415c9bfc298ed7fdcd67e8d9de54c.
This re-lands a subset of the changes to
dap_server.py/DebugCommunication and addresses the python3.10
compatibility issue.
This includes less type annotations since those were the reason for the
failures on that specific version of python.
I've done additional testing on python3.8, python3.10 and python3.13 to
further validate these changes.
In DebugCommunication, we currently are using 2 thread to drive
lldb-dap. At the moment, they make an attempt at only synchronizing the
`recv_packets` between the reader thread and the main test thread. Other
stateful properties of the debug session are not guarded by a
locks/mutex.
To mitigate this, I am moving any state updates to the main thread
inside the `_recv_packet` method to ensure that between calls to
`_recv_packet` the state does not change out from under us in a test.
This does mean the precise timing of events has changed slightly as a
result and I've updated the existing tests that fail for me locally with
this new behavior.
I think this should result in overall more predictable behavior, even if
the test is slow due to the host workload or architecture differences.
---------
Co-authored-by: Ebuka Ezike <yerimyah1@gmail.com>
Attempt to improve tests by synchronously waiting for breakpoints to
resolve. Not sure if it will fix all the tests but I think it should
make the tests more stable
This is more straight forward refactor of the startup sequence that
reverts parts of ba29e60f9a2222bd5e883579bb78db13fc5a7588. Unlike my
previous attempt, I ended up removing the pending request queue and not
including an `AsyncReqeustHandler` because I don't think we actually
need that at the moment.
The key is that during the startup flow there are 2 parallel operations
happening in the DAP that have different triggers.
* The `initialize` request is sent and once the response is received the
`launch` or `attach` is sent.
* When the `initialized` event is recieved the `setBreakpionts` and
other config requests are made followed by the `configurationDone`
event.
I moved the `initialized` event back to happen in the `PostRun` of the
`launch` or `attach` request handlers. This ensures that we have a valid
target by the time the configuration calls are made. I added also added
a few extra validations that to the `configurationeDone` handler to
ensure we're in an expected state.
I've also fixed up the tests to match the new flow. With the other
additional test fixes in 087a5d2ec7897cd99d3787820711fec76a8e1792 I
think we've narrowed down the main source of test instability that
motivated the startup sequence change.
Adding an assert that the 'continue' request succeeds caused a number of
tests to fail. This showed a number of tests that were not specifying if
they should be stopped or not at key points in the test. This is likely
contributing to these tests being flaky since the debugger is not in the
expected state.
Additionally, I spent a little time trying to improve the readability of
the dap_server.py and lldbdap_testcase.py.
The module event indicates that some information about a module has
changed. The event is supported by the Emacs and Visual Studio DAP
clients. This PR adds support for emitting the event from lldb-dap.
Fixes#137058
The don't currently work (and they're also not particularly useful,
since all of the remote stuff happens inside lldb).
This saves us from annotating tests one by one.
assertEquals is a deprecated alias for assertEqual and has been removed
in Python 3.12. This wasn't an issue previously because we used a
vendored version of the unittest module. Now that we use the built-in
version this gets updated together with the Python version used to run
the test suite.
Rename lldb-vscode to lldb-dap. This change is largely mechanical. The
following substitutions cover the majority of the changes in this
commit:
s/VSCODE/DAP/
s/VSCode/DAP/
s/vscode/dap/
s/g_vsc/g_dap/
Discourse RFC:
https://discourse.llvm.org/t/rfc-rename-lldb-vscode-to-lldb-dap/74075/