tedwoodward eb6da944af
[lldb] Improve disassembly of unknown instructions (#145793)
LLDB uses the LLVM disassembler to determine the size of instructions and
to do the actual disassembly. Currently, if the LLVM disassembler can't
disassemble an instruction, LLDB will ignore the instruction size, assume
the instruction size is the minimum size for that device, print no useful
opcode, and print nothing for the instruction.

This patch changes this behavior to separate the instruction size and
"can't disassemble". If the LLVM disassembler knows the size, but can't
dissasemble the instruction, LLDB will use that size. It will print out
the opcode, and will print "<unknown>" for the instruction. This is much
more useful to both a user and a script.

The impetus behind this change is to clean up RISC-V disassembly when
the LLVM disassembler doesn't understand all of the instructions.
RISC-V supports proprietary extensions, where the TD files don't know
about certain instructions, and the disassembler can't disassemble them.
Internal users want to be able to disassemble these instructions.

With llvm-objdump, the solution is to pipe the output of the disassembly
through a filter program. This patch modifies LLDB's disassembly to look
more like llvm-objdump's, and includes an example python script that adds
a command "fdis" that will disassemble, then pipe the output through a
specified filter program. This has been tested with crustfilt, a sample
filter located at https://github.com/quic/crustfilt .

Changes in this PR:
- Decouple "can't disassemble" with "instruction size".
  DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for
    valid disassembly, and has the size as an out paramter.
  Use the size even if the disassembly is invalid.
  Disassemble if disassemby is valid.

- Always print out the opcode when -b is specified.
  Previously it wouldn't print out the opcode if it couldn't disassemble.

- Print out RISC-V opcodes the way llvm-objdump does.
  Code for the new Opcode Type eType16_32Tuples by Jason Molenda.

- Print <unknown> for instructions that can't be disassembled, matching
  llvm-objdump, instead of printing nothing.

- Update max riscv32 and riscv64 instruction size to 8.

- Add example "fdis" command script.

- Added disassembly byte test for x86 with known and unknown instructions.
- Added disassembly byte test for riscv32 with known and unknown instructions,
  with and without filtering.
- Added test from Jason Molenda to RISC-V disassembly unit tests.
2025-07-14 21:50:22 -05:00
..

LLVM Documentation
==================

LLVM's documentation is written in reStructuredText, a lightweight
plaintext markup language (file extension `.rst`). While the
reStructuredText documentation should be quite readable in source form, it
is mostly meant to be processed by the Sphinx documentation generation
system to create HTML pages which are hosted on <https://llvm.org/docs/> and
updated after every commit. Manpage output is also supported, see below.

If you instead would like to generate and view the HTML locally, install
Sphinx <http://sphinx-doc.org/> and then do:

    cd <build-dir>
    cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true <src-dir>
    make -j3 docs-llvm-html
    $BROWSER <build-dir>/docs/html/index.html

The mapping between reStructuredText files and generated documentation is
`docs/Foo.rst` <-> `<build-dir>/docs//html/Foo.html` <-> `https://llvm.org/docs/Foo.html`.

If you are interested in writing new documentation, you will want to read
`SphinxQuickstartTemplate.rst` which will get you writing documentation
very fast and includes examples of the most important reStructuredText
markup syntax.

Manpage Output
===============

Building the manpages is similar to building the HTML documentation. The
primary difference is to use the `man` makefile target, instead of the
default (which is `html`). Sphinx then produces the man pages in the
directory `<build-dir>/docs/man/`.

    cd <build-dir>
    cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_MAN=true <src-dir>
    make -j3 docs-llvm-man
    man -l <build-dir>/docs/man/FileCheck.1

The correspondence between .rst files and man pages is
`docs/CommandGuide/Foo.rst` <-> `<build-dir>/docs//man/Foo.1`.
These .rst files are also included during HTML generation so they are also
viewable online (as noted above) at e.g.
`https://llvm.org/docs/CommandGuide/Foo.html`.

Checking links
==============

The reachability of external links in the documentation can be checked by
running:

    cd llvm/docs/
    sphinx-build -b linkcheck . _build/lintcheck/
    # report will be generated in _build/lintcheck/output.txt

Doxygen page Output
==============

Install doxygen <https://www.doxygen.nl/download.html> and dot2tex <https://dot2tex.readthedocs.io/en/latest>.

    cd <build-dir>
    cmake -DLLVM_ENABLE_DOXYGEN=On <llvm-top-src-dir>
    make doxygen-llvm # for LLVM docs
    make doxygen-clang # for clang docs

It will generate html in

    <build-dir>/docs/doxygen/html # for LLVM docs
    <build-dir>/tools/clang/docs/doxygen/html # for clang docs