Implement the detection of tail calls performed with untrusted link
register, which violates the assumption made on entry to every function.
Unlike other pauth gadgets, detection of this one involves some amount
of guessing which branch instructions should be checked as tail calls.
After a label in a function without CFG information, use a reasonably
pessimistic estimation of register state (assume that any register that
can be clobbered in this function was actually clobbered) instead of the
most pessimistic "all registers are unsafe". This is the same estimation
as used by the dataflow variant of the analysis when the preceding
instruction is not known for sure.
Without this, leaf functions without CFG information are likely to have
false positive reports about non-protected return instructions, as
1) LR is unlikely to be signed and authenticated in a leaf function and
2) LR is likely to be used by a return instruction near the end of the
function and
3) the register state is likely to be reset at least once during the
linear scan through the function
Instead of refusing to analyze an instruction completely when it is
unreachable according to the CFG reconstructed by BOLT, use pessimistic
assumption of register state when possible. Nevertheless, unreachable
basic blocks found in optimized code likely means imprecise CFG
reconstruction, thus report a warning once per function.
Some instruction-printing code used under LLVM_DEBUG does not handle CFI
instructions well. While CFI instructions seem to be harmless for the
correctness of the analysis results, they do not convey any useful
information to the analysis either, so skip them early.
Implement the detection of authentication instructions whose results can
be inspected by an attacker to know whether authentication succeeded.
As the properties of output registers of authentication instructions are
inspected, add a second set of analysis-related classes to iterate over
the instructions in reverse order.
Clarify the semantics of `getAuthenticatedReg` and remove a redundant
`isAuthenticationOfReg` method, as combined auth+something instructions
(such as `retaa` on AArch64) should be handled carefully, especially
when searching for authentication oracles: usually, such instructions
cannot be authentication oracles and only some of them actually write an
authenticated pointer to a register (such as "ldra x0, [x1]!").
Use `std::optional<MCPhysReg>` returned type instead of plain MCPhysReg
and returning `getNoRegister()` as a "not applicable" indication.
Document a few existing methods, add information about preconditions.
Remove `getAffectedRegisters` and `setOverwritingInstrs` methods from
the base `Report` class. Instead, rename the `Report` class to
`Diagnostic` and make it always represent the brief version of the
report, which is kept unchanged since initially found. Throughout its
life-cycle, an instance of `Diagnostic` is first wrapped into
`PartialReport<ReqT>` together with an optional request for extra
details. Then, on the second run of the analysis, it is re-wrapped into
`FinalReport` together with the requested detailed information.
* use more flexible `ArrayRef<T>` and `StringRef` types instead of
`const std::vector<T> &` and `const std::string &`, correspondingly,
for function arguments
* return plain `const SrcState &` instead of `ErrorOr<const SrcState &>`
from `SrcSafetyAnalysis::getStateBefore`, as absent state is not
handled gracefully by any caller
Implement the detection of signing oracles. In this patch, a signing
oracle is defined as a sign instruction that accepts a "non-protected"
pointer, but for a slightly different definition of "non-protected"
compared to control flow instructions.
A second BitVector named TrustedRegs is added to the register state
computed by the data-flow analysis. The difference between a
"safe-to-dereference" and a "trusted" register states is that to make
an unsafe register trusted by authentication, one has to make sure
that the authentication succeeded. For example, on AArch64 without
FEAT_PAuth2 and FEAT_EPAC, an authentication instruction produces an
invalid pointer on failure, so that subsequent memory access triggers
an error, but re-signing such pointer would "fix" the signature.
Note that while a separate "trusted" register state may be redundant
depending on the specific semantics of auth and sign operations, it is
still important to check signing operations: while code like this
resign:
autda x0, x1
pacda x0, x2
ret
is probably safe provided `autda` generates an error on authentication
failure, this function
sign_anything:
pacda x0, x1
ret
is inherently unsafe.
Support simple analysis of the functions for which BOLT is unable to
reconstruct the CFG. This patch is inspired by the approach implemented
by Kristof Beyls in the original prototype of gadget scanner, but a
CFG-unaware counterpart of the data-flow analysis is implemented
instead of separate version of gadget detector, as multiple gadget kinds
are detected now.
Scanning functions without CFG information as well as the detection of
authentication oracles requires introducing more classes related to
register state analysis. To make the future code easier to understand,
rename several classes beforehand.
To detect authentication oracles, one has to query the properties of
*output* operands of authentication instructions *after* the instruction
is executed - this requires adding another analysis that iterates over
the instructions in reverse order, and a corresponding state class.
As the main difference of the existing `State` class is that it stores
the properties of source register operands of the instructions before
the instruction's execution, rename it to `SrcState` and
`PacRetAnalysis` to `SrcSafetyAnalysis`.
Apply minor adjustments to the debug output along the way.
In addition to authenticated pointers, consider the contents of a
register safe if it was
* written by PC-relative address computation
* updated by an arithmetic instruction whose input address is safe
In preparation for implementing support for detection of non-protected
call instructions, refine the definition of state which is computed for
each register by data-flow analysis.
Explicitly marking the registers which are known to be trusted at
function entry is crucial for finding non-protected calls. In addition,
it fixes less-common false negatives for pac-ret, such as `ret x1` in
`f_nonx30_ret_non_auted` test case.
In preparation for adding more gadget kinds to detect, streamline
issue reporting.
Rename classes representing issue reports. In particular, rename
`Annotation` base class to `Report`, as it has nothing to do with
"annotations" in `MCPlus` terms anymore. Remove references to "return
instructions" from variable names and report messages, use generic
terms instead. Rename NonPacProtectedRetAnalysis to PAuthGadgetScanner.
Remove `GeneralDiagnostic` as a separate class, make `GenericReport`
(former `GenDiag`) store `std::string Text` directly. Remove unused
`operator=` and `operator==` methods, as `Report`s are created on the
heap and referenced via `shared_ptr`s.
Introduce `GadgetKind` class - currently, it only wraps a `const char *`
description to display to the user. This description is intended to be
a per-gadget-kind constant (or a few hard-coded constants), so no need
to store it to `std::string` field in each report instance. To handle
both free-form `GenericReport`s and statically-allocated messages
without unnecessary overhead, move printing of the report header to the
base class (and take the message argument as a `StringRef`).