This changes VirtualDataExtractor's GetByteSize to return the virtual
byte size of the buffer (external users only understand the data
contents in terms of the virtual sizes & offsets). There are check
methods in DataExtractor that check they are not going off the end of a
buffer, they usually use the BytesLeft() method. There are a couple of
callers of BytesLeft() externally, but it is predominantly an internal
use API. I have BytesLeft() use the physical size of the buffer, not the
virtual size, for the benefit of the DataExtractor methods. (and to
avoid duplicating all of them down in VirtualDataExtractor)
Another problem is the we call SetData on DataExtractorSP's (e.g. see
the ObjectFile ctor) with the DataBuffer it already has, an offset of 0,
and the GetByteSize. A no-op for a DataExtractor that is already using
that DataBuffer. But SetData would try to use that length as a physical
size, and truncate the buffer that the DataExtractor would accept.
I added VirtualDataExtractor subclass methods for the SetData's, detect
(1) data being added to an uninitialized DataExtractor, (2) the same
data / offset / length as currently being used is added to the
DataExtractor (a no-op), or (3) we're genuinely changing the data source
or setting an offset / length that is different. This final case we're
not ready to handle today, I added asserts for them so we can catch it
in debug builds, and then I clear the LookupTable and add a no-op entry
so this extractor will behave like a plain DataExtractor -- because I
don't know better to do. If we genuinely need to handle this case, and
I'm pretty sure we don't need to, I'd have to assume that we're taking a
subset of the original data source (an offset & length), so we'd need to
update all of the LookupTable entries to reflect the new offsets, and
remove entries that are no longer referring to the subsetted range. I'll
leave that until there's any evidence it's actually needed.
rdar://148939795
This fixes the following error on 32 bit platforms, if compiling with Clang:
llvm-project/lldb/source/Utility/VirtualDataExtractor.cpp:211:55: error: non-constant-expression cannot be narrowed from type 'SizeType' (aka 'unsigned long long') to 'size_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing]
211 | return {m_start + static_cast<size_t>(entry->data), entry->size};
| ^~~~~~~~~~~
llvm-project/lldb/source/Utility/VirtualDataExtractor.cpp:211:55: note: insert an explicit cast to silence this issue
211 | return {m_start + static_cast<size_t>(entry->data), entry->size};
| ^~~~~~~~~~~
| static_cast<size_t>( )
We have many places where an ObjectFile subclass will take the
DataExtractor representing the entire binary, create a subsection of
that in a new DataExtractor for processing. For instance, an object file
might have symbol table entries with offsets into the string table. A
common code pattern is to create a DataExtractor representing the string
table, and then pulling out the c-strings based on those offsets from
the string table DataExtractor.
When code does this, it creates a new DataExtractor, copies the
Endianness and Wordsize from the original, copies the DataBufferSP from
the original, and specifies a new start and offset into the DataBuffer.
However, if the binary is actaully stored in a VirtualDataExtractor,
this code pattern loses the correct virtual-to-physical table
translation and will not work correctly. This new method simplifies this
common pattern, and correctly takes a subset of a VirtualDataExtractor.
The current implementation only allows a subset of a
VirtualDataExtractor that is contained within a single virtual entry
(LookupTable entry) and returns a DataExtractor with the corret offsets
calculated from the LookupTable. If we need to a VirtualDataExtractor to
create a Subset DataExtractor representing multiple separate virtual
ranges of data, we'll need to copy over the LookupTable entries that
cover all the bytes, and update them to be relative to the new
VirtualDataExtractor. It's a bit of work, and it's not needed right now,
so I'm not tackling that.
I am working on a larger PR which needs this new method. This PR
contains a unit test that uses it.
rdar://148939795
Introduce VirtualDataExtractor, a DataExtractor subclass that enables
reading data at virtual addresses by translating them to physical buffer
offsets using a lookup table. The lookup table maps virtual address
ranges to physical offsets and enforces boundaries to prevent reads from
crossing entry limits.
The new class inherits from DataExtractor, overriding GetData and
PeekData to provide transparent virtual address translation for most of
the DataExtractor methods. The exception are the unchecked methods, that
bypass those methods and are overloaded as well.