- In both `emitTable` and the generated `decodeInstruction` function
increment the pointer to the decoder op as a part of the switch
statement instead of later on in each case.
- Avoid dereferencing the end() iterator to get the end pointer, instead
calculate it explicitly
- Fixes a regression introduced in
https://github.com/llvm/llvm-project/pull/136220.
- The windows build failure shows the following call stack:
```
| Exception Code: 0x80000003
| #0 0x00007ff74bc05897 std::_Vector_const_iterator<class std::_Vector_val<struct std::_Simple_types<unsigned char>>>::operator*(void) const C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.37.32822\include\vector:52:0
| #1 0x00007ff74bbd3d64 `anonymous namespace'::DecoderEmitter::emitTable D:\buildbot\llvm-worker\clang-cmake-x86_64-avx512-win\llvm\llvm\utils\TableGen\DecoderEmitter.cpp:852:0
```
- Move the code generated by DecoderEmitter to anonymous namespace.
- Move AMDGPU's usage of this code from header file to .cpp file.
Note, we get build errors like "call to function 'decodeInstruction'
that is neither visible in the template definition nor found by
argument-dependent lookup" if we do not change AMDGPU.
- Add command line option `num-to-skip-size` to parameterize the size of
`NumToSkip` bytes in the decoder table. Default value will be 2, and
targets that need larger size can use 3.
- Keep all existing targets, except AArch64, to use size 2, and change
AArch64 to use size 3 since it run into the "disassembler decoding table
too large" error with size 2.
- Following is a rough reduction in size for the decoder tables by
switching to size 2.
```
Target Old Size New Size % Reduction
================================================
AArch64 153254 153254 0.00
AMDGPU 471566 412805 12.46
ARC 5724 5061 11.58
ARM 84936 73831 13.07
AVR 1497 1306 12.76
BPF 2172 1927 11.28
CSKY 10064 8692 13.63
Hexagon 47967 41965 12.51
Lanai 1108 982 11.37
LoongArch 24446 21621 11.56
MSP430 4200 3716 11.52
Mips 36330 31415 13.53
PPC 31897 28098 11.91
RISCV 37979 32790 13.66
Sparc 8331 7252 12.95
SystemZ 36722 32248 12.18
VE 48296 42873 11.23
XCore 2590 2316 10.58
Xtensa 3827 3316 13.35
```
- Create a new stack scope only in the fallthrough case.
- For the non-fallthrough cases, any fixup entries will naturally be
added to the existing scope without needing to copy them manually.
- Verified that the generated `GenDisassembler` files are identical with
and without this change.
- Add helper functions to insert ULEB128 encoded value and NumToSkip.
- Use ArrayRef<> instead of const vector references as function
arguments.
- Return `OpHasCompleteDecoder` by value instead of by reference.
- Use range for loops.
- Remove {} around single line if/else bodies.
- In `emitSoftFailTableEntry`, unconditionally emit the Positive and
Negative mask values, instead of explicitly emitting a 0 byte when
the mask is not needed.
Combine the StartBits, EndBits, and FieldVals vectors into a single
vector of a struct that contains all 3 pieces of information.
Instead of storing EndBits, we store NumBits since that's what the users
want.
I've removed the BitNo variable as it was easy to construct calculate
from StartBit. I've also removed Num in favor of Islands.size().
This reduces the amount of space needed for vectors of bit_value_t
and allows the user of memset.
Also reorder the enum values so BIT_FALSE is 0 and BIT_TRUE is 1.
Instead of returning the number of bytes emitted, just take the iterator
by reference so the increments in emitULEB128 will update the copy in
the caller.
Also pass the iterator by reference to emitNumToSkip so we don't need a
separate I += 3 in the caller.
Rename `indent` to `Indent` and `o` to `OS`.
Rename `Indentation` to `Indent`.
Remove unused argument from `emitPredicateMatch`.
Change `Indent` argument to `emitBinaryParser` to by value.
Add PrintError and family overload that accepts a print function. This
avoids constructing potentially long strings for passing into these
print functions.
In a previous PR #81716, a new decoder function was added to
llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp. During
code review it was suggested that, as most of the decoder functions were
very similar in structure, that they be refactored into a single,
templated function. I have added the refactored function, removed the
definitions of the replaced functions, and replaced the references to
the replaced functions in AArch64Disassembler.cpp and
llvm/lib/Target/AArch64/AArch64RegisterInfo.td. To reduce the number of
duplicate references in AArch64RegisterInfo.td, I have also made a small
change to llvm/utils/TableGen/DecoderEmitter.cpp.
1. Remove 'AllModes' and 'DefaultMode' suffixes for DecoderTables under
default HwMode.
2. Introduce a less aggressive suppression for HwMode DecoderTable, only
reduce necessary tables duplications. This allows encodings under
different HwModes to retain the original DecoderNamespace.
3. Change 'suppress-per-hwmode-duplicates' command option from bool type
to enum type, allowing users to choose what level of suppression to use.
Refactor of the llvm-tblgen source into:
- a "Basic" library, which contains the bare minimum utilities to build
`llvm-min-tablegen`
- a "Common" library which contains all of the helpers for TableGen
backends. Such helpers can be shared by more than one backend, and even
unit tested (e.g. CodeExpander is, maybe we can add more over time)
Fixes#80647
Many decodeULEB128/decodeSLEB128 users need to increment the pointer.
Add helpers to simplify this common pattern. We don't add `end` and
`error` parameters at present because many users don't need them.
Pull Request: https://github.com/llvm/llvm-project/pull/85739
The decoder emitter is showing some signs of age. This patch makes a few
kinds of clean-ups:
- Use ranged-for more widely, including using enumerate() for those
loops maintaining a loop index along with the items.
- Reduce the number of arguments to fieldFromInsn (removes an out
reference parameter: CodingStandards). The insn_t argument to insnWithID
can/should probably be removed soon too since modern C++ allows us to
return a local container without a copy.
- Use raw strings for the large emitted code segments. This enhances
both readability and modifiability.
DecoderEmitter and CodeEmitterGen perform repeated linear walks over the
entire instruction list. This patch eliminates two more such walks.
The eliminated traversals visit every instruction merely to determine
whether the target has variable length encodings. For a target with
variable length encodings, the original any_of will terminate quickly.
But all targets other than M68k use fixed length encodings and thus
any_of must visit the entire instruction list.
Currently the decoder and encoder emitters will crash if DefaultMode is
used within an EncodingByHwMode. As can be done today for
RegInfoByHwMode and ValueTypeByHwMode, this patch adds support for this
usage in EncodingByHwMode:
let EncodingInfos =
EncodingByHwMode<[ModeA, DefaultMode], [EncA, EncDefault]>;
Currently the DecoderEmitter spends a fair amount of cycles performing
repeated linear walks over the entire instruction list. This patch
eliminates one such walk during HwMode collection for EncodingInfos.
The eliminated traversal visits every instruction and then every
EncodingInfos entry for that instruction merely to collect all
referenced HwModes. That information already happens to be present in
the HwModeSelects created during the one-time construction of
CodeGenHwModes. We instead traverse the HwModeSelects, collecting each
one referenced as an encoding select. This set is a small constant in
size and does not generally grow with the size of the instruction set.
Currently, for per-HwMode encoding/decoding, those instructions that do
not have a HwMode override are duplicated into the decoder tables for
all HwModes. This includes inducing multiple tables for instructions
that are otherwise unrelated (e.g., different namespace with no
overrides at all).
This patch adds support to suppress instruction and table duplicates.
TableGen option "-gen-disassembler --suppress-per-hwmode-duplicates"
enables the suppression (off by default).
For one downstream backend with a complicated ISA and major
cross-generation encoding differences, this eliminates ~32000 duplicate
table entries at the time of this patch.
There are legitimate reasons to suppress or not suppress duplicates. If
there are relatively few non-overridden related instructions, it can be
convenient to pull them into the per-mode tables (only need to decode
the per-mode tables, slightly simpler decode function in disassembler).
On the other hand, in some backends, the opposite is true or the size is
too large to tolerate any duplication in the first place. We let the
user decide which makes sense.
This is currently off by default, though there is no reason it couldn't
be enabled by default. Any existing backends downstream using the
per-HwMode feature will function as before. Turning on the feature
requires minor modifications to their disassembler due to more/less
tables and naming.
Today, if any instruction uses EncodingInfos/EncodingByHwMode to
override the default encoding, the opcode field of the decoder table is
generated incorrectly. This causes failed disassemblies and other
problems.
Specifically, the main correctness issue is that the EncodingID is
inadvertently stored in the table rather than the actual opcode. This is
caused by having set up the IndexOfInstruction map incorrectly during
the loop to populate NumberedEncodings-- which is then propagated around
when OpcMap is set up with a bad EncodingIDAndOpcode.
Instead, do away with IndexOfInstruction altogether and use opcode value
queried from CodeGenTarget::getInstrIntValue to set up OpcMap. This
itself exposed another problem where emitTable was using the decoded
opcode to index into NumberedEncodings. Instead pass in the
EncodingIDAndOpcode vector, and create the reverse mapping from Opcode
to EncodingID, which is then used to index NumberedEncodings.
This problem is not currently exposed upstream since no in-tree targets
yet use the per-HwMode feature. It does show up in at least two
downstream targets.
This patch removes a couple of redundant buffer copies in emitTable for
setting up calls to decodeULEB128. Instead, provide the Table.data
buffer directly to the calls-- where decodeULEB128 does its own buffer
overflow checking.
Factor out 7 explicit loops to emit ULEB128 bytes into emitULEB128. Also
factor out 4 copies of 24-bit numtoskip emission into emitNumToSkip.
The functionality is already covered by existing unit tests and by
virtue of most of the in-tree back-ends exercising the decoder emitter.
Both OPC_ExtractField and OPC_CheckField are currently defined to take
an unsigned 8-bit start value. On some architectures with long
instruction words, this value can silently overflow, resulting in a bad
decoder table.
This patch changes each to take a ULE128B-encoded start value instead.
Additionally, a range assertion is added for the 8-bit length to
prominently notify a user in case that field ever overflows.
This problem isn't currently exposed upstream since all in-tree targets
use small instruction words (i.e., bitwidth <= 64 bits). It does show up
in at least one downstream target with instructions > 64 bits long.
Co-authored-by: Jason Eckhardt <jeckhardt@nvidia.com>