9 Commits

Author SHA1 Message Date
Sergei Barannikov
60bdf09654
[TableGen][DecoderEmitter] Rework table construction/emission (#155889)
### Current state

We have FilterChooser class, which can be thought of as a **tree of
encodings**. Tree nodes are instances of FilterChooser itself, and come
in two types:

* A node containing single encoding that has *constant* bits in the
specified bit range, a.k.a. singleton node.
* A node containing only child nodes, where each child represents a set
of encodings that have the same *constant* bits in the specified bit
range.

Either of these nodes can have an additional child, which represents a
set of encodings that have some *unknown* bits in the same bit range.

As can be seen, the **data structure is very high level**.

The encoding tree represented by FilterChooser is then converted into a
finite-state machine (FSM), represented as **byte array**. The
translation is straightforward: for each node of the tree we emit a
sequence of opcodes that check encoding bits and predicates for each
encoding. For a singleton node we also emit a terminal "decode" opcode.

The translation is done in one go, and this has negative consequences:

* We miss optimization opportunities.
* We have to use "fixups" when encoding transitions in the FSM since we
don't know the size of the data we want to jump over in advance. We have
to emit the data first and then fix up the location of the jump. This
means the fixup size has to be large enough to encode the longest jump,
so **most of the transitions are encoded inefficiently**.
* Finally, when converting the FSM into human readable form, we have to
**decode the byte array we've just emitted**. This is also done in one
go, so we **can't do any pretty printing**.

### This PR

We introduce an intermediary data structure, decoder tree, that can be
thought as **AST of the decoder program**.
This data structure is **low level** and as such allows for optimization
and analysis.
It resolves all the issues listed above. We now can:
* Emit more optimal opcode sequences.
* Compute the size of the data to be emitted in advance, avoiding
fixups.
* Do pretty printing.

Serialization is done by a new class, DecoderTableEmitter, which
converts the AST into a FSM in **textual form**, streamed right into the
output file.

### Results
* The new approach immediately resulted in 12% total table size savings
across all in-tree targets, without implementing any optimizations on
the AST. Many tables observe ~20% size reduction.
* The generated file is much more readable.
* The implementation is arguably simpler and more straightforward (the
diff is only +150~200 lines, which feels rather small for the benefits
the change gives).
2025-09-20 01:58:53 +00:00
Sergei Barannikov
7f4c297e94
[TableGen][CodeGen] Remove feature string from HwMode (#157600)
`Predicates` and `Features` fields serve the same purpose. They should
be kept in sync, but not all predicates are based on features. This
resulted in introducing dummy features for that only reason.

This patch removes `Features` field and changes TableGen emitters to use
`Predicates` instead.

Historically, predicates were written with the assumption that the
checking code will be used in `SelectionDAGISel` subclasses, meaning
they will have access to the subclass variables, such as `Subtarget`.
There are no such variables in the generated
`GenSubtargetInfo::getHwModeSet()`, so we need to provide them. This can
be achieved by subclassing `HwModePredicateProlog`, see an example in
`Hexagon.td`.
2025-09-10 12:39:47 +03:00
Sergei Barannikov
2a586a8118
[TableGen][DecoderEmitter] Remove dead OPC_Fail (#155229)
It can never be reached. It could be reached if we emitted an opcode
that could fall outside the outermost scope, but emission of all such
opcodes is guarded by `!isOutermostScope()`.

That also means we never add fixups to the outermost scope, so avoid
pushing an entry for it onto the stack.
2025-08-25 19:15:35 +03:00
Sergei Barannikov
6ae0d9591e
[TableGen][DecoderEmitter] Print the size of the decoder tables (#155139)
So we can see the changes in table sizes after making changes to
DecoderEmitter by simply running `grep DecoderTable`.

Also, remove an unnecessary terminating 0 from the end of the tables.
2025-08-24 09:09:31 +03:00
Jason Eckhardt
2ed0aacf97
[TableGen] Fixes for per-HwMode decoding problem (#82201)
Today, if any instruction uses EncodingInfos/EncodingByHwMode to
override the default encoding, the opcode field of the decoder table is
generated incorrectly. This causes failed disassemblies and other
problems.

Specifically, the main correctness issue is that the EncodingID is
inadvertently stored in the table rather than the actual opcode. This is
caused by having set up the IndexOfInstruction map incorrectly during
the loop to populate NumberedEncodings-- which is then propagated around
when OpcMap is set up with a bad EncodingIDAndOpcode.

Instead, do away with IndexOfInstruction altogether and use opcode value
queried from CodeGenTarget::getInstrIntValue to set up OpcMap. This
itself exposed another problem where emitTable was using the decoded
opcode to index into NumberedEncodings. Instead pass in the
EncodingIDAndOpcode vector, and create the reverse mapping from Opcode
to EncodingID, which is then used to index NumberedEncodings.

This problem is not currently exposed upstream since no in-tree targets
yet use the per-HwMode feature. It does show up in at least two
downstream targets.
2024-02-19 13:14:22 +08:00
Craig Topper
81a150656b [TableGen][RISCV][Hexagon][LoongArch] Add a list of Predicates to HwMode.
Use the predicate condition instead of checkFeatures in *GenDAGISel.inc.

This makes the code similar to isel pattern predicates.

checkFeatures is still used by code created by SubtargetEmitter so
we can't remove the string. Backends need to be careful to keep
the string and predicates in sync, but I don't think that's a big issue.

I haven't measured it, but this should be a compile time improvement
for isel since we don't have to do any of the string processing that's
inside checkFeatures.

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D146012
2023-03-14 13:00:38 -07:00
Paul C. Anagnostopoulos
93b4f85382 Update TableGen test files to use the new '...' range punctuation. 2020-09-12 16:26:32 -04:00
James Molloy
9948fe6997 [TableGen] Fix crash when using HwModes in CodeEmitterGen
When an instruction has an encoding definition for only a subset of
the available HwModes, ensure we just avoid generating an encoding
rather than crash.

llvm-svn: 374150
2019-10-09 09:15:34 +00:00
James Molloy
88a5fbfcea [TableGen] Support encoding per-HwMode
Much like ValueTypeByHwMode/RegInfoByHwMode, this patch allows targets
to modify an instruction's encoding based on HwMode. When the
EncodingInfos field is non-empty the Inst and Size fields of the Instruction
are ignored and taken from EncodingInfos instead.

As part of this promote getHwMode() from TargetSubtargetInfo to MCSubtargetInfo.

This is NFC for all existing targets - new code is generated only if targets
use EncodingByHwMode.

llvm-svn: 372320
2019-09-19 13:39:54 +00:00