Rahul Joshi 85f5953fca
[LLVM][MC] Unique per-hw mode field encoding code in CodeEmitterGen (#172764)
Change CodeEmitterGen to de-duplicate case statements emitted for
encoding instruction fields for different HW modes when they contain the
same code. When 2 or more HW modes share the same code for encoding the
fields of an instruction, we currently generate a case statement for
each mode and emit the same code in each case body. Instead, unique the
case statement bodies and emit each body just once.

Some minor refactor to help with this:
1. Make `emitCaseMap` a standalone static function and use
`ListSeparator` to emit the case statements.
2. Add a type-alias for the map of cases.

No upstream target seems to use this feature (`EncodingInfos`) but this
results in ~3% code size reduction in a downstream target.
2025-12-18 09:31:59 -08:00
..

LLVM TableGen

The purpose of TableGen is to generate complex output files based on information from source files that are significantly easier to code than the output files would be, and also easier to maintain and modify over time.

The information is coded in a declarative style involving classes and records, which are then processed by TableGen.

class Hello <string _msg> {
  string msg = !strconcat("Hello ", _msg);
}

def HelloWorld: Hello<"world!"> {}
------------- Classes -----------------
class Hello<string Hello:_msg = ?> {
  string msg = !strconcat("Hello ", Hello:_msg);
}
------------- Defs -----------------
def HelloWorld {        // Hello
  string msg = "Hello world!";
}

Try this example on Compiler Explorer.

The internalized records are passed on to various backends, which extract information from a subset of the records and generate one or more output files.

These output files are typically .inc files for C++, but may be any type of file that the backend developer needs.

Resources for learning the language:

Writing TableGen backends:

TableGen in MLIR:

Useful tools: