Historically, the main example of *very* large string tables used the `EmitCharArray` to work around MSVC limitations with string literals, but that was switched (without removing the API) in order to consolidate on a nicer emission primitive. While this large string table in `IntrinsicsImpl.inc` seems to compile correctly on MSVC without the work around in `EmitCharArray` (and that this PR adds back to the nicer emission path), other users have repeatedly hit this MSVC limitation as you can see in the discussion on PR https://github.com/llvm/llvm-project/pull/120534. This PR teaches the string offset table emission to look at the size of the table and switch to the char array emission strategy when the table becomes too large. This work around does have the downside of making compile times worse for large string tables, but that appears unavoidable until we can identify known good MSVC versions and switch to requiring them for all LLVM users. It also reduces searchability of the generated string table -- I looked at emitting a comment with each string but it is tricky because the escaping rules for an inline comment are different from those of of a string literal, and there's no real way to turn the string literal into a comment. While improving the output in this way, also clean up the output to not emit an extraneous empty string at the end of the string table, and update the `StringTable` class to not look for that. It isn't actually used by anything and is wasteful. This PR also switches the `IntrinsicsImpl.inc` string tables over to the new `StringTable` runtime abstraction. I didn't want to do this until landing the MSVC workaround in case it caused even this example to start hitting the MSVC bug, but I wanted to switch here so that I could simplify the API for emitting the string table with the workaround present. With the two different emission strategies, its important to use a very exact syntax and that seems better encapsulated in the API. Last but not least, the `SDNodeInfoEmitter` is updated, including its tests to match the new output. This PR should unblock landing https://github.com/llvm/llvm-project/pull/120534 and letting us switch all of Clang's builtins to use string tables. That PR has all the details motivating the overall effort. Follow-up patches will try to consolidate the remaining users onto the single interface, but those at least were easy to separate into follow-ups and keep this PR somewhat smaller.
LLVM TableGen
The purpose of TableGen is to generate complex output files based on information from source files that are significantly easier to code than the output files would be, and also easier to maintain and modify over time.
The information is coded in a declarative style involving classes and records, which are then processed by TableGen.
class Hello <string _msg> {
string msg = !strconcat("Hello ", _msg);
}
def HelloWorld: Hello<"world!"> {}
------------- Classes -----------------
class Hello<string Hello:_msg = ?> {
string msg = !strconcat("Hello ", Hello:_msg);
}
------------- Defs -----------------
def HelloWorld { // Hello
string msg = "Hello world!";
}
Try this example on Compiler Explorer.
The internalized records are passed on to various backends, which extract information from a subset of the records and generate one or more output files.
These output files are typically .inc files for C++, but may be any type of file that the backend developer needs.
Resources for learning the language:
- TableGen Overview
- Programmer's reference guide
- Tutorial
- Tools for Learning LLVM TableGen
- Lessons in TableGen (video), slides
- Improving Your TableGen Descriptions (video), slides
Writing TableGen backends:
- TableGen Backend Developer's Guide
- How to write a TableGen backend (video), slides, also available as a notebook.
TableGen in MLIR:
Useful tools: