Chandler Carruth f4de28a63c
[StrTable] Switch intrinsics to StringTable and work around MSVC (#123548)
Historically, the main example of *very* large string tables used the
`EmitCharArray` to work around MSVC limitations with string literals,
but that was switched (without removing the API) in order to consolidate
on a nicer emission primitive.

While this large string table in `IntrinsicsImpl.inc` seems to compile
correctly on MSVC without the work around in `EmitCharArray` (and that
this PR adds back to the nicer emission path), other users have
repeatedly hit this MSVC limitation as you can see in the discussion on
PR https://github.com/llvm/llvm-project/pull/120534. This PR teaches the
string offset table emission to look at
the size of the table and switch to the char array emission strategy
when the table becomes too large.

This work around does have the downside of making compile times worse
for large string tables, but that appears unavoidable until we can
identify known good MSVC versions and switch to requiring them for all
LLVM users. It also reduces searchability of the generated string table
-- I looked at emitting a comment with each string but it is tricky
because the escaping rules for an inline comment are different from
those of of a string literal, and there's no real way to turn the string
literal into a comment.

While improving the output in this way, also clean up the output to not
emit an extraneous empty string at the end of the string table, and
update the `StringTable` class to not look for that. It isn't actually
used by anything and is wasteful.

This PR also switches the `IntrinsicsImpl.inc` string tables over to the
new `StringTable` runtime abstraction. I didn't want to do this until
landing the MSVC workaround in case it caused even this example to start
hitting the MSVC bug, but I wanted to switch here so that I could
simplify the API for emitting the string table with the workaround
present. With the two different emission strategies, its important to
use a very exact syntax and that seems better encapsulated in the API.

Last but not least, the `SDNodeInfoEmitter` is updated, including its
tests to match the new output.

This PR should unblock landing
https://github.com/llvm/llvm-project/pull/120534 and letting us switch
all of
Clang's builtins to use string tables. That PR has all the details
motivating the overall effort.

Follow-up patches will try to consolidate the remaining users onto the
single interface, but those at least were easy to separate into
follow-ups and keep this PR somewhat smaller.
2025-01-28 00:17:04 -08:00
..
2023-02-17 00:32:46 +09:00

LLVM TableGen

The purpose of TableGen is to generate complex output files based on information from source files that are significantly easier to code than the output files would be, and also easier to maintain and modify over time.

The information is coded in a declarative style involving classes and records, which are then processed by TableGen.

class Hello <string _msg> {
  string msg = !strconcat("Hello ", _msg);
}

def HelloWorld: Hello<"world!"> {}
------------- Classes -----------------
class Hello<string Hello:_msg = ?> {
  string msg = !strconcat("Hello ", Hello:_msg);
}
------------- Defs -----------------
def HelloWorld {        // Hello
  string msg = "Hello world!";
}

Try this example on Compiler Explorer.

The internalized records are passed on to various backends, which extract information from a subset of the records and generate one or more output files.

These output files are typically .inc files for C++, but may be any type of file that the backend developer needs.

Resources for learning the language:

Writing TableGen backends:

TableGen in MLIR:

Useful tools: