138 Commits

Author SHA1 Message Date
Sergei Barannikov
ec860d1b87
[TableGen][DecoderEmitter] Refactor emitTableEntries (NFCI) (#155100)
* Inline two small functions so that `emitTableEntries()` calls itself
directly rather than through other functions.
* Peel the last iteration of the loop as it is special.

This should make the code easier to follow.
2025-08-24 12:39:19 +03:00
Sergei Barannikov
6ae0d9591e
[TableGen][DecoderEmitter] Print the size of the decoder tables (#155139)
So we can see the changes in table sizes after making changes to
DecoderEmitter by simply running `grep DecoderTable`.

Also, remove an unnecessary terminating 0 from the end of the tables.
2025-08-24 09:09:31 +03:00
Sergei Barannikov
1ab3042318 [TableGen][DecoderEmitter] Fix indentation in generated code (NFC)
`MCD::OPC_SoftFail` case in the generated `decodeInstruction()` was
overindented, except for the closing brace, which was underindented.
2025-08-24 07:17:34 +03:00
Sergei Barannikov
ee55efc711
[TableGen][DecoderEmitter] Repurpose Filter class (#155065)
There was a lot of confusion about the responsibilities of Filter and
FilterChooser. They created instances of each other and called each
other's methods. Some of the methods had similar names and did similar
things.

This change moves most of the Filter members to FilterChooser and turns
Filter into a supplementary class with short lifetime. FilterChooser
constructs an array of (candidate) Filters, chooses the best performing
one, and applies it to the given set of encodings, creating inferior
FilterChoosers as necessary. The Filter array is then destroyed. All
responsibility for generating the decoder table now lies with
FilterChooser.
2025-08-23 09:01:24 +03:00
Sergei Barannikov
68964f5dad [TableGen][DecoderEmitter] Small refactoring (NFC)
Few changes extracted from #155065 to make it smaller.
2025-08-23 06:39:45 +03:00
Sergei Barannikov
98262e5bfe
[TableGen][DecoderEmitter] Fix broken AdditionalEncoding support (#155057)
We didn't have tests for AdditionalEncoding and none of the in-tree
targets use this functionality, so I inadvertently broke it in #154288.
2025-08-23 02:48:59 +00:00
Sergei Barannikov
8aba413497
[TableGen][DecoderEmitter] Extract a couple of methods (NFC) (#155044)
Extract `findBestFilter() const` searching for the best filter and
move calls to `recurse()` out of it to a single place.

Extract `dump()` as well, it is useful for debugging.
2025-08-22 23:21:45 +00:00
Sergei Barannikov
539259d6e3 [TableGen][DecoderEmitter] Remove unused move constructor (NFC)
Also delete no-op destructor declaration.
2025-08-23 02:00:43 +03:00
Sergei Barannikov
4028896d4b
[TableGen][DecoderEmitter] Move a function to InstructionEncoding (NFC) (#155038) 2025-08-22 22:37:15 +00:00
Sergei Barannikov
0d6ca2f969
[TableGen][DecoderEmitter] Fix decoder reading bytes past instruction (#154916)
See the added test. Before this change the decoder would first read
the second byte, despite the fact that there are 1-byte instructions
that could match:

```
/* 0 */       MCD::OPC_ExtractField, 8, 8,  // Inst{15-8} ...
/* 3 */       MCD::OPC_FilterValue, 0, 4, 0, // Skip to: 11
/* 7 */       MCD::OPC_Decode, 186, 2, 0, // Opcode: I16_0, DecodeIdx: 0
/* 11 */      MCD::OPC_FilterValue, 1, 4, 0, // Skip to: 19
/* 15 */      MCD::OPC_Decode, 187, 2, 0, // Opcode: I16_1, DecodeIdx: 0
/* 19 */      MCD::OPC_FilterValue, 2, 4, 0, // Skip to: 27
/* 23 */      MCD::OPC_Decode, 188, 2, 0, // Opcode: I16_2, DecodeIdx: 0
/* 27 */      MCD::OPC_ExtractField, 0, 1,  // Inst{0} ...
/* 30 */      MCD::OPC_FilterValue, 0, 4, 0, // Skip to: 38
/* 34 */      MCD::OPC_Decode, 189, 2, 1, // Opcode: I8_0, DecodeIdx: 1
/* 38 */      MCD::OPC_FilterValueOrFail, 1,
/* 40 */      MCD::OPC_Decode, 190, 2, 1, // Opcode: I8_1, DecodeIdx: 1
/* 44 */      MCD::OPC_Fail,
```

There are no changes in the generated files. The only in-tree target
that uses variable length decoder is M68k, which is free of decoding
conflicts that could result in the decoder doing OOB access.

This also fixes misaligned "Decoding Conflict" dump,
prettified example output is shown in the second test.
2025-08-23 00:51:47 +03:00
Sergei Barannikov
6a7ade03d1
[TableGen][DecoderEmitter] Remove redundant variable (NFC) (#154880)
`NumFiltered` is the number of elements in all vectors in a map.
It is ever compared to 1, which is equivalent to checking if the map
contains exactly one vector with exactly one element.
2025-08-22 04:42:06 +00:00
Sergei Barannikov
418fb50301
[TableGen][DecoderEmitter] Calculate encoding bits once (#154026)
Parse the `Inst` and `SoftField` fields once and store them in
`InstructionEncoding` so that we don't parse them every time
`getMandatoryEncodingBits()` is called.
2025-08-22 05:19:35 +03:00
Rahul Joshi
4eeeb8a01e
[NFC][MC][Decoder] Fix off-by-one indentation in generated code (#154855) 2025-08-21 17:20:05 -07:00
Sergei Barannikov
c74afaac6c
[TableGen][DecoderEmitter] Use KnownBits for filters/encodings (NFCI) (#154691)
`KnownBits` is faster and smaller than `std::vector<BitValue>`.
It is also more convenient to use.
2025-08-22 01:37:47 +03:00
Sergei Barannikov
33f6b10c17
[TableGen][DecoderEmitter] Resolve a FIXME in emitDecoder (#154649)
As the FIXME says, we might generate the wrong code to decode an
instruction if it had an operand with no encoding bits. An example is
M68k's `MOV16ds` that is defined as follows:

```
dag OutOperandList = (outs MxDRD16:$dst);
dag InOperandList = (ins SRC:$src);
list<Register> Uses = [SR];
string AsmString = "move.w\t$src, $dst"
dag Inst = (descend { 0, 1, 0, 0, 0, 0, 0, 0, 1, 1 },
            (descend { 0, 0, 0 }, (operand "$dst", 3)));
```

The `$src` operand is not encoded, but what we see in the decoder is:
```C++
    tmp = fieldFromInstruction(insn, 0, 3);
    if (!Check(S, DecodeDR16RegisterClass(MI, tmp, Address, Decoder)))
    { return MCDisassembler::Fail; }
    if (!Check(S, DecodeSRCRegisterClass(MI, insn, Address, Decoder)))
    { return MCDisassembler::Fail; }
    return S;
```

This calls DecodeSRCRegisterClass passing it `insn` instead of the value
of a field that doesn't exist. DecodeSRCRegisterClass has an
unconditional llvm_unreachable inside it.

New decoder looks like:
```C++
    tmp = fieldFromInstruction(insn, 0, 3);
    if (!Check(S, DecodeDR16RegisterClass(MI, tmp, Address, Decoder)))
    { return MCDisassembler::Fail; }
    return S;
```

We're still not disassembling this instruction right, but at least we no
longer have to provide a weird operand decoder method that accepts
instruction bits instead of operand bits.

See #154477 for the origins of the FIXME.
2025-08-21 22:22:16 +00:00
Rahul Joshi
22f8693248
[NFC][MC][Decoder] Extract fixed pieces of decoder code into new header file (#154802)
Extract fixed functions generated by decoder emitter into a new
MCDecoder.h header.
2025-08-21 15:06:43 -07:00
Sergei Barannikov
2421929ca6
[TableGen][DecoderEmitter] Infer encoding's HasCompleteDecoder earlier (NFCI) (#154644)
If an encoding has a custom decoder, the decoder is assumed to be
"complete" (always succeed) if hasCompleteDecoder field is true. We
determine this when constructing InstructionEncoding.

If the decoder for an encoding is *generated*, it always succeeds if
none of the operand decoders can fail. The latter is determined based on
the value of operands' DecoderMethod/hasCompleteDecoder. This happens
late, at table construction time, making the code harder to follow.

This change moves this logic to the InstructionEncoding constructor.
2025-08-21 21:35:30 +00:00
Sergei Barannikov
b96d5c2452
[TableGen][DecoderEmitter] Outline InstructionEncoding constructor (NFC) (#154673)
It is going to grow, so it makes sense to move its definition
out of class. Instead, inline `populateInstruction()` into it.
Also, rename a couple of methods to better convey their meaning.
2025-08-21 06:08:57 +00:00
Sergei Barannikov
46343ca374
[TableGen][DecoderEmitter] Add DecoderMethod to InstructionEncoding (NFC) (#154477)
We used to abuse Operands list to store instruction encoding's
DecoderMethod there. Let's store it in the InstructionEncoding class
instead, where it belongs.
2025-08-20 21:59:59 +00:00
Sergei Barannikov
19ac1ff56e
[TableGen][DecoderEmitter] Factor populateFixedLenEncoding (NFC) (#154511)
Also drop the debug code under `#if 0` and a seemingly outdated comment.
2025-08-20 11:34:59 +00:00
Sergei Barannikov
9ae0bd2c9f
[TableGen][DecoderEmitter] Move Operands to InstructionEncoding (NFCI) (#154456)
This is where they belong, no need to maintain a separate map keyed by
encoding ID.
`populateInstruction()` has been made a member of `InstructionEncoding`
and is now called from the constructor.
2025-08-20 07:10:34 +03:00
Sergei Barannikov
8666ffdd15 [TableGen][DecoderEmitter] Rename some variables (NFC)
And change references to pointers, to make the future diff smaller.
2025-08-20 04:55:07 +03:00
Sergei Barannikov
6462223853 [TableGen] Make ParseOperandName method const (NFC)
Also change its name to start with a lowercase letter and update
the doxygen comment to conform to the coding standard.
2025-08-20 03:21:15 +03:00
Sergei Barannikov
803edce6f7
[TableGen][DecoderEmitter] Analyze encodings once (#154309)
Follow-up to #154288.

With HwModes involved, we used to analyze the same encoding multiple
times (unless `-suppress-per-hwmode-duplicates=O2` is specified). This
affected the build time and made the statistics inaccurate.

From the point of view of the generated code, this is an NFC.
2025-08-19 23:17:12 +00:00
Sergei Barannikov
07a6323c32
[TableGen][DecoderEmitter] Turn EncodingAndInst into a class (NFC) (#154230)
The class will get more methods in follow-up patches.
2025-08-20 01:29:26 +03:00
Sergei Barannikov
56ce40bc73
[TableGen][DecoderEmitter] Stop duplicating encodings (NFC) (#154288)
When HwModes are involved, we can duplicate an instruction encoding that
does not belong to any HwMode multiple times. We can do better by
mapping HwMode to a list of encoding IDs it contains. (That is,
duplicate IDs instead of encodings.)

The encodings that were duplicated are still processed multiple times
(e.g., we call an expensive populateInstruction() on each instance).
This is going to be fixed in subsequent patches.
2025-08-19 09:02:22 +00:00
Sergei Barannikov
cded128009
[TableGen][DecoderEmitter] Extract encoding parsing into a method (NFC) (#154271)
Call it from the constructor so that we can make `run` method `const`.
Turn a couple of related functions into methods as well.
2025-08-19 06:35:59 +00:00
Sergei Barannikov
6c3a0ab51a [TableGen][DecoderEmitter] Shorten a few variable names (NFC)
These "Numbered"-prefixed names were rather confusing than helpful.
2025-08-19 08:05:02 +03:00
Sergei Barannikov
f84ce1e1d0 [TableGen][DecoderEmitter] Extract a couple of loop invariants (NFC) 2025-08-19 07:47:15 +03:00
Sergei Barannikov
c8c2218c00
[TableGen][DecoderEmitter] Synthesize decoder table name in emitTable (#154255)
Previously, HW mode name was appended to decoder namespace name when
enumerating encodings, and then emitTable appended the bit width to it
to form the final table name. Let's do this all in one place.
A nice side effect is that this allows us to avoid having to deal with
std::string.

The changes in the tests are caused by the different order of tables.
2025-08-19 06:19:54 +03:00
Sergei Barannikov
61a859bf6f Use llvm::copy instead of append_range to work around MacOS build failure 2025-08-19 01:43:22 +03:00
Sergei Barannikov
0cd4ae9be0
Reland "[TableGen][DecoderEmitter] Store HW mode ID instead of name (NFC) (#154052)" (#154212)
This reverts commit 5612dc533a9222a0f5561b2ba7c897115f26673f.

Reland with MacOS build fixed.
2025-08-18 22:28:20 +00:00
Shubham Sandeep Rastogi
5612dc533a Revert "[TableGen][DecoderEmitter] Store HW mode ID instead of name (NFC) (#154052)"
This reverts commit b20bbd48e8b1966731a284b4208e048e060e97c2.

Reverted due to greendragon failures:

20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/utils/TableGen/DecoderEmitter.cpp:14:
20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/utils/TableGen/Common/CodeGenHwModes.h:14:
20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/DenseMap.h:20:
20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/STLExtras.h:21:
20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/Hashing.h:53:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/algorithm:1913:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/chrono:746:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__chrono/convert_to_tm.h:19:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__chrono/statically_widen.h:17:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__format/concepts.h:17:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__format/format_parse_context.h:15:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/string_view:1027:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/functional:515:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__functional/boyer_moore_searcher.h:26:
20:34:43  /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/vector:1376:19: error: object of type 'llvm::const_set_bits_iterator_impl<llvm::SmallBitVector>' cannot be assigned because its copy assignment operator is implicitly deleted
20:34:43              __mid =  __first;
20:34:43                    ^
20:34:43  /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/utils/TableGen/DecoderEmitter.cpp:2404:13: note: in instantiation of function template specialization 'std::vector<unsigned int>::assign<llvm::const_set_bits_iterator_impl<llvm::SmallBitVector>, 0>' requested here
20:34:43    HwModeIDs.assign(BV.set_bits_begin(), BV.set_bits_end());
20:34:43              ^
20:34:43  /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/BitVector.h:35:21: note: copy assignment operator of 'const_set_bits_iterator_impl<llvm::SmallBitVector>' is implicitly deleted because field 'Parent' is of reference type 'const llvm::SmallBitVector &'
20:34:43    const BitVectorT &Parent;
20:34:43                      ^
20:34:43  1 warning and 1 error generated.
2025-08-18 14:36:54 -07:00
Sergei Barannikov
13dd65096b [TableGen][DecoderEmitter] Rename some variables for clarity (NFC) 2025-08-19 00:16:56 +03:00
Sergei Barannikov
b20bbd48e8
[TableGen][DecoderEmitter] Store HW mode ID instead of name (NFC) (#154052)
This simplifies code a bit.
2025-08-18 22:53:09 +03:00
Sergei Barannikov
bad02e38c8
[TableGen][DecoderEmitter] Avoid using a sentinel value (#153986)
`NO_FIXED_SEGMENTS_SENTINEL` has a value that is actually a valid field
encoding and so it cannot be used as a sentinel.
Replace the sentinel with a new member variable, `VariableFC`, that
contains the value previously stored in `FilterChooserMap` with
`NO_FIXED_SEGMENTS_SENTINEL` key.
2025-08-18 08:25:17 +03:00
Sergei Barannikov
9ddc043538 [TableGen] Use structured binding in one more place (NFC) 2025-08-18 06:15:44 +03:00
Sergei Barannikov
6947fb4556 [TableGen] Use structured binding in one place (NFC) 2025-08-17 23:50:23 +03:00
Sergei Barannikov
a10773c864
[TableGen][DecoderEmitter] Remove EncodingIDAndOpcode struct (NFC) (#154028)
Most of the time we don't need instruction opcode. There is no need to
carry it around all the time, we can easily get it by other means.
Rename affected variables accordingly.

Part of an effort to simplify DecoderEmitter code.
2025-08-17 20:13:48 +00:00
Sergei Barannikov
ea4325f174
[TableGen][DecoderEmitter] Improve conflicts dump (#154001)
* Print filter stack in non-reversed order.
* Print encoding name to the right of encoding bits to deal with
alignment issues.
* Use the correct bit width when printing encoding bits.

Example of old output:
```
		01000100........
		01000...........
		0100............
		................
	tADDhirr 000000000000000001000100________
	tADDrSP 000000000000000001000100_1101___
	tADDspr 0000000000000000010001001____101
```

New output:
```
    ................
    0100............
    01000...........
    01000100........
    01000100________  tADDhirr
    01000100_1101___  tADDrSP
    010001001____101  tADDspr
```
2025-08-17 06:42:25 +00:00
Sergei Barannikov
05f1673e75 [TableGen] Make a function static (NFC)
Also, modernize the return value to std::optional.
2025-08-17 09:31:28 +03:00
Sergei Barannikov
05827e7ccb [TableGen][DecoderEmitter] Dump conflicts earlier
Dump a conflict as soon as we discover it, no need to wait until
we start building the decoder table.
This improves debugging experience.
2025-08-17 08:20:31 +03:00
Sergei Barannikov
fc6024d895
[TableGen][DecoderEmitter] Shrink lifetime of Filters vector (NFC) (#153998)
Only one element of the `Filters` vector (see `BestIndex`) is used
outside the method that fills it. Localize the vector to the method,
replacing the member variable with the only used element.

Part of an effort to simplify DecoderEmitter code.
2025-08-17 04:02:16 +00:00
Sergei Barannikov
7bb73455f7
[TableGen][DecoderEmitter] Add helpers for working with scopes (NFC) (#153979)
Part of an effort to simplify DecoderEmitter code.
2025-08-16 21:49:17 +00:00
Sergei Barannikov
3acb679bda [TableGen] Remove redundant variable (NFC) 2025-08-16 23:11:53 +03:00
Sergei Barannikov
56681c94f3
[TableGen][DecoderEmitter] Compute bit attribute once (NFC) (#153530)
Pull the logic to compute bit attributes from `filterProcessor()` to its
caller to avoid recomputing them on the second call.
2025-08-15 13:28:38 +03:00
Sergei Barannikov
a73403ba8a [TableGen] Use empty() instead of size() == 0 (NFC) 2025-08-14 06:36:24 +03:00
Sergei Barannikov
6abb6264ea [TableGen] Declare loop induction variables in the loop header (NFC) 2025-08-14 05:48:16 +03:00
Sergei Barannikov
8f3254aa4a
[TableGen][DecoderEmitter] Returns insn_t / std::vector<Islands> by value (NFC) (#153354)
The containers passed by reference are always empty on entry to the
functions that fill them. Return them by value instead and let the
compiler do the return value optimization.
2025-08-13 07:09:13 +00:00
Sergei Barannikov
1ffc38ca49
[TableGen][DecoderEmitter] Remove unused variables (NFC) (#153262) 2025-08-12 20:21:01 +00:00