126 Commits

Author SHA1 Message Date
Rahul Joshi
4eeeb8a01e
[NFC][MC][Decoder] Fix off-by-one indentation in generated code (#154855) 2025-08-21 17:20:05 -07:00
Sergei Barannikov
c74afaac6c
[TableGen][DecoderEmitter] Use KnownBits for filters/encodings (NFCI) (#154691)
`KnownBits` is faster and smaller than `std::vector<BitValue>`.
It is also more convenient to use.
2025-08-22 01:37:47 +03:00
Sergei Barannikov
33f6b10c17
[TableGen][DecoderEmitter] Resolve a FIXME in emitDecoder (#154649)
As the FIXME says, we might generate the wrong code to decode an
instruction if it had an operand with no encoding bits. An example is
M68k's `MOV16ds` that is defined as follows:

```
dag OutOperandList = (outs MxDRD16:$dst);
dag InOperandList = (ins SRC:$src);
list<Register> Uses = [SR];
string AsmString = "move.w\t$src, $dst"
dag Inst = (descend { 0, 1, 0, 0, 0, 0, 0, 0, 1, 1 },
            (descend { 0, 0, 0 }, (operand "$dst", 3)));
```

The `$src` operand is not encoded, but what we see in the decoder is:
```C++
    tmp = fieldFromInstruction(insn, 0, 3);
    if (!Check(S, DecodeDR16RegisterClass(MI, tmp, Address, Decoder)))
    { return MCDisassembler::Fail; }
    if (!Check(S, DecodeSRCRegisterClass(MI, insn, Address, Decoder)))
    { return MCDisassembler::Fail; }
    return S;
```

This calls DecodeSRCRegisterClass passing it `insn` instead of the value
of a field that doesn't exist. DecodeSRCRegisterClass has an
unconditional llvm_unreachable inside it.

New decoder looks like:
```C++
    tmp = fieldFromInstruction(insn, 0, 3);
    if (!Check(S, DecodeDR16RegisterClass(MI, tmp, Address, Decoder)))
    { return MCDisassembler::Fail; }
    return S;
```

We're still not disassembling this instruction right, but at least we no
longer have to provide a weird operand decoder method that accepts
instruction bits instead of operand bits.

See #154477 for the origins of the FIXME.
2025-08-21 22:22:16 +00:00
Rahul Joshi
22f8693248
[NFC][MC][Decoder] Extract fixed pieces of decoder code into new header file (#154802)
Extract fixed functions generated by decoder emitter into a new
MCDecoder.h header.
2025-08-21 15:06:43 -07:00
Sergei Barannikov
2421929ca6
[TableGen][DecoderEmitter] Infer encoding's HasCompleteDecoder earlier (NFCI) (#154644)
If an encoding has a custom decoder, the decoder is assumed to be
"complete" (always succeed) if hasCompleteDecoder field is true. We
determine this when constructing InstructionEncoding.

If the decoder for an encoding is *generated*, it always succeeds if
none of the operand decoders can fail. The latter is determined based on
the value of operands' DecoderMethod/hasCompleteDecoder. This happens
late, at table construction time, making the code harder to follow.

This change moves this logic to the InstructionEncoding constructor.
2025-08-21 21:35:30 +00:00
Sergei Barannikov
b96d5c2452
[TableGen][DecoderEmitter] Outline InstructionEncoding constructor (NFC) (#154673)
It is going to grow, so it makes sense to move its definition
out of class. Instead, inline `populateInstruction()` into it.
Also, rename a couple of methods to better convey their meaning.
2025-08-21 06:08:57 +00:00
Sergei Barannikov
46343ca374
[TableGen][DecoderEmitter] Add DecoderMethod to InstructionEncoding (NFC) (#154477)
We used to abuse Operands list to store instruction encoding's
DecoderMethod there. Let's store it in the InstructionEncoding class
instead, where it belongs.
2025-08-20 21:59:59 +00:00
Sergei Barannikov
19ac1ff56e
[TableGen][DecoderEmitter] Factor populateFixedLenEncoding (NFC) (#154511)
Also drop the debug code under `#if 0` and a seemingly outdated comment.
2025-08-20 11:34:59 +00:00
Sergei Barannikov
9ae0bd2c9f
[TableGen][DecoderEmitter] Move Operands to InstructionEncoding (NFCI) (#154456)
This is where they belong, no need to maintain a separate map keyed by
encoding ID.
`populateInstruction()` has been made a member of `InstructionEncoding`
and is now called from the constructor.
2025-08-20 07:10:34 +03:00
Sergei Barannikov
8666ffdd15 [TableGen][DecoderEmitter] Rename some variables (NFC)
And change references to pointers, to make the future diff smaller.
2025-08-20 04:55:07 +03:00
Sergei Barannikov
6462223853 [TableGen] Make ParseOperandName method const (NFC)
Also change its name to start with a lowercase letter and update
the doxygen comment to conform to the coding standard.
2025-08-20 03:21:15 +03:00
Sergei Barannikov
803edce6f7
[TableGen][DecoderEmitter] Analyze encodings once (#154309)
Follow-up to #154288.

With HwModes involved, we used to analyze the same encoding multiple
times (unless `-suppress-per-hwmode-duplicates=O2` is specified). This
affected the build time and made the statistics inaccurate.

From the point of view of the generated code, this is an NFC.
2025-08-19 23:17:12 +00:00
Sergei Barannikov
07a6323c32
[TableGen][DecoderEmitter] Turn EncodingAndInst into a class (NFC) (#154230)
The class will get more methods in follow-up patches.
2025-08-20 01:29:26 +03:00
Sergei Barannikov
56ce40bc73
[TableGen][DecoderEmitter] Stop duplicating encodings (NFC) (#154288)
When HwModes are involved, we can duplicate an instruction encoding that
does not belong to any HwMode multiple times. We can do better by
mapping HwMode to a list of encoding IDs it contains. (That is,
duplicate IDs instead of encodings.)

The encodings that were duplicated are still processed multiple times
(e.g., we call an expensive populateInstruction() on each instance).
This is going to be fixed in subsequent patches.
2025-08-19 09:02:22 +00:00
Sergei Barannikov
cded128009
[TableGen][DecoderEmitter] Extract encoding parsing into a method (NFC) (#154271)
Call it from the constructor so that we can make `run` method `const`.
Turn a couple of related functions into methods as well.
2025-08-19 06:35:59 +00:00
Sergei Barannikov
6c3a0ab51a [TableGen][DecoderEmitter] Shorten a few variable names (NFC)
These "Numbered"-prefixed names were rather confusing than helpful.
2025-08-19 08:05:02 +03:00
Sergei Barannikov
f84ce1e1d0 [TableGen][DecoderEmitter] Extract a couple of loop invariants (NFC) 2025-08-19 07:47:15 +03:00
Sergei Barannikov
c8c2218c00
[TableGen][DecoderEmitter] Synthesize decoder table name in emitTable (#154255)
Previously, HW mode name was appended to decoder namespace name when
enumerating encodings, and then emitTable appended the bit width to it
to form the final table name. Let's do this all in one place.
A nice side effect is that this allows us to avoid having to deal with
std::string.

The changes in the tests are caused by the different order of tables.
2025-08-19 06:19:54 +03:00
Sergei Barannikov
61a859bf6f Use llvm::copy instead of append_range to work around MacOS build failure 2025-08-19 01:43:22 +03:00
Sergei Barannikov
0cd4ae9be0
Reland "[TableGen][DecoderEmitter] Store HW mode ID instead of name (NFC) (#154052)" (#154212)
This reverts commit 5612dc533a9222a0f5561b2ba7c897115f26673f.

Reland with MacOS build fixed.
2025-08-18 22:28:20 +00:00
Shubham Sandeep Rastogi
5612dc533a Revert "[TableGen][DecoderEmitter] Store HW mode ID instead of name (NFC) (#154052)"
This reverts commit b20bbd48e8b1966731a284b4208e048e060e97c2.

Reverted due to greendragon failures:

20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/utils/TableGen/DecoderEmitter.cpp:14:
20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/utils/TableGen/Common/CodeGenHwModes.h:14:
20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/DenseMap.h:20:
20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/STLExtras.h:21:
20:34:43  In file included from /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/Hashing.h:53:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/algorithm:1913:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/chrono:746:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__chrono/convert_to_tm.h:19:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__chrono/statically_widen.h:17:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__format/concepts.h:17:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__format/format_parse_context.h:15:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/string_view:1027:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/functional:515:
20:34:43  In file included from /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__functional/boyer_moore_searcher.h:26:
20:34:43  /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/vector:1376:19: error: object of type 'llvm::const_set_bits_iterator_impl<llvm::SmallBitVector>' cannot be assigned because its copy assignment operator is implicitly deleted
20:34:43              __mid =  __first;
20:34:43                    ^
20:34:43  /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/utils/TableGen/DecoderEmitter.cpp:2404:13: note: in instantiation of function template specialization 'std::vector<unsigned int>::assign<llvm::const_set_bits_iterator_impl<llvm::SmallBitVector>, 0>' requested here
20:34:43    HwModeIDs.assign(BV.set_bits_begin(), BV.set_bits_end());
20:34:43              ^
20:34:43  /Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake/llvm-project/llvm/include/llvm/ADT/BitVector.h:35:21: note: copy assignment operator of 'const_set_bits_iterator_impl<llvm::SmallBitVector>' is implicitly deleted because field 'Parent' is of reference type 'const llvm::SmallBitVector &'
20:34:43    const BitVectorT &Parent;
20:34:43                      ^
20:34:43  1 warning and 1 error generated.
2025-08-18 14:36:54 -07:00
Sergei Barannikov
13dd65096b [TableGen][DecoderEmitter] Rename some variables for clarity (NFC) 2025-08-19 00:16:56 +03:00
Sergei Barannikov
b20bbd48e8
[TableGen][DecoderEmitter] Store HW mode ID instead of name (NFC) (#154052)
This simplifies code a bit.
2025-08-18 22:53:09 +03:00
Sergei Barannikov
bad02e38c8
[TableGen][DecoderEmitter] Avoid using a sentinel value (#153986)
`NO_FIXED_SEGMENTS_SENTINEL` has a value that is actually a valid field
encoding and so it cannot be used as a sentinel.
Replace the sentinel with a new member variable, `VariableFC`, that
contains the value previously stored in `FilterChooserMap` with
`NO_FIXED_SEGMENTS_SENTINEL` key.
2025-08-18 08:25:17 +03:00
Sergei Barannikov
9ddc043538 [TableGen] Use structured binding in one more place (NFC) 2025-08-18 06:15:44 +03:00
Sergei Barannikov
6947fb4556 [TableGen] Use structured binding in one place (NFC) 2025-08-17 23:50:23 +03:00
Sergei Barannikov
a10773c864
[TableGen][DecoderEmitter] Remove EncodingIDAndOpcode struct (NFC) (#154028)
Most of the time we don't need instruction opcode. There is no need to
carry it around all the time, we can easily get it by other means.
Rename affected variables accordingly.

Part of an effort to simplify DecoderEmitter code.
2025-08-17 20:13:48 +00:00
Sergei Barannikov
ea4325f174
[TableGen][DecoderEmitter] Improve conflicts dump (#154001)
* Print filter stack in non-reversed order.
* Print encoding name to the right of encoding bits to deal with
alignment issues.
* Use the correct bit width when printing encoding bits.

Example of old output:
```
		01000100........
		01000...........
		0100............
		................
	tADDhirr 000000000000000001000100________
	tADDrSP 000000000000000001000100_1101___
	tADDspr 0000000000000000010001001____101
```

New output:
```
    ................
    0100............
    01000...........
    01000100........
    01000100________  tADDhirr
    01000100_1101___  tADDrSP
    010001001____101  tADDspr
```
2025-08-17 06:42:25 +00:00
Sergei Barannikov
05f1673e75 [TableGen] Make a function static (NFC)
Also, modernize the return value to std::optional.
2025-08-17 09:31:28 +03:00
Sergei Barannikov
05827e7ccb [TableGen][DecoderEmitter] Dump conflicts earlier
Dump a conflict as soon as we discover it, no need to wait until
we start building the decoder table.
This improves debugging experience.
2025-08-17 08:20:31 +03:00
Sergei Barannikov
fc6024d895
[TableGen][DecoderEmitter] Shrink lifetime of Filters vector (NFC) (#153998)
Only one element of the `Filters` vector (see `BestIndex`) is used
outside the method that fills it. Localize the vector to the method,
replacing the member variable with the only used element.

Part of an effort to simplify DecoderEmitter code.
2025-08-17 04:02:16 +00:00
Sergei Barannikov
7bb73455f7
[TableGen][DecoderEmitter] Add helpers for working with scopes (NFC) (#153979)
Part of an effort to simplify DecoderEmitter code.
2025-08-16 21:49:17 +00:00
Sergei Barannikov
3acb679bda [TableGen] Remove redundant variable (NFC) 2025-08-16 23:11:53 +03:00
Sergei Barannikov
56681c94f3
[TableGen][DecoderEmitter] Compute bit attribute once (NFC) (#153530)
Pull the logic to compute bit attributes from `filterProcessor()` to its
caller to avoid recomputing them on the second call.
2025-08-15 13:28:38 +03:00
Sergei Barannikov
a73403ba8a [TableGen] Use empty() instead of size() == 0 (NFC) 2025-08-14 06:36:24 +03:00
Sergei Barannikov
6abb6264ea [TableGen] Declare loop induction variables in the loop header (NFC) 2025-08-14 05:48:16 +03:00
Sergei Barannikov
8f3254aa4a
[TableGen][DecoderEmitter] Returns insn_t / std::vector<Islands> by value (NFC) (#153354)
The containers passed by reference are always empty on entry to the
functions that fill them. Return them by value instead and let the
compiler do the return value optimization.
2025-08-13 07:09:13 +00:00
Sergei Barannikov
1ffc38ca49
[TableGen][DecoderEmitter] Remove unused variables (NFC) (#153262) 2025-08-12 20:21:01 +00:00
Sergei Barannikov
2f9f92ad01
[TableGen] Use getValueAsOptionalDef to simplify code (NFC) (#153170) 2025-08-12 17:44:01 +03:00
Rahul Joshi
633728f3b5
[NFC][TableGen][DecoderEmitter] Eliminate indent for a few functions (#148718)
Eliminate the `indent` argument for functions which are always called
with `indent(0)`.
2025-07-14 15:23:41 -07:00
Rahul Joshi
23b4f4eb9b
[NFC][TableGen] Change DecoderEmitter insertBits to use integer types only (#147613)
The `insertBits` templated function generated by DecoderEmitter is
called with variable `tmp` of type `TmpType` which is:

```
using TmpType = std::conditional_t<std::is_integral<InsnType>::value, InsnType, uint64_t>;
```

That is, `TmpType` is always an integral type. Change the generated
`insertBits` to be valid only for integer types, and eliminate the
unused `insertBits` function from `DecoderUInt128` in
AMDGPUDisassembler.h

Additionally, drop some of the requirements `InsnType` must support as
they no longer seem to be required.
2025-07-09 08:56:07 -07:00
Rahul Joshi
5f2e88a125
[NFC][TableGen] Rename CodeGenTarget instruction accessors (#146767)
Rename `getXYZInstructionsByEnumValue()` to just `getXYZInstructions`
and drop the `ByEnumValue` in the name.
2025-07-07 08:01:14 -07:00
Rahul Joshi
d7b8b65e23
[LLVM][TableGen][DecoderEmitter] Add wrapper struct for bit_value_t (#146248)
Add a convenience wrapper struct for the `bit_value_t` enum type to host
various constructors, query, and printing support. Also refactor related
code in several places. In `getBitsField`, use `llvm::append_range` and
`SmallVector::append()` and eliminate manual loops. Eliminate
`emitNameWithID` and instead use the `operator <<` that does the same
thing as this function. Have `BitValue::getValue()` (replacement for
`Value`) return std::optional<> instead of -1 for unset bits. Terminate
with a fatal error when a decoding conflict is encountered.
2025-07-01 07:36:17 -07:00
Rahul Joshi
92b50959da
[NFC][TableGen] Capitalize to in UseFnTableInDecodetoMCInst. (#146419) 2025-06-30 16:12:15 -07:00
Rahul Joshi
ed5f8f238d
[LLVM][DecoderEmitter] Add option to use function table in decodeToMCInst (#144814)
Add option `use-fn-table-in-decode-to-mcinst` to use a table of function
pointers instead of a switch case in the generated `decodeToMCInst`
function.

When the number of switch cases in this function is large, the generated
code takes a long time to compile in release builds. Using a table of
function pointers instead improves the compile time significantly (~3x
speedup in compiling the code in a downstream target). This option will
allow targets to opt into this mode if they desire for better build
times.

Tested with `check-llvm-mc` with the option enabled by default.
2025-06-24 18:49:05 -07:00
Rahul Joshi
376b71442d
[NFC][TableGen][DecoderEmitter] Use structured binding in range for loop (#144890)
Also assign variable names to different elements of `OpMap` for better
readibility, and eliminate `NumberedEncodingsRef` as `std::vector` will
automatically get converted to an `ArrayRef`.
2025-06-20 06:41:48 -07:00
Rahul Joshi
816ab1af0d
[NFCI][TableGen][DecoderEmitter] Cull Op handling when possible (#142974)
TryDecode/CheckPredicate/SoftFail MCD ops are not used by many targets.
Track the set of opcodes that were emitted and emit code for handling
TryDecode/CheckPredicate/SoftFail ops when decoding only if there were
emitted. This is purely eliminating dead code in the generated
`decodeInstruction` function.

This results in the following reduction in the size of the Disassembler
.so files with a release x86_64 release build on Linux:

```
Target                                                   Old Size        New Size  %  reduction
build/lib/libLLVMAArch64Disassembler.so.21.0git             256656          256656          0.00
build/lib/libLLVMAMDGPUDisassembler.so.21.0git              813000          808168          0.59
build/lib/libLLVMARCDisassembler.so.21.0git                  44816           43536          2.86
build/lib/libLLVMARMDisassembler.so.21.0git                 281744          278808          1.04
build/lib/libLLVMAVRDisassembler.so.21.0git                  36040           34496          4.28
build/lib/libLLVMBPFDisassembler.so.21.0git                  26248           23168         11.73
build/lib/libLLVMCSKYDisassembler.so.21.0git                 55960           53632          4.16
build/lib/libLLVMHexagonDisassembler.so.21.0git             115952          113416          2.19
build/lib/libLLVMLanaiDisassembler.so.21.0git                24360           21008         13.76
build/lib/libLLVMLoongArchDisassembler.so.21.0git            58584           56168          4.12
build/lib/libLLVMM68kDisassembler.so.21.0git                 57264           53880          5.91
build/lib/libLLVMMSP430Disassembler.so.21.0git               28896           28440          1.58
build/lib/libLLVMMipsDisassembler.so.21.0git                123128          120568          2.08
build/lib/libLLVMPowerPCDisassembler.so.21.0git              80656           78096          3.17
build/lib/libLLVMRISCVDisassembler.so.21.0git               154080          150200          2.52
build/lib/libLLVMSparcDisassembler.so.21.0git                42040           39568          5.88
build/lib/libLLVMSystemZDisassembler.so.21.0git              97056           94552          2.58
build/lib/libLLVMVEDisassembler.so.21.0git                   83944           81352          3.09
build/lib/libLLVMWebAssemblyDisassembler.so.21.0git          25280           25280          0.00
build/lib/libLLVMX86Disassembler.so.21.0git                2920624         2920624          0.00
build/lib/libLLVMXCoreDisassembler.so.21.0git                48320           44288          8.34
build/lib/libLLVMXtensaDisassembler.so.21.0git               42248           35840         15.17
```
2025-06-17 06:21:21 -07:00
Jay Foad
39ad3151e0
[TableGen] Use default member initializers. NFC. (#144349)
Automated with clang-tidy -fix -checks=-*,modernize-use-default-member-init
2025-06-16 15:26:47 +01:00
Rahul Joshi
7005a76638
[NFC][TableGen] Print DecodeIdx for DecodeOps in DecoderEmitter (#142963)
Print DecodeIdx associated with Decode MCD ops in the generated decoder
tables. This can help in debugging decode failures by first mapping the
Op -> DecodeIdx and then inspecting the code in `decodeToMCInst`
associated with that DecodeIdx.
2025-06-05 21:57:26 -07:00
Rahul Joshi
e53ccb78e4
[LLVM][MC] Introduce OrFail variants of MCD ops (#138614)
Introduce `OrFail` variants for all MCD Decoder Ops that have
`NumToSKip` encoded with them. This is intended to capture the common
case of jumps to the end of the decoder table which has a `OP_Fail` at
the end. Using the `OrFail` variants of these ops avoid encoding the
`NumToSkip` jump offset for these cases, resulting in a reduction in the
size of the decoder tables (from 5 - 17%). Additionally, for the AArch64
target, the table size reduces enough to switch to using 2-byte
`NumToSkip` encoding instead of existing 3-bytes, resulting in a net 30%
reduction in the size of the decoder table.

The total reduction in the size of the decoder tables for different
targets is as follows (computed using the following command: `for i in
*.inc; do echo -n ``basename $i: ``; grep "MCD::OPC_Fail," $i | awk
'{sum += $2} END { print sum}'; done`)

```
Target         Old Size   New Size   % Reduction
================================================
AArch64           153268     106987       30.20
AMDGPU            412056     340856       17.28
ARC                 5061       4605        9.01
ARM                73831      60847       17.59
AVR                 1306       1158       11.33
BPF                 1927       1795        6.85
CSKY                8692       6922       20.36
Hexagon            41965      34759       17.17
Lanai                982        924        5.91
LoongArch          21629      20035        7.37
M68k               13461      11689       13.16
MSP430              3716       3384        8.93
Mips               31415      25771       17.97
PPC                28931      24771       14.38
RISCV              34800      28352       18.53
Sparc               7432       6236       16.09
SystemZ            32248      29716        7.85
VE                 42873      36923       13.88
XCore               2316       2196        5.18
Xtensa              3443       2793       18.88
```
2025-06-05 06:17:50 -07:00