Add MIR2Vec support to the llvm-ir2vec tool, enabling embedding generation for Machine IR alongside the existing LLVM IR functionality.
(This is an initial integration; Other entity/triplet gen for vocab generation would follow as separate patches)
Handling opcodes in embedding computation.
- Revamped MIR Vocabulary with four sections - `Opcodes`, `Common Operands`, `Physical Registers`, and `Virtual Registers`
- Operands broadly fall into 3 categories -- the generic MO types that are common across architectures, physical and virtual register classes. We handle these categories separately in MIR2Vec. (Though we have same classes for both physical and virtual registers, their embeddings vary).
Implement MIR2Vec embedder for generating vector representations of Machine IR instructions, basic blocks, and functions. This patch introduces changes necessary to *embed* machine opcodes. Machine operands would be handled incrementally in the upcoming patches.
Refactor MIRVocabulary to improve opcode lookup and add Section enum for better organization. This is useful for embedder lookups (next patches)
(Tracking issue - #141817)
This PR introduces the initial infrastructure and vocabulary necessary for generating embeddings for MIR (discussed briefly in the earlier IR2Vec RFC - https://discourse.llvm.org/t/rfc-enhancing-mlgo-inlining-with-ir2vec-embeddings). The MIR2Vec embeddings are useful in driving target specific optimizations that work on MIR like register allocation.
(Tracking issue - #141817)