llvm-project

Author	SHA1	Message	Date
S. VenkataKeerthy	9e0d3bcbf2	[IR2Vec] Restrict caching only to Flow-Aware computation (#162559 ) Removed all the caching maps (BB, Inst) in `Embedder` as we don't want to cache embeddings in general. Our earlier experiments on Symbolic embeddings show recomputation of embeddings is cheaper than cache lookups. OTOH, Flow-Aware embeddings would benefit from instruction level caching, as computing the embedding for an instruction would depend on the embeddings of other instructions in a function. So, retained instruction embedding caching logic only for Flow-Aware computation. This also necessitates an `invalidate` method that would clean up the cache when the embeddings would become invalid due to transformations.	2025-10-09 15:59:52 -07:00
S. VenkataKeerthy	33e6a9ae41	[IR2Vec] Added fixme for cyclic dependency in Flow-Aware embedding computation (#162522 )	2025-10-08 12:13:15 -07:00
S. VenkataKeerthy	3491738601	[NFC][IR2Vec] Reinitialize Function Vectors (#162165 )	2025-10-06 15:26:27 -07:00
S. VenkataKeerthy	79d1524bde	[NFC][IR2Vec] Moving `parseVocabSection()` to `VocabStorage` (#161711 )	2025-10-02 16:35:12 -07:00
Kazu Hirata	ac0e99e191	[Analysis] Fix a warning This patch fixes: llvm/lib/Analysis/IR2Vec.cpp:289:14: error: unused variable 'allSameDim' [-Werror,-Wunused-variable]	2025-10-01 20:06:15 -07:00
S. VenkataKeerthy	ed1d9548b5	[IR2Vec] Refactor vocabulary to use section-based storage (#158376 ) Refactored IR2Vec vocabulary and introduced IR (semantics) agnostic `VocabStorage` - `Vocabulary` has-a `VocabStorage` - `Vocabulary` deals with LLVM IR specific entities. This would help in efficient reuse of parts of the logic for MIR. - Storage uses a section-based approach instead of a flat vector, improving organization and access patterns.	2025-10-01 17:13:13 -07:00
S. VenkataKeerthy	52b1850759	[IR2Vec] Add support for Cmp predicates in vocabulary and embeddings (#156952 ) Comparison predicates (equal, not equal, greater than, etc.) provide important semantic information about program behavior. Previously, IR2Vec only captured that a comparison was happening but not what kind of comparison it was. This PR extends the IR2Vec vocabulary to include comparison predicates (ICmp and FCmp) as part of the embedding space. Following are the changes: 1. Expand the vocabulary slot layout to include predicate entries after opcodes, types, and operands 2. Add methods to handle predicate embedding lookups and conversions 3. Update the embedder implementations to include predicate information when processing CmpInst instructions 4. Update test files to include the new predicate entries in the vocabulary (Tracking issues: #141817, #141833)	2025-10-01 16:39:22 -07:00
Mircea Trofin	96c1201504	[nfc][ir2vec] Remove `Valid` field (#157132 ) It is tied to the vocab having had been set. Checking that vector's `emtpy` is sufficient. Less state to track (for a maintainer)	2025-09-07 11:26:23 -07:00
S. VenkataKeerthy	5877baf016	[NFC][IR2Vec] Initialize Embedding vectors with zeros by default (#155690 ) Initialize `Embedding` vectors with zeros by default when only size is provided.	2025-09-04 13:31:11 -07:00
S. VenkataKeerthy	27f06dd6a7	[NFC][IR2Vec] Change getSlotIndex parameter from Value* to Value& (#155700 )	2025-08-29 16:54:04 -07:00
S. VenkataKeerthy	45c5498573	[IR2Vec] Refactor vocabulary to use canonical type IDs (#155323 ) Refactor IR2Vec vocabulary to use canonical type IDs, improving the embedding representation for LLVM IR types. The previous implementation used raw Type::TypeID values directly in the vocabulary, which led to redundant entries (e.g., all float variants mapped to "FloatTy" but had separate slots). This change improves the vocabulary by: 1. Making the type representation more consistent by properly canonicalizing types 2. Reducing vocabulary size by eliminating redundant entries 3. Improving the embedding quality by ensuring similar types share the same representation (Tracking issue - #141817)	2025-08-29 14:56:56 -07:00
S. VenkataKeerthy	3f8081d350	[IR2Vec] Make IR2VecCategory externally visible and reuse in llvm-ir2vec cl options (#153089 ) Consolidate IR2Vec option categories to use a single shared category across the library and tool. With this change the cl options defined in IR2Vec.cpp are visible in llvm-ir2vec tool. This is necessary as we use the same options in the tool.	2025-08-29 11:49:11 -07:00
S. VenkataKeerthy	cdf5f4770b	[NFC] Fix warning in IR2Vec Embedder creation in printer pass (#155917 ) Fixes the warning `default label in switch which covers all enumeration values [-Wcovered-switch-default]`	2025-08-28 15:25:17 -07:00
S. VenkataKeerthy	0da0289e46	[IR2Vec] Add support for flow-aware embeddings (#152613 ) This patch introduces support for Flow-Aware embeddings in IR2Vec, which capture data flow information in addition to symbolic representations.	2025-08-28 12:55:44 -07:00
S. VenkataKeerthy	61a45d20cf	[IR2Vec][NFC] Add helper methods for numeric ID mapping in Vocabulary (#149212 ) Add helper methods to IR2Vec's Vocabulary class for numeric ID mapping and vocabulary size calculation. These APIs will be useful in triplet generation for `llvm-ir2vec` tool (See #149214). (Tracking issue - #141817)	2025-07-17 13:40:51 -07:00
S. VenkataKeerthy	fad0fbc937	[NFC][IR2Vec] Fix warnings on MSVC compilation (#148911 )	2025-07-15 10:54:00 -07:00
S. VenkataKeerthy	ec90786ad1	[NFC][IR2Vec] Exposing helpers in IR2Vec Vocabulary (#147841 ) Minor refactoring IR2Vec vocabulary. This would help in upcoming PRs related to the IR2Vec tool. (Tracking issue - #141817)	2025-07-14 16:38:50 -07:00
S. VenkataKeerthy	8ae8b50d36	[NFC][IR2Vec] Minor refactoring of opcode access in vocabulary (#147585 ) Refactored IR2Vec vocabulary handling to improve code organization and error handling. This would help in upcoming PRs related to the IR2Vec tool. (Tracking issue - #141817)	2025-07-14 16:35:24 -07:00
Kazu Hirata	a73aa721d6	[Analysis] Fix a warning This patch fixes: llvm/lib/Analysis/IR2Vec.cpp:280:3: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]	2025-07-14 13:18:52 -07:00
S. VenkataKeerthy	e86bd05bdc	[IR2Vec] Restructuring Vocabulary (#145119 ) This PR restructures the vocabulary. * String based look-ups are removed. Vocabulary is changed from a map to vector. (#141832) * Grouped all the vocabulary related methods under a single class - `ir2vec::Vocabulary`. This replaces `IR2VecVocabResult`. * `ir2vec::Vocabulary` effectively abstracts out the _layout_ and other internal details of the vector structure. Exposes necessary APIs for accessing the Vocabulary. These changes ensure that _all_ known opcodes and types are present in the vocabulary. We have retained the original operands. This can be extended going forward. (Tracking issue - #141817)	2025-07-14 11:07:29 -07:00
S. VenkataKeerthy	119292c40b	[IR2Vec] Add out-of-place arithmetic operators to Embedding class (#145118 ) This PR adds out-of-place arithmetic operators (`+`, `-`, ``) to the `Embedding` class in IR2Vec, complementing the existing in-place operators (`+=`, `-=`, `=`). Tests have been added to verify the functionality of these new operators. (Tracking issue - #141817)	2025-07-01 12:09:54 -07:00
S. VenkataKeerthy	0a69c83421	[NFC][IR2Vec] Remove unreachable code and simplify invalid mode test (#146459 ) The code following `llvm_unreachable` is optimized out in Release builds. In this case, `Embedder::create` do not seem to return `nullptr` causing `CreateInvalidMode` test to break. Hence removing `llvm_unreachable`.	2025-06-30 20:31:47 -07:00
S. VenkataKeerthy	9438048816	[IR2Vec] Simplifying creation of Embedder (#143999 ) This change simplifies the API by removing the error handling complexity. - Changed `Embedder::create()` to return `std::unique_ptr<Embedder>` directly instead of `Expected<std::unique_ptr<Embedder>>` - Updated documentation and tests to reflect the new API - Added death test for invalid IR2Vec kind in debug mode - In release mode, simply returns nullptr for invalid kinds instead of creating an error (Tracking issue - #141817)	2025-06-30 18:24:08 -07:00
Kazu Hirata	56739f5866	[Analysis] Fix a warning This patch fixes: llvm/lib/Analysis/IR2Vec.cpp:296:2: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-semi]	2025-06-30 14:18:03 -07:00
S. VenkataKeerthy	0745eb501d	[IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (#143986 ) Changes to scale opcodes, types and args once in `IR2VecVocabAnalysis` so that we can avoid scaling each time while computing embeddings. This PR refactors the vocabulary to explicitly define 3 sections---Opcodes, Types, and Arguments---used for computing Embeddings. (Tracking issue - #141817 ; partly fixes - #141832)	2025-06-30 23:09:19 +02:00
S. VenkataKeerthy	e29bb9a038	[IR2Vec] Consider only reachable BBs and non-debug instructions (#143476 ) Changes to consider BBs that are reachable from the entry block. Similarly we skip debug instruction while computing the embeddings. (Tracking issue - #141817)	2025-06-17 10:57:52 -07:00
S. VenkataKeerthy	09c54c2e9e	[IR2Vec] Minor vocab changes and exposing weights (#143200 ) This PR changes some asserts in Vocab to hard checks that emit error and exposes flags and constructor to help in unit tests. (Tracking issue - #141817)	2025-06-13 10:43:22 -07:00
S. VenkataKeerthy	32649e017e	[IR2Vec] Exposing Embedding as an data type wrapped around std::vector<double> (#143197 ) Currently `Embedding` is `std::vector<double>`. This PR makes it a data type wrapped around `std::vector<double>` to overload basic arithmetic operators and expose comparison operations. It _simplifies_ the usage here and in the passes where operations on `Embedding` would be performed. (Tracking issue - #141817)	2025-06-10 15:12:16 -07:00
S. VenkataKeerthy	741136a8ac	[NFC][IR2Vec] Removing Dimension from `Embedder::Create` (#142486 ) This PR removes the necessity to know the dimension of the embeddings while invoking `Embedder::Create`. Having the `Dimension` parameter introduces complexities in downstream consumers. (Tracking issue - #141817)	2025-06-02 15:05:11 -07:00
S. VenkataKeerthy	494c82e709	[IR2Vec] Support for lazy computation of BB Embeddings (#142033 ) This PR exposes interfaces to compute embeddings at BB level. This would be necessary for delta patching the embeddings in MLInliner (#141836). (Tracking issue - #141817)	2025-05-29 15:29:55 -07:00
S. VenkataKeerthy	e74b45e078	[IR2Vec] Adding unit tests (#141873 ) This PR adds unit tests for IR2Vec (Tracking issue - #141817)	2025-05-29 13:35:29 -07:00
S. VenkataKeerthy	3581e9bb4c	[NFC][IR2Vec] Refactoring for Stateless Embedding Computation (#141811 ) Currently, users have to invoke two APIs: `computeEmbeddings()` followed by getters to access the embeddings. This PR refactors the code to reduce this stateful access of APIs. Users can now directly invoke getters; Internally, getters would compute the embeddings.	2025-05-28 12:19:02 -07:00
Kazu Hirata	0918361d8b	[Analysis] Remove unused includes (NFC) (#141319 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-05-23 23:59:56 -07:00
Kazu Hirata	30a9d9d25b	[Analysis] Fix warnings This patch fixes: llvm/lib/Analysis/IR2Vec.cpp:76:3: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] llvm/lib/Analysis/IR2Vec.cpp:218:12: error: unused variable 'Dim' [-Werror,-Wunused-variable]	2025-05-22 10:12:25 -07:00
S. VenkataKeerthy	58ab005d8d	Adding IR2Vec as an analysis pass (#134004 ) This PR introduces IR2Vec as an analysis pass. The changes include: - Logic for generating Symbolic encodings. - 75D learned vocabulary. - lit tests. Here is the link to the RFC - https://discourse.llvm.org/t/rfc-enhancing-mlgo-inlining-with-ir2vec-embeddings Acknowledgements: contributors - https://github.com/IITH-Compilers/IR2Vec/graphs/contributors --------- Co-authored-by: svkeerthy <venkatakeerthy@google.com> Co-authored-by: Mircea Trofin <mtrofin@google.com>	2025-05-22 09:50:21 -07:00

35 Commits