35 Commits

Author SHA1 Message Date
S. VenkataKeerthy
9e0d3bcbf2
[IR2Vec] Restrict caching only to Flow-Aware computation (#162559)
Removed all the caching maps (BB, Inst) in `Embedder` as we don't want
to cache embeddings in general. Our earlier experiments on Symbolic
embeddings show recomputation of embeddings is cheaper than cache
lookups.

OTOH, Flow-Aware embeddings would benefit from instruction level
caching, as computing the embedding for an instruction would depend on
the embeddings of other instructions in a function. So, retained
instruction embedding caching logic only for Flow-Aware computation.
This also necessitates an `invalidate` method that would clean up the
cache when the embeddings would become invalid due to transformations.
2025-10-09 15:59:52 -07:00
S. VenkataKeerthy
33e6a9ae41
[IR2Vec] Added fixme for cyclic dependency in Flow-Aware embedding computation (#162522) 2025-10-08 12:13:15 -07:00
S. VenkataKeerthy
3491738601
[NFC][IR2Vec] Reinitialize Function Vectors (#162165) 2025-10-06 15:26:27 -07:00
S. VenkataKeerthy
79d1524bde
[NFC][IR2Vec] Moving parseVocabSection() to VocabStorage (#161711) 2025-10-02 16:35:12 -07:00
Kazu Hirata
ac0e99e191 [Analysis] Fix a warning
This patch fixes:

  llvm/lib/Analysis/IR2Vec.cpp:289:14: error: unused variable
  'allSameDim' [-Werror,-Wunused-variable]
2025-10-01 20:06:15 -07:00
S. VenkataKeerthy
ed1d9548b5
[IR2Vec] Refactor vocabulary to use section-based storage (#158376)
Refactored IR2Vec vocabulary and introduced IR (semantics) agnostic `VocabStorage`
- `Vocabulary` *has-a* `VocabStorage`
- `Vocabulary` deals with LLVM IR specific entities. This would help in efficient reuse of parts of the logic for MIR.
- Storage uses a section-based approach instead of a flat vector, improving organization and access patterns.
2025-10-01 17:13:13 -07:00
S. VenkataKeerthy
52b1850759
[IR2Vec] Add support for Cmp predicates in vocabulary and embeddings (#156952)
Comparison predicates (equal, not equal, greater than, etc.) provide important semantic information about program behavior. Previously, IR2Vec only captured that a comparison was happening but not what kind of comparison it was. This PR extends the IR2Vec vocabulary to include comparison predicates (ICmp and FCmp) as part of the embedding space.

Following are the changes:
1. Expand the vocabulary slot layout to include predicate entries after opcodes, types, and operands
2. Add methods to handle predicate embedding lookups and conversions
3. Update the embedder implementations to include predicate information when processing CmpInst instructions
4. Update test files to include the new predicate entries in the vocabulary

(Tracking issues: #141817, #141833)
2025-10-01 16:39:22 -07:00
Mircea Trofin
96c1201504
[nfc][ir2vec] Remove Valid field (#157132)
It is tied to the vocab having had been set. Checking that vector's
`emtpy` is sufficient. Less state to track (for a maintainer)
2025-09-07 11:26:23 -07:00
S. VenkataKeerthy
5877baf016
[NFC][IR2Vec] Initialize Embedding vectors with zeros by default (#155690)
Initialize `Embedding` vectors with zeros by default when only size is provided.
2025-09-04 13:31:11 -07:00
S. VenkataKeerthy
27f06dd6a7
[NFC][IR2Vec] Change getSlotIndex parameter from Value* to Value& (#155700) 2025-08-29 16:54:04 -07:00
S. VenkataKeerthy
45c5498573
[IR2Vec] Refactor vocabulary to use canonical type IDs (#155323)
Refactor IR2Vec vocabulary to use canonical type IDs, improving the embedding representation for LLVM IR types.

The previous implementation used raw Type::TypeID values directly in the vocabulary, which led to redundant entries (e.g., all float variants mapped to "FloatTy" but had separate slots). This change improves the vocabulary by:

1. Making the type representation more consistent by properly canonicalizing types
2. Reducing vocabulary size by eliminating redundant entries
3. Improving the embedding quality by ensuring similar types share the same representation

(Tracking issue - #141817)
2025-08-29 14:56:56 -07:00
S. VenkataKeerthy
3f8081d350
[IR2Vec] Make IR2VecCategory externally visible and reuse in llvm-ir2vec cl options (#153089)
Consolidate IR2Vec option categories to use a single shared category across the library and tool.

With this change the cl options defined in IR2Vec.cpp are visible in llvm-ir2vec tool. This is necessary as we use the same options in the tool.
2025-08-29 11:49:11 -07:00
S. VenkataKeerthy
cdf5f4770b
[NFC] Fix warning in IR2Vec Embedder creation in printer pass (#155917)
Fixes the warning `default label in switch which covers all enumeration
values [-Wcovered-switch-default]`
2025-08-28 15:25:17 -07:00
S. VenkataKeerthy
0da0289e46
[IR2Vec] Add support for flow-aware embeddings (#152613)
This patch introduces support for Flow-Aware embeddings in IR2Vec, which capture data flow information in addition to symbolic representations.
2025-08-28 12:55:44 -07:00
S. VenkataKeerthy
61a45d20cf
[IR2Vec][NFC] Add helper methods for numeric ID mapping in Vocabulary (#149212)
Add helper methods to IR2Vec's Vocabulary class for numeric ID mapping and vocabulary size calculation. These APIs will be useful in triplet generation for `llvm-ir2vec` tool (See #149214). 

(Tracking issue - #141817)
2025-07-17 13:40:51 -07:00
S. VenkataKeerthy
fad0fbc937
[NFC][IR2Vec] Fix warnings on MSVC compilation (#148911) 2025-07-15 10:54:00 -07:00
S. VenkataKeerthy
ec90786ad1
[NFC][IR2Vec] Exposing helpers in IR2Vec Vocabulary (#147841)
Minor refactoring IR2Vec vocabulary. This would help in upcoming PRs related to the IR2Vec tool.

(Tracking issue - #141817)
2025-07-14 16:38:50 -07:00
S. VenkataKeerthy
8ae8b50d36
[NFC][IR2Vec] Minor refactoring of opcode access in vocabulary (#147585)
Refactored IR2Vec vocabulary handling to improve code organization and error handling. This would help in upcoming PRs related to the IR2Vec tool.

(Tracking issue - #141817)
2025-07-14 16:35:24 -07:00
Kazu Hirata
a73aa721d6 [Analysis] Fix a warning
This patch fixes:

  llvm/lib/Analysis/IR2Vec.cpp:280:3: error: default label in switch
  which covers all enumeration values
  [-Werror,-Wcovered-switch-default]
2025-07-14 13:18:52 -07:00
S. VenkataKeerthy
e86bd05bdc
[IR2Vec] Restructuring Vocabulary (#145119)
This PR restructures the vocabulary. 

* String based look-ups are removed. Vocabulary is changed from a map to vector. (#141832)
* Grouped all the vocabulary related methods under a single class - `ir2vec::Vocabulary`. This replaces `IR2VecVocabResult`.
* `ir2vec::Vocabulary` effectively abstracts out the _layout_ and other internal details of the vector structure. Exposes necessary APIs for accessing the Vocabulary. 

These changes ensure that _all_ known opcodes and types are present in the vocabulary. We have retained the original operands. This can be extended going forward. 

(Tracking issue - #141817)
2025-07-14 11:07:29 -07:00
S. VenkataKeerthy
119292c40b
[IR2Vec] Add out-of-place arithmetic operators to Embedding class (#145118)
This PR adds out-of-place arithmetic operators (`+`, `-`, `*`) to the `Embedding` class in IR2Vec, complementing the existing in-place operators (`+=`, `-=`, `*=`). 

Tests have been added to verify the functionality of these new operators.

(Tracking issue - #141817)
2025-07-01 12:09:54 -07:00
S. VenkataKeerthy
0a69c83421
[NFC][IR2Vec] Remove unreachable code and simplify invalid mode test (#146459)
The code following `llvm_unreachable`  is optimized out in Release builds. In this case, `Embedder::create` do not seem to return `nullptr` causing `CreateInvalidMode` test to break. Hence removing `llvm_unreachable`.
2025-06-30 20:31:47 -07:00
S. VenkataKeerthy
9438048816
[IR2Vec] Simplifying creation of Embedder (#143999)
This change simplifies the API by removing the error handling complexity. 

- Changed `Embedder::create()` to return `std::unique_ptr<Embedder>` directly instead of `Expected<std::unique_ptr<Embedder>>`
- Updated documentation and tests to reflect the new API
- Added death test for invalid IR2Vec kind in debug mode
- In release mode, simply returns nullptr for invalid kinds instead of creating an error

(Tracking issue - #141817)
2025-06-30 18:24:08 -07:00
Kazu Hirata
56739f5866 [Analysis] Fix a warning
This patch fixes:

  llvm/lib/Analysis/IR2Vec.cpp:296:2: error: extra ';' outside of a
  function is incompatible with C++98
  [-Werror,-Wc++98-compat-extra-semi]
2025-06-30 14:18:03 -07:00
S. VenkataKeerthy
0745eb501d
[IR2Vec] Scale embeddings once in vocab analysis instead of repetitive scaling (#143986)
Changes to scale opcodes, types and args once in `IR2VecVocabAnalysis` so that we can avoid scaling each time while computing embeddings. This PR refactors the vocabulary to explicitly define 3 sections---Opcodes, Types, and Arguments---used for computing Embeddings. 

(Tracking issue - #141817 ; partly fixes - #141832)
2025-06-30 23:09:19 +02:00
S. VenkataKeerthy
e29bb9a038
[IR2Vec] Consider only reachable BBs and non-debug instructions (#143476)
Changes to consider BBs that are reachable from the entry block. Similarly we skip debug instruction while computing the embeddings.

(Tracking issue - #141817)
2025-06-17 10:57:52 -07:00
S. VenkataKeerthy
09c54c2e9e
[IR2Vec] Minor vocab changes and exposing weights (#143200)
This PR changes some asserts in Vocab to hard checks that emit error and exposes flags and constructor to help in unit tests.

(Tracking issue - #141817)
2025-06-13 10:43:22 -07:00
S. VenkataKeerthy
32649e017e
[IR2Vec] Exposing Embedding as an data type wrapped around std::vector<double> (#143197)
Currently `Embedding` is `std::vector<double>`. This PR makes it a data type wrapped around `std::vector<double>` to overload basic arithmetic operators and expose comparison operations. It _simplifies_ the usage here and in the passes where operations on `Embedding` would be performed.

(Tracking issue - #141817)
2025-06-10 15:12:16 -07:00
S. VenkataKeerthy
741136a8ac
[NFC][IR2Vec] Removing Dimension from Embedder::Create (#142486)
This PR removes the necessity to know the dimension of the embeddings while invoking `Embedder::Create`. Having the `Dimension` parameter introduces complexities in downstream consumers. 

(Tracking issue - #141817)
2025-06-02 15:05:11 -07:00
S. VenkataKeerthy
494c82e709
[IR2Vec] Support for lazy computation of BB Embeddings (#142033)
This PR exposes interfaces to compute embeddings at BB level. This would be necessary for delta patching the embeddings in MLInliner (#141836).

(Tracking issue - #141817)
2025-05-29 15:29:55 -07:00
S. VenkataKeerthy
e74b45e078
[IR2Vec] Adding unit tests (#141873)
This PR adds unit tests for IR2Vec

(Tracking issue - #141817)
2025-05-29 13:35:29 -07:00
S. VenkataKeerthy
3581e9bb4c
[NFC][IR2Vec] Refactoring for Stateless Embedding Computation (#141811)
Currently, users have to invoke two APIs: `computeEmbeddings()` followed
by getters to access the embeddings. This PR refactors the code to
reduce this *stateful* access of APIs. Users can now directly invoke
getters; Internally, getters would compute the embeddings.
2025-05-28 12:19:02 -07:00
Kazu Hirata
0918361d8b
[Analysis] Remove unused includes (NFC) (#141319)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-23 23:59:56 -07:00
Kazu Hirata
30a9d9d25b [Analysis] Fix warnings
This patch fixes:

  llvm/lib/Analysis/IR2Vec.cpp:76:3: error: default label in switch
  which covers all enumeration values
  [-Werror,-Wcovered-switch-default]

  llvm/lib/Analysis/IR2Vec.cpp:218:12: error: unused variable 'Dim'
  [-Werror,-Wunused-variable]
2025-05-22 10:12:25 -07:00
S. VenkataKeerthy
58ab005d8d
Adding IR2Vec as an analysis pass (#134004)
This PR introduces IR2Vec as an analysis pass. The changes include:
- Logic for generating Symbolic encodings.
- 75D learned vocabulary.
- lit tests.

Here is the link to the RFC -
https://discourse.llvm.org/t/rfc-enhancing-mlgo-inlining-with-ir2vec-embeddings

Acknowledgements: contributors -
https://github.com/IITH-Compilers/IR2Vec/graphs/contributors

---------

Co-authored-by: svkeerthy <venkatakeerthy@google.com>
Co-authored-by: Mircea Trofin <mtrofin@google.com>
2025-05-22 09:50:21 -07:00