[SPGO] Use std::hash instead of MD5 to avoid run time regression in llvm-profgen (#180581)

https://github.com/llvm/llvm-project/pull/66164 changed the hashing in `SampleContextFrame` from `std::hash` to `MD5` in a very hot function (ContextTrieNode::getOrCrateChildContext()) in llvm-profgen. This creates over 2x run time regression when running llvm-profgen with csspgo preinliner enabled, since the MD5 computation is tripled comparing to the Murmur hash in the std library. An llvm-profgen run time comparison shows follows: ``` $ time llvm-profgen -binary $BINARY--perfscript $SAMPLES --populate-profile-symbol-list --show-density --output=XXX # MD5 hash real 105m31.644s user 104m51.334s sys 0m35.033s # std::hash real 46m0.340s user 45m17.998s sys 0m38.420s ``` Can confirm that this patch recovers the run time regression in llvm-profgen, and the perf testing in our internal services shows neutral.
2026-02-09 11:41:40 -10:00 · 2026-02-09 11:41:40 -10:00 · 37c3241d23
commit 37c3241d23
parent fbed673f52
1 changed files with 7 additions and 1 deletions
--- a/llvm/include/llvm/ProfileData/SampleProf.h
+++ b/llvm/include/llvm/ProfileData/SampleProf.h
@ -522,7 +522,13 @@ struct SampleContextFrame {
  }

  uint64_t getHashCode() const {
-    uint64_t NameHash = Func.getHashCode();
+    // Context frame hash is heavily used in llvm-profgen context-sensitive
+    // pre-inliner. Use a lightweight hashing here to avoid speed regression.
+    uint64_t NameHash = 0;
+    if (Func.isStringRef())
+      NameHash = std::hash<std::string>{}(Func.str());
+    else
+      NameHash = Func.getHashCode();
    uint64_t LocId = Location.getHashCode();
    return NameHash + (LocId << 5) + LocId;
  }