[SPGO] Use std::hash instead of MD5 to avoid run time regression in llvm-profgen (#180581)

https://github.com/llvm/llvm-project/pull/66164 changed the hashing in
`SampleContextFrame` from `std::hash` to `MD5` in a very hot function
(ContextTrieNode::getOrCrateChildContext()) in llvm-profgen. This
creates over 2x run time regression when running llvm-profgen with
csspgo preinliner enabled, since the MD5 computation is tripled
comparing to the Murmur hash in the std library. An llvm-profgen run
time comparison shows follows:

```
$ time llvm-profgen -binary $BINARY--perfscript $SAMPLES --populate-profile-symbol-list --show-density --output=XXX

# MD5 hash
real    105m31.644s
user    104m51.334s
sys     0m35.033s

# std::hash
real    46m0.340s
user    45m17.998s
sys     0m38.420s
```

Can confirm that this patch recovers the run time regression in
llvm-profgen, and the perf testing in our internal services shows
neutral.
This commit is contained in:
HighW4y2H3ll 2026-02-09 11:41:40 -10:00 committed by GitHub
parent fbed673f52
commit 37c3241d23
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -522,7 +522,13 @@ struct SampleContextFrame {
}
uint64_t getHashCode() const {
uint64_t NameHash = Func.getHashCode();
// Context frame hash is heavily used in llvm-profgen context-sensitive
// pre-inliner. Use a lightweight hashing here to avoid speed regression.
uint64_t NameHash = 0;
if (Func.isStringRef())
NameHash = std::hash<std::string>{}(Func.str());
else
NameHash = Func.getHashCode();
uint64_t LocId = Location.getHashCode();
return NameHash + (LocId << 5) + LocId;
}