ld.lld SHF_MERGE|SHF_STRINGS duplicate elimination is computation heavy
and utilitizes llvm::xxHash64, a simplified version of XXH64.
Externally many sources confirm that a new variant XXH3 is much faster.
I have picked a few hash implementations and computed the
proportion of time spent on hashing in the overall link time (a debug
build of clang 16 on a machine using AMD Zen 2 architecture):
* llvm::xxHash64: 3.63%
* official XXH64 (`#define XXH_VECTOR XXH_SCALAR`): 3.53%
* official XXH3_64bits (`#define XXH_VECTOR XXH_SCALAR`): 1.21%
* official XXH3_64bits (default, essentially `XXH_SSE2`): 1.22%
* this patch llvm::xxh3_64bits: 1.19%
The remaining part of lld remains unchanged. Consequently, a lower ratio
indicates that hashing is faster. Therefore, it is evident that XXH3 from xxhash
is significantly faster than both the official version and our llvm::xxHash64.
(
string length: count
1-3: 393434
4-8: 2084056
9-16: 2846249
17-128: 5598928
129-240: 1317989
241-: 328058
)
This patch adds heavily simplified https://github.com/Cyan4973/xxHash,
taking account of many simplification ideas from Devin Hussey's xxhash-clean.
Important x86-64 optimization ideas:
* Make XXH3_len_129to240_64b and XXH3_hashLong_64b noinline
* Unroll XXH3_len_17to128_64b
* __restrict does not affect Clang code generation
Beside SHF_MERGE|SHF_STRINGS duplicate elimination, llvm/ADT/StringMap.h
StringMapImpl::LookupBucketFor and a few places in lld can potentially be
accelerated by switching to llvm::xxh3_64bits.
Link: https://github.com/llvm/llvm-project/issues/63750
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D154812
This is quite silly, but casting to uintptr_t seems like the easiest
option to quiet ubsan.
llvm/lib/Support/xxhash.cpp:107:12: runtime error: applying non-zero offset 8 to null pointer
#0 0x7fe3660404c0 in llvm::xxHash64(llvm::StringRef) llvm/lib/Support/xxhash.cpp:107:12
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636