Line ending policies were changed in the parent, dccebddb3b80. To make
it easier to resolve downstream merge conflicts after line-ending
policies are adjusted this is a separate whitespace-only commit. If you
have merge conflicts as a result, you can simply `git add --renormalize
-u && git merge --continue` or `git add --renormalize -u && git rebase
--continue` - depending on your workflow.
In 362da640dd18 "[SandboxIR][Bench] Test RAUW (#107440)" we started
using std::stringstream, but didn't include `<sstream>` for the
definition. This is resulting in a build failure on windows.
This patch adds a new benchmark suite for SandboxIR. It measures the
performance of some of the most commonly used API functions and compares
it against LLVM IR.
Change the "fixed encoding" table used for encoding intrinsic
type signature to use 16-bit encoding as opposed to 32-bit.
This results in both space and time improvements. For space,
the total static storage size (in bytes) of this info reduces by 50%:
- Current = 14193*4 (Fixed table) + 16058 + 3 (Long Table) = 72833
- New size = 14193*2 (Fixed table) + 19879 + 3 (Long Table) = 48268.
- Reduction = 50.9%
For time, with the added benchmark, we see a 7.3% speedup in
`GetIntrinsicInfoTableEntries` benchmark. Actual output of the
benchmark in included in the GitHub MR.
Change formatv() to validate that the number of arguments passed matches
number of replacement fields in the format string, and that the replacement
indices do not contain holes.
To support cases where this cannot be guaranteed, introduce a formatv()
overload that allows disabling validation with a bool flag as its first argument.
Rework `IntrinsicEmitter::EmitIntrinsicToBuiltinMap` for improved
peformance as well as refactor the code.
Performance:
- Current generated code does a linear search on the TargetPrefix,
followed by a binary search on the builtin names for that
target's builtins.
- Improve the performance of this code in 2 ways:
(a) Use binary search on the target prefix to lookup the builtin
table for the target.
(b) Improve the (common) case of when all builtins for a target
share a common prefix. Check this common prefix first, and
then do the binary search in the builtin table using the builtin
name with the common prefix removed. This should help
both data size (by creating a smaller static string table) and
runtime (by reducing the cost of binary search on smaller
strings).
Refactor:
- Use range based for loops for iterating over maps.
- Use formatv() and C++ raw string literals to simplify the emission
code.
- Change the generated `getIntrinsicForClangBuiltin` and
`getIntrinsicForMSBuiltin` to take a `StringRef` instead of
`const char *` for the prefix.
Compared to the generic scalar code, using Arm NEON instructions yields
a ~11x speedup: 31 vs 339.5 ms to hash 1 GiB of random data on the Apple
M1.
This follows the upstream implementation closely, with some
simplifications made:
- Removed workarounds for suboptimal codegen on older GCC
- Removed instruction reordering barriers which seem to have a
negligible impact according to my measurements
- We do not support WebAssembly's mostly NEON-compatible API
- There is no configurable mixing of SIMD and scalar code; according to
the upstream comments, this is only relevant for smaller Cortex cores
which can dispatch relatively few NEON micro-ops per cycle.
This commit intends to use only standard ACLE intrinsics and datatypes,
so it should build with all supported versions of GCC, Clang and MSVC.
This feature is enabled by default when targeting AArch64, but the
`LLVM_XXH_USE_NEON=0` macro can be set to explicitly disable it.
XXH3 is used for ICF, string deduplication and computing the UUID in
ld64.lld; this commit results in a -1.77% +/- 0.59% speed improvement
for a `--threads=8` link of Chromium.framework.
This patch pulls google/benchmark v1.4.1 into the LLVM tree so that any
project could use it for benchmark generation. A dummy benchmark is
added to `llvm/benchmarks/DummyYAML.cpp` to validate the correctness of
the build process.
The current version does not utilize LLVM LNT and LLVM CMake
infrastructure, but that might be sufficient for most users. Two
introduced CMake variables:
* `LLVM_INCLUDE_BENCHMARKS` (`ON` by default) generates benchmark
targets
* `LLVM_BUILD_BENCHMARKS` (`OFF` by default) adds generated
benchmark targets to the list of default LLVM targets (i.e. if `ON`
benchmarks will be built upon standard build invocation, e.g. `ninja` or
`make` with no specific targets)
List of modifications:
* `BENCHMARK_ENABLE_TESTING` is disabled
* `BENCHMARK_ENABLE_EXCEPTIONS` is disabled
* `BENCHMARK_ENABLE_INSTALL` is disabled
* `BENCHMARK_ENABLE_GTEST_TESTS` is disabled
* `BENCHMARK_DOWNLOAD_DEPENDENCIES` is disabled
Original discussion can be found here:
http://lists.llvm.org/pipermail/llvm-dev/2018-August/125023.html
Reviewed by: dberris, lebedev.ri
Subscribers: ilya-biryukov, ioeric, EricWF, lebedev.ri, srhines,
dschuff, mgorny, krytarowski, fedor.sergeev, mgrang, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50894
llvm-svn: 340809