This is a new attempt of https://reviews.llvm.org/D159481, this time as
GitHub PR.
`GenericOptionValue::compare()` should return `true` for a match.
- `OptionValueBase::compare()` always returns `false` and shouldn't
match anything.
- `OptionValueCopy::compare()` returns `false` if not `Valid` which
corresponds to no match.
Also adding some tests.
Add an option in `SpecialCaseList` to use Globs instead of Regex to match patterns. `GlobPattern` was extended in https://reviews.llvm.org/D153587 to support brace expansions which allows us to use patterns like `*/src/foo.{c,cpp}`. It turns out that most patterns only take advantage of `*` so using Regex was overkill and required lots of escaping in practice. This often led to bugs due to forgetting to escape special characters.
Since this would be a breaking change, we temporarily support Regex by default and use Globs when `#!special-case-list-v2` is the first line in the file. Users should switch to the glob format described in https://llvm.org/doxygen/classllvm_1_1GlobPattern.html. For example, `(abc|def)` should become `{abc,def}`.
See discussion in https://reviews.llvm.org/D152762 and https://discourse.llvm.org/t/use-glob-instead-of-regex-for-specialcaselists/71666.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D154014
This revision supports --print-supported-extensions,
it prints out all of the extensions and corresponding version supported.
Reviewed By: craig.topper, kito-cheng
Differential Revision: https://reviews.llvm.org/D146054
Extend `GlobPattern` to support brace expansions, e.g., `foo.{c,cpp}` as discussed in https://reviews.llvm.org/D152762#4425203.
The high level change was to turn `Tokens` into a list that gets larger when we see a new brace expansion term. Then in `GlobPattern::match()` we must check against each token group.
This is a breaking change since `{` will no longer match a literal without escaping. However, `\{` will match the literal `{` before and after this change. Also, from a brief survey of LLVM, it seems that `GlobPattern` is mostly used for symbol and path matching, which likely won't need `{` in their patterns.
See https://github.com/devongovett/glob-match#syntax for a nice glob reference.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D153587
This corrects a Darwin build failure due to a missing stub and an
environment-specific Linux test failure due to an overly restrictive
test regex. This also backs out of the previous fix attempt; %p is
intended to be printed and parsed with the semantics of #.
When the environment LLVM_ENABLE_SYMBOLIZER_MARKUP is set, if
llvm-symbolizer fails or is disabled, this change will print a backtrace
in llvm-symbolizer markup instead of falling back to in-process
symbolization mechanisms.
This allows llvm-symbolizer to be run on the output later to produce a
high quality backtrace, even for fully-stripped LLVM utilities.
Reviewed By: mcgrathr
Differential Revision: https://reviews.llvm.org/D139750
Unlike isa<> and cast<>, current implementation of dyn_cast<> fails
to process a std::unique_ptr to a class supporting LLVM RTTI:
// A and B support LLVM RTTI
class A {...}
class B: public A {...}
void foo() {
...
auto V = std::make_unique<A>();
auto VB = dyn_cast<B>(std::move(V));
if (VB)
...
...
}
Reviewed By: lattner, bzcheeseman
Differential Revision: https://reviews.llvm.org/D147991
In a Unicode name was stored in a way that caused
a medial hyphen to be at the end of a a chunk, it would not
be properly ignored by the loose matching algorithm.
For example if `LEFT-TO-RIGHT OVERRIDE` was stored as
`LEFT-` [...], the `-` would not be ignored.
The generators now ensures nodes are not cut accross
medial hyphen boundaries.
Fixes#64161
Differential Revision: https://reviews.llvm.org/D156518
The current implementation has two primary issues:
* Matching `a*a*a*b` against `aaaaaa` has exponential complexity.
* BitVector harms data cache and is inefficient for literal matching.
and a minor issue that `\` at the end may cause an out of bounds access
in `StringRef::operator[]`.
Switch to an O(|S|*|P|) greedy algorithm instead: factor the pattern
into segments split by '*'. The segment is matched sequentianlly by
finding the first occurrence past the end of the previous match. This
algorithm is used by lots of fnmatch implementations, including musl and
NetBSD's.
In addition, `optional<StringRef> Exact, Suffix, Prefix` wastes space.
Instead, match the non-metacharacter prefix against the haystack, then
match the pattern with the rest.
In practice `*suffix` style patterns are less common and our new
algorithm is fast enough, so don't bother storing the non-metacharacter
suffix.
Note: brace expansions (D153587) can leverage the `matchOne` function.
Differential Revision: https://reviews.llvm.org/D156046
This restores D132311, which was reverted in
29c841ce93e087fa4e0c5f3abae94edd460bc24a (Sep 2022) due to certain files
not buildable with GCC 7.3.0. The previous attempt was reverted by
6cd9608fb37ca2418fb44b57ec955bb5efe10689 (Dec 2020).
This time, GCC 7.3.0 has existing build errors for a long time due to
structured bindings for many files, e.g.
```
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9098:13: error: cannot decompose class type ‘std::pair<llvm::Value*, const llvm::SCEV*>’: both it and it
s base class ‘std::pair<llvm::Value*, const llvm::SCEV*>’ have non-static data members
for (auto [_, Stride] : Legal->getLAI()->getSymbolicStrides()) {
^~~~~~~~~~~
```
... and also some `error: duplicate initialization of` instances due to llvm/Transforms/IPO/Attributor.h.
---
GCC 7.5.0 has a bug that, without this change, certain `SmallVector` with a `std::pair` element type like `SmallVector<std::pair<Instruction * const, Info>, 0> X;` lead to spurious
```
/tmp/opt/gcc-7.5.0/include/c++/7.5.0/type_traits:878:48: error: constructor required before non-static data member for ‘...’ has been parsed
```
Switching to std::is_trivially_{copy/move}_constructible fixes the error.
Similar to D156016 for MapVector.
This brings back commit fae7b98c221b5b28797f7b56b656b6b819d99f27 with a
fix to llvm/unittests/Support/ThreadPool.cpp's `_WIN32` code path.
Some ISAInfo unittest base on experimental-zihintntl, but zihintntl will become none-experimental. This patch use another experimental extension zicond to replace zihintntl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D155673
ld.lld SHF_MERGE|SHF_STRINGS duplicate elimination is computation heavy
and utilitizes llvm::xxHash64, a simplified version of XXH64.
Externally many sources confirm that a new variant XXH3 is much faster.
I have picked a few hash implementations and computed the
proportion of time spent on hashing in the overall link time (a debug
build of clang 16 on a machine using AMD Zen 2 architecture):
* llvm::xxHash64: 3.63%
* official XXH64 (`#define XXH_VECTOR XXH_SCALAR`): 3.53%
* official XXH3_64bits (`#define XXH_VECTOR XXH_SCALAR`): 1.21%
* official XXH3_64bits (default, essentially `XXH_SSE2`): 1.22%
* this patch llvm::xxh3_64bits: 1.19%
The remaining part of lld remains unchanged. Consequently, a lower ratio
indicates that hashing is faster. Therefore, it is evident that XXH3 from xxhash
is significantly faster than both the official version and our llvm::xxHash64.
(
string length: count
1-3: 393434
4-8: 2084056
9-16: 2846249
17-128: 5598928
129-240: 1317989
241-: 328058
)
This patch adds heavily simplified https://github.com/Cyan4973/xxHash,
taking account of many simplification ideas from Devin Hussey's xxhash-clean.
Important x86-64 optimization ideas:
* Make XXH3_len_129to240_64b and XXH3_hashLong_64b noinline
* Unroll XXH3_len_17to128_64b
* __restrict does not affect Clang code generation
Beside SHF_MERGE|SHF_STRINGS duplicate elimination, llvm/ADT/StringMap.h
StringMapImpl::LookupBucketFor and a few places in lld can potentially be
accelerated by switching to llvm::xxh3_64bits.
Link: https://github.com/llvm/llvm-project/issues/63750
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D154812
According to the spec, Zce is an alias for Zca, Zcb, Zcmp, and Zcmt.
If F is enabled on RV32 it also includes Zcf.
This patch adds the Zce and the implication rule which unfortunately
requires custom handling for adding Zcf.
I've also made all the Zc* extensions imply Zca.
I've also added an error for Zcf without RV32.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D153742
A follow-up to D135841. This patch returns the virtual path for directories from `RedirectingFileSystem`. This ensures the contents of `Path` are the same as the contents of `FS->getRealPath(Path)`. This also means we can drop the workaround in Clang's module map canonicalization, where we couldn't use the real path for a directory if it resolved to a different `DirectoryEntry`. In addition to that, we can also avoid introducing new workaround for a bug triggered by the newly introduced test case.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D135849
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
This is fixing all files missed in b0abd4893fa1.
Differential Revision: https://reviews.llvm.org/D154543
We're in favor of the llvm::writeToOutput API, and all
writeFileAtomically usages have been migrated to writeToOutput.
Differential Revision: https://reviews.llvm.org/D153740
This was discussed somewhat in D148315. As it stands, we require in
RISCVISAInfo::parseArchString (used for e.g. -march parsing in Clang)
that extensions are given in the order of z, then s, then x prefixed
extensions (after the standard single-letter extensions). However, we
recently (in D148315) moved to that order from z/x/s as the canonical
ordering was changed in the spec. In addition, recent GCC seems to
require z* extensions before s*.
My recollection of the history here is that we thought keeping -march as
close to the rules for ISA naming strings as possible would simplify
things, as there's an existing spec to point to. My feeling is that now
we've had incompatible changes, and an incompatibility with GCC there's
no real benefit to sticking to this restriction, and it risks making it
much more painful than it needs to be to copy a -march= string between
GCC and Clang.
This patch removes all ordering restrictions so you can freely mix x/s/z
extensions.
To be very explicit, this doesn't change our behaviour when emitting a
canonically ordered extension string (e.g. in build attributes). We of
course sort according to the canonical order (as we understand it) in
that case.
Differential Revision: https://reviews.llvm.org/D149246
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
We already had an error for Zcmt though it appears to be untested
Add similar one for Zcmp along with tests for both.
Factor the code to share the strings as much as possible.
Reviewed By: VincentWu
Differential Revision: https://reviews.llvm.org/D153159
`TrigramIndex` was added back in https://reviews.llvm.org/D27188 as an optimization to make `SpecialCaseList::match()` faster. I've found that `TrigramIndex` actually makes the function slower and it has no functional use, so we can remove it.
I grabbed the list of queries passed to `SpecialCaseList::match()` on a random very large file (`AArch64ISelLowering.cpp`) and measured the runtime to call `match()` on all of them with [this line](8e1f820bb4/llvm/lib/Support/SpecialCaseList.cpp (L64)) disabled and then enabled.
```
$ hyperfine --warmup 3 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests' 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests'
Benchmark 1: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests
Time (mean ± σ): 575.9 ms ± 20.3 ms [User: 573.1 ms, System: 2.7 ms]
Range (min … max): 555.5 ms … 620.0 ms 10 runs
Benchmark 2: GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests
Time (mean ± σ): 283.4 ms ± 6.7 ms [User: 280.3 ms, System: 3.0 ms]
Range (min … max): 277.0 ms … 294.9 ms 10 runs
Summary
'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=0 build/unittests/Support/SupportTests' ran
2.03 ± 0.09 times faster than 'GTEST_FILTER="SpecialCaseListTest.Large" USE_TRIGRAMS=1 build/unittests/Support/SupportTests'
```
Using `perf` I found that most of the runtime in `TrigramIndex::isDefinitelyOut()` comes from a division operation that seems to come from `std::unordered_map`: 8e1f820bb4/llvm/include/llvm/Support/TrigramIndex.h (L62)
Removing `TrigramIndex` will make it easier to potentially switch to using `GlobPattern` instead of a full regex for `SpecialCaseList`. See discussion in https://reviews.llvm.org/D152762 for details.
Reviewed By: MaskRay, #sanitizers, vitalybuka
Differential Revision: https://reviews.llvm.org/D153171
Casting.cpp in llvm unittests includes "llvm/IR/User.h" which depends on
intrinsic_gen if using module because it needs to build IR module including
`Attributes.h`.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D153048
TaskQueue was added several years ago as part of D48240.
There are currently no uses of this class anywhere in LLVM and I don't see
any patch that plans to use this class, so it doesn't seem useful to keep
compiling and testing this class at the moment.
The code itself is fine, so if we actually end up having a use for this code,
then I think it's perfectly fine to just re-commit this class then.
Differential revision: https://reviews.llvm.org/D86338
Balanced Partitioning tests, added by 1794532bb942, currently
hang on ARM, what causes check-all to never finish.
Issue #63168
Differential Revision: https://reviews.llvm.org/D147812
In https://reviews.llvm.org/D147812 I created
`BalancedPartitioningTest.cpp` and inadvertently drastically increased the
number of files needed to compile `SupportTests`. Instead lets move the
`BPFunctionNode` test to `unittests/ProfileData` so we can remove the
extra dependency.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D152325
In [0] we described an algorithm called //BalancedPartitioning// (bp) to consume function traces [1] and compute a function order that reduces the number of page faults during startup.
This patch adds the `order` command to the `llvm-profdata` tool which uses bp to output a function order that can be passed to the linker via `--symbol-ordering-file=`.
Special thanks to Sergey Pupyrev and Julian Mestre for designing this balanced partitioning algorithm.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
[1] https://reviews.llvm.org/D147287
Reviewed By: spupyrev
Differential Revision: https://reviews.llvm.org/D147812
Similar to what we do with ConstantRanges, also test 1-bit values
in exhaustive tests, as these often expose special conditions.
This would have exposed the assertion failure fixed in D151788
earlier.
Implement precise nuw/nsw support in the KnownBits implementation,
replacing the rather crude handling in ValueTracking.
Differential Revision: https://reviews.llvm.org/D151208
These where previously missing. Even in the case where overflow is
indeterminate we can still deduce some of the low/high bits.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D150102
`abs` preserves the lowest set bit, so if we know the lowest set bit,
set it in the output.
As well, implement the case where the operand is known negative.
Reviewed By: foad, RKSimon
Differential Revision: https://reviews.llvm.org/D150100
We can more precisely determine the upper bits doing `MaxNum /
MinDenum` as opposed to only using the MSB.
As well, if the `exact` flag is set, we can sometimes determine some
of the low-bits.
Differential Revision: https://reviews.llvm.org/D150094
Can figure out some of the upper bits (similiar to `udiv`) if we know
the sign of the inputs.
As well, if we have the `exact` flag we can sometimes determine some
low-bits.
Differential Revision: https://reviews.llvm.org/D150093
Do not assert if the bit width is larger than 64 bits. This case
is currently hidden from the IR layer by other checks, but gets
exposed with future changes.
The implementations for shifts were suboptimal in the case where
the max shift amount was >= bitwidth. In that case we should still
use the usual code clamped to BitWidth-1 rather than just giving up
entirely.
Additionally, there was an implementation bug where the known zero
bits for the individual shift amounts were not set in the shl/lshr
implementations. I think after these changes, we'll be able to drop
some of the code in ValueTracking which *also* evaluates all possible
shift amounts and has been papering over this issue.
For the "all poison" case I've opted to return an unknown value for
now. It would be better to return zero, but this has fairly
substantial test fallout, so I figured it's best to not mix it into
this change. (The "correct" return value would be a conflict, but
given that a lot of our APIs assert conflict-freedom, that's probably
not the best idea to actually return.)
Differential Revision: https://reviews.llvm.org/D150587
Align the way we perform exhaustive tests for KnownBits with what
we do for ConstantRange. Test each case separately by specifying
a function on KnownBits and one on APInts. Additionally, specify
a callback that determines which cases are supposed to be optimal,
rather than only correct. Unlike the ConstantRange case there is
a well-defined, unique notion of optimality for KnownBits.
If a failure occurs, print out the inputs, computed result and
exact result. Adjust the printing function to produce the output
in a format that is meaningful for KnownBits, i.e. print the
actual known bits, using ? to signify unknowns and ! to signify
conflicts.
PerThreadBumpPtrAllocator allows separating allocations by thread id.
That makes allocations race free. It is possible because
ThreadPoolExecutor class creates threads, keeps them until
the destructor of ThreadPoolExecutor is called, and assigns ids
to the threads. Thus PerThreadBumpPtrAllocator should be used with only
threads created by ThreadPoolExecutor. This allocator is useful when
thread safe BumpPtrAllocator is needed.
Reviewed By: MaskRay, dexonsmith, andrewng
Differential Revision: https://reviews.llvm.org/D142318