f4719c4d2cda ("Add support for GNU Hurd in Path.inc and other places")
made llvm use an internal __f_type name for the f_type field (which it
is not supposed to since accessing double-underscore names is explicitly
not supported by standards). In glibc 2.39 this field was renamed to
f_type so application can now access the field as the standard says.
LLVM-based tools often use the `llvm::vfs::FileSystem` instrastructure
to access the file system. This patch adds new kind of a VFS that
performs lightweight tracing of file system operations on an underlying
VFS. This is supposed to aid in investigating file system traffic
without resorting to instrumentation on the operating system level.
There will be follow-up patches that integrate this into Clang and its
dependency scanner.
This patch changes replaceAllocation to a static function while moving
the declaration to SmallVector.cpp. Note that:
- replaceAllocation is used only within SmallVector.cpp.
- replaceAllocation doesn't access any class members.
- Rename `Align` field in ReplacementItem/FmtAlign to `Width` to
accurately reflect its use.
- Change both `Width` and `Index` in ReplacementItem to 32-bit int
instead of size_t (as 64-bits seems excessive in this context).
- Eliminate the use of `Empty` ReplacementType, and use the
existing std::optional<> instead to indicate that.
- Eliminate some boilerplate type code in formatv().
- Eliminate the loop in `splitLiteralAndReplacement`. The existing
code will never loop back.
- Directly use constructor instead of std::make_pair.
The purpose is to save an extra memset in both cases:
1. When `int64_t(val) < 0`, zeroing out is redundant as the subsequent
for-loop will initialize to `val .. 0xFFFFF ....`. Instead we should
only create an uninitialized buffer, and transform the slow for-loop
into a memset to initialize the higher words to `0xFF`.
2. In the other case, first we create an uninitialized array (`new
int64_t[]`) and _then_ we zero it out with `memset`. But this can be
combined in one operation with `new int64_t[]()`, which
default-initializes the array.
On one example where use of APInt was heavy, this improved compile time
by 1%.
If the uint64_t constructor is used, assert that the value is actually a
signed or unsigned N-bit integer depending on whether the isSigned flag
is set. Provide an implicitTrunc flag to restore the previous behavior,
where the argument is silently truncated instead.
In this commit, implicitTrunc is enabled by default, which means that
the new assertions are disabled and no actual change in behavior occurs.
The plan is to flip the default once all places violating the assertion
have been fixed. See #80309 for the scope of the necessary changes.
The primary motivation for this change is to avoid incorrectly specified
isSigned flags. A recurring problem we have is that people write
something like `APInt(BW, -1)` and this works perfectly fine -- until
the code path is hit with `BW > 64`. Most of our i128 specific
miscompilations are caused by variants of this issue.
The cost of the change is that we have to specify the correct isSigned
flag (and make sure there are no excess bits) for uses where BW is
always <= 64 as well.
Suppresses the copyright banner for `ml64` compiling BLAKE3 assembly
sources with MSVC and Ninja on Windows:
```
[157/3758] Building ASM_MASM object lib\Support\BLAKE3\CMa...upportBlake3.dir\blake3_avx512_x86-64_windows_msvc.asm.obj
Microsoft (R) Macro Assembler (x64) Version 14.41.34120.0
Copyright (C) Microsoft Corporation. All rights reserved.
Assembling: C:\path\to\llvm-project\llvm\lib\Support\BLAKE3\blake3_avx512_x86-64_windows_msvc.asm
```
is now just:
```
Assembling: C:\path\to\llvm-project\llvm\lib\Support\BLAKE3\blake3_avx512_x86-64_windows_msvc.asm
```
We can suppress that last line with `/quiet` in more recent versions of
`ml64` (from MSVC 2022 17.6) but it is not supported by all potential
MASM compilers.
Change formatv() to validate that the number of arguments passed matches
number of replacement fields in the format string, and that the replacement
indices do not contain holes.
To support cases where this cannot be guaranteed, introduce a formatv()
overload that allows disabling validation with a bool flag as its first argument.
S.substr(N) is simpler than S.slice(N, StringRef::npos) and
S.slice(N, S.size()). Also, substr is probably better recognizable
than slice thanks to std::string_view::substr.
ConstantFolding behaves differently depending on host's `HAS_IEE754_FLOAT128`.
LLVM should not change the behavior depending on host configurations.
This reverts commit 14c7e4a1844904f3db9b2dc93b722925a8c66b27.
(llvmorg-20-init-3262-g14c7e4a18449 and llvmorg-20-init-3498-g001e423ac626)
This is a reland of (#96287). This patch attempts to reduce the reverted
patch's clang compile time by removing #includes of float128.h and
inlining convertToQuad functions instead.
- Move raw_ostream << operators for `ModRef` and `MemoryEffects` to a
new ModRef.cpp file under llvm/Support (instead of AliasAnalysis.cpp)
- This enables calling these operators from `Core` files like
Instructions.cpp (for instance for debugging). Currently, they live in
`LLVMAnalysis` which cannot be linked with `Core`.
- When an unterminated open { is detected in the format string, instead
of asserting and ignoring the error, replace that string with another to
indicate the error, and remove the assert as well.
- This will make the error evident in both assert and release builds and
make observing the error more convenient (as several uses of this
function are in TableGen and it is often built in release mode even in
debug builds)
This is a reland of #96287. This change makes tests in logf128.ll ignore
the sign of NaNs for negative value tests and moves an #include <cmath>
to be blocked behind #ifndef _GLIBCXX_MATH_H.
In order to guarantee that extracting 64 bits doesn't require more than
2 words, the word size would need to be 64 bits or more. If the word
size was smaller than 64, like 32, you may need to read 3 words to get
64 bits.
https://reviews.llvm.org/D70447 switched to `ManagedStatic` to work
around race conditions in MSVC runtimes and the MinGW runtime.
However, `ManagedStatic` is not suitable for other platforms.
However, this workaround is not suitable for other platforms (#66974).
lld::fatal calls exitLld(1), which calls `llvm_shutdown` to destroy
`ManagedStatic` objects. The worker threads will finish and invoke TLS
destructors (glibc `__call_tls_dtors`). This can lead to race conditions
if other threads attempt to access TLS objects that have already been
destroyed.
While lld's early exit mechanism needs more work, I believe Parallel.cpp
should avoid this pitfall as well.
Pull Request: https://github.com/llvm/llvm-project/pull/102989
since C23 this macro is defined by float.h, which clang implements in
it's float.h since #96659 landed.
However, regcomp.c in LLVMSupport happened to define it's own macro with
that name, leading to problems when bootstrapping. This change renames
the offending macro.
It's barely testable - the test does exercise the code, but wouldn't
fail on an empty implementation. It would cause a memory leak though
(because the error handle wouldn't be unwrapped/reowned) which could be
detected by asan and other leak detectors.
Since functions exported from DLLs are type-erased, before this patch I
was seeing the new Clang 19 warning `-Wcast-function-type-mismatch`.
This happens when building LLVM on Windows.
Following discussion in
593f708118 (commitcomment-143905744)
This PR adds `f8E4M3` type to APFloat.
`f8E3M4` type follows IEEE 754 convention
```c
f8E3M4 (IEEE 754)
- Exponent bias: 3
- Maximum stored exponent value: 6 (binary 110)
- Maximum unbiased exponent value: 6 - 3 = 3
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Precision specifies the total number of bits used for the significand (mantissa),
including implicit leading integer bit = 4 + 1 = 5
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs
Additional details:
- Max exp (unbiased): 3
- Min exp (unbiased): -2
- Infinities (+/-): S.111.0000
- Zeros (+/-): S.000.0000
- NaNs: S.111.{0,1}⁴ except S.111.0000
- Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5
- Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2)
- Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6)
- Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6)
```
Related PRs:
- [PR-97179](https://github.com/llvm/llvm-project/pull/97179) [APFloat]
Add support for f8E4M3 IEEE 754 type
Compared to the generic scalar code, using Arm NEON instructions yields
a ~11x speedup: 31 vs 339.5 ms to hash 1 GiB of random data on the Apple
M1.
This follows the upstream implementation closely, with some
simplifications made:
- Removed workarounds for suboptimal codegen on older GCC
- Removed instruction reordering barriers which seem to have a
negligible impact according to my measurements
- We do not support WebAssembly's mostly NEON-compatible API
- There is no configurable mixing of SIMD and scalar code; according to
the upstream comments, this is only relevant for smaller Cortex cores
which can dispatch relatively few NEON micro-ops per cycle.
This commit intends to use only standard ACLE intrinsics and datatypes,
so it should build with all supported versions of GCC, Clang and MSVC.
This feature is enabled by default when targeting AArch64, but the
`LLVM_XXH_USE_NEON=0` macro can be set to explicitly disable it.
XXH3 is used for ICF, string deduplication and computing the UUID in
ld64.lld; this commit results in a -1.77% +/- 0.59% speed improvement
for a `--threads=8` link of Chromium.framework.
This PR implements `raw_socket_stream::read`, which overloads the base
class `raw_fd_stream::read`. `raw_socket_stream::read` provides a way to
timeout the underlying `::read`. The timeout functionality was not added
to `raw_fd_stream::read` to avoid needlessly increasing compile times
and allow for convenient code reuse with `raw_socket_stream::accept`,
which also requires timeout functionality. This PR supports the module
build daemon and will help guarantee it never becomes a zombie process.
This PR adds `f8E4M3` type to APFloat.
`f8E4M3` type follows IEEE 754 convention
```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa),
including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs
Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```
Related PRs:
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) Add f8E4M3
IEEE 754 type to mlir
The removal of StringRef::equals in
3fa409f2318ef790cc44836afe9a72830715ad84 broke the
[Solaris/sparcv9](https://lab.llvm.org/buildbot/#/builders/13/builds/724)
and
[Solaris/amd64](https://lab.llvm.org/staging/#/builders/94/builds/5176)
buildbots:
```
In file included from /vol/llvm/src/llvm-project/git/llvm/lib/Support/Path.cpp:1200:
/vol/llvm/src/llvm-project/git/llvm/lib/Support/Unix/Path.inc:519:18: error: no member named 'equals' in 'llvm::StringRef'
519 | return !fstype.equals("nfs");
| ~~~~~~ ^
```
Fixed by switching to `operator!=` instead.
Tested on sparcv9-sun-solaris2.11 and amd64-pc-solaris2.11.