The existing options for bin→dec float conversion are all based on the
Ryū algorithm, which generates 9 output digits at a time using a table
lookup. For users who can't afford the space cost of the table, the
table-lookup subroutine is replaced with one that computes the needed
table entry on demand, but the algorithm is otherwise unmodified.
The performance problem with computing table entries on demand is that
now you need to calculate a power of 10 for each 9 digits you output.
But if you're calculating a custom power of 10 anyway, it's easier to
just compute one, and multiply the _whole_ mantissa by it.
This patch adds a header file alongside `float_dec_converter.h`, which
replaces the whole Ryū system instead of just the table-lookup routine,
implementing this alternative simpler algorithm. The result is accurate
enough to satisfy (minimally) the accuracy demands of IEEE 754-2019 even
in 128-bit long double. The new float128 test cases demonstrate this by
testing the cases closest to the 39-digit rounding boundary.
In my tests of generating 39 output digits (the maximum number supported
by this algorithm) this code is also both faster and smaller than the
USE_DYADIC_FLOAT version of the existing Ryū code.
I normally run my cmake with LIBC_CMAKE_VERBOSE_LOGGING set to ON so I
can debug build issues more easily. One of the effects of this is I see
which tests/entrypoints are skipped on my machine. This patch fixes up
the tests and entrypoints that were skipped, but easily fixed. These
were:
libc.src.pthread.pthread_spin_destroy
libc.src.pthread.pthread_spin_init
libc.src.pthread.pthread_spin_lock
libc.src.pthread.pthread_spin_trylock
libc.src.pthread.pthread_spin_unlock
(entrypoints were just missing)
libc.src.wchar.btowc
(I forgot to finish it)
libc.test.src.sys.statvfs.linux.statvfs_test
libc.test.src.sys.statvfs.linux.fstatvfs_test
(Incorrect includes for rmdir, needed some cleanup)
libc.test.integration.src.unistd.execve_test
(wrong dep for errno)
libc.test.src.math.smoke.fmaf_test
(add_fp_unittest doesn't support flags)
libc.test.src.stdio.scanf_core.converter_test
(needed to be moved away from string_reader, further cleanup needed)
Scanf parsing reads the longest possibly valid prefix for a given
conversion. Then, it performs the conversion on that string. In the case
of "0xZ" with a hex conversion (either "%x" or "%i") the longest
possibly valid prefix is "0x", which makes it the "input item" (per the
standard). The sequence "0x" is not a "matching sequence" for a hex
conversion, meaning it results in a matching failure, and parsing ends.
This is because to know that there's no valid digit after "0x" it reads
the 'Z', but it can only put back one character (the 'Z') leaving it
with consuming an invalid sequence.
(inspired by a thread on the libc-coord mailing list:
https://www.openwall.com/lists/libc-coord/2024/10/15/1, see 7.32.6.2 in
the standard for more details.)
Dyadic floats were an existing option for float to string conversion,
but it had become stale. This patch fixes it up as well as adding proper
config options and test support. Due to the test changes this is a
followup to #110759
The sprintf tests have a macro named "ASSERT_STREQ_LEN" which was used
in about half of the tests. This patch moves all of the tests which can
to using that macro. This patch also enables long double tests for %e
and %g, since those never got finished. There's still some work to do
enabling long double testing for long doubles other than the intel 80
bit format, but that can land in a followup.
The `#ifdef LIBC_COPT_FLOAT_TO_STR_REDUCED_PRECISION` lines are for a
followup patch.
Summary:
The GPU handling for a lot of `FILE *` functions pretty much just
forwards it to the host via RPC. This test checks for implementation
defined behavior, which sometimes passes and sometimes doesn't. We just
disable it here so it works on the standard semantics.
We do this forwarding primarily for interopt w/ the host if the user is
compiling from an offloading language (e.g. CUDA).
This patch adds the %m conversion to printf, which prints the
strerror(errno). Explanation of why is below, this patch also updates
the docs, tests, and build system to accomodate this.
The standard for syslog in posix specifies it uses the same format as
printf, but adds %m which prints the error message string for the
current value of errno. For ease of implementation, it's standard
practice for libc implementers to just add %m to printf instead of
creating a separate parser for syslog.
In patch #105293 tests for vfscanf were added, meant to be identical to
the fscanf tests. Unfortunately, the author forgot to rename the target
file causing an occasional test flake where one test writes to the file
while the other is trying to read it. This patch fixes the issue by
renaming the target test file for the vfscanf test.
Prevously, if INT_MIN was passed as a wildcard width to a printf
conversion the parser would attempt to negate it to get the positive
width (and set the left justify flag), but it would underflow and the
width would be treated as 0. This patch corrects the issue by instead
treating a width of INT_MIN as identical to -INT_MAX.
Also includes docs changes to explain this behavior and adding b to the
list of int conversions.
Summary:
We can enable the sscanf function on the GPU now. This required adding
the configs to the scanf list so that the GPU build didn't do float
conversions.
Summary:
This patch implements the `printf` family of functions on the GPU using
the new variadic support. This patch adapts the old handling in the
`rpc_fprintf` placeholder, but adds an extra RPC call to get the size of
the buffer to copy. This prevents the GPU from needing to parse the
string. While it's theoretically possible for the pass to know the size
of the struct, it's prohibitively difficult to do while maintaining ABI
compatibility with NVIDIA's varargs.
Depends on https://github.com/llvm/llvm-project/pull/96015.
The previous printf macro test for 64 bit octal used the number 0123,
which is not large enough to ensure that the macro is actually reading a
64 bit number. This patch enlarges the number, and also makes sure the
return value of sprintf is correct for the macro tests.
Fixes#93711 .
This patch implements the ``fdopen`` function. Given that ``fdopen``
internally calls ``fcntl``, the implementation of ``fcntl`` has been
moved to the ``__support/OSUtil``, where it serves as an internal public
function.
Use a seek offset that fits within the file size.
This was missed in presubmit because the FILE based stdio tests aren't
run in
overlay mode; fullbuild is not tested in presubmit.
WRITE_SIZE == 11, so using a value of 42 for offseto would cause the
expression
`WRITE_SIZE - offseto` to evaluate to -31 as an unsigned 64b integer
(18446744073709551585ULL).
Fixes#86928
In patch #82461 the sprintf tests were made to use UINTMAX_WIDTH which
isn't defined on all systems. This patch changes it to
sizeof(uintmax_t)*CHAR_BIT which is more portable.
SYS_rename may be unavailable on architectures such as aarch64 and
riscv.
rename can be implemented in terms of SYS_rename, SYS_renameat, or
SYS_renameat2. I don't have a full picture of the history here, but it
seems
that SYS_renameat might also be unavailable on some platforms.
`man 2 rename` mentions that SYS_renameat2 was added in Linux 3.15. We
don't
need to support such ancient kernel versions prior.
Link: #84980
Link: #85068
Summary:
Currently we print `null` for the null pointer in a `%s` expression.
Although it's not defined by the standard, other implementations choose
to use `(null)` to indicate this. We also currently print `(nullptr)` so
I think it's more consistent to use parens in both cases.
This patch adds the r, R, k, and K conversion specifiers to printf, with
accompanying tests. They are guarded behind the
LIBC_COPT_PRINTF_DISABLE_FIXED_POINT flag as well as automatic fixed
point support detection.
Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.
This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.
The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.
The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.
The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.
We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.
This approach is vastly superior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.
Some certain utilities need to be built with
`--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine
workaround as we
will always assume that the GPU `libc` is a cross-build with a
functioning host.
Depends on https://github.com/llvm/llvm-project/pull/81557
Having libc_errno outside of the namespace causes versioning issues when
trying to link the tests against LLVM-libc. Most of this patch is just
moving libc_errno inside the namespace in tests. This isn't necessary in
the function implementations since those are already inside the
namespace.
Much of unistd involves modifying files. The tests for these functions
need to use libc_make_test_file_path which didn't exist when they were
first implemented. This patch adds most of unistd to the bazel along
with the corresponding tests. Tests that modify directories had to be
disabled since bazel doesn't seem to handle them properly.
This patch provides specific test macros to deal with `errno`.
This will help abstract away the differences between unit test and integration/hermetic tests in #79319.
In one case we use `libc_errno` which is a struct, in the other case we deal directly with `errno`.
The Ryu algorithm is very fast with its table, but that table grows too
large for long doubles. This patch adds a method of calculating the
digits of long doubles using just wide integers and fast modulo
operations. This results in significant performance improvements vs the
previous int calc mode, while taking up a similar amound of peak memory.
It will be slow in some %e/%g cases, but reasonable fast for %f with no
loss of accuracy.