This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16fma
- bf16fmaf
- bf16fmal
- bf16fmaf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
The PR is going to improve the readability for the files under
`llvm-project/libc/src/wchar` directory.
---------
Co-authored-by: Jin Huang <jingold@google.com>
Use LIBC_ERRNO_MODE_SYSTEM_INLINE instead as the default for the "public
packaging" (i.e. release mode) of an overlay build. The Bazel build has
already switched to use it by default in
5ccc734fa0355f971f8f515457a0bece33ab6642. This should be a safe change,
as LIBC_ERRNO_MODE_SYSTEM_INLINE works a drop-in (but simpler)
LIBC_ERRNO_MODE_SYSTEM replacement. Remove the associated code paths and
config settings.
Fixes issue #143454.
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16div
- bf16divf
- bf16divl
- bf16divf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- fmaximumbf16
- fmaximum_magbf16
- fmaximum_mag_numbf16
- fmaximum_numbf16
- fminimumbf16
- fminimum_magbf16
- fminimum_mag_numbf16
- fminimum_numbf16
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
Summary:
This moves the waiting to be done inside of the `try_lock` routine
instead. This makes the logic much simpler since it's just a single loop
on a load. We should have the same effect here, and since we don't care
about this being a generic interface it shouldn't matter that it waits
abit. Still wait free since it's guaranteed to make progress
*eventually*.
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16mul
- bf16mulf
- bf16mull
- bf16mulf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
Summary:
This patch introduces a lock-free stack used to store a fixed number of
slabs. Instead of going directly through RPC memory, we instead can
consult the cache and use that. Currently, this means that ~64 MiB of
memory will remain in-use if the user completely fills the cache.
However, because we always fully destroy the object, the chunk size can
be reset so they can be fully reused.
This greatly improves performance in cases where the user has previously
accessed malloc, lowering the difference between an implementation that
does not free slabs at all and one that does.
We can also skip the expensive zeroing step if the old chunk size was
smaller than the previous one. Smaller chunk sizes need a larger
bitfield, and because we know for a fact that the number of users
remaining in this slab is zero thanks to the reference counting we can
guarantee that the bitfield is all zero like when it was initialized.
This PR adds implements following basic math functions for BFloat16 type
along with the tests:
- bf16add
- bf16addf
- bf16addl
- bf16addf128
- bf16sub
- bf16subf
- bf16subl
- bf16subf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR implements the following basic math functions for BFloat16 type
along with the tests:
- ceilbf16
- floorbf16
- roundbf16
- roundevenbf16
- truncbf16
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This removes an extraneous ',' in the generated dlfcn header.
This also adds `__restrict` to `dladdr`'s declaration per POSIX. Another
fix is made: the C standard `restrict` keyword is removed from
dlinfo.cpp/dlinfo.h (but note that dlfcn.yaml still annotates
`__restrict` for dlinfo's decl).
Addressed TODO to template the StringConverter pop functions to have a
single implementation (combine popUTF8 and popUTF32 into a single
templated pop function)
The iswalpha function is only working for characters in the ascii range
right now, which isn't ideal. Also it was returning a bool when the
function is supposed to return an int.
When single-threaded mode is selected, all instances of the keyword
`LIBC_THREAD_LOCAL` are stubbed out, similar to how it currently works
on the GPU. This allows baremetal builds to avoid using thread_local.
However, libcxx uses shared headers, so we need to be careful there.
Thankfully, there is already an option to disable multithreading in
libcxx, so a flag is added such that single-threaded mode is propagated
down to libc.
A initial commit for #97929, this adds a stub implementation for
`dladdr` and includes the definition for the `DL_info` type used as one
of its arguments.
While the `dladdr` implementation relies on dynamic linker support, this
patch will add its prototype in the generated `dlfcn.h` header so that
it can be used by downstream platforms that have their own `dladdr`
implementation.
An initial commit for #149911, this adds a stub implementation for
dlinfo and the enums list of `RTLD_DI_*` values.
While the dlinfo implementation relies on dynamic linker support, this
patch will add its prototype in the generated dlfcn.h header so that it
can be used by downstream platforms that have their own dlinfo
implementation.
POSIX says "If no such wide-character code is found, the current token
extends to the end of the wide-character string pointed to by ws1, and
subsequent searches for a token shall return a null pointer", but the
current implementation only returns nullptr the first time. This failed
an existing bionic test when I tried to switch over to llvm-libc
wcstok().
Summary:
Wave 64 mode touches the upper limit of this, which had an off-by-one
error. This caused it to return the same leader which gave an invalid
view of memory.
Part of https://github.com/llvm/llvm-project/issues/145349. Required to
allow `atexit` to work. As part of `HermeticTestUtils.cpp`, there is a
reference to `atexit()`, which eventually instantiates an instance of a
Mutex.
Instead of copying the implementation from
`libc/src/__support/threads/gpu/mutex.h`, we allow platforms to select
an implementation based on configurations, allowing the GPU and
single-threaded baremetal platforms to share an implementation. This can
be configured or overridden.
Later, when the threading API is more complete, we can add an option to
support multithreading (or set it as the default), but having
single-threading (in tandem) is in line with other libraries for
embedded devices.
This patch removes some sched.h includes, preferring the proxy headers.
This patch does not clean them all up as some proxy type headers need to
be added, like for `struct sched_param`.
Summary:
This patch changes the slab search to start at the number of allocated
bits. Previously we would randomly search, but this gives very good
performance when doing nothing but allocating, which is a common
configuration. This will degrade performance when mixing malloc and free
close to eachother as this is more likely to fail when the counter
starts decreasing.