Summary:
This patch changes the linux build to use the wide reads on the memory
operations by default. These memory functions will now potentially read
outside of the bounds explicitly allowed by the current function. While
technically undefined behavior in the standard, plenty of C library
implementations do this. it will not cause a segmentation fault on linux
as long as you do not cross a page boundary, and because we are only
*reading* memory it should not have atomic effects.
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- fromfpbf16
- fromfpxbf16
- ufromfpbf16
- ufromfpxbf16
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16fma
- bf16fmaf
- bf16fmal
- bf16fmaf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
Use LIBC_ERRNO_MODE_SYSTEM_INLINE instead as the default for the "public
packaging" (i.e. release mode) of an overlay build. The Bazel build has
already switched to use it by default in
5ccc734fa0355f971f8f515457a0bece33ab6642. This should be a safe change,
as LIBC_ERRNO_MODE_SYSTEM_INLINE works a drop-in (but simpler)
LIBC_ERRNO_MODE_SYSTEM replacement. Remove the associated code paths and
config settings.
Fixes issue #143454.
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16div
- bf16divf
- bf16divl
- bf16divf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- fmaximumbf16
- fmaximum_magbf16
- fmaximum_mag_numbf16
- fmaximum_numbf16
- fminimumbf16
- fminimum_magbf16
- fminimum_mag_numbf16
- fminimum_numbf16
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16mul
- bf16mulf
- bf16mull
- bf16mulf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds implements following basic math functions for BFloat16 type
along with the tests:
- bf16add
- bf16addf
- bf16addl
- bf16addf128
- bf16sub
- bf16subf
- bf16subl
- bf16subf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR implements the following basic math functions for BFloat16 type
along with the tests:
- ceilbf16
- floorbf16
- roundbf16
- roundevenbf16
- truncbf16
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
Part of #145349. Requires #145358. Required by #146863. Once the mutex
has been implemented, we can register functions to be called for exit
with `atexit`.
Part of https://github.com/llvm/llvm-project/issues/145349. Required to
allow `atexit` to work. As part of `HermeticTestUtils.cpp`, there is a
reference to `atexit()`, which eventually instantiates an instance of a
Mutex.
Instead of copying the implementation from
`libc/src/__support/threads/gpu/mutex.h`, we allow platforms to select
an implementation based on configurations, allowing the GPU and
single-threaded baremetal platforms to share an implementation. This can
be configured or overridden.
Later, when the threading API is more complete, we can add an option to
support multithreading (or set it as the default), but having
single-threading (in tandem) is in line with other libraries for
embedded devices.
This PR implements fabsbf16 math function for BFloat16 type along with
the tests.
---------
Signed-off-by: krishna2803 <kpandey81930@gmail.com>
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
Co-authored-by: OverMighty <its.overmighty@gmail.com>
Implemented barrier synchronization for pthreads
- Uses condition variables internally for platform independence
(platform-specific work is handled by the condition variable
implementation)
- Does NOT currently handle barrierattr pshared, this is a goal for a
future patch
Reverts llvm/llvm-project#146226
The MPFR test uses `mpfr_asinpi` which requires MPFR 4.2.0 or later, but
the Buildbots are running an older version of MPFR.
See https://lab.llvm.org/buildbot/#/builders/104/builds/27743 for
example.
I said I was going to revert the PR until we have a workaround for older
versions of MPFR, but then I forgot and I just disabled the entrypoints
which doesn't fix the Buildbot builds.
The function is implemented using the following Taylor series that's
generated using [python-sympy](https://www.sympy.org/en/index.html), and
it is very accurate for |x| $$\in [0, 0.5]$$ and has been verified using
Geogebra. The range reduction is used for the rest range (0.5, 1].
$$
\frac{\arcsin(x)}{\pi} \approx
\begin{aligned}[t]
& 0.318309886183791x \\
& + 0.0530516476972984x^3 \\
& + 0.0238732414637843x^5 \\
& + 0.0142102627760621x^7 \\
& + 0.00967087327815336x^9 \\
& + 0.00712127941391293x^{11} \\
& + 0.00552355646848375x^{13} \\
& + 0.00444514782463692x^{15} \\
& + 0.00367705242846804x^{17} \\
& + 0.00310721681820837x^{19} + O(x^{21})
\end{aligned}
$$
## Geogebra graph

Closes#132210
Implemented an internal multi-byte to wide character string conversion
function, public functions, and tests
---------
Co-authored-by: Sriya Pratipati <sriyap@google.com>
Summary:
This patch adds all the new f16 math functions to the GPU build. These
should all pass except exp2m1f16 on AMDGPU for some reason. I'll
investigate that later.