This PR adds the following basic math functions for BFloat16 type along
with the tests:
- fromfpbf16
- fromfpxbf16
- ufromfpbf16
- ufromfpxbf16
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16fma
- bf16fmaf
- bf16fmal
- bf16fmaf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16div
- bf16divf
- bf16divl
- bf16divf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- fmaximumbf16
- fmaximum_magbf16
- fmaximum_mag_numbf16
- fmaximum_numbf16
- fminimumbf16
- fminimum_magbf16
- fminimum_mag_numbf16
- fminimum_numbf16
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR adds the following basic math functions for BFloat16 type along
with the tests:
- bf16mul
- bf16mulf
- bf16mull
- bf16mulf128
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
This PR implements the following basic math functions for BFloat16 type
along with the tests:
- ceilbf16
- floorbf16
- roundbf16
- roundevenbf16
- truncbf16
---------
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
Part of https://github.com/llvm/llvm-project/issues/145349. Required to
allow `atexit` to work. As part of `HermeticTestUtils.cpp`, there is a
reference to `atexit()`, which eventually instantiates an instance of a
Mutex.
Instead of copying the implementation from
`libc/src/__support/threads/gpu/mutex.h`, we allow platforms to select
an implementation based on configurations, allowing the GPU and
single-threaded baremetal platforms to share an implementation. This can
be configured or overridden.
Later, when the threading API is more complete, we can add an option to
support multithreading (or set it as the default), but having
single-threading (in tandem) is in line with other libraries for
embedded devices.
This PR implements fabsbf16 math function for BFloat16 type along with
the tests.
---------
Signed-off-by: krishna2803 <kpandey81930@gmail.com>
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
Co-authored-by: OverMighty <its.overmighty@gmail.com>
Summary:
This patch adds all the new f16 math functions to the GPU build. These
should all pass except exp2m1f16 on AMDGPU for some reason. I'll
investigate that later.
Summary:
Currently we dispatch the writing mode off of a runtime enum passed in
by the constructor. This causes very unfortunate codegen for the GPU
targets where we get worst-case codegen because of the unused function
pointer for `sprintf`. Instead, this patch moves all of this to a
template so it can be masked out. This results in no dynamic stack and
uses 60 VGPRs instead of 117. It also compiles about 5x as fast.
Update string_utils' string_length to work with char* or wchar_t*, so that it
may be reusable when implementing wmemchr, wcspbrk, wcsrchr, wcsstr.
Link: #121183
Link: #124027
Co-authored-by: Nick Desaulniers <ndesaulniers@google.com>
---------
Co-authored-by: Tristan Ross <tristan.ross@midstall.com>
Summary:
This is a holdover from when these targets were merged. They're
basically the same but there's no reason they should be treated as
identical. I think we will live with a little duplication.