For some reason cmake usually turns this path into an absolute path, but
sometimes
it doesn't do it and stays as a relative path, which means the command
will fail.
Specify it as an absolute path always so this doesn't happen any more.
1. WEOF is defined as a `wint_t` by the C standard. On certain
architectures, the test won't compile on `-Wall`. This fixes it.
2. If `testSubnormalRange` fails, it will spit out way too many error
messages, which overwhelms my test environment. I reduce this to 1k for
now.
This is required for #145349
Duplicated str_to_integer.h and modified it to work with widechars.
A future patch will implement the public functions (wcstol, wcstoll,
etc) by calling this internal function.
This PR enables support for BFloat16 type in LLVM libc along with
support for testing BFloat16 functions via MPFR.
---------
Signed-off-by: krishna2803 <kpandey81930@gmail.com>
Signed-off-by: Krishna Pandey <kpandey81930@gmail.com>
Co-authored-by: OverMighty <its.overmighty@gmail.com>
Summary:
The allocator interface is supposed to have 16 byte alignment (to keep
it consistent with the CPU allocator. We could probably drop this to 8
if desires.) But this was not enforced because the number of bytes used
for the bitfield sometimes resulted in alignment of 8 instead of 16.
Explicitly align the number of bytes to be a multiple of 16 even if
unused.
Summary:
This patch uses the actual allocator interface to implement
`aligned_alloc`. We do this by simply rounding up the amount allocated.
Because of how index calculation works, any offset within an allocated
pointer will still map to the same chunk, so we can just adjust
internally and it will free all the same.
Summary:
This avoids a ptrtoint by just using the clang builtin. This is clang
specific but only clang can compile GPU code anyway so I do not bother
with a fallback.
Summary:
Now that we have `malloc` we can implement `realloc` efficiently. This
uses the known chunk sizes to avoid unnecessary allocations. We just
return nullptr for NVPTX. I'd remove the list for the entrypoint but
then the libc++ code would stop working. When someone writes the NVPTX
support this will be trivial.
Summary:
In the GPU allocator we reinterpret cast from a void pointer. We know
that an actual object was constructed there according to the C++ object
model, but to make it fully standards compliant we need to 'launder' it
to forward that information to the compiler. Add this function and call
it as appropriate.
This implementation has been compiled with the [pigweed toolchain](https://pigweed.dev/toolchain.html) and tested on:
- Raspberry Pi Pico 2 with the following options\
`--target=armv8m.main-none-eabi`
`-march=armv8m.main+fp+dsp`
`-mcpu=cortex-m33`
- Raspberry Pi Pico with the following options\
`--target=armv6m-none-eabi`
`-march=armv6m`
`-mcpu=cortex-m0+`
They both compile down to a little bit more than 200 bytes and are between 2 and 10 times faster than byte per byte copies.
For best performance the following options can be set in the `libc/config/baremetal/arm/config.json`
```
{
"codegen": {
"LIBC_CONF_KEEP_FRAME_POINTER": {
"value": false
}
},
"general": {
"LIBC_ADD_NULL_CHECKS": {
"value": false
}
}
}
```
Summary:
This patch adds all the new f16 math functions to the GPU build. These
should all pass except exp2m1f16 on AMDGPU for some reason. I'll
investigate that later.