This is required in hermetic testing downstream. It is not complete, and
will not work on hardware, however it runs on QEMU, and can report a
pass/fail on our tests.
These variants require a different exception table that requires a bit
of initialisation.
This allows us to enable testing for these variants downstream.
In order to run hermetic tests downstream (#145349), we need startup
code. This is based on the crt0 that is present in Arm Toolchain:
5446a3cbf4/arm-software/embedded/llvmlibc-support/crt0.
Note that some tests do fail due to lack of hardware setup, which will
be implemented at a later time.
Most of the CMake/structure is a copy from the existing code in both
`libc/startup/linux` and `libc/startup/gpu`. It is required to build an
object file to be linked with.
Also add header files for init/fini such that crt0 can use the provided
symbols.
---------
Co-authored-by: Michael Jones <michaelrj@google.com>
Summary:
The default behavior for LTO on other targets does not specify the
number of LTO partitions. Recent changes made this default to 8 on
AMDGPU which had some issues with the `libc` project. The option to
disable this is HIP only so I think for now we should restrict this just
to HIP.
I'm definitely on board with getting some more parallelism here, but I
think it should probably be restricted to just offloading languages. The
new driver goes through the `--target=amdgcn-amd-amdhsa` for its output,
which means we'd need to forward the default somehow.
Summary:
There's an extern weak symbol for this, we should just factor these into
a more common interface. Stub them temporarily to make the bots happy.
PTXAS does not handle extern weak.
The call chain to `Mutex:lock` can be polluted by stack protector. For
completely safe, let's postpone the main TLS tearing down to a separate
phase.
fix#108030
Summary:
I fixed the errors here recently so I can actually use these. This
shouldn't impact much, just should hopefully make the code generated
slightly better.
Summary:
This patch implements 'getenv'. I was torn on how to implement this,
since realistically we only have access to this environment pointer in
the "loader" interface. An alternative would be to use an RPC call every
time, but I think that's overkill for what this will be used for. A
better solution is just to emit a common `DataEnvironment` that contains
all of the host visible resources to initialize. Right now this is the
`env_ptr`, `clock_freq`, and `rpc_client`.
I did this by making the `app.h` interface that Linux uses more general,
could possibly move that into a separate patch, but I figured it's
easier to see with the usage.
Summary:
The `nvlink-wrapper` can do LTO now, which means we can still create
some LLVM-IR without needing an architecture. In the case that we try to
invoke `nvlink` internally, that will still fail. This patch simply
defers the error until later so we can use `--lto-emit-llvm` to get the
IR without specifying an architecture.
Summary:
We want to provide the `crt1.o` object inside of the current build so
that other projects can use it for tests. Because it's currently an
object library (as is also required for the dependencies for the
hermetic tests) this cannot be renamed and is not emitted by default.
Here we simply add an executable that emits bitcode so that CMake will
give it a name.
This PR fix several build errors on aarch64 targets when building with
gcc:
- uninitialized values leading to `Werrors`
- undefined builtin functions
- glibc header pollution
Otherwise the startup objects will fail to link since they were cross compiled,
but the linker is not informed of the intent to cross compile, which results in
linker errors when the host architecture does not match the target
architecture.
Summary:
The AMDGPU target needs an external clock symbol so the driver can set
the frequency with the correct value. This was left over from the
previous implementation and I forgot to remove it when actually
implementing the timing utilities.
Summary:
This is a massive patch because it reworks the entire build and
everything that depends on it. This is not split up because various bots
would fail otherwise. I will attempt to describe the necessary changes
here.
This patch completely reworks how the GPU build is built and targeted.
Previously, we used a standard runtimes build and handled both NVPTX and
AMDGPU in a single build via multi-targeting. This added a lot of
divergence in the build system and prevented us from doing various
things like building for the CPU / GPU at the same time, or exporting
the startup libraries or running tests without a full rebuild.
The new appraoch is to handle the GPU builds as strict cross-compiling
runtimes. The first step required
https://github.com/llvm/llvm-project/pull/81557 to allow the `LIBC`
target to build for the GPU without touching the other targets. This
means that the GPU uses all the same handling as the other builds in
`libc`.
The new expected way to build the GPU libc is with
`LLVM_LIBC_RUNTIME_TARGETS=amdgcn-amd-amdhsa;nvptx64-nvidia-cuda`.
The second step was reworking how we generated the embedded GPU library
by moving it into the library install step. Where we previously had one
`libcgpu.a` we now have `libcgpu-amdgpu.a` and `libcgpu-nvptx.a`. This
patch includes the necessary clang / OpenMP changes to make that not
break the bots when this lands.
We unfortunately still require that the NVPTX target has an `internal`
target for tests. This is because the NVPTX target needs to do LTO for
the provided version (The offloading toolchain can handle it) but cannot
use it for the native toolchain which is used for making tests.
This approach is vastly superior in every way, allowing us to treat the
GPU as a standard cross-compiling target. We can now install the GPU
utilities to do things like use the offload tests and other fun things.
Some certain utilities need to be built with
`--target=${LLVM_HOST_TRIPLE}` as well. I think this is a fine
workaround as we
will always assume that the GPU `libc` is a cross-build with a
functioning host.
Depends on https://github.com/llvm/llvm-project/pull/81557
Summary:
Currently, doing `ninja install` will fail in fullbuild mode due to the
startup utilities not being built by default. This was hidden previously
by the fact that if tests were run, it would build the startup utilities
and thus they would be present.
This patch solves this issue by making the `libc-startup` target a
dependncy on the final library. Furthermore we simply factor out the
library install directory into the base CMake directory next to the
include directory handling. This change makes the `crt` files get
installed in `lib/x86_64-unknown-linu-gnu` instead of just `lib`.
This fixes an error I had where doing a runtimes failed to install its
libraries because the install step always errored.
```
AppProperties app;
```
is marked as a weak symbol in header now. One can just use `&app !=
nullptr` to check if `app` is defined. There is no need to define it for
overlay mode.
For some reasons, we are using `-fpie`
(libc/cmake/modules/LLVMLibCObjectRules.cmake:31) without supporting it.
According to @lntue, some of the hermetic tests are broken without
proper PIE support. This patch implements basic relocations support for
PIE.
* separate initialization routines into _start and do_start for all
architectures.
* lift do_start as a separate object library to avoid code duplication.
* (addtionally) address the problem of building hermetic libc with
-fstack-pointer-*
The `crt1.o` is now a merged result of three components:
```
___
|___ x86_64
| |_______ start.cpp.o <- _start (loads process initial stack and aligns stack pointer)
| |_______ tls.cpp.o <- init_tls, cleanup_tls, set_thread_pointer (TLS related routines)
|___ do_start.cpp.o <- do_start (sets up global variables and invokes the main function)
```
As part of startup refactoring, this patch adds a function to merge
multiple objects into a single relocatable object:
cc -r obj1.o obj2.o -o obj.o
A relocatable object is an object file that is not fully linked into an
executable or a shared library. It is an intermediate file format that
can be passed into the linker.
A crt object can have arch-specific code and arch-agnostic code. To
reduce code cohesion, the implementation is splitted into multiple
units. As a result, we need to merge them into a single relocatable
object.
__stack_chk_fail should be provided by libc.a, not startup files.
Add __stack_chk_fail to existing linux and arm entrypoints. On Windows
(when
not targeting MinGW), it seems that the corresponding function
identifier is
__security_check_cookie, so no entrypoint is added for Windows.
Baremetal
targets also ought to be compileable with `-fstack-protector*`
There is no common header for this prototype, since calls to
__stack_chk_fail
are meant to be inserted by the compiler upon function return when
compiled
`-fstack-protector*`.
This patch lifts aux vector related definitions to app.h. Because
startup's refactoring is in progress, this patch still contains
duplicated changes. This problem will be addressed very soon in an
incoming patch.
If a function is declared with stack-protector, the compiler may try to
load the TLS. However, inside certain runtime functions, TLS may not be
available. This patch disables stack protectors for such routines to fix
the problem.
Closes#74487.
Summary:
We call the global constructors by function pointer. For whatever reason
the NVPTX architecture relies very specifically on the arguments to the
function pointer invocation matching what the function is implemented
as. This is problematic as most of these constructors are generated
with no arguments. This patch removes the extended arguments that GNU
and LLVM use for the constructors optionally so that it can support the
common case.