FreeBSD doesn't do anything wrong here, it just happens to define and
use a struct thread in its own headers. The problems arise because here
in LLVM we have using namespace llvm prior to including system headers,
which is bad practice for precisely this reason. If we instead play by
the rules and defer our using namespace llvm until after we've included
the system headers then we no longer need this hack.
This hack is particularly problematic by being conditional on
__FreeBSD__ as of 9093ba9f7ee5 ("[Support] Include Support/thread.h
before api implementations (#111175)"), since on non-FreeBSD
Threading.inc can reference anything in Support/thread.h, only causing
errors on FreeBSD, which is precisely what happened in 64be34c562a2
("Enable using threads on z/OS (#171847)").
By deferring the using namespace llvm until after Threading.inc is
included there may be build failures introduced on untested platforms
due to needing to replace unqualified identifiers with qualified ones by
prepending llvm::.
z/OS 3.1 enables TLS support (limited to compile time constant
initializers). To enable building with thread support, we need to update
the code to handle the difference in definition of pthread_t. It is a
struct on z/OS, not an integer. The existing code assumes that pthread_t
is an integer. This usually happens when checking to see if pthread_t is
null or not.
In Parallel.cpp, there was a variable `Backoff` defined as TLS. The
initializer for this requires C++ initialization which isn't supported
on z/OS. The variable isn't actually used (see declaration of local var
with same name inside the loop) so deleting it solved the build failure
this was causing.
This PR reduces outliers in terms of runtime performance, by asking the
OS to prefetch memory-mapped input files in advance, as early as
possible. I have implemented the Linux aspect, however I have only
tested this on Windows 11 version 24H2, with an active security stack
enabled. The machine is a AMD Threadripper PRO 3975WX 32c/64t with 128
GB of RAM and Samsung 990 PRO SSD.
I have used a Unreal Engine-based game to profile the link times. Here's
a quick summary of the input data:
```
Summary
--------------------------------------------------------------------------------
4,169 Input OBJ files (expanded from all cmd-line inputs)
26,325,429,114 Size of all consumed OBJ files (non-lazy), in bytes
9 PDB type server dependencies
0 Precomp OBJ dependencies
350,516,212 Input debug type records
18,146,407,324 Size of all input debug type records, in bytes
15,709,427 Merged TPI records
4,747,187 Merged IPI records
56,408 Output PDB strings
23,410,278 Global symbol records
45,482,231 Module symbol records
1,584,608 Public symbol records
```
In normal conditions - meanning all the pages are already in RAM - this
PR has no noticeable effect:
```
>hyperfine "before\lld-link.exe @Game.exe.rsp" "with_pr\lld-link.exe @Game.exe.rsp"
Benchmark 1: before\lld-link.exe @Game.exe.rsp
Time (mean ± σ): 29.689 s ± 0.550 s [User: 259.873 s, System: 37.936 s]
Range (min … max): 29.026 s … 30.880 s 10 runs
Benchmark 2: with_pr\lld-link.exe @Game.exe.rsp
Time (mean ± σ): 29.594 s ± 0.342 s [User: 261.434 s, System: 62.259 s]
Range (min … max): 29.209 s … 30.171 s 10 runs
Summary
with_pr\lld-link.exe @Game.exe.rsp ran
1.00 ± 0.02 times faster than before\lld-link.exe @Game.exe.rsp
```
However when in production conditions, we're typically working with the
Unreal Engine Editor, with exteral DCC tools like Maya, Houdini; we have
several instances of Visual Studio open, VSCode with Rust analyzer, etc.
All this means that between code change iterations, most of the input
OBJs files might have been already evicted from the Windows RAM cache.
Consequently, in the following test, I've simulated the worst case
condition by evicting all data from RAM with
[RAMMap64](https://learn.microsoft.com/en-us/sysinternals/downloads/rammap)
(ie. `RAMMap64.exe -E[wsmt0]` with a 5-sec sleep at the end to ensure
the System thread actually has time to evict the pages)
```
>hyperfine -p cleanup.bat "before\lld-link.exe @Game.exe.rsp" "with_pr\lld-link.exe @Game.exe.rsp"
Benchmark 1: before\lld-link.exe @Game.exe.rsp
Time (mean ± σ): 48.124 s ± 1.770 s [User: 269.031 s, System: 41.769 s]
Range (min … max): 46.023 s … 50.388 s 10 runs
Benchmark 2: with_pr\lld-link.exe @Game.exe.rsp
Time (mean ± σ): 34.192 s ± 0.478 s [User: 263.620 s, System: 40.991 s]
Range (min … max): 33.550 s … 34.916 s 10 runs
Summary
with_pr\lld-link.exe @Game.exe.rsp ran
1.41 ± 0.06 times faster than before\lld-link.exe @Game.exe.rsp
```
This is similar to the work done in MachO in
https://github.com/llvm/llvm-project/pull/157917
Remove the requirement that the jobserver "fifo" is actually a named
pipe. Named pipes are essentially stateless, and therefore carry a high
risk of a killed process leaving the server with no tokens left, and no
clear way to reclaim them. Therefore, multiple jobserver implementations
use FUSE instead:
- [nixos-jobserver](https://github.com/NixOS/nixpkgs/pull/314888) (WIP)
uses simple file on FUSE
- [steve](https://gitweb.gentoo.org/proj/steve.git) uses a character
device via CUSE
- [guildmaster](https://codeberg.org/amonakov/guildmaster) uses a
character device via CUSE
This is compatible with GNU make and Ninja, since they do not check the
file type, and seems to be the only solution that can achieve state
tracking while preserving compatibility.
CC @amonakov
---------
Signed-off-by: Michał Górny <mgorny@gentoo.org>
This PR introduces a new mechanism for enforcing a sandbox around
filesystem reads coming from the compiler. A fatal error is raised
whenever the `llvm::sys::fs`, `llvm::MemoryBuffer::getFile*()` APIs get
used directly instead of going through the "blessed" virtual interface
of `llvm::vfs::FileSystem`.
In this PR I'm changing the way we provide the missing functions like
strnlen() on z/OS from the separate header file to a wrapper around the
system headers that declare these functions. This will be less
intrusive.
---------
Co-authored-by: Zibi Sarbinowski <zibi@ca.ibm.com>
This patch introduces support for the jobserver protocol to control
parallelism for device offloading tasks.
When running a parallel build with a modern build system like `make -jN`
or `ninja -jN`, each Clang process might also be configured to use
multiple threads for its own tasks (e.g., via `--offload-jobs=4`). This
can lead to an explosion of threads (N * 4), causing heavy system load,
CPU contention, and ultimately slowing down the entire build.
This patch allows Clang to act as a cooperative client of the build
system's jobserver. It extends the `--offload-jobs` option to accept the
value 'jobserver'. With the recent addition of jobserver support to the
Ninja build system, this functionality now benefits users of both Make
and Ninja.
When `--offload-jobs=jobserver` is specified, Clang's thread pool will:
1. Parse the MAKEFLAGS environment variable to find the jobserver
details.
2. Before dispatching a task, acquire a job slot from the jobserver. If
none are available, the worker thread will block.
3. Release the job slot once the task is complete.
This ensures that the total number of active offload tasks across all
Clang processes does not exceed the limit defined by the parent build
system, leading to more efficient and controlled parallel builds.
Implementation:
- A new library, `llvm/Support/Jobserver`, is added to provide a
platform-agnostic client for the jobserver protocol, with backends for
Unix (FIFO) and Windows (semaphores).
- `llvm/Support/ThreadPool` and `llvm/Support/Parallel` are updated with
a `jobserver_concurrency` strategy to integrate this logic.
- The Clang driver and linker-wrapper are modified to recognize the
'jobserver' argument and enable the new thread pool strategy.
- New unit and integration tests are added to validate the feature.
Add MappedFileRegionArena which can be served as a file system backed
persistent memory allocator. The allocator works like a
BumpPtrAllocator,
and is designed to be thread safe and process safe.
The implementation relies on the POSIX compliance of file system and
doesn't work on all file systems. If the file system supports lazy tail
(doesn't allocate disk space if the tail of the large file is not used),
user has more flexibility to declare a larger capacity.
The allocator works by using a atomically updated bump ptr at a location
that can be customized by the user. The atomic pointer points to the
next available space to allocate, and the allocator will resize/truncate
to current usage once all clients closed the allocator.
Windows implementation contributed by: @hjyamauchi
Add parameter to file lock API to allow exclusive file lock. Both Unix
and Windows support lock the file exclusively for write for one process
and LLVM OnDiskCAS uses exclusive file lock to coordinate CAS creation.
When an llvm tool crashes (e.g. from a segmentation fault),
SignalHandler will re-raise the signal. The effect is that crash reports
now contain SignalHandler in the stack trace. The crash reports are
still useful, but the presence of SignalHandler can confuse tooling and
automation that deduplicate or analyze crash reports.
rdar://150464802
This patch is part of a series that adds origin-tracking to the debugify
source location coverage checks, allowing us to report symbolized stack
traces of the point where missing source locations appear.
This patch adds a pair of new functions in `signals.h` that can be used
to collect and symbolize stack traces respectively. This has major
implementation overlap with the existing stack trace
collection/symbolizing methods, but the existing functions are
specialized for dumping a stack trace to stderr when LLVM crashes, while
these new functions are meant to be called repeatedly during the
execution of the program, and therefore we need a separate set of
functions.
#143514 broke the `clang-solaris11-sparcv9` bot; from what I can tell
that’s Solaris and according to `SolarisTargetInfo::getOSDefines`, the
macro `__sun__` should be defined on Solaris, so check for that and
don’t try to query the terminal size if it is defined.
Not sure this is the best solution but hopefully it fixes the bot.
On unix systems, we were trying to determine the terminal width using
the `COULMNS` environment variable. Unfortunately, `COLUMNS` is not
exported by all shells and thus not available on some systems.
We were previously using `ioctl()` for this; fall back to doing so if `COLUMNS`
does not exist or does not store a positive integer.
This essentially reverts a3eb3d3d92d037fe3c9deaad87f6fc42fe9ea766 and
parts of https://reviews.llvm.org/D61326.
For more information, see #139499.
Fixes#139499.
In Unix/DynamicLibrary.inc, it was already known that Cygwin required
use of `RTLD_DEFAULT` as the `Handle` parameter to `DLSym` to search all
modules for a symbol. Unfortunately, RTLD_DEFAULT is defined as NULL, so
the existing checks of the `Process` handle meant `DLSym` would never be
called on Cygwin. Use the existing `&Invalid` sentinel instead of
`nullptr` for the `Process` handle.
This commit resolves a potential issue of working with uninitialized
memory when querying the CPU's affinity. The man page of
`sched_getaffinity` does not guarantee that the memory will be fully
overwritten, so this change should ensure that issues are avoided.
In case of an error, the DL_info struct may have been left
uninitialized, so it is not safe to use its members.
In one error case, initialize dli_sname to nullptr explicitly, so that
the later check against nullptr is guaranteed to be safe.
Cygwin types sometimes do not match Linux exactly. Like in this case:
```
In file included from /h/projects/llvm-project/llvm/include/llvm/Support/Error.h:23,
from /h/projects/llvm-project/llvm/include/llvm/Support/FileSystem.h:34,
from /h/projects/llvm-project/llvm/lib/Support/Signals.cpp:22:
/h/projects/llvm-project/llvm/include/llvm/Support/Format.h: In instantiation of ‘llvm::format_object<Ts>::format_object(const char*, const Ts& ...) [with Ts = {int, char [4096]}]’:
/h/projects/llvm-project/llvm/include/llvm/Support/Format.h:126:10: required from ‘llvm::format_object<Ts ...> llvm::format(const char*, const Ts& ...) [with Ts = {int, char [4096]}]’
/h/projects/llvm-project/llvm/lib/Support/Unix/Signals.inc:850:19: required from here
/h/projects/llvm-project/llvm/include/llvm/Support/Format.h:106:34: error: no matching function for call to ‘std::tuple<int, char [4096]>::tuple(const int&, const char [4096])’
106 | : format_object_base(fmt), Vals(vals...) {
| ^~~~~~~~~~~~~
```
Casting here is safe and solves the issue.
Let the actual syscall error if the file doesn't exist. This produces
a more standard "no such file or directory" phrasing of the error
message,
and avoids an extra step.
The same antipattern appears in the windows code, we should probably
fix that one too.
Prefers the page size to come from the AUX vector, `getpagesize` is
removed from POSIX.1-2001. Also throws in a couple asserts to ensure the
page size is a valid value.
Otherwise, the handler "swallows" the signal and the process continues
to execute. While this use case is peculiar, ignoring these signals
entirely seems more odd.
Closes#124652.
This header was introduced in
536736995b,
but it appears that including only `fnctl.h` should be enough.
Hopefully, this patch will not cause build issues on other Unix
platforms.
Prevents avoidable memory leaks.
Looks like exchange added in aa1333a91f8d8a060bcf5b14aa32a6e8bab74e8c
didn't take "continue" into account.
```
==llc==2150782==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 10 byte(s) in 1 object(s) allocated from:
#0 0x5f1b0f9ac14a in strdup llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:593:3
#1 0x5f1b1768428d in FileToRemoveList llvm-project/llvm/lib/Support/Unix/Signals.inc:105:55
```
These are unneeded even on AIX, PURE_WINDOWS, and ZOS (per #104706)
* HAVE_ERRNO_H: introduced by 1a93330ffa2ae2aa0b49461f05e6f0d51e8443f8 (2009) but unneeded.
The guarded ABI is unconditionally used by lldb.
* HAVE_FCNTL_H
* HAVE_FENV_H
* HAVE_SYS_STAT_H
Pull Request: https://github.com/llvm/llvm-project/pull/123087
The system call `__CELQTBCK()` is used to build a backtrace like
on other systems. The collected information are the address of the PC,
the address of the entry point (EP), the difference between both
addresses (+EP), the dynamic storage area (DSA aka the stack
pointer), and the function name.
The system call is described here:
https://www.ibm.com/docs/en/zos/3.1.0?topic=cwicsa6a-celqtbck-also-known-as-celqtbck-64-bit-traceback-service
This header was included after the implementations to work around an
issue with FreeBSD, however, , this causes some issues when
dllexport\explicit visibility
attributes will be added to the headers on Windows, since the
definitions need to see the declarations for the attributes to apply.
This is part of the work to enable LLVM_BUILD_LLVM_DYLIB and plugins on
windows.
---------
Co-authored-by: Tom Stellard <tstellar@redhat.com>