This header was included after the implementations to work around an
issue with FreeBSD, however, , this causes some issues when
dllexport\explicit visibility
attributes will be added to the headers on Windows, since the
definitions need to see the declarations for the attributes to apply.
This is part of the work to enable LLVM_BUILD_LLVM_DYLIB and plugins on
windows.
---------
Co-authored-by: Tom Stellard <tstellar@redhat.com>
I'm on a system that has have_pthread_getname_np defined but set to 0.
The correct behavior here is not to use the function, but presently the
macros will attempt to use a non-existing function.
This previously worked before
https://github.com/llvm/llvm-project/pull/106486/files
I was profiling our compiler and noticed that `llvm::get_threadid` was
at the top of the hotlist, taking up a surprising 5% (7 seconds) in the
profile trace. It seems that computing this on MacOS systems is
non-trivial, so cache the result in a thread_local.
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
The threading library does not recognize AIX and always returns `-1` for
number of physical cores on AIX. This PR teaches the library to
recognize AIX and obtain the correct value for the number of physical
cores.
This change is focussed on simplifying `Support/Host.h` to only do
target detection. In this case, this function is close in usage to
existing functions in `Support/Threading.h`, so I moved it into there.
The function is also renamed to `llvm::get_physical_cores()` to match
the style of threading's functions.
The big change here is that now if you have threading disabled,
`llvm::get_physical_cores()` will return -1, as if it had not been able
to work out the right info. This is due to how Threading.cpp includes
OS-specific code/headers. This seems ok, as if threading is disabled,
LLVM should not need to know the number of physical cores.
Differential Revision: https://reviews.llvm.org/D137836
Apply clang-format on llvm/lib/Support/Windows/ and llvm/lib/Support/Unix/ since .inc files in these folders aren't picked up by default. Eventually we need to add this extension in the monorepo .clang-format file.
Differential Revision: https://reviews.llvm.org/D138714
This reverts commit 9969ceb36b440eaafa17c486f29a69c7a7da3b3b.
On Windows:
lld-link: error: undefined symbol: int __cdecl computeHostNumPhysicalCores(void)
>>> referenced by LLVMSupport.lib(Support.Host.obj):(int __cdecl llvm::sys::getHostNumPhysicalCores(void))
On Apple Silicon Macs, using a Darwin thread priority of PRIO_DARWIN_BG seems to
map directly to the QoS class Background. With this priority, the thread is
confined to efficiency cores only, which makes background indexing take forever.
Introduce a new ThreadPriority "Low" that sits in the middle between Background
and Default, and maps to QoS class "Utility" on Mac. Make this new priority the
default for indexing. This makes the thread run on all cores, but still lowers
priority enough to keep the machine responsive, and not interfere with
user-initiated actions.
I didn't change the implementations for Windows and Linux; on these systems,
both ThreadPriority::Background and ThreadPriority::Low map to the same thread
priority. This could be changed as a followup (e.g. by using SCHED_BATCH for Low
on Linux).
See also https://github.com/clangd/clangd/issues/1119.
Reviewed By: sammccall, dgoldman
Differential Revision: https://reviews.llvm.org/D124715
FreeBSD's condvar.h (included by user.h in Threading.inc) uses a "struct
thread" that conflicts with llvm::thread if both are visible when it's
included.
So this moves our #include after the FreeBSD code.
This adds a new llvm::thread class with the same interface as std::thread
except there is an extra constructor that allows us to set the new thread's
stack size. On Darwin even the default size is boosted to 8MB to match the main
thread.
It also switches all users of the older C-style `llvm_execute_on_thread` API
family over to `llvm::thread` followed by either a `detach` or `join` call and
removes the old API.
Moved definition of DefaultStackSize into the .cpp file to hopefully
fix the build on some (GCC-6?) machines.
This adds a new llvm::thread class with the same interface as std::thread
except there is an extra constructor that allows us to set the new thread's
stack size. On Darwin even the default size is boosted to 8MB to match the main
thread.
It also switches all users of the older C-style `llvm_execute_on_thread` API
family over to `llvm::thread` followed by either a `detach` or `join` call and
removes the old API.
This retrieves CPU affinity via FreeBSD's cpuset(2) API, and makes LLVM
respect affinity settings configured by the user via the cpuset(1)
command.
In particular, this allows to reduce the number of threads used on
machines with high core counts, which can interact badly with
parallelized build systems. This is particularly noticable with lld,
which spawns lots of threads even for linking e.g. hello_world!
This fix is related to PR48193, but does not adress the more fundamental
problem, which is that LLVM by default grabs as many CPUs and/or threads
as possible.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D92271
sched_getaffinity (Linux specific) has been available
* in glibc since 2002-08-08 (commit 972e719e8154eec5f543b027e2a08dfa285d55d5)
* in musl since the initial check-in.
Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled.
One can now say in LLD:
/opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified.
/opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency().
/opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask.
When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows).
When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows).
Differential Revision: https://reviews.llvm.org/D75153
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
This reverts commit 40668abca4d307e02b33345cfdb7271549ff48d0.
This causes clang tests to fail, as stacksize=0 is being explicitly passed and
is no longer a no-op.
This roughly mimics `std::thread(...).detach()` except it allows to
customize the stack size. Required for https://reviews.llvm.org/D50993.
I've decided against reusing the existing `llvm_execute_on_thread` because
it's not obvious what to do with the ownership of the passed
function/arguments:
1. If we pass possibly owning functions data to `llvm_execute_on_thread`,
we'll lose the ability to pass small non-owning non-allocating functions
for the joining case (as it's used now). Is it important enough?
2. If we use the non-owning interface in the new use case, we'll force
clients to transfer ownership to the spawned thread manually, but
similar code would still have to exist inside
`llvm_execute_on_thread(_async)` anyway (as we can't just pass the same
non-owning pointer to pthreads and Windows implementations, and would be
forced to wrap it in some structure, and deal with its ownership.
Patch by Dmitry Kozhevnikov!
Differential Revision: https://reviews.llvm.org/D51103
Summary:
We have a multi-platform thread priority setting function(last piece
landed with D58683), I wanted to make this available to all llvm community,
there seem to be other users of such functionality with portability fixmes:
lib/Support/CrashRecoveryContext.cpp
tools/clang/tools/libclang/CIndex.cpp
Reviewers: gribozavr, ioeric
Subscribers: krytarowski, jfb, kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59130
llvm-svn: 358494
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
LLVM_ON_WIN32 is set exactly with MSVC and MinGW (but not Cygwin) in
HandleLLVMOptions.cmake, which is where _WIN32 defined too. Just use the
default macro instead of a reinvented one.
See thread "Replacing LLVM_ON_WIN32 with just _WIN32" on llvm-dev and cfe-dev.
No intended behavior change.
This moves over all uses of the macro, but doesn't remove the definition
of it in (llvm-)config.h yet.
llvm-svn: 331127
Summary:
Due to some android peculiarities, in some build configurations
(statically linked executables targeting older releases) we could detect
the presence of these functions (because they are present in libc.a,
where check_library_exists searches), but then fail to build because the
headers did not include the definition.
This attempts to remedy that by upgrading the check_library_exists to
check_symbol_exists, which will check that the function is declared too.
I am hoping that a more thorough check will make the messy #ifdef we
have accumulated in the code obsolete, so I optimistically try to remove
them.
Reviewers: zturner, kparzysz, danalbert
Subscribers: srhines, mgorny, krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D45359
llvm-svn: 330251
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
Do not include <sys/user.h> on NetBSD. It's dead file and will be removed.
No need to include <sys/sysctl.h> in this code context on NetBSD.
llvm-svn: 296973
pthread_self() returns a pthread_t, but we were setting it to
an int. It seems the cast to int when calling sysctl is still
the correct thing to do, though.
llvm-svn: 296892