42 Commits

Author SHA1 Message Date
Kazu Hirata
15cb5ebed7 [Support] Use llvm::popcount (NFC)
This should fix builds on Windows.
2023-02-12 13:39:18 -08:00
Alexandre Ganea
e66500c774 [Support] On Windows 11 and Windows Server 2022, fix an affinity mask issue on large core count machines
Before Windows 11 and Windows Server 2022, only one 'processor group' is assigned by default to a starting process, then the program is responsible for dispatching its own threads on more 'processor groups'. That is what 8404aeb56a73ab24f9b295111de3b37a37f0b841 was doing, allowing LLVM tools to automatically use all hardware threads in the machine.

After Windows 11 and Windows Server 2022, the OS takes care of that. This has an adverse effect reported in #56618 which is that using `GetProcessAffinityMask()` API in some edge cases seems buggy now. That API is used to detect if an affinity mask was set, and adjust accordingly the available threads for a ThreadPool.

With this patch, on one hand, we let the OS dispatch threads on all 'processor groups', but only for Windows 11 & Windows Server 2022 and after. We retain the old behavior for older OS versions. On the other hand, a workaround was added to mitigate the `GetProcessAffinityMask()` issue described above (see Threading.inc, L226).

Differential Revision: https://reviews.llvm.org/D138747
2023-01-06 17:03:43 -05:00
Fangrui Song
b1df3a2c0b [Support] llvm::Optional => std::optional
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-16 08:49:10 +00:00
Kazu Hirata
1f421b6d7e [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-06 22:45:17 -08:00
Kazu Hirata
1ea9dd3270 [llvm] Use std::nullopt instead of llvm::None (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-05 23:50:04 -08:00
Fangrui Song
7f5d0aa527 Threading: Convert Optional to std::optional
dthorn: Relanding after fix-forward.
2022-12-01 15:57:17 -08:00
Daniel Thornburgh
8f0aa9df11 Revert "Threading: Convert Optional to std::optional"
This reverts commit 5e50b8089aee249d77542ea858d956568ec6581f.

This commit breaks the build for BOLT:
  bolt/lib/Profile/DataAggregator.cpp:264:66: error: no viable
  conversion from 'Optional<StringRef>[3]' to
  'ArrayRef<std::optional<StringRef>>'
2022-12-01 15:42:25 -08:00
Fangrui Song
5e50b8089a Threading: Convert Optional to std::optional 2022-12-01 22:36:05 +00:00
Archibald Elliott
3c97f6cab9 [Support] Move getHostNumPhysicalCores to Threading.h
This change is focussed on simplifying `Support/Host.h` to only do
target detection. In this case, this function is close in usage to
existing functions in `Support/Threading.h`, so I moved it into there.
The function is also renamed to `llvm::get_physical_cores()` to match
the style of threading's functions.

The big change here is that now if you have threading disabled,
`llvm::get_physical_cores()` will return -1, as if it had not been able
to work out the right info. This is due to how Threading.cpp includes
OS-specific code/headers. This seems ok, as if threading is disabled,
LLVM should not need to know the number of physical cores.

Differential Revision: https://reviews.llvm.org/D137836
2022-11-29 13:14:13 +00:00
Alexandre Ganea
f1aa7348d3 [Support] Apply clang-format on .inc files. NFC.
Apply clang-format on llvm/lib/Support/Windows/ and llvm/lib/Support/Unix/ since .inc files in these folders aren't picked up by default. Eventually we need to add this extension in the monorepo .clang-format file.

Differential Revision: https://reviews.llvm.org/D138714
2022-11-26 09:36:43 -05:00
Fangrui Song
875adb4007 Host: Internalize computeHostNumPhysicalCores/computeHostNumHardwareThreads
Windows computeHostNumPhysicalCores is defined by Threading.cpp. Leave it
unchanged.
2022-11-23 21:09:45 -08:00
Fangrui Song
93b553e3f2 Revert "Host: Internalize computeHostNumPhysicalCores/computeHostNumHardwareThreads"
This reverts commit 9969ceb36b440eaafa17c486f29a69c7a7da3b3b.

On Windows:

lld-link: error: undefined symbol: int __cdecl computeHostNumPhysicalCores(void)
>>> referenced by LLVMSupport.lib(Support.Host.obj):(int __cdecl llvm::sys::getHostNumPhysicalCores(void))
2022-11-23 20:12:16 -08:00
Fangrui Song
9969ceb36b Host: Internalize computeHostNumPhysicalCores/computeHostNumHardwareThreads 2022-11-23 17:44:04 -08:00
Fangrui Song
67854f9ed0 Use value_or instead of getValueOr. NFC 2022-06-29 21:55:02 -07:00
stk
9902a0945d Add ThreadPriority::Low, and use QoS class Utility on Mac
On Apple Silicon Macs, using a Darwin thread priority of PRIO_DARWIN_BG seems to
map directly to the QoS class Background. With this priority, the thread is
confined to efficiency cores only, which makes background indexing take forever.

Introduce a new ThreadPriority "Low" that sits in the middle between Background
and Default, and maps to QoS class "Utility" on Mac. Make this new priority the
default for indexing. This makes the thread run on all cores, but still lowers
priority enough to keep the machine responsive, and not interfere with
user-initiated actions.

I didn't change the implementations for Windows and Linux; on these systems,
both ThreadPriority::Background and ThreadPriority::Low map to the same thread
priority. This could be changed as a followup (e.g. by using SCHED_BATCH for Low
on Linux).

See also https://github.com/clangd/clangd/issues/1119.

Reviewed By: sammccall, dgoldman

Differential Revision: https://reviews.llvm.org/D124715
2022-05-16 10:01:49 +02:00
Tim Northover
0c39f82f0b [Support] reorder Threading includes to avoid conflict with FreeBSD headers
FreeBSD's condvar.h (included by user.h in Threading.inc) uses a "struct
thread" that conflicts with llvm::thread if both are visible when it's
included.

So this moves our #include after the FreeBSD code.
2021-07-09 10:39:52 +01:00
Tim Northover
48c68a630e Recommit: Support: add llvm::thread class that supports specifying stack size.
This adds a new llvm::thread class with the same interface as std::thread
except there is an extra constructor that allows us to set the new thread's
stack size. On Darwin even the default size is boosted to 8MB to match the main
thread.

It also switches all users of the older C-style `llvm_execute_on_thread` API
family over to `llvm::thread` followed by either a `detach` or `join` call and
removes the old API.

Moved definition of DefaultStackSize into the .cpp file to hopefully
fix the build on some (GCC-6?) machines.
2021-07-08 16:22:26 +01:00
Tim Northover
2bf5e8d953 Revert "Support: add llvm::thread class that supports specifying stack size."
It's causing build failures because DefaultStackSize isn't defined everywhere
it should be and I need time to investigate.
2021-07-08 14:59:47 +01:00
Tim Northover
727e1c9be3 Support: add llvm::thread class that supports specifying stack size.
This adds a new llvm::thread class with the same interface as std::thread
except there is an extra constructor that allows us to set the new thread's
stack size. On Darwin even the default size is boosted to 8MB to match the main
thread.

It also switches all users of the older C-style `llvm_execute_on_thread` API
family over to `llvm::thread` followed by either a `detach` or `join` call and
removes the old API.
2021-07-08 14:51:53 +01:00
Alexandre Ganea
4fcb25583c Re-land [Support] On Windows, take the affinity mask into account
The number of hardware threads available to a ThreadPool can be limited if setting an affinity mask.
For example:

    > start /B /AFFINITY 0xF lld-link.exe ...

Would let LLD only use 4 hyper-threads.

Previously, there was an outstanding issue on Windows Server 2019 on dual-CPU machines, which was preventing from using both CPU sockets. In normal conditions, when no affinity mask was set, ProcessorGroup::AllThreads was different from ProcessorGroup::UsableThreads. The previous code in llvm/lib/Support/Windows/Threading.inc L201 was improperly assuming those two values to be equal, and consequently was limiting the execution to only one CPU socket.

Differential Revision: https://reviews.llvm.org/D92419
2021-01-14 17:03:22 -05:00
Alexandre Ganea
eec856848c Revert "[Support] On Windows, take the affinity mask into account"
This reverts commit 336ab2d51dfdd5ca09c2a9c506453db4fe653584.
2021-01-13 21:34:54 -05:00
Alexandre Ganea
336ab2d51d [Support] On Windows, take the affinity mask into account
The number of hardware threads available to a ThreadPool can be limited if setting an affinity mask.
For example:

> start /B /AFFINITY 0xF lld-link.exe ...

Would let LLD only use 4 hyper-threads.

Previously, there was an outstanding issue on Windows Server 2019 on dual-CPU machines, which was preventing from using both CPU sockets. In normal conditions, when no affinity mask was set, ProcessorGroup::AllThreads was different from ProcessorGroup::UsableThreads. The previous code in llvm/lib/Support/Windows/Threading.inc L201 was improperly assuming those two values to be equal, and consequently was limiting the execution to only one CPU socket.

Differential Revision: https://reviews.llvm.org/D92419
2021-01-13 21:00:09 -05:00
Alexandre Ganea
09158252f7 [ThinLTO] Allow usage of all hardware threads in the system
Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled.

One can now say in LLD:
/opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified.
/opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency().
/opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask.

When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows).
When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows).

Differential Revision: https://reviews.llvm.org/D75153
2020-03-27 10:20:58 -04:00
Hans Wennborg
01f9abbb50 llvm-ar: Fix MinGW compilation
llvm-ar is using CompareStringOrdinal which is available
only starting with Windows Vista (WINVER 0x600).

Fix this by hoising WindowsSupport.h, which sets _WIN32_WINNT
to 0x0601, up to llvm/include/llvm/Support and use it in llvm-ar.

Patch by Cristian Adam!

Differential revision: https://reviews.llvm.org/D74599
2020-02-28 09:59:24 +01:00
Alexandre Ganea
8404aeb56a [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775
2020-02-14 10:24:22 -05:00
Ilya Biryukov
3e54404c71 [Support] fix mingw-w64 build
Older versions of Mingw-w64 do not define _beginthreadex_proc_type,
so we replace it with `unsigned (__stdcall *ThreadFunc)(void *)`.

Fixes https://github.com/clangd/clangd/issues/188

Patch by lh123!

Differential Revision: https://reviews.llvm.org/D69879
2019-11-06 15:18:58 +01:00
Sam McCall
a9c3c176ad Reland "[Support] Add a way to run a function on a detached thread""
This reverts commit 7bc7fe6b789d25d48d6dc71d533a411e9e981237.
The immediate callers have been fixed to pass nullopt where appropriate.
2019-10-23 15:51:44 +02:00
Sam McCall
7bc7fe6b78 Revert "[Support] Add a way to run a function on a detached thread"
This reverts commit 40668abca4d307e02b33345cfdb7271549ff48d0.
This causes clang tests to fail, as stacksize=0 is being explicitly passed and
is no longer a no-op.
2019-10-23 15:10:35 +02:00
Sam McCall
40668abca4 [Support] Add a way to run a function on a detached thread
This roughly mimics `std::thread(...).detach()` except it allows to
customize the stack size. Required for https://reviews.llvm.org/D50993.

I've decided against reusing the existing `llvm_execute_on_thread` because
it's not obvious what to do with the ownership of the passed
function/arguments:

1. If we pass possibly owning functions data to `llvm_execute_on_thread`,
   we'll lose the ability to pass small non-owning non-allocating functions
   for the joining case (as it's used now). Is it important enough?
2. If we use the non-owning interface in the new use case, we'll force
   clients to transfer ownership to the spawned thread manually, but
   similar code would still have to exist inside
   `llvm_execute_on_thread(_async)` anyway (as we can't just pass the same
   non-owning pointer to pthreads and Windows implementations, and would be
   forced to wrap it in some structure, and deal with its ownership.

Patch by Dmitry Kozhevnikov!

Differential Revision: https://reviews.llvm.org/D51103
2019-10-23 12:48:38 +02:00
Kadir Cetinkaya
8fdc5abffe [llvm][Support] Provide interface to set thread priorities
Summary:
We have a multi-platform thread priority setting function(last piece
landed with D58683), I wanted to make this available to all llvm community,
there seem to be other users of such functionality with portability fixmes:
lib/Support/CrashRecoveryContext.cpp
tools/clang/tools/libclang/CIndex.cpp

Reviewers: gribozavr, ioeric

Subscribers: krytarowski, jfb, kristina, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59130

llvm-svn: 358494
2019-04-16 14:32:43 +00:00
Chandler Carruth
2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Zachary Turner
4e83923d83 Don't write #include "Windows/WindowsSupport.h" from the Windows dir.
This generates -Wnonportable-include-dir warnings, and doesn't need
to be there.  It seems this was just checked in on accident.

llvm-svn: 350655
2019-01-08 21:05:34 +00:00
Konstantin Zhuravlyov
4203ea320e Fix C2712 build error on Windows
Move the __try/__except block outside of the set_thread_name function to avoid a conflict with object unwinding due to the use of the llvm::Storage.

Differential Revision: https://reviews.llvm.org/D30707

llvm-svn: 297192
2017-03-07 20:09:46 +00:00
Zachary Turner
1f004c43d2 Try to fix thread name truncation on non-Windows.
llvm-svn: 296976
2017-03-04 18:53:09 +00:00
Zachary Turner
777de77956 Truncate thread names if they're too long.
llvm-svn: 296972
2017-03-04 16:42:25 +00:00
Zachary Turner
757dbc9ff3 [Support] Provide access to current thread name/thread id.
Applications often need the current thread id when making
system calls, and some operating systems provide the notion
of a thread name, which can be useful in enabling better
diagnostics when debugging or logging.

This patch adds an accessor for the thread id, and "best effort"
getters and setters for the thread name.  Since this is
non critical functionality, no error is returned to indicate
that a platform doesn't support thread names.

Differential Revision: https://reviews.llvm.org/D30526

llvm-svn: 296887
2017-03-03 17:15:17 +00:00
Mehdi Amini
0fb2488702 Revert "Revert "Revert 220932.": "Removing the static initializer in ManagedStatic.cpp by using llvm_call_once to initialize the ManagedStatic mutex""
This reverts commit r269577.
Broke NetBSD, waiting for Kamil to investigate

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 269584
2016-05-14 23:44:21 +00:00
Mehdi Amini
c048b6c4cd Revert "Revert 220932.": "Removing the static initializer in ManagedStatic.cpp by using llvm_call_once to initialize the ManagedStatic mutex"
This reverts commit r221331 and reinstate r220932 as discussed in D19271.
Original commit message was:

This patch adds an llvm_call_once which is a wrapper around
std::call_once on platforms where it is available and devoid
of bugs. The patch also migrates the ManagedStatic mutex to
be allocated using llvm_call_once.

These changes are philosophically equivalent to the changes
added in r219638, which were reverted due to a hang on Win32
which was the result of a bug in the Windows implementation
of std::call_once.

Differential Revision: http://reviews.llvm.org/D5922

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 269577
2016-05-14 20:55:52 +00:00
Jiangning Liu
1fb71bc395 Revert 220932.
Commit 220932 caused crash when building clang-tblgen on aarch64 debian target,
so it's blocking all daily tests.

The std::call_once implementation in pthread has bug for aarch64 debian.

llvm-svn: 221331
2014-11-05 04:44:31 +00:00
Yaron Keren
6091fe7db9 #include <winbase.h> is not enough for Visual C++ 2013, it errors:
1>C:\Program Files (x86)\Windows Kits\8.1\Include\um\minwinbase.h(46):
error C2146: syntax error : missing ';' before identifier 'nLength'
1>C:\Program Files (x86)\Windows Kits\8.1\Include\um\minwinbase.h(46):
error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
...

including <windows.h> is actually required.

llvm-svn: 221244
2014-11-04 07:53:30 +00:00
Hans Wennborg
6cda0d7269 Speculative fix for Windows build after r220932
llvm-svn: 220936
2014-10-30 23:10:01 +00:00
Chris Bieneman
14e2bcccfb Removing the static initializer in ManagedStatic.cpp by using llvm_call_once to initialize the ManagedStatic mutex.
Summary:
This patch adds an llvm_call_once which is a wrapper around std::call_once on platforms where it is available and devoid of bugs. The patch also migrates the ManagedStatic mutex to be allocated using llvm_call_once.

These changes are philosophically equivalent to the changes added in r219638, which were reverted due to a hang on Win32 which was the result of a bug in the Windows implementation of std::call_once.

Reviewers: aaron.ballman, chapuni, chandlerc, rnk

Reviewed By: rnk

Subscribers: majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D5922

llvm-svn: 220932
2014-10-30 22:07:09 +00:00