Mostly adding feature flags from the newest SDK.
(Note that in addition to the obvious, this also affects the compiler-rt
SME ABI routines, which rely on FEAT_SME and FEAT_SME2.)
SVE depends on a combination of host support and operating system
support. Sometimes those don't line up with detected host CPU name; make
sure SVE is disabled when it isn't available. Implement this for both
Windows and Linux. (We don't have a codepath for other operating
systems. If someone wants to implement this, it should be possible to
adapt fmv code from compiler-rt.)
While I'm here, also add support for detecting other Windows CPU
features.
For Windows, declare constants ourselves so the code builds on older
SDKs; we also do this in compiler-rt.
Per Intel Architecture Instruction Set Extensions Programming Reference
rev. 59 (https://cdrdv2.intel.com/v1/dl/getContent/671368), Revision
History entry for revision -59, AMX-TRANSPOSE was removed
This patch replaces LLVM_ATTRIBUTE_UNUSED with [[maybe_unused]]. Note
that this patch adjusts the placement of [[maybe_unused]] to comply
with the C++17 language.
The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343
We have warned these options in LLVM21 through #132542. This patch
removes underlying implementations in LLVM22.
The new `sys::detail::getHostCPUNameForARM` for Windows (#151596) was
implemented using a C++ bit-field, which caused the associated unit
tests to fail on big-endian machines as it assumed a little-endian
layout.
This change switches from the C++ bit-field to LLVM's `BitField` type
instead.
Uses the `CP 4000` registry keys under
`HKLM\HARDWARE\DESCRIPTION\System\CentralProcessor\*` to get the
Implementer and Part, which is then provided to a modified form of
`getHostCPUNameForARM` to map to a CPU.
On my local Surface Pro 11 `llc --version` reports:
```
> .\build\bin\llc.exe --version
LLVM (http://llvm.org/):
LLVM version 22.0.0git
Optimized build with assertions.
Default target: aarch64-pc-windows-msvc
Host CPU: oryon-1
```
These changes allow LLVM and Clang to be built with Clang targeting
Arm64EC using the MSVC linker.
Built with these options:
```
-DLLVM_ENABLE_PROJECTS="clang"
-DLLVM_HOST_TRIPLE=arm64ec-pc-windows-msvc
-DCMAKE_C_COMPILER=clang-cl.exe
-DCMAKE_C_COMPILER_TARGET=arm64ec-pc-windows-msvc
-DCMAKE_CXX_COMPILER=clang-cl.exe
-DCMAKE_CXX_COMPILER_TARGET=arm64ec-pc-windows-msvc
-DCMAKE_LINKER_TYPE=MSVC
```
This patch adds support for -mcpu=gb10 (NVIDIA GB10). This is a
big.LITTLE cluster of Cortex-X925 and Cortex-A725 cores. The appropriate
MIDR numbers are added to detect them in -mcpu=native.
We did not add an -mcpu=cortex-x925.cortex-a725 option because GB10 does
include the crypto instructions which we want on by default, and the
current convention is to not enable such extensions for Arm Cortex cores
in -mcpu where they are optional in the IP.
Relevant GCC patch:
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/687005.html
We can get the `mvendorid/marchid/mimpid` via hwprobe and then we
can compare these IDs with those defined in processors to find the
CPU name.
With this change, `-mcpu/-mtune=native` can set the proper name.
This patch adds initial support for the recently announced Armv9
Cortex-A320 processor.
For more information, including the Technical Reference Manual, see:
https://developer.arm.com/Processors/Cortex-A320
---------
Co-authored-by: Oliver Stannard <oliver.stannard@arm.com>
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.
This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
This patch adds support for the next-generation arch15
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Detection of arch15 as host processor.
- Assembler/disassembler support for new instructions.
- Exploitation of new instructions for code generation.
- New vector (signed|unsigned|bool) __int128 data types.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10305.
Note: No currently available Z system supports the arch15
architecture. Once new systems become available, the
official system name will be added as supported -march name.
Add Apple M4 host detection, which fixes
https://github.com/rust-lang/rust/issues/133414.
Also add support for older ARM families (this is likely never going to
get used, since only macOS is officially supported as host OS, but nice
to have for completeness sake). Error handling (checking
`CPUFAMILY_UNKNOWN`) is also included here.
Finally, add links to extra documentation to make it easier for others
to update this in the future.
NOTE: These values are taken from `mach/machine.h` the Xcode 16.2 SDK,
and has been confirmed on an M4 Max in
https://github.com/rust-lang/rust/issues/133414#issuecomment-2499123337.
Two options for clang
-mdiv32: Use div.w[u] and mod.w[u] instructions with input not
sign-extended.
-mno-div32: Do not use div.w[u] and mod.w[u] instructions with input not
sign-extended.
The default is -mno-div32.
Two options for clang
-mld-seq-sa: Do not generate load-load barrier instructions (dbar 0x700)
-mno-ld-seq-sa: Generate load-load barrier instructions (dbar 0x700)
The default is -mno-ld-seq-sa
Two features (i.e. `frecipe` and `lam-bh`) are added to
`sys.getHostCPUFeatures`. More features will be added in future.
In addition, this patch adds the features returned by
`sys.getHostCPUFeature` when `-march=native`.