565 Commits

Author SHA1 Message Date
Marcos Maronas
ce94d63f0f
Make OpenCL an OSType rather than an EnvironmentType. (#170297)
OpenCL was added as an `EnvironmentType` in
https://github.com/llvm/llvm-project/pull/78655, but there is no
explanation as to why it was added as such, even after explicitly asking
in the PR
(https://github.com/llvm/llvm-project/pull/78655#issuecomment-2743162853).
This PR makes it an `OSType` instead, which feels more natural, and
updates tests accordingly.

---------

Co-authored-by: Marcos Maronas <marcos.maronas@intel.com>
2026-02-10 18:45:50 +00:00
Mirko Brkušanin
4280f0d241
[AMDGPU] Add dot4 fp8/bf8 instructions for gfx1170 (#180516) 2026-02-10 12:14:49 +01:00
Ruoyu Qiu
da0ad392ff
[llvm-objdump][AVR] Detect AVR architecture from ELF flags for disassembling (#180468)
Reland #174731, resolve cyclic dependency issue.

The use of LLVM_Object in LLVM_Util would cause cyclic dependency.
Fix cyclic dependency by reimplement `getFeatureSetFromEFlag()`.

Original description:

---

This PR updates llvm-objdump to detect the specific AVR architecture
from the ELF header flags when no specific CPU is provided.

Fixes: https://github.com/llvm/llvm-project/issues/146451

Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
2026-02-09 21:10:14 +08:00
Mirko Brkušanin
45b037cf7a
[AMDGPU] Add fp8/bf8 conversion instructions for gfx1170 (#180191) 2026-02-09 13:56:43 +01:00
Ganesh
a362593e0d
[X86] AMD Zen 6 Initial enablement (#179150)
This patch adds initial support for AMD Zen 6 architecture (znver6):

- Added znver6 CPU target recognition in Clang and LLVM
- Updated compiler-rt CPU model detection for znver6
- Added znver6 to target parser and host CPU detection
- Added znver6 to various optimizer tests

znver6 features: FP16, AVXVNNIINT8, AVXNECONVERT, AVXIFMA (without BMM).
2026-02-07 09:38:10 +05:30
Henrik G. Olsson
eff21afae0
Revert "[llvm-objdump][AVR] Detect AVR architecture from ELF flags for disassembling" (#180252)
Reverts llvm/llvm-project#174731 due to introducing a cyclic dependency
when building LLVM with modules enabled: LLVM_Utils -> LLVM_Object ->
LLVM_Utils
2026-02-06 19:00:32 +00:00
Mirko Brkušanin
20b5849e17
[AMDGPU] Define new target gfx1170 (#180185) 2026-02-06 14:38:50 +01:00
Ruoyu Qiu
d005cb2953
[llvm-objdump][AVR] Detect AVR architecture from ELF flags for disassembling (#174731)
This PR updates llvm-objdump to detect the specific AVR architecture
from the ELF header flags when no specific CPU is provided.

Fixes: #146451

---------

Signed-off-by: RuoyuQiu <cabbaken@outlook.com>
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
Co-authored-by: qiuruoyu <qiuruoyu@hygon.cn>
2026-02-06 08:58:12 +08:00
Min-Yih Hsu
6441f1c9d5
[RISCV] Introduce a new syntax for processor-specific tuning feature strings (#175063)
This patch proposes new a tuning feature string format that helps users
to build a performance model by "configuring" an existing tune CPU,
along with its scheduling model. For example, this string
```
"sifive-x280:single-element-vec-fp64"
```
takes ``sifive-x280`` as the "base" tune CPU and configured it with
``single-element-vec-fp64``. This gives us a performance model that
looks exactly like that of ``sifive-x280``, except some of the 64-bit
vector floating point instructions now produce only a single element per
cycle due to ``single-element-vec-fp64``.

This string could eventually be used in places like ``-mtune`` at the
frontend. Right now, this patch only implements the parser part, which
is put under the TargetParser library.

The grammar for this string is:
```
    tune-cpu      ::= 'tuning CPU name in lower case'
    directive     ::= "[a-zA-Z0-9_-]+"
    tune-features ::= directive ["," directive]*
```
A *directive* can and can only _enable_ or _disable_ a certain tuning
feature from the tuning CPU. A **positive directive**, like the
``single-element-vec-fp64`` we just saw, enables an additional tuning
feature in the associated tuning model.

A **negative directive**, on the other hand, removes a certain tuning
feature. For example, ``sifive-x390`` already has the
``single-element-vec-fp64`` feature, and we can use
"sifive-x390:no-single-element-vec-fp64" to create a new performance
model that looks nearly the same as ``sifive-x390`` except
``single-element-vec-fp64`` being cut out. In this case,
``no-single-element-vec-fp64`` is a negative directive.

There are additional restrictions on what we can put in the list of
directives, please refer to the documentations for more details.

Right now, this string only accepts directives that are explicitly
supported by the tune CPU. For example, "sifive-x280:prefer-w-inst" is
not a valide string as ``prefer-w-inst`` is not supported by
``sifive-x280`` at this moment. Vendors of these processors are expected
to maintain the compatibility of their supported directives across
different versions.

---------

Co-authored-by: Sam Elliott <aelliott@qti.qualcomm.com>
2026-02-05 15:22:07 -08:00
Ian Anderson
639a8d1f1d
[Triple] Make a target triple "os" for firmware (#176272)
Make a Triple::OSType to support a generic "firmware" OS that isn't bare
metal, but isn't tied to a specific hardware platform like macOS or iOS.
Hook up support for the new OSType in the Darwin toolchain.
2026-02-04 12:15:25 -08:00
Phoebe Wang
2f3935bcee
[X86][APX] Disable PP2/PPX generation on Windows (#178122)
The PUSH2/POP2/PPX instructions for APX require updates to the Microsoft
Windows OS x64 calling convention documented at
https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64?view=msvc-170
due to lack of suitable unwinder opcodes that can support APX
PUSH2/POP2/PPX.

The PR request disables this support by default for code robustness;
workloads that choose to explicitly enable this support can change the
default behavior by explicitly specifying the flag options that enable
this support e.g. for experimentation or code paths that do not need
unwinder support.
2026-02-02 18:01:44 +08:00
Mariusz Sikora
6de6f7b46b
[AMDGPU] Define gfx1310 target with ELF number 0x50 (#177355)
For now this is identical to gfx1250.

---------

Co-authored-by: Jay Foad <jay.foad@amd.com>
2026-01-22 17:08:38 +01:00
Nikita Popov
4fe4f23e2f
[TargetParser] Fix fp16 feature name for ARM64 Windows feature detection (#176925)
The feature is called fullfp16, not fp16, see:
979db00b9a/llvm/lib/Target/AArch64/AArch64Features.td (L142)
2026-01-22 09:23:10 +01:00
Jonas Paulsson
8eccda10d2
[SystemZ] Add SP alignment to the DataLayout string. (#176041)
Add '-S64' to the SystemZ datalayout string, to avoid overalignment of
stack objects.

Fixes #173402
2026-01-20 09:54:47 -06:00
Ricardo Jesus
9458d2a0f4
[AArch64][Driver] Allow runtime detection to override default features. (#176340)
Currently, most extensions controlled through -march and -mcpu options
are handled in a bitset of AArch64::ExtensionSet. However, extensions
detected at runtime for native compilation are handled in a separate
list of CPU features; once most of the parsing logic has run, the bitset
is converted to a feature list, added after the features detected at
runtime, and the resulting list is used from there on out.

This has the downside that runtime-detected features are unable to
override default CPU extensions. For example, if a CPU enables +aes in
its processor definition, but aes support is not detected at runtime,
the feature currently remains enabled---even though
unsupported---because default features are enabled after the runtime
logic attempts to disable them.

This patch inserts runtime-detected features directly into the extension
set such that these options can take precedence over extensions enabled
by default. The general parsing order for mcpu=native becomes:
1. CPU defaults;
2. Runtime detection;
3. +featureA+nofeatureB options;
4. Other parsing decisions.

This allows features that are found to be unsupported at runtime to be
removed from the list of features supported by targets that enable them
by default.

While at it, this also disables rng if not detected at runtime.
2026-01-20 13:09:17 +00:00
Shilei Tian
39bd4562ba
[Clang][AMDGPU] Handle wavefrontsize32 and wavefrontsize64 features more robustly (#176599)
We should not allow `-wavefrontsize32` and `-wavefrontsize64` to be
specified at the same time. We should also not allow `-wavefrontsize32`
on a target that only supports `wavefrontsize32`, and the vice versa.
2026-01-19 18:16:29 -05:00
hev
0a9d480fad
[clang][LoongArch] Add support for LoongArch32 (#172619)
This patch adds support for LoongArch32, as introduced in
la-toolchain-conventions v1.2.

Co-authored-by: Sun Haiyong <sunhaiyong@zdbr.net>
Link:
https://github.com/loongson/la-toolchain-conventions/releases/tag/releases%2Fv1.2
Link:
https://gcc.gnu.org/pipermail/gcc-patches/2025-December/703312.html
2026-01-17 16:27:54 +08:00
Henry Linjamäki
9587892183
[Triple] Add "chipstar" OS components (#170655)
This new component is for Clang driver for selecting HIPSPV toolchain.
2026-01-16 08:24:58 -06:00
Shoreshen
26624d51d1
[AMDGPU]Add specific instruction feature for multicast load (#175503) 2026-01-13 09:10:09 +08:00
Philipp Tomsich
43138d6272
[Aarch64] Add support for Ampere1C core (#175442)
This patch adds initial support for the ARMv9.2+ Ampere1C core.
2026-01-12 09:52:23 +01:00
Dan Gohman
597ffbe09d
Rename wasm32-wasi to wasm32-wasip1. (#165345)
This adds code to recognize "wasm32-wasip1", "wasm32-wasip2", and
"wasm32-wasip3" as explicit targets, and adds a deprecation warning when
the "wasm32-wasi" target is used, pointing users to the "wasm32-wasip1"
target.

Fixes #165344.

I'm filing this as a draft PR for now, as I've only just now proposed to
make this change in #165344.
2026-01-10 00:09:06 +00:00
Craig Topper
bafbf2d58d
[RISCV] Add rules for Zca+Zcb+Zcmp+Zcmpt implying Zce. (#175041)
The implication rules need to consider whether F is enabled like was
done for C in #172860.
2026-01-08 20:07:02 -08:00
Francesco Petrogalli
75d025124a
[RISCV] Add basic Mach-O triple support. (#141682)
Based on a patch written by Tim Northover (https://github.com/TNorthover).
2026-01-05 23:18:48 +00:00
Jerry Zhang Jian
fc69c804db
[RISCV] Implement conditional Zca implies C extension rule (#172860)
This change implements the conditional "Zca implies C" rule to match
GCC's behavior (PR119122) and the RISC-V specification for MISA.C.

The rule is:
  - For RV32:
    - No F and no D: Zca alone implies C
    - F but no D: Zca + Zcf implies C
    - F and D: Zca + Zcf + Zcd implies C
  - For RV64:
    - No D: Zca alone implies C
    - D: Zca + Zcd implies C

This fixes multilib matching issues where LLVM-generated march strings
didn't include the C extension when GCC's multilib configurations
expected it.

Reference:
  - GCC PR119122: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119122
- RISC-V Zc spec:
https://github.com/riscv/riscv-isa-manual/blob/main/src/zc.adoc

Signed-off-by: Jerry Zhang Jian <jerry.zhangjian@sifive.com>
2025-12-20 01:15:03 +08:00
Sudharsan Veeravalli
3bf0a8d6e1
[RISCV] Add Xqci feature flag (#172608)
This patch adds an experimental Xqci feature flag that covers all the
sub-extensions in the Qualcomm uC Extension.
2025-12-18 21:32:49 +05:30
Phoebe Wang
d6c2cd69cb
[X86][APX] Check APXSave before enabling APX features (#172834)
According to APX spec 3.1.4.2, APX instructions can normally execute
only when XCR0[APX_F]=1, where APX_F=19.
2025-12-18 22:22:20 +08:00
Zachary Yedidia
2c05ae4b8f
[LFI] Introduce AArch64 LFI Target (#167061)
This PR is the first step towards introducing LFI into LLVM as a new
sub-architecture backend of AArch64. For details, please see the
[RFC](https://discourse.llvm.org/t/rfc-lightweight-fault-isolation-lfi-efficient-native-code-sandboxing-upstream-lfi-target-and-compiler-changes/88380),
which has been approved for AArch64.

This patch creates the `aarch64_lfi` architecture, and marks the
appropriate registers as reserved when it is targeted (`x25`, `x26`,
`x27`, `x28`). It also adds a Clang driver toolchain for targeting LFI,
and updates the compiler-rt CMake to allow builds for the `aarch64_lfi`
target. The patch also includes documentation for LFI and the rewrites
that will be implemented in future patches.

I am planning to split the relevant modifications for LFI into a series
of patches, organized as described below (after this one). Please let me
know if you'd like me to split the changes in a different way, or
provide one big patch.

1. The next patch will introduce the `MCLFIExpander` mechanism for
applying the MC-level rewrites needed by LFI, along with the
`.lfi_expand` and `.lfi_no_expand` assembly directives when targeting
LFI. A preview can be seen on the `lfi-project`
[fork](https://github.com/llvm/llvm-project/compare/main...lfi-project:llvm-project:lfi-patchset/aarch64-pr-2).

2. The following patch will create an `MCLFIExpander` for the AArch64
backend that performs LFI expansions. This patch will contain the
majority of the LFI-specific logic.

3. The final patch will add an optimization to the rewriter that can
eliminate redundant guard instructions that occur within the same basic
block.

We plan to introduce x86-64 support after further discussion and once
the `MCLFIExpander` infrastructure is in place.

Please let me know your feedback, and thank you very much for your help
and guidance in the review process.
2025-12-16 12:51:02 -08:00
dcandler
23f967ada0
[AArch64] Add support for C1 CPUs (#171124)
This patch adds initial support for the Arm v9.3 C1 processors:
* C1-Nano
* C1-Pro
* C1-Premium
* C1-Ultra

For more information on each, see:
https://developer.arm.com/Processors/C1-Nano
https://developer.arm.com/Processors/C1-Pro
https://developer.arm.com/Processors/C1-Premium
https://developer.arm.com/Processors/C1-Ultra

Technical Reference Manual for C1-Nano:
https://developer.arm.com/documentation/107753/latest/

Technical Reference Manual for C1-Pro:
https://developer.arm.com/documentation/107771/latest/

Technical Reference Manual for C1-Premium:
https://developer.arm.com/documentation/109416/latest/

Technical Reference Manual for C1-Ultra:
https://developer.arm.com/documentation/108014/latest/
2025-12-16 14:54:27 +00:00
Mikołaj Piróg
b6f210b215
[X86] Correct CPUID checks for AVX10 (#172350)
This corrects a wrong condition for avx10 (AVX10Ver is always set to
0/1) and corrects how CPUID for avx10 is queried: per ISE table 1-3 we
should query with EAX = 0x24 and ECX = 0x0 -- previously we omitted the
latter.

Issue reported by user Seraphimt here
https://discourse.llvm.org/t/test-for-sys-gethostcpufeatures/89130
2025-12-16 13:59:50 +01:00
Eli Friedman
1b4a74fcdc
[AArch64] Fix typo in 09e57cfd32b0073b63d568835f07251e0d51affb (#172354) 2025-12-15 11:15:59 -08:00
Eli Friedman
09e57cfd32
[AArch64] Extend Windows CPU feature detection with more features. (#171930)
Mostly adding feature flags from the newest SDK.

(Note that in addition to the obvious, this also affects the compiler-rt
SME ABI routines, which rely on FEAT_SME and FEAT_SME2.)
2025-12-15 10:56:17 -08:00
Nikita Popov
b7c0452a9a
[PowerPC][AIX] Specify correct ABI alignment for double (#144673)
Add `f64:32:64` to the data layout for AIX, to indicate that doubles
have a 32-bit ABI alignment and 64-bit preferred alignment.

Clang was already taking this into account, but it was not reflected in
LLVM's data layout.

A notable effect of this change is that `double` loads/stores with 4
byte alignment are no longer considered "unaligned" and avoid the
corresponding unaligned access legalization. I assume that this is
correct/desired for AIX. (The codegen previously already relied on this
in some places related to the call ABI simply by dint of assuming
certain stack locations were 8 byte aligned, even though they were only
actually 4 byte aligned.)

Fixes https://github.com/llvm/llvm-project/issues/133599.
2025-12-11 08:57:26 +01:00
Mirko Brkušanin
5759a3a779
[AMDGPU] Add s_wakeup_barrier instruction for gfx1250 (#170501) 2025-12-10 09:45:13 +01:00
Craig Topper
d18cdc99bc
[RISCVInsertVSETVLI] Don't allow getSEW/getLMUL to be called for hasSEWLMULRatioOnly(). NFC (#171554)
Refactor some logic in transferBefore to handle hasSEWLMULRatioOnly()
before calling getSEW/getLMUL.

Update adjustIncoming to use getSEWLMULRatio(). Update the interface of
RISCVVType::getSameRatioLMUL to take the ratio instead of SEW and LMUL.
Update the few other callers to call RISCVVType::getSEWLMULRatio first.
2025-12-09 22:06:15 -08:00
Alexandros Lamprineas
1b82c16fa8
[FMV][AArch64] Allow user to override version priority. (#150267)
Implements https://github.com/ARM-software/acle/pull/404

This allows the user to specify "featA+featB;priority=[1-255]" where
priority=255 means highest priority. If the explicit priority string is
omitted then the priority of "featA+featB" is implied, which is lower
than priority=1.

Internally this gets expanded using special FMV features P0 ... P7 which
can encode up to 256-1 priority levels (excluding all zeros). Those do
not have corresponding detection bit at pos FEAT_#enum so I made this
field optional in FMVInfo. Also they don't affect the codegen or name
mangling of versioned functions.
2025-12-09 13:31:10 +00:00
Nikita Popov
9dc3255cb9
[Clang] Use DataLayout from TargetParser (#171135)
This switches clang to use the data layouts from TargetParser, instead
of maintaining its own copy of data layouts, which are required to match
the backend data layouts.

For now I've kept explicit calls to resetDataLayout(), just with the
argument implied by the triple and ABI. Ideally this would happen
automatically, but the way these classes are initialized currently
doesn't offer a great place to do this.

Previously resetDataLayout() also set the UserLabelPrefix. I've
separated this out, with a reasonable default so that most targets don't
need to worry about it.

I've kept the explicit data layouts for TCE and SPIR (without the V).
These seem to not correspond to real LLVM targets.

I've also fixed the XCore data layout in TargetParser, which was
incorrectly set to the same one as Xtensa. It was previously unused.
2025-12-09 07:42:02 +00:00
Mikołaj Piróg
e3044cd552
[X86] Sync multiversion features with libgcc and refactor internal feature tables (#168750)
Compiler-rt internal feature table is synced with the one in libgcc
(common/config/i386/i386-cpuinfo.h).

LLVM internal feature table is refactored to include a field ABI_VALUE,
so we won't be relying on ordering to keep the values correct. The table
is also synced to the one in compiler-rt.
2025-11-27 15:29:16 +01:00
Eli Friedman
590bb3e8e6
[AArch64] Improve host feature detection. (#160410)
SVE depends on a combination of host support and operating system
support. Sometimes those don't line up with detected host CPU name; make
sure SVE is disabled when it isn't available. Implement this for both
Windows and Linux. (We don't have a codepath for other operating
systems. If someone wants to implement this, it should be possible to
adapt fmv code from compiler-rt.)

While I'm here, also add support for detecting other Windows CPU
features.

For Windows, declare constants ourselves so the code builds on older
SDKs; we also do this in compiler-rt.
2025-11-24 14:08:50 -08:00
Shoreshen
52a58a4193
[AMDGPU] Adding instruction specific features (#167809) 2025-11-19 11:06:00 +08:00
Kazu Hirata
99bf41cd11
[TargetParser] Use range-based for loops (#168296)
While I am at it, this patch converts one of the loops to use
llvm::is_contained.

Identified with modernize-loop-convert.
2025-11-17 07:59:45 -08:00
Mikołaj Piróg
b6fd3c62bb
[X86] Enable APX and AVX10.2 on NVL (#168061)
Per Intel Architecture Instruction Set Extensions Programming Reference
rev. 60 (https://cdrdv2.intel.com/v1/dl/getContent/671368), table 1-2,
NVL supports APX and AVX10.2
2025-11-17 15:46:58 +01:00
Kazu Hirata
2394eb1180
[TargetParser] Avoid repeated hash lookups (NFC) (#168216) 2025-11-16 08:08:39 -08:00
Mikołaj Piróg
8f6c7aa2b1
[X86] Remove vector length (256 vs 512) distinction of AVX10 (#167736)
As in title. AVX10.x doesn't distinguish between available vector
lengths.

-mattr=avx10.x-512 and defining of macros with _512 is kept for compatibility. 

Bit-positions of avx10.1/2 features in compiler-rt and X86TargetParser
are synced to match those in the gcc.
2025-11-15 15:51:06 +01:00
serge-sans-paille
04b05998b1
Remove unused <array> and <list> inclusion (#167116) 2025-11-09 15:15:10 +00:00
Walter Lee
0902a6b8de
Add missing #include (fix for #166997) 2025-11-08 16:37:31 -05:00
Amit Kumar Pandey
36d477850f
[ASan] Skip explicit check of 'xnack' feature for gfx1250 && gfx1251. (#166754)
Xnack processing is essential and performed at the frontend to enable
ASan instrumentation for AMDGPU device code. Certain AMDGPU subtargets
like gfx1250 && gfx1251 don't have to enable 'xnack+' explictly in
'--offload-arch=' for device ASan instrumentation.
2025-11-06 21:42:42 +05:30
Jakub Kuderski
4c21d0cb14
[ADT] Prepare to deprecate variadic StringSwitch::Cases. NFC. (#166020)
Update all uses of variadic `.Cases` to use the initializer list
overload instead. I plan to mark variadic `.Cases` as deprecated in a
followup PR.

For more context, see https://github.com/llvm/llvm-project/pull/163117.
2025-11-02 00:12:33 +00:00
Mikołaj Piróg
5322fb6268
[X86] Remove AMX-TRANSPOSE (#165556)
Per Intel Architecture Instruction Set Extensions Programming Reference
rev. 59 (https://cdrdv2.intel.com/v1/dl/getContent/671368), Revision
History entry for revision -59, AMX-TRANSPOSE was removed
2025-10-31 12:50:21 +01:00
Jens Reidel
331b3eb489
[PowerPC] Take ABI into account for data layout (#149725)
Prior to this change, the data layout calculation would not account for
explicitly set `-mabi=elfv2` on `powerpc64-unknown-linux-gnu`, a target
that defaults to `elfv1`.

This is loosely inspired by the equivalent ARM / RISC-V code.

`make check-llvm` passes fine for me, though AFAICT all the tests
specify the data layout manually so there isn't really a test for this
and I am not really sure what the best way to go about adding one would
be.

Signed-off-by: Jens Reidel <adrian@travitia.xyz>
2025-10-31 10:30:53 +01:00
Kazu Hirata
817aff6960
[llvm] Use nullptr instead of 0 or NULL (NFC) (#165396)
Identified with modernize-use-nullptr.
2025-10-28 16:15:01 -07:00