llvm-project

Author	SHA1	Message	Date
Alex Voicu	b08b56381c	[NFC][AMDGPU] Clean-up feature parsing for AMDGCNSPIRV. (#123519 ) When we did the initial AMDGCNSPIRV commits we left the initialisation of the feature map in a relatively disorderly state. This change corrects that oversight: - We make sure that AMDGCNSPIRV actually advertises the union of all AMDGCN features, as some were not included; - We keep feature initialisation in sorted order to make it easy to pick an insertion point when features are added in the future.	2025-01-20 02:30:29 +00:00
Alexandros Lamprineas	831527a5ef	[FMV][GlobalOpt] Statically resolve calls to versioned functions. (#87939 ) To deduce whether the optimization is legal we need to compare the target features between caller and callee versions. The criteria for bypassing the resolver are the following: * If the callee's feature set is a subset of the caller's feature set, then the callee is a candidate for direct call. * Among such candidates the one of highest priority is the best match and it shall be picked, unless there is a version of the callee with higher priority than the best match which cannot be picked from a higher priority caller (directly or through the resolver). * For every higher priority callee version than the best match, there is a higher priority caller version whose feature set availability is implied by the callee's feature set. Example: Callers and Callees are ordered in decreasing priority. The arrows indicate successful call redirections. Caller Callee Explanation ========================================================================= mops+sve2 --+--> mops all the callee versions are subsets of the \| caller but mops has the highest priority \| mops --+ sve2 between mops and default callees, mops wins sve sve between sve and default callees, sve wins but sve2 does not have a high priority caller default -----> default sve (callee) implies sve (caller), sve2(callee) implies sve (caller), mops(callee) implies mops(caller)	2025-01-17 10:49:43 +00:00
Mads Marquart	a082cc145f	Add Apple M4 host detection (#117530 ) Add Apple M4 host detection, which fixes https://github.com/rust-lang/rust/issues/133414. Also add support for older ARM families (this is likely never going to get used, since only macOS is officially supported as host OS, but nice to have for completeness sake). Error handling (checking `CPUFAMILY_UNKNOWN`) is also included here. Finally, add links to extra documentation to make it easier for others to update this in the future. NOTE: These values are taken from `mach/machine.h` the Xcode 16.2 SDK, and has been confirmed on an M4 Max in https://github.com/rust-lang/rust/issues/133414#issuecomment-2499123337.	2025-01-16 08:15:12 -08:00
Shilei Tian	ebef44067b	[LLVM][Triple] Add an argument to specify canonical form to `Triple::normalize` (#122935 ) Currently, the output of `Triple::normalize` can vary depending on how the `Triple` object is constructed, producing a 3-field, 4-field, or even 5-field string. However, there is no way to control the format of the output, as all forms are considered canonical according to the LangRef. This lack of control can be inconvenient when a specific format is required. To address this, this PR introduces an argument to specify the desired format (3, 4, or 5 identifiers), with the default set to none to maintain the current behavior. If the requested format requires more components than are available in the actual `Data`, `"unknown"` is appended as needed.	2025-01-14 20:12:29 -05:00
Martin Storsjö	a829ebadd4	[Triple] Ignore the vendor field for MinGW, wrt LTO/IR compatibility (#122801 ) For MinGW environments, the regular C/C++ toolchains usually use "w64" for the vendor field in triples, while Rust toolchains usually use "pc" in the vendor field. The differences in the vendor field have no bearing on whether the IR is compatible on this platform. (This probably goes for most other OSes as well, but limiting the scope of the change to the specific case.) Add a unit test for the isCompatibleWith, including some existing test cases found in existing tests.	2025-01-15 00:18:52 +02:00
CarolineConcatto	9256485043	[Clang][LLVM][AArch64]Add new feature SSVE-BitPerm (#121947 ) The 20204-12 ISA update release adds a new feature: FEAT_SSVE_BitPerm, which allows the sve-bitperm instructions to run in streaming mode. It also removes the requirement of FEAT_SVE2 for FEAT_SVE_BitPerm. The sve2-bitperm feature is now an alias for sve-bitperm and sve2. A new feature flag sve-bitperm is added to reflect the change that the instructions under FEAT_SVE_BitPerm are supported if: on non streaming mode with FEAT_SVE2 and FEAT_SVE_BitPerm or in streaming mode with FEAT_SME and FEAT_SSVE_BitPerm	2025-01-13 16:34:33 +00:00
quic_hchandel	171d3edd05	[RISCV] Add Qualcomm uC Xqciint (Interrupts) extension (#122256 ) This extension adds eleven instructions to accelerate interrupt servicing. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support. --------- Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>	2025-01-13 16:36:05 +05:30
Steven Perron	d723587686	[HLSL] Explicitly set the SPIR-V version with spv-target-env (#121961 ) In DXC, setting the vulkan version automatically sets the target spir-v version to the maximum spir-v version that the vulkan version must support. So for Vulkan 1.2, we set the spir-v version to spirv 1.5 because every implementation of Vulkan 1.2 must support spirv 1.5, but not spir-v 1.6.	2025-01-09 09:39:56 -05:00
Alexandros Lamprineas	8e65940161	[FMV][AArch64] Simplify version selection according to ACLE. (#121921 ) Currently, the more features a version has, the higher its priority is. We are changing ACLE https://github.com/ARM-software/acle/pull/370 as follows: "Among any two versions, the higher priority version is determined by identifying the highest priority feature that is specified in exactly one of the versions, and selecting that version."	2025-01-08 18:59:07 +00:00
quic_hchandel	737d6ca44d	[RISCV] Add Qualcomm uC Xqcicm (Conditional Move) extension (#121752 ) The Qualcomm uC Xqcicm extension adds 13 conditional move instructions. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support. --------- Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>	2025-01-07 08:25:00 +05:30
Craig Topper	fd38a95586	[TargetParser] Use StringRef::split that takes a char separator instead of StringRef separator. NFC	2025-01-04 12:31:46 -08:00
Sudharsan Veeravalli	532a2691bc	[RISCV] Add Qualcomm uC Xqcicli (Conditional Load Immediate) extension (#121292 ) This extension adds 12 instructions that conditionally load an immediate value. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2025-01-03 06:33:27 +05:30
quic_hchandel	1557eeda73	[RISCV] Add Qualcomm uC Xqciac (Load-Store Adress calculation) extension (#121037 ) This extension adds 3 instructions that perform load-store address calculation. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support. --------- Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com> Co-authored-by: Sudharsan Veeravalli <quic_svs@quicinc.com>	2024-12-29 11:14:12 +05:30
Nick Sarnie	1c16807d0d	[LLVM] Add Intel vendor in Triple (#120250 ) We plan to make use of this in SPIR-V-based OpenMP offloading, for which there is already an initial patch in review. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2024-12-17 12:30:21 -06:00
Phoebe Wang	90968794e2	[X86] Add missing feature USERMSR to DiamondRapids (#120061 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-12-16 20:29:26 +08:00
Sudharsan Veeravalli	668d9688ac	[RISCV] Add Qualcomm uC Xqcilsm (Load Store Multiple) extension (#119823 ) This extension adds 6 instructions that can do multi-word load/store. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2024-12-14 00:06:58 +05:30
pcc	38eaea73ca	TargetParser: AArch64: Add part numbers for Apple CPUs. Part numbers taken from: https://github.com/AsahiLinux/m1n1/blob/main/src/chickens.c Reviewers: ahmedbougacha, jroelofs Reviewed By: jroelofs Pull Request: https://github.com/llvm/llvm-project/pull/119777	2024-12-12 16:52:58 -08:00
quic_hchandel	0614c601b4	[RISCV] Add Qualcomm uC Xqcics(Conditional Select) extension (#119504 ) The Qualcomm uC Xqcics extension adds 8 conditional select instructions. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support. --------- Co-authored-by: Harsh Chandel <hchandel@qti.qualcomm.com>	2024-12-12 11:12:09 +05:30
Alexandros Lamprineas	6f013dbced	[AArch64][FMV] Add missing feature dependencies and detect at runtime. (#119231 ) i8mm -> simd fp16fml -> simd frintts -> fp bf16 -> simd sme -> fp16 Approved in ACLE as https://github.com/ARM-software/acle/pull/368	2024-12-11 22:11:32 +00:00
Kinoshita Kotaro	a1197a2ca8	[AArch64] Add initial support for FUJITSU-MONAKA (#118432 ) This patch adds initial support for FUJITSU-MONAKA CPU (-mcpu=fujitsu-monaka). The scheduling model will be corrected in the future.	2024-12-09 09:56:02 +09:00
Phoebe Wang	a63931292b	[X86] Fix typo of gracemont (#118486 )	2024-12-03 20:56:52 +08:00
Phoebe Wang	3348b4688f	[X86][compiler-rt] Split CPU names even they have the same subtype (#118237 ) Fixes: #118205	2024-12-02 18:51:19 +08:00
Sudharsan Veeravalli	6881c6d2a6	[RISCV] Add Qualcomm uC Xqcia (Arithmetic) extension (#118113 ) This extension adds 11 instructions that perform integer arithmetic. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2024-12-01 17:06:22 +05:30
Sudharsan Veeravalli	8fcbba82d6	[RISCV] Add Qualcomm uC Xqcisls (Scaled Load Store) extension (#117987 ) This extension adds 8 load/store instructions with a scaled index addressing mode. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2024-11-29 10:26:00 +05:30
Alexandros Lamprineas	88c2af80fa	[NFC][clang][FMV][TargetInfo] Refactor API for FMV feature priority. (#116257 ) Currently we have code with target hooks in CodeGenModule shared between X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used when generating IFunc resolvers for FMV. The RISCV target has different criteria for sorting, therefore it repeats sorting after calling CodeGenFunction::EmitMultiVersionResolver. I am moving the FMV priority logic in TargetInfo, so that it can be implemented by the TargetParser which then makes it possible to query it from llvm. Here is an example why this is handy: https://github.com/llvm/llvm-project/pull/87939	2024-11-28 09:22:05 +00:00
Sudharsan Veeravalli	c4645ffeda	[RISCV] Add Qualcomm uC Xqcicsr (CSR) extension (#117169 ) The Qualcomm uC Xqcicsr extension adds 2 instructions that can read and write CSRs. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support.	2024-11-28 12:46:15 +05:30
tangaac	427be07675	[LoongArch] Support amcas[_db].{b/h/w/d} instructions. (#114189 ) Two options for clang: -mlamcas & -mno-lamcas. Enable or disable amcas[_db].{b/h} instructions. The default is -mno-lamcas. Only works on LoongArch64.	2024-11-27 17:36:13 +08:00
Matt Arsenault	5615657209	AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16\|f16}_f32 instructions (#117824 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:37:05 -05:00
Matt Arsenault	62dc8f3069	AMDGPU: Add builtins & codegen support for bitop3_b{16\|32} of gfx950. (#117823 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:33:07 -05:00
Matt Arsenault	0f4fcca546	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 for gfx950 (#117745 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:26:07 -05:00
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	815069c701	AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16\|f32]_[bf8\|fp8] (#117739 ) OPSEL[1:0] collectively decide which byte to read from src input. Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch. OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3} Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp\|bf}8 Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:54:10 -05:00
tangaac	f4379db496	[LoongArch] Support LA V1.1 feature that div.w[u] and mod.w[u] instructions with inputs not signed-extended. (#116764 ) Two options for clang -mdiv32: Use div.w[u] and mod.w[u] instructions with input not sign-extended. -mno-div32: Do not use div.w[u] and mod.w[u] instructions with input not sign-extended. The default is -mno-div32.	2024-11-26 21:57:29 +08:00
Matt Arsenault	7fc71f7909	AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599 ) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:54:50 -08:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	aa7eb5723c	AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597 ) v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler. This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16). All necessary changes to gfx11 and gfx12 are updated to reflect this change. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:47:48 -08:00
Matt Arsenault	5d650a62a3	AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596 ) This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:44:47 -08:00
Matt Arsenault	22503a9df1	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:27:01 -08:00
Weining Lu	e70f9e2096	[LoongArch] Remove the added in #116762	2024-11-25 09:33:55 +08:00
Matt Arsenault	d1cca3133a	AMDGPU: Add v_permlane16_swap_b32 and v_permlane32_swap_b32 for gfx950 (#117260 ) This was a bit annoying because these introduce a new special case encoding usage. op_sel is repurposed as a subset of dpp controls, and is eligible for VOP3->VOP1 shrinking. For some reason fi also uses an enum value, so we need to convert the raw boolean to 1 instead of -1. The 2 registers are swapped, so this has 2 defs. Ideally the builtin would return a pair, but that's difficult so return a vector instead. This would make a hypothetical builtin that supports v2f16 directly uglier.	2024-11-22 20:12:50 -08:00
Pengcheng Wang	875b10f7d0	[RISCV] Support __builtin_cpu_is We have defined `__riscv_cpu_model` variable in #101449. It contains `mvendorid`, `marchid` and `mimpid` fields which are read via system call `sys_riscv_hwprobe`. We can support `__builtin_cpu_is` via comparing values in compiler's CPU definitions and `__riscv_cpu_model`. This depends on #116202. Reviewers: lenary, BeMg, kito-cheng, preames, lukel97 Reviewed By: lenary Pull Request: https://github.com/llvm/llvm-project/pull/116231	2024-11-22 22:58:54 +08:00
Pengcheng Wang	4da960b898	[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202 ) We can get these information via `sys_riscv_hwprobe`. This can be used to implement `__builtin_cpu_is`.	2024-11-22 22:58:54 +08:00
Mikhail Goncharov	d1dae1e861	Revert "[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202 )" chain This reverts commit b36fcf4f493ad9d30455e178076d91be99f3a7d8. This reverts commit c11b6b1b8af7454b35eef342162dc2cddf54b4de. This reverts commit 775148f2367600f90d28684549865ee9ea2f11be. multiple bot build breakages, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/8076	2024-11-22 14:09:13 +01:00
Wang Pengcheng	b36fcf4f49	[RISCV] Rename variable CPUModel to Model The variable name can't be the same as the struct name or we will have "error: declaration of ‘llvm::RISCV::CPUModel llvm::RISCV::CPUInfo::CPUModel’ changes meaning of ‘CPUModel’ [-fpermissive]".	2024-11-22 20:12:28 +08:00
Pengcheng Wang	c11b6b1b8a	[RISCV] Support __builtin_cpu_is We have defined `__riscv_cpu_model` variable in #101449. It contains `mvendorid`, `marchid` and `mimpid` fields which are read via system call `sys_riscv_hwprobe`. We can support `__builtin_cpu_is` via comparing values in compiler's CPU definitions and `__riscv_cpu_model`. This depends on #116202. Reviewers: lenary, BeMg, kito-cheng, preames, lukel97 Reviewed By: lenary Pull Request: https://github.com/llvm/llvm-project/pull/116231	2024-11-22 20:04:57 +08:00
Pengcheng Wang	775148f236	[RISCV] Add mvendorid/marchid/mimpid to CPU definitions (#116202 ) We can get these information via `sys_riscv_hwprobe`. This can be used to implement `__builtin_cpu_is`.	2024-11-22 19:54:45 +08:00
tangaac	1d4602070f	[LoongArch] Support LA V1.1 feature ld-seq-sa that don't generate dbar 0x700. (#116762 ) Two options for clang -mld-seq-sa: Do not generate load-load barrier instructions (dbar 0x700) -mno-ld-seq-sa: Generate load-load barrier instructions (dbar 0x700) The default is -mno-ld-seq-sa	2024-11-22 17:34:15 +08:00
Joseph Huber	7672216ed7	[LLVM] Add environment triple for 'llvm' (#117218 ) Summary: The LLVM C library is an in-development environment for running executables on various systems. Similarly how we have `-gnu` to indicate that we are using a GNU toolchain we should support `-llvm` to indicate the LLVM C library. This patch only adds the basic support for the triple and does not do any necessary clang changes to handle compiling with it. Fixes https://github.com/llvm/llvm-project/issues/117251	2024-11-21 17:30:18 -06:00
Kazu Hirata	4d6d56315d	[TargetParser] Remove unused includes (NFC) (#116929 ) Identified with misc-include-cleaner.	2024-11-20 06:52:45 -08:00
Matt Arsenault	ca1b35a6c8	AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310 ) Rand num instruction for stochastic rounding.	2024-11-18 10:54:54 -08:00

1 2 3 4 5 ...

479 Commits