llvm-project

Author	SHA1	Message	Date
Alexandros Lamprineas	8e65940161	[FMV][AArch64] Simplify version selection according to ACLE. (#121921 ) Currently, the more features a version has, the higher its priority is. We are changing ACLE https://github.com/ARM-software/acle/pull/370 as follows: "Among any two versions, the higher priority version is determined by identifying the highest priority feature that is specified in exactly one of the versions, and selecting that version."	2025-01-08 18:59:07 +00:00
Chandler Carruth	a774adb017	Bulk port 64-bit x86 builtins to TableGen (#121043 ) This PR follows https://github.com/llvm/llvm-project/pull/120831 for x86-64. Similar to that PR, this does a very mechanical port of X86 builtins to TableGen. There is a lot of improvement available here to use TableGen more effectively and collapse repeated structures. But those can now be follow-up PRs that restructure within the `.td` file. The current structure produces a file that exactly matches the original X-macros except for the differences outlined in https://github.com/llvm/llvm-project/pull/120831: - Horizontal whitespace - `long long` types now use `long long` outside of OpenCL, but switch to `long` in OpenCL where relevant. Otherwise, only the order of builtins change, and no tests regress.	2025-01-04 17:52:19 -08:00
Chandler Carruth	2529a8df53	Mechanically port bulk of x86 builtins to TableGen (#120831 ) The goal is to make incremental (if small) progress towards fully TableGen'ed builtins, and to unblock #120534 by gaining access to more powerful TableGen-based representations. The bulk `.td` file addition was generated with the help of a very rough Python script. That script made no attempt to be robust or reusable, it specifically handled only the cases in the X86 `.def` file. Four entries from the `.def` file were not handled automatically as they used `BUILTIN` rather than `TARGET_BUILTIN`. These were ported by hand to an empty-feature `TargetBuiltin` entry, which seems like a better match. For all the automatically ported entries, the results were compared by sorting and diffing the `.def` file and the generated `.inc` file. The only differences were: - Different horizontal whitespace - Additional entries that had already been ported to the `.td` file. - More systematically using `Oi` instead of `LLi` for the type `long long int` in the fully general `__builtin_ia32_...` builtins for OpenCL support. The `.def` file was only partially moved to this it seems, and the systematic migration has updated a few missed builtins.	2025-01-04 02:23:54 -08:00
Chandler Carruth	ca79ff07d8	Revert "Switch builtin strings to use string tables" (#119638 ) Reverts llvm/llvm-project#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We don't know why as all the other build bots and at least some folks' local Windows builds work fine. This is a candidate revert to help the relevant folks catch their builders up and have time to debug the issue. However, the expectation is to roll forward at some point with a workaround if at all possible.	2024-12-13 23:58:48 -08:00
Chandler Carruth	be2df95e92	Switch builtin strings to use string tables (#118734 ) The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocations to apply to the executable image: roughly 400k. Each of these takes up binary space in the executable, and perhaps most interestingly takes start-up time to apply the relocations. The largest pattern I identified were the strings used to describe target builtins. The addresses of these string literals were stored into huge arrays, each one requiring a dynamic relocation. The way to avoid this is to design the target builtins to use a single large table of strings and offsets within the table for the individual strings. This switches the builtin management to such a scheme. This saves over 100k dynamic relocations by my measurement, an over 25% reduction. Just looking at byte size improvements, using the `bloaty` tool to compare a newly built `clang` binary to an old one: ``` FILE SIZE VM SIZE -------------- -------------- +1.4% +653Ki +1.4% +653Ki .rodata +0.0% +960 +0.0% +960 .text +0.0% +197 +0.0% +197 .dynstr +0.0% +184 +0.0% +184 .eh_frame +0.0% +96 +0.0% +96 .dynsym +0.0% +40 +0.0% +40 .eh_frame_hdr +114% +32 [ = ] 0 [Unmapped] +0.0% +20 +0.0% +20 .gnu.hash +0.0% +8 +0.0% +8 .gnu.version +0.9% +7 +0.9% +7 [LOAD #2 [R]] [ = ] 0 -75.4% -3.00Ki .relro_padding -16.1% -802Ki -16.1% -802Ki .data.rel.ro -27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn -1.6% -2.66Mi -1.6% -2.66Mi TOTAL ``` We get a 16% reduction in the `.data.rel.ro` section, and nearly 30% reduction in `.rela.dyn` where those reloctaions are stored. This is also visible in my benchmarking of binary start-up overhead at least: ``` Benchmark 1: ./old_clang --version Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms] Range (min … max): 14.2 ms … 22.8 ms 162 runs Benchmark 2: ./new_clang --version Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms] Range (min … max): 12.4 ms … 20.3 ms 216 runs Summary './new_clang --version' ran 1.13 ± 0.14 times faster than './old_clang --version' ``` We get about 2ms faster `--version` runs. While there is a lot of noise in binary execution time, this delta is pretty consistent, and represents over 10% improvement. This is particularly interesting to me because for very short source files, repeatedly starting the `clang` binary is actually the dominant cost. For example, `configure` scripts running against the `clang` compiler are slow in large part because of binary start up time, not the time to process the actual inputs to the compiler. ---- This PR implements the string tables using `constexpr` code and the existing macro system. I understand that the builtins are moving towards a TableGen model, and if complete that would provide more options for modeling this. Unfortunately, that migration isn't complete, and even the parts that are migrated still rely on the ability to break out of the TableGen model and directly expand an X-macro style `BUILTIN(...)` textually. I looked at trying to complete the move to TableGen, but it would both require the difficult migration of the remaining targets, and solving some tricky problems with how to move away from any macro-based expansion. I was also able to find a reasonably clean and effective way of doing this with the existing macros and some `constexpr` code that I think is clean enough to be a pretty good intermediate state, and maybe give a good target for the eventual TableGen solution. I was also able to factor the macros into set of consistent patterns that avoids a significant regression in overall boilerplate.	2024-12-08 19:00:14 -08:00
Matthias Braun	ea6cdb9a07	allow prefer 256 bit attribute target (#117092 ) This allows `__attribute__((target("prefer-256-bit")))` / `__attribute__((target("no-prefer-256-bit")))` to create variants of a functions with 256/512 bit vector sizes within the same application.	2024-12-03 15:01:28 -08:00
Jie Fu	b869f1bd4f	[clang] Remove unused lambda capture (NFC) /llvm-project/clang/lib/Basic/Targets/X86.cpp:1368:23: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture] auto getPriority = [this](StringRef Feature) -> unsigned { ^~~~ 1 error generated.	2024-11-28 18:25:30 +08:00
Alexandros Lamprineas	88c2af80fa	[NFC][clang][FMV][TargetInfo] Refactor API for FMV feature priority. (#116257 ) Currently we have code with target hooks in CodeGenModule shared between X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used when generating IFunc resolvers for FMV. The RISCV target has different criteria for sorting, therefore it repeats sorting after calling CodeGenFunction::EmitMultiVersionResolver. I am moving the FMV priority logic in TargetInfo, so that it can be implemented by the TargetParser which then makes it possible to query it from llvm. Here is an example why this is handy: https://github.com/llvm/llvm-project/pull/87939	2024-11-28 09:22:05 +00:00
Freddy Ye	97836bed63	Reland "[X86] Support -march=diamondrapids (#113881 )" (#116564 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-18 10:40:32 +08:00
Freddy Ye	90e92239bd	Revert "[X86] Support -march=diamondrapids (#113881 )" (#116563 ) This reverts commit 826b845c9e97448395431be3e4e5da585bd98c5e.	2024-11-18 08:45:28 +08:00
Freddy Ye	826b845c9e	[X86] Support -march=diamondrapids (#113881 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-18 08:31:17 +08:00
Malay Sanghi	f77101ea79	[X86][AMX] Support AMX-MOVRS (#115151 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-12 15:05:43 +08:00
Feng Zou	eddb79d56d	[X86][AMX] Support AMX-TF32 (#115625 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-11 15:24:18 +08:00
Phoebe Wang	8f4401374c	Reland "[X86][AMX] Support AMX-AVX512" (#115581 ) Resolve compile fail without SSE2.	2024-11-09 13:26:10 +08:00
Alan Zhao	ff22515430	Revert "[X86][AMX] Support AMX-AVX512" (#115570 ) Reverts llvm/llvm-project#114070 Reason: Causes `immintrin.h` to fail to compile if `-msse` and `-mno-sse2` are passed to clang: https://github.com/llvm/llvm-project/pull/114070#issuecomment-2465926700	2024-11-08 16:15:02 -08:00
Phoebe Wang	58a17e1bbc	[X86][AMX] Support AMX-AVX512 (#114070 )	2024-11-08 16:25:16 +08:00
Phoebe Wang	c72a751dab	[X86][AMX] Support AMX-TRANSPOSE (#113532 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-01 16:45:03 +08:00
Feng Zou	8127162427	[X86][AMX] Support AMX-FP8 (#113850 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-31 10:14:25 +08:00
Nikolas Klauser	508263824f	[Clang] Start moving X86Builtins.def to X86Builtins.td (#106005 ) This starts moving `X86Builtins.def` to be a tablegen file. It's quite large, so I think it'd be good to move things in multiple steps to avoid a bunch of merge conflicts due to the amount of time this takes to complete.	2024-10-30 14:23:35 +01:00
Craig Topper	7bd8a165f9	[X86] Don't allow '+f' as an inline asm constraint. (#113871 ) f cannot be used as an output constraint. We already errored for '=f' but not '+f'. Fixes #113692.	2024-10-28 13:20:46 -07:00
Freddy Ye	c4248fa3ed	[X86] Support MOVRS and AVX10.2 instructions. (#113274 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-10-25 09:00:19 +08:00
Ganesh	02e4186d0b	[X86] AMD Zen 5 Initial enablement (#107964 ) This patch enables the basic skeleton enablement of AMD next gen zen5 CPUs.	2024-09-13 17:45:33 +01:00
Freddy Ye	83ad644afa	[X86][AVX10.2] Support AVX10.2-BF16 new instructions. (#101603 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2024-09-04 08:13:24 +08:00
Phoebe Wang	3f25f23a2b	[X86][AVX10] Fix unexpected error and warning when using intrinsic (#104781 ) E.g.: https://godbolt.org/z/G8zK5svjK Based on Evgenii's work.	2024-08-20 19:56:19 +08:00
Phoebe Wang	259ca9ee9c	Reland "[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions (#101452 )" (#101616 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2024-08-03 09:26:07 +08:00
Phoebe Wang	2e0588d5e1	Revert "[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions" (#101612 ) Reverts llvm/llvm-project#101452 There are several buildbot failed. Revert first.	2024-08-02 13:04:10 +08:00
Phoebe Wang	10bad2c8d7	[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new instructions (#101452 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2024-08-02 12:10:50 +08:00
Shengchen Kan	88e9bd822f	[X86][Driver] Enable feature zu for -mapxf This is follow-up for #78901 after validation. Drop the comments for stability since zu is the last feature for cpuid APX_F.	2024-07-19 12:34:41 +08:00
James Y Knight	f0eb5587ce	Remove support for 3DNow!, both intrinsics and builtins. (#96246 ) This set of instructions was only supported by AMD chips starting in the K6-2 (introduced 1998), and before the "Bulldozer" family (2011). They were never much used, as they were effectively superseded by the more-widely-implemented SSE (first implemented on the AMD side in Athlon XP in 2001). This is being done as a predecessor towards general removal of MMX register usage. Since there is almost no usage of the 3DNow! intrinsics, and no modern hardware even implements them, simple removal seems like the best option. (Clang half originally uploaded in https://reviews.llvm.org/D94213) Works towards issue #41665 and issue #98272.	2024-07-16 12:08:48 -04:00
Feng Zou	e603451f3c	[X86] Support branch hint (#97721 ) For more details about this feature, please refer to latest Intel 64 and IA-32 Architectures Optimization Reference Manual Volume 1: https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html	2024-07-08 13:12:50 +08:00
Shengchen Kan	8ad32ce738	[X86] Add sub-feature zu (zero upper) for APX This is a follow-up patch for #74199	2024-06-25 09:25:32 +08:00
Shengchen Kan	45a7af7c99	[X86][Driver] Enable feature cf for -mapxf This is follow-up for #78901 after validation.	2024-06-24 15:11:07 +08:00
Freddy Ye	6f2794afeb	Fix build warning for '[X86] Support EGPR for inline assembly. (#92338 )' (#93777 )	2024-05-30 15:25:08 +08:00
Freddy Ye	73f4c2547d	[X86] Support EGPR for inline assembly. (#92338 ) "jR": explicitly enables EGPR "r", "l", "q": enables/disables EGPR w/wo -mapx-inline-asm-use-gpr32 "jr": explicitly enables GPR with -mapx-inline-asm-use-gpr32 -mapx-inline-asm-use-gpr32 will also define a new macro: `__APX_INLINE_ASM_USE_GPR32__` GCC patches: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631183.html https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631186.html [[PATCH v2] x86: Define _APX_INLINE_ASM_USE_GPR32_ (gnu.org)](https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649003.html) Reference: https://gcc.godbolt.org/z/nPPvbY6r4	2024-05-30 14:47:47 +08:00
Shengchen Kan	0f7b4b04a5	[X86][Driver] Enable feature ccmp,nf for -mapxf This is follow-up for #78901 after validation.	2024-05-29 17:34:26 +08:00
Freddy Ye	4def1ce101	Reland "[X86] Remove knl/knm specific ISAs supports (#92883 )" (#93136 ) This reverts commit aa4069ea96e5eb62bc8c7895b9d920f129611b3a.	2024-05-24 13:46:34 +08:00
Freddy Ye	aa4069ea96	Revert "[X86] Remove knl/knm specific ISAs supports (#92883 )" (#93123 ) This reverts commit 282d2ab58f56c89510f810a43d4569824a90c538.	2024-05-23 10:25:23 +08:00
Freddy Ye	282d2ab58f	[X86] Remove knl/knm specific ISAs supports (#92883 ) Cont. patch after https://github.com/llvm/llvm-project/pull/75580	2024-05-23 09:46:44 +08:00
Shengchen Kan	575177f610	[X86] Add sub-feature nf (no flags update) for APX This is a follow-up patch for #74199	2024-05-11 15:55:59 +08:00
Freddy Ye	e44600f3ab	[X86][CFE] Support EGPR in GCCRegNames. (#91323 )	2024-05-08 15:07:18 +08:00
Freddy Ye	db2fb3d96b	[X86] Define __APX_F__ when APX is enabled. (#88343 ) Relate gcc patch: https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648789.html	2024-04-11 16:57:32 +08:00
Kazu Hirata	3c93c037c9	[Basic] Use StringRef::ends_with (NFC)	2024-02-03 21:43:05 -08:00
Kazu Hirata	b67ce7e349	[clang] Use StringRef::starts_with (NFC)	2024-01-31 23:54:09 -08:00
Fangrui Song	d4cb5d9f2b	[X86] Add "Ws" constraint and "p" modifier for symbolic address/label reference (#77886 ) Printing the raw symbol is useful in inline asm (e.g. getting the C++ mangled name, referencing a symbol in a custom way while ensuring it is not optimized out even if internal). Similar constraints are available in other targets (e.g. "S" for aarch64/riscv, "Cs" for m68k). ``` namespace ns { extern int var, a[4]; } void foo() { asm(".pushsection .xxx,\"aw\"; .dc.a %p0; .popsection" :: "Ws"(&ns::var)); asm(".reloc ., BFD_RELOC_NONE, %p0" :: "Ws"(&ns::a[3])); } ``` Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105576	2024-01-16 23:57:42 -08:00
Freddy Ye	19870ed9c3	[X86] Emit Warnings for frontend options to enable knl/knm specific ISAs. (#75580 ) Since Knight Landing and Knight Mill microarchitectures are EOL, we would like to remove intrinsic supports for its specific ISA in LLVM 19. In LLVM 18, we will first emit a warning for the usage.	2024-01-09 19:43:14 +08:00
Kazu Hirata	a70dcc2cda	[clang] Use StringRef::ltrim (NFC)	2023-12-27 09:10:39 -08:00
Shengchen Kan	6d6baef5c9	[X86] Support CFE flags for APX features (#74199 ) Positive options: -mapx-features=<comma-separated-features> Negative options: -mno-apx-features=<comma-separated-features> -m[no-]apx-features is designed to be able to control separate APX features. Besides, we also support the flag -m[no-]apxf, which can be used like an alias of -m[no-]apx-features=< all APX features covered by CPUID APX_F> Behaviour when positive and negative options are used together: For boolean flags, the last one wins -mapxf -mno-apxf -> -mno-apxf -mno-apxf -mapxf -> -mapxf For flags that take a set as arguments, it sets the mask by order of the flags -mapx-features=egpr,ndd -mno-apx-features=egpr -> -egpr,+ndd -mapx-features=egpr -mno-apx-features=egpr,ndd -> -egpr,-ndd -mno-apx-features=egpr -mapx-features=egpr,ndd -> +egpr,+ndd -mno-apx-features=egpr,ndd -mapx-features=egpr -> -ndd,+egpr The design is aligned with gcc https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628905.html	2023-12-04 19:22:56 +08:00
Phoebe Wang	e96eddec5e	Reland "[X86][AVX10] Fix a bug when using -march with no-evex512 attribute (#72126 )" Fixes #72106	2023-11-14 15:39:30 +08:00
Phoebe Wang	17dd0c70c8	Revert "[X86][AVX10] Fix a bug when using -march with no-evex512 attribute (#72126 )" This reverts commit 451c594bcbe528a44312cb698d78145c3ef18fa1. Revert due to buildbot fails.	2023-11-14 15:34:38 +08:00
Phoebe Wang	451c594bcb	[X86][AVX10] Fix a bug when using -march with no-evex512 attribute (#72126 ) #71318 failed to clear EVEX512 feature for intended intrinsics. Fixes #72106	2023-11-14 15:15:34 +08:00

1 2 3 4 5 ...

264 Commits