llvm-project

Author	SHA1	Message	Date
Chandler Carruth	ca79ff07d8	Revert "Switch builtin strings to use string tables" (#119638 ) Reverts llvm/llvm-project#118734 There are currently some specific versions of MSVC that are miscompiling this code (we think). We don't know why as all the other build bots and at least some folks' local Windows builds work fine. This is a candidate revert to help the relevant folks catch their builders up and have time to debug the issue. However, the expectation is to roll forward at some point with a workaround if at all possible.	2024-12-13 23:58:48 -08:00
Chandler Carruth	be2df95e92	Switch builtin strings to use string tables (#118734 ) The Clang binary (and any binary linking Clang as a library), when built using PIE, ends up with a pretty shocking number of dynamic relocations to apply to the executable image: roughly 400k. Each of these takes up binary space in the executable, and perhaps most interestingly takes start-up time to apply the relocations. The largest pattern I identified were the strings used to describe target builtins. The addresses of these string literals were stored into huge arrays, each one requiring a dynamic relocation. The way to avoid this is to design the target builtins to use a single large table of strings and offsets within the table for the individual strings. This switches the builtin management to such a scheme. This saves over 100k dynamic relocations by my measurement, an over 25% reduction. Just looking at byte size improvements, using the `bloaty` tool to compare a newly built `clang` binary to an old one: ``` FILE SIZE VM SIZE -------------- -------------- +1.4% +653Ki +1.4% +653Ki .rodata +0.0% +960 +0.0% +960 .text +0.0% +197 +0.0% +197 .dynstr +0.0% +184 +0.0% +184 .eh_frame +0.0% +96 +0.0% +96 .dynsym +0.0% +40 +0.0% +40 .eh_frame_hdr +114% +32 [ = ] 0 [Unmapped] +0.0% +20 +0.0% +20 .gnu.hash +0.0% +8 +0.0% +8 .gnu.version +0.9% +7 +0.9% +7 [LOAD #2 [R]] [ = ] 0 -75.4% -3.00Ki .relro_padding -16.1% -802Ki -16.1% -802Ki .data.rel.ro -27.3% -2.52Mi -27.3% -2.52Mi .rela.dyn -1.6% -2.66Mi -1.6% -2.66Mi TOTAL ``` We get a 16% reduction in the `.data.rel.ro` section, and nearly 30% reduction in `.rela.dyn` where those reloctaions are stored. This is also visible in my benchmarking of binary start-up overhead at least: ``` Benchmark 1: ./old_clang --version Time (mean ± σ): 17.6 ms ± 1.5 ms [User: 4.1 ms, System: 13.3 ms] Range (min … max): 14.2 ms … 22.8 ms 162 runs Benchmark 2: ./new_clang --version Time (mean ± σ): 15.5 ms ± 1.4 ms [User: 3.6 ms, System: 11.8 ms] Range (min … max): 12.4 ms … 20.3 ms 216 runs Summary './new_clang --version' ran 1.13 ± 0.14 times faster than './old_clang --version' ``` We get about 2ms faster `--version` runs. While there is a lot of noise in binary execution time, this delta is pretty consistent, and represents over 10% improvement. This is particularly interesting to me because for very short source files, repeatedly starting the `clang` binary is actually the dominant cost. For example, `configure` scripts running against the `clang` compiler are slow in large part because of binary start up time, not the time to process the actual inputs to the compiler. ---- This PR implements the string tables using `constexpr` code and the existing macro system. I understand that the builtins are moving towards a TableGen model, and if complete that would provide more options for modeling this. Unfortunately, that migration isn't complete, and even the parts that are migrated still rely on the ability to break out of the TableGen model and directly expand an X-macro style `BUILTIN(...)` textually. I looked at trying to complete the move to TableGen, but it would both require the difficult migration of the remaining targets, and solving some tricky problems with how to move away from any macro-based expansion. I was also able to find a reasonably clean and effective way of doing this with the existing macros and some `constexpr` code that I think is clean enough to be a pretty good intermediate state, and maybe give a good target for the eventual TableGen solution. I was also able to factor the macros into set of consistent patterns that avoids a significant regression in overall boilerplate.	2024-12-08 19:00:14 -08:00
tangaac	427be07675	[LoongArch] Support amcas[_db].{b/h/w/d} instructions. (#114189 ) Two options for clang: -mlamcas & -mno-lamcas. Enable or disable amcas[_db].{b/h} instructions. The default is -mno-lamcas. Only works on LoongArch64.	2024-11-27 17:36:13 +08:00
tangaac	f4379db496	[LoongArch] Support LA V1.1 feature that div.w[u] and mod.w[u] instructions with inputs not signed-extended. (#116764 ) Two options for clang -mdiv32: Use div.w[u] and mod.w[u] instructions with input not sign-extended. -mno-div32: Do not use div.w[u] and mod.w[u] instructions with input not sign-extended. The default is -mno-div32.	2024-11-26 21:57:29 +08:00
tangaac	1d4602070f	[LoongArch] Support LA V1.1 feature ld-seq-sa that don't generate dbar 0x700. (#116762 ) Two options for clang -mld-seq-sa: Do not generate load-load barrier instructions (dbar 0x700) -mno-ld-seq-sa: Generate load-load barrier instructions (dbar 0x700) The default is -mno-ld-seq-sa	2024-11-22 17:34:15 +08:00
tangaac	5b9c76b6e7	[LoongArch] Support LoongArch-specific amswap[_db].{b/h} and amadd[_db].{b/h} instructions (#113255 ) Two options for clang: -mlam-bh & -mno-lam-bh. Enable or disable amswap[__db].{b/h} and amadd[__db].{b/h} instructions. The default is -mno-lam-bh. Only works on LoongArch64.	2024-10-23 16:03:15 +08:00
Ami-zhang	5a1b9896ad	[LoongArch] Support -march=la64v1.0 and -march=la64v1.1 (#100057 ) The newly added strings `la64v1.0` and `la64v1.1` in `-march` are as described in LoongArch toolchains conventions (see [1]). The target-cpu/feature attributes are forwarded to compiler when specifying particular `-march` parameter. The default cpu `loongarch64` is returned when archname is `la64v1.0` or `la64v1.1`. In addition, this commit adds `la64v1.0`/`la64v1.1` to "__loongarch_arch" and adds definition for macro "__loongarch_frecipe". [1]: https://github.com/loongson/la-toolchain-conventions	2024-07-23 14:03:28 +08:00
Zhaoxin Yang	626c7ce33f	[LoongArch][clang] Add support for option `-msimd=` and macro `__loongarch_simd_width`. (#97984 )	2024-07-09 14:13:19 +08:00
Nathan Sidwell	7df79ababe	[clang] TargetInfo hook for unaligned bitfields (#65742 ) Promote ARM & AArch64's HasUnaligned to TargetInfo and set for all targets.	2024-03-29 09:35:31 -04:00
licongtian	8d4e35600f	[Clang][LoongArch] Support compiler options -mlsx/-mlasx for clang This patch adds compiler options -mlsx/-mlasx which enables the instruction sets of LSX and LASX, and sets related predefined macros according to the options.	2023-10-31 15:52:05 +08:00
Weining Lu	f62c9252fc	[LoongArch] Support -march=native and -mtune= As described in [1][2], `-mtune=` is used to select the type of target microarchitecture, defaults to the value of `-march`. The set of possible values should be a superset of `-march` values. Currently possible values of `-march=` and `-mtune=` are `native`, `loongarch64` and `la464`. D136146 has supported `-march={loongarch64,la464}` and this patch adds support for `-march=native` and `-mtune=`. A new ProcessorModel called `loongarch64` is defined in LoongArch.td to support `-mtune=loongarch64`. `llvm::sys::getHostCPUName()` returns `generic` on unknown or future LoongArch CPUs, e.g. the not yet added `la664`, leading to `llvm::LoongArch::isValidArchName()` failing to parse the arch name. In this case, use `loongarch64` as the default arch name for 64-bit CPUs. Two preprocessor macros are defined based on user-provided `-march=` and `-mtune=` options and the defaults. - __loongarch_arch - __loongarch_tune Note that, to work with `-fno-integrated-cc1` we leverage cc1 options `-target-cpu` and `-tune-cpu` to pass driver options `-march=` and `-mtune=` respectively because cc1 needs these information to define macros in `LoongArchTargetInfo::getTargetDefines`. [1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc [2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc Reviewed By: xen0n, wangleiat, steven_wu, MaskRay Differential Revision: https://reviews.llvm.org/D155824	2023-08-09 10:29:50 +08:00
Steven Wu	42c9354a92	Revert "Reland "[LoongArch] Support -march=native and -mtune="" This reverts commit c56514f21b2cf08eaa7ac3a57ba4ce403a9c8956. This commit adds global state that is shared between clang driver and clang cc1, which is not correct when clang is used with `-fno-integrated-cc1` option (no integrated cc1). The -march and -mtune option needs to be properly passed through cc1 command-line and stored in TargetInfo.	2023-07-31 16:57:06 -07:00
Weining Lu	c56514f21b	Reland "[LoongArch] Support -march=native and -mtune=" As described in [1][2], `-mtune=` is used to select the type of target microarchitecture, defaults to the value of `-march`. The set of possible values should be a superset of `-march` values. Currently possible values of `-march=` and `-mtune=` are `native`, `loongarch64` and `la464`. D136146 has supported `-march={loongarch64,la464}` and this patch adds support for `-march=native` and `-mtune=`. A new ProcessorModel called `loongarch64` is defined in LoongArch.td to support `-mtune=loongarch64`. `llvm::sys::getHostCPUName()` returns `generic` on unknown or future LoongArch CPUs, e.g. the not yet added `la664`, leading to `llvm::LoongArch::isValidArchName()` failing to parse the arch name. In this case, use `loongarch64` as the default arch name for 64-bit CPUs. And these two preprocessor macros are defined: - __loongarch_arch - __loongarch_tune [1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc [2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc Reviewed By: xen0n, wangleiat Differential Revision: https://reviews.llvm.org/D155824	2023-07-26 10:26:38 +08:00
Weining Lu	212d6aa0da	Revert "[LoongArch] Support -march=native and -mtune=" This reverts commit 92c06114b2ea9900a3364fb395988dfb065758f7.	2023-07-25 23:32:15 +08:00
Weining Lu	92c06114b2	[LoongArch] Support -march=native and -mtune= As described in [1][2], `-mtune=` is used to select the type of target microarchitecture, defaults to the value of `-march`. The set of possible values should be a superset of `-march` values. Currently possible values of `-march=` and `-mtune=` are `native`, `loongarch64` and `la464`. D136146 has supported `-march={loongarch64,la464}` and this patch adds support for `-march=native` and `-mtune=`. A new ProcessorModel called `loongarch64` is defined in LoongArch.td to support `-mtune=loongarch64`. `llvm::sys::getHostCPUName()` returns `generic` on unknown or future LoongArch CPUs, e.g. the not yet added `la664`, leading to `llvm::LoongArch::isValidArchName()` failing to parse the arch name. In this case, use `loongarch64` as the default arch name for 64-bit CPUs. And these two preprocessor macros are defined: - __loongarch_arch - __loongarch_tune [1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc [2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc Differential Revision: https://reviews.llvm.org/D155824	2023-07-25 21:01:51 +08:00
chenli	d25c79dc70	[LoongArch] Support InlineAsm for LSX and LASX The author of the following files is licongtian <licongtian@loongson.cn>: - clang/lib/Basic/Targets/LoongArch.cpp - llvm/lib/Target/LoongArch/LoongArchAsmPrinter.cpp - llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp The files mentioned above implement InlineAsm for LSX and LASX as follows: - Enable clang parsing LSX/LASX register name, such as $vr0. - Support the case which operand type is 128bit or 256bit when the constraints is 'f'. - Support the way of specifying LSX/LASX register by using constraint, such as "={$xr0}". - Support the operand modifiers 'u' and 'w'. - Support and legalize the data types and register classes involved in LSX/LASX in the lowering process. Reviewed By: xen0n, SixWeining Differential Revision: https://reviews.llvm.org/D154931	2023-07-25 09:02:29 +08:00
Weining Lu	0bbf3ddf5f	[Clang][LoongArch] Add GPR alias handling without `$` prefix Currenlty there is a mismatch between LoongArch gcc and clang about handling register name in inlineasm, i.e. gcc allows both `$`-prefixed and non-prefiexed names for GPRs while clang only allows `$`-prefixed one. This patch fixes this mismatch by adding non-prefixed GPR names in clang. Take `$r4` for example. With this patch, clang accepts `$r4`, `r4`, `$a0` and `a0` like what gcc does. Reviewed By: xen0n Differential Revision: https://reviews.llvm.org/D136436	2023-05-13 12:08:59 +08:00
Weining Lu	161716a713	[LoongArch] Support fcc* (condition flag) registers in inlineasm clobbers Differential Revision: https://reviews.llvm.org/D150089	2023-05-09 14:55:50 +08:00
Archibald Elliott	8e3d7cf5de	[NFC][TargetParser] Remove llvm/Support/TargetParser.h	2023-02-07 11:08:21 +00:00
serge-sans-paille	5a7f47cc02	[clang] Optimize clang::Builtin::Info density Reorganize clang::Builtin::Info to have them naturally align on 4 bytes boundaries. Instead of storing builtin headers as a straight char pointer, enumerate them and store the enum. It allows to use a small enum instead of a pointer to reference them. On a 64 bit machine, this brings sizeof(clang::Builtin::Info) from 56 down to 48 bytes. On a release build on my Linux 64 bit machine, it shrinks the size of libclang-cpp.so by 193kB. The impact on performance is negligible in terms of instruction count, but the wall time seems better, see https://llvm-compile-time-tracker.com/compare.php?from=b3d8639f3536a4876b511aca9fb7948ff9266cee&to=a89b56423f98b550260a58c41e64aff9e56b76be&stat=task-clock Differential Revision: https://reviews.llvm.org/D142024	2023-01-23 14:27:44 +01:00
serge-sans-paille	a3c248db87	Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ part This is a follow-up to https://reviews.llvm.org/D140896, split into several parts as it touches a lot of files. Differential Revision: https://reviews.llvm.org/D141139	2023-01-09 12:15:24 +01:00
Xiaodong Liu	9e06d18c80	[LoongArch] Add intrinsics for CACOP instruction The CACOP instruction is mainly used for cache initialization and cache-consistency maintenance. Depends on D140872 Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D140527	2023-01-06 11:41:35 +08:00
Brad Smith	f70d17fc2c	[LoongArch] Define __GCC_HAVE_SYNC_COMPARE_AND_SWAP macros Define __GCC_HAVE_SYNC_COMPARE_AND_SWAP macros Reviewed By: SixWeining, MaskRay Differential Revision: https://reviews.llvm.org/D141070	2023-01-05 20:21:22 -05:00
serge-sans-paille	d9ab3e82f3	[clang] Use a StringRef instead of a raw char pointer to store builtin and call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470. The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e. The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable. Differential Revision: https://reviews.llvm.org/D139881	2022-12-27 09:55:19 +01:00
gonglingqin	da34aff90d	[Clang][LoongArch] Implement __builtin_loongarch_crc_w_d_w builtin and add diagnostics This patch adds support to prevent __builtin_loongarch_crc_w_d_w from compiling on loongarch32 in the front end and adds diagnostics accordingly. Reference: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/loongarch/larchintrin.h#L175-L184 Depends on D136906 Differential Revision: https://reviews.llvm.org/D137316	2022-11-11 09:16:57 +08:00
gonglingqin	85f08c4197	[Clang][LoongArch] Implement __builtin_loongarch_dbar builtin Differential Revision: https://reviews.llvm.org/D136906	2022-11-10 17:27:44 +08:00
Weining Lu	60e5cfe2a4	[Clang][LoongArch] Define more LoongArch specific built-in macros Define below macros according to LoongArch toolchain conventions [1]. * `__loongarch_grlen` * `__loongarch_frlen` * `__loongarch_lp64` * `__loongarch_hard_float` * `__loongarch_soft_float` * `__loongarch_single_float` * `__loongarch_double_float` Note: 1. `__loongarch__` has been defined in earlier patch. 2. `__loongarch_arch` is not defined because I don't know how `TargetInfo` can get the arch name specified by `-march`. 3. `__loongarch_tune` will be defined in future. [1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-toolchain-conventions-EN.html Depends on D136146 Differential Revision: https://reviews.llvm.org/D136413	2022-11-10 17:27:29 +08:00
Weining Lu	cd0174aacb	[Clang][LoongArch] Support inline asm constraint 'J' 'J' is defined in GCC [1] but not documented [2] while Linux [3] has already used it in LoongArch port. [1]: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/loongarch/constraints.md#L61 [2]: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html [3]: https://github.com/torvalds/linux/blob/master/arch/loongarch/include/asm/cmpxchg.h#L19 Differential Revision: https://reviews.llvm.org/D136835	2022-10-31 09:13:52 +08:00
Weining Lu	42b70793a1	Reland "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC" Reference: https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html k: A memory operand whose address is formed by a base register and (optionally scaled) index register. m: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as st.w and ld.w. ZB: An address that is held in a general-purpose register. The offset is zero. ZC: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ll.w and sc.w. Note: The INLINEASM SDNode flags in below tests are updated because the new introduced enum `Constraint_k` is added before `Constraint_m`. llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-inline-asm.ll llvm/test/CodeGen/X86/callbr-asm-kill.mir This patch passes `ninja check-all` on a X86 machine with all official targets and the LoongArch target enabled. Differential Revision: https://reviews.llvm.org/D134638	2022-10-11 19:51:48 +08:00
Fangrui Song	04a65d62a0	Revert D134638 "[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC" This reverts commit b7baddc7557e5c35a0f6a604a134d849265a99d4. Broke CodeGen/X86/callbr-asm-kill.mir We shall pay attention when adding new constraints.	2022-09-29 00:54:56 -07:00
Weining Lu	b7baddc755	[Clang][LoongArch] Add inline asm support for constraints k/m/ZB/ZC k: A memory operand whose address is formed by a base register and (optionally scaled) index register. m: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as st.w and ld.w. ZB: An address that is held in a general-purpose register. The offset is zero. ZC: A memory operand whose address is formed by a base register and offset that is suitable for use in instructions with the same addressing mode as ll.w and sc.w. Differential Revision: https://reviews.llvm.org/D134638	2022-09-29 15:02:08 +08:00
Weining Lu	394f30919a	[Clang][LoongArch] Add inline asm support for constraints f/l/I/K This patch adds support for constraints `f`, `l`, `I`, `K` according to [1]. The remain constraints (`k`, `m`, `ZB`, `ZC`) will be added later as they are a little more complex than the others. f: A floating-point register (if available). l: A signed 16-bit constant. I: A signed 12-bit constant (for arithmetic instructions). K: An unsigned 12-bit constant (for logic instructions). For now, no need to support register alias (e.g. `$a0`) in llvm as clang will correctly decode the usage of register name aliases into their official names. And AFAIK, the not yet upstreamed `rustc` for LoongArch will always use official register names (e.g. `$r4`). [1] https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html Differential Revision: https://reviews.llvm.org/D134157	2022-09-26 08:49:58 +08:00
Kazu Hirata	981cbfb592	[clang] Don't include StringSwitch.h (NFC) These files don't seem to use StringSwitch.	2022-09-18 22:21:32 -07:00
Weining Lu	15b65bcd65	[Clang][LoongArch] Add initial LoongArch target and driver support With the initial support added, clang can compile `helloworld` C to executable file for loongarch64. For example: ``` $ cat hello.c int main() { printf("Hello, world!\n"); return 0; } $ clang --target=loongarch64-unknown-linux-gnu --gcc-toolchain=xxx --sysroot=xxx hello.c ``` The output a.out can run within qemu or native machine. For example: ``` $ file ./a.out ./a.out: ELF 64-bit LSB pie executable, LoongArch, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-loongarch-lp64d.so.1, for GNU/Linux 5.19.0, with debug_info, not stripped $ ./a.out Hello, world! ``` Currently gcc toolchain and sysroot can be found here: https://github.com/loongson/build-tools/releases/download/2022.08.11/loongarch64-clfs-5.1-cross-tools-gcc-glibc.tar.xz Reference: https://github.com/loongson/LoongArch-Documentation The last commit hash (main branch) is: 99016636af64d02dee05e39974d4c1e55875c45b Note loongarch32 is not fully tested because there is no reference gcc toolchain yet. Differential Revision: https://reviews.llvm.org/D130255	2022-08-23 13:47:22 +08:00

34 Commits