133 Commits

Author SHA1 Message Date
Nikita Popov
246a64a12e
[Clang] Rename HasLegalHalfType -> HasFastHalfType (NFC) (#153163)
This option is confusingly named. What it actually controls is whether,
under the default of `-ffloat16-excess-precision=standard`, it is
beneficial for performance to perform calculations on float (without
intermediate rounding) or not. For `-ffloat16-excess-precision=none` the
LLVM `half` type will always be used, and all backends are expected to
legalize it correctly.
2025-08-18 09:23:48 +02:00
Eli Friedman
558277ae4d
[clang][ARM] Fix setting of MaxAtomicInlineWidth. (#151404)
2f497ec3a0056f15727ee6008211aeb2c4a8f455 updated the backend's rules for
when lock-free atomics are available, but we never made a corresponding
change to the frontend. Fix it to be consistent. This only affects
targets older than v7.
2025-08-01 11:03:21 -07:00
Simon Tatham
d5985905ae
[Clang][ARM][Sema] Reject bad sizes of __builtin_arm_ldrex (#150419)
Depending on the particular version of the AArch32 architecture,
load/store exclusive operations might be available for various subset of
8, 16, 32, and 64-bit quantities. Sema knew nothing about this and was
accepting all four sizes, leading to a compiler crash at isel time if
you used a size not available on the target architecture.

Now the Sema checking stage emits a more sensible diagnostic, pointing
at the location in the code.

In order to allow Sema to query the set of supported sizes, I've moved
the enum of LDREX_x sizes out of its Arm-specific header into
`TargetInfo.h`.

Also, in order to allow the diagnostic to specify the correct list of
supported sizes, I've filled it with `%select{}`. (The alternative was
to make separate error messages for each different list of sizes.)
2025-07-29 09:01:37 +01:00
Simon Tatham
34f59d7920
[Clang][ARM] Fix __ARM_FEATURE_LDREX on Armv8-M (#149538)
The Armv8-M architecture doesn't have the LDREXD and STREXD
instructions, for exclusive load/store of a 64-bit quantity split across
two registers. But the `__ARM_FEATURE_LDREX` macro was set to a value
that claims it does, because the case for Armv8 was missing a check for
M profile.

The Armv7 case got it right, so I've just made the two cases the same.
2025-07-22 08:53:45 +01:00
Brad Smith
0d2e11f3e8
Remove Native Client support (#133661)
Remove the Native Client support now that it has finally reached end of life.
2025-07-15 13:22:33 -04:00
Nick Sarnie
3b9ebe9201
[clang] Simplify device kernel attributes (#137882)
We have multiple different attributes in clang representing device
kernels for specific targets/languages. Refactor them into one attribute
with different spellings to make it more easily scalable for new
languages/targets.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2025-06-05 14:15:38 +00:00
Kazu Hirata
cd9fe8a34c
[Basic] Remove unused includes (NFC) (#142295)
These are identified by misc-include-cleaner.  I've filtered out those
that break builds.  Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
2025-05-31 19:00:31 -07:00
Daniil Kovalev
84b0c128a7
[PAC] Do not support some values of branch-protection with ptrauth-returns (#125280)
This patch does two things.

1. Previously, when checking driver arguments, we emitted an error for
unsupported values of `-mbranch-protection` when using pauthtest ABI.
The reason for that was ptrauth-returns being enabled as part of
pauthtest. This patch changes the check against pauthtest to a check
against ptrauth-returns.

2. Similarly, check against values of the following function attribute
which are unsupported with ptrauth-returns:
`__attribute__((target("branch-protection=XXX`. Note that existing
`validateBranchProtection` function is used, and current behavior is to
ignore the unsupported attribute value, so no error is emitted.
2025-02-05 11:39:27 +03:00
Chandler Carruth
64ea3f5a47 [StrTable] Switch AArch64 and ARM to use directly TableGen-ed builtin tables
This leverages the sharded structure of the builtins to make it easy to
directly tablegen most of the AArch64 and ARM builtins while still using
X-macros for a few edge cases. It also extracts common prefixes as part
of that.

This makes the string tables for these targets dramatically smaller.
This is especially important as the SVE builtins represent (by far) the
largest string table and largest builtin table across all the targets in
Clang.
2025-02-04 18:04:58 +00:00
Chandler Carruth
cd269fee05 [StrTable] Switch Clang builtins to use string tables
This both reapplies #118734, the initial attempt at this, and updates it
significantly.

First, it uses the newly added `StringTable` abstraction for string
tables, and simplifies the construction to build the string table and
info arrays separately. This should reduce any `constexpr` compile time
memory or CPU cost of the original PR while significantly improving the
APIs throughout.

It also restructures the builtins to support sharding across several
independent tables. This accomplishes two improvements from the
original PR:

1) It improves the APIs used significantly.

2) When builtins are defined from different sources (like SVE vs MVE in
   AArch64), this allows each of them to build their own string table
   independently rather than having to merge the string tables and info
   structures.

3) It allows each shard to factor out a common prefix, often cutting the
   size of the strings needed for the builtins by a factor two.

The second point is important both to allow different mechanisms of
construction (for example a `.def` file and a tablegen'ed `.inc` file,
or different tablegen'ed `.inc files), it also simply reduces the sizes
of these tables which is valuable given how large they are in some
cases. The third builds on that size reduction.

Initially, we use this new sharding rather than merging tables in
AArch64, LoongArch, RISCV, and X86. Mostly this helps ensure the system
works, as without further changes these still push scaling limits.
Subsequent commits will more deeply leverage the new structure,
including using the prefix capabilities which cannot be easily factored
out here and requires deep changes to the targets.
2025-02-04 18:04:57 +00:00
Oliver Stannard
5e43418e0e
[ARM] Forbid use of TLS with execute-only (#124806)
Thread-local code generation requires constant pools because most of the
relocations needed for it operate on data, so it cannot be used with
-mexecute-only (or -mpure-code, which is aliased in the driver).

Without this we hit an assertion in the backend when trying to generate
a constant pool.
2025-01-29 10:41:58 +00:00
Un1q32
e00d1dd6ea
[ARM] Fix armv6kz LDREX definition (#122965)
Fixes #37901

This behavior is consistent with GCC
2025-01-15 13:31:54 +00:00
Ian Anderson
8a1174f06c
[Darwin][Driver][clang] arm64-apple-none-macho is missing the Apple macros from arm-apple-none-macho (#122427)
arm-apple-none-macho uses DarwinTargetInfo which provides several Apple
specific macros. arm64-apple-none-macho however just uses the generic
AArch64leTargetInfo and doesn't get any of those macros. It's not clear
if everything from DarwinTargetInfo is desirable for
arm64-apple-none-macho, so make an AppleMachOTargetInfo to hold the
generic Apple macros and a few other basic things.
2025-01-10 15:50:54 -08:00
Chandler Carruth
ca79ff07d8
Revert "Switch builtin strings to use string tables" (#119638)
Reverts llvm/llvm-project#118734

There are currently some specific versions of MSVC that are miscompiling
this code (we think). We don't know why as all the other build bots and
at least some folks' local Windows builds work fine.

This is a candidate revert to help the relevant folks catch their
builders up and have time to debug the issue. However, the expectation
is to roll forward at some point with a workaround if at all possible.
2024-12-13 23:58:48 -08:00
Chandler Carruth
be2df95e92
Switch builtin strings to use string tables (#118734)
The Clang binary (and any binary linking Clang as a library), when built
using PIE, ends up with a pretty shocking number of dynamic relocations
to apply to the executable image: roughly 400k.

Each of these takes up binary space in the executable, and perhaps most
interestingly takes start-up time to apply the relocations.

The largest pattern I identified were the strings used to describe
target builtins. The addresses of these string literals were stored into
huge arrays, each one requiring a dynamic relocation. The way to avoid
this is to design the target builtins to use a single large table of
strings and offsets within the table for the individual strings. This
switches the builtin management to such a scheme.

This saves over 100k dynamic relocations by my measurement, an over 25%
reduction. Just looking at byte size improvements, using the `bloaty`
tool to compare a newly built `clang` binary to an old one:

```
    FILE SIZE        VM SIZE
 --------------  --------------
  +1.4%  +653Ki  +1.4%  +653Ki    .rodata
  +0.0%    +960  +0.0%    +960    .text
  +0.0%    +197  +0.0%    +197    .dynstr
  +0.0%    +184  +0.0%    +184    .eh_frame
  +0.0%     +96  +0.0%     +96    .dynsym
  +0.0%     +40  +0.0%     +40    .eh_frame_hdr
  +114%     +32  [ = ]       0    [Unmapped]
  +0.0%     +20  +0.0%     +20    .gnu.hash
  +0.0%      +8  +0.0%      +8    .gnu.version
  +0.9%      +7  +0.9%      +7    [LOAD #2 [R]]
  [ = ]       0 -75.4% -3.00Ki    .relro_padding
 -16.1%  -802Ki -16.1%  -802Ki    .data.rel.ro
 -27.3% -2.52Mi -27.3% -2.52Mi    .rela.dyn
  -1.6% -2.66Mi  -1.6% -2.66Mi    TOTAL
```

We get a 16% reduction in the `.data.rel.ro` section, and nearly 30%
reduction in `.rela.dyn` where those reloctaions are stored.

This is also visible in my benchmarking of binary start-up overhead at
least:

```
Benchmark 1: ./old_clang --version
  Time (mean ± σ):      17.6 ms ±   1.5 ms    [User: 4.1 ms, System: 13.3 ms]
  Range (min … max):    14.2 ms …  22.8 ms    162 runs

Benchmark 2: ./new_clang --version
  Time (mean ± σ):      15.5 ms ±   1.4 ms    [User: 3.6 ms, System: 11.8 ms]
  Range (min … max):    12.4 ms …  20.3 ms    216 runs

Summary
  './new_clang --version' ran
    1.13 ± 0.14 times faster than './old_clang --version'
```

We get about 2ms faster `--version` runs. While there is a lot of noise
in binary execution time, this delta is pretty consistent, and
represents over 10% improvement. This is particularly interesting to me
because for very short source files, repeatedly starting the `clang`
binary is actually the dominant cost. For example, `configure` scripts
running against the `clang` compiler are slow in large part because of
binary start up time, not the time to process the actual inputs to the
compiler.

----

This PR implements the string tables using `constexpr` code and the
existing macro system. I understand that the builtins are moving towards
a TableGen model, and if complete that would provide more options for
modeling this. Unfortunately, that migration isn't complete, and even
the parts that are migrated still rely on the ability to break out of
the TableGen model and directly expand an X-macro style `BUILTIN(...)`
textually. I looked at trying to complete the move to TableGen, but it
would both require the difficult migration of the remaining targets, and
solving some tricky problems with how to move away from any macro-based
expansion.

I was also able to find a reasonably clean and effective way of doing
this with the existing macros and some `constexpr` code that I think is
clean enough to be a pretty good intermediate state, and maybe give a
good target for the eventual TableGen solution. I was also able to
factor the macros into set of consistent patterns that avoids a
significant regression in overall boilerplate.
2024-12-08 19:00:14 -08:00
Aaron Ballman
af7c58b7ea
Remove support for RenderScript (#112916)
See
https://discourse.llvm.org/t/rfc-deprecate-and-eventually-remove-renderscript-support/81284
for the RFC
2024-10-28 12:48:42 -04:00
Michał Górny
387b37af1a
[LLVM] [Clang] Support for Gentoo *t64 triples (64-bit time_t ABIs) (#111302)
Gentoo is planning to introduce a `*t64` suffix for triples that will be
used by 32-bit platforms that use 64-bit `time_t`. Add support for
parsing and accepting these triples, and while at it make clang
automatically enable the necessary glibc feature macros when this suffix
is used.

An open question is whether we can backport this to LLVM 19.x. After
all, adding new triplets to Triple sounds like an ABI change — though I
suppose we can minimize the risk of breaking something if we move new
enum values to the very end.
2024-10-14 11:18:04 +00:00
Jonathan Thackray
d0756caedc
[ARM][AArch64] Introduce the Armv9.6-A architecture version (#110825)
This introduces the Armv9.6-A architecture version, including the
relevant command-line option for -march.

More details about the Armv9.6-A architecture version can be found at:
  * https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2024
  * https://developer.arm.com/documentation/ddi0602/2024-09/
2024-10-04 10:12:41 +01:00
Kazu Hirata
deffae5da1
[clang] Use StringRef::operator== instead of StringRef::equals (NFC) (#91844)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  24 under clang/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-11 11:38:52 -07:00
Nathan Sidwell
7df79ababe [clang] TargetInfo hook for unaligned bitfields (#65742)
Promote ARM & AArch64's HasUnaligned to TargetInfo and set for all
targets.
2024-03-29 09:35:31 -04:00
Tomas Matheson
d022f32c73 Revert "[ARM] __ARM_ARCH macro definition fix (#81493)"
This reverts commit 89c1bf1230e011f2f0e43554c278205fa1819de5.

This has been unimplemenented for a while, and GCC does not implement
it, therefore we need to consider whether we should just deprecate it
in the ACLE instead.
2024-02-19 12:19:16 +00:00
James Westwood
89c1bf1230
[ARM] __ARM_ARCH macro definition fix (#81493)
This patch changes how the macro __ARM_ARCH is defined to match its
defintion in the ACLE. In ACLE 5.4.1, __ARM_ARCH is defined as equal to
the major architecture version for ISAs up to and including v8. From
v8.1 onwards, its definition is changed to include minor versions, such
that for an architecture vX.Y, __ARM_ARCH = X*100 + Y. Before this
patch, LLVM defined __ARM_ARCH using only the major architecture version
for all architecture versions. This patch adds functionality to define
__ARM_ARCH correctly for architectures greater than or equal to v8.1.
2024-02-13 15:12:35 +00:00
Lucas Duarte Prates
6bbaad1ed4
[ARM] Introduce the v9.5-A architecture version to Arm targets (#78994)
This introduces the Armv9.5-A architecture version to the Arm backend,
following on from the existing implementation for AArch64 targets.

Mode details about the Armv9.5-A architecture version can be found at:
* https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
* https://developer.arm.com/documentation/ddi0602/2023-09/
2024-01-23 14:39:15 +00:00
hstk30-hw
4f68ee36fc
[ARM] arm_acle.h add Coprocessor Instrinsics (#75440)
https://github.com/llvm/llvm-project/issues/75424 

Add Coprocessor Instrinsics
2024-01-09 19:04:29 +08:00
Kazu Hirata
a70dcc2cda [clang] Use StringRef::ltrim (NFC) 2023-12-27 09:10:39 -08:00
Tomas Matheson
7bd17212ef Re-land "[AArch64] Codegen support for FEAT_PAuthLR" (#75947)
This reverts commit 9f0f5587426a4ff24b240018cf8bf3acc3c566ae.

Fix expensive checks failure by properly marking register def for ADR.
2023-12-21 18:32:55 +00:00
Tomas Matheson
9f0f558742 Revert "[AArch64] Codegen support for FEAT_PAuthLR"
This reverts commit 5992ce90b8c0fac06436c3c86621fbf6d5398ee5.

Builtbot failures with expensive checks enabled.
2023-12-21 16:25:55 +00:00
Tomas Matheson
5992ce90b8 [AArch64] Codegen support for FEAT_PAuthLR
- Adds a new +pc option to -mbranch-protection that will enable
  the use of PC as a diversifier in PAC branch protection code.

- When +pauth-lr is enabled (-march=armv9.5a+pauth-lr) in combination
  with -mbranch-protection=pac-ret+pc, the new 9.5-a instructions
  (pacibsppc, retaasppc, etc) are used.

Documentation for the relevant instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/

Co-authored-by: Lucas Prates <lucas.prates@arm.com>
2023-12-21 14:18:33 +00:00
Brad Smith
a63dc79d11
[Clang][OHOS] Keep ARM ABI selection logic in sync between Clang and LLVM (#68656) 2023-10-22 08:48:41 +03:00
Brad Smith
7cfe32d4d8
[Driver] Hook up Haiku ARM support (#67222) 2023-10-09 00:49:53 -04:00
M. Zeeshan Siddiqui
e621757365 [Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support
Pursuant to discussions at
https://discourse.llvm.org/t/rfc-c-23-p1467r9-extended-floating-point-types-and-standard-names/70033/22,
this commit enhances the handling of the __bf16 type in Clang.
- Firstly, it upgrades __bf16 from a storage-only type to an arithmetic
  type.
- Secondly, it changes the mangling of __bf16 to DF16b on all
  architectures except ARM. This change has been made in
  accordance with the finalization of the mangling for the
  std::bfloat16_t type, as discussed at
  https://github.com/itanium-cxx-abi/cxx-abi/pull/147.
- Finally, this commit extends the existing excess precision support to
  the __bf16 type. This applies to hardware architectures that do not
  natively support bfloat16 arithmetic.
Appropriate tests have been added to verify the effects of these
changes and ensure no regressions in other areas of the compiler.

Reviewed By: rjmccall, pengfei, zahiraam

Differential Revision: https://reviews.llvm.org/D150913
2023-05-27 13:33:50 +08:00
John Brawn
78bf8a0a22 [clang] Don't define predefined macros multiple times
Fix several instances of macros being defined multiple times
in several targets. Most of these are just simple duplication in a
TargetInfo or OSTargetInfo of things already defined in
InitializePredefinedMacros or InitializeStandardPredefinedMacros,
but there are a few that aren't:
 * AArch64 defines a couple of feature macros for armv8.1a that are
   handled generically by getTargetDefines.
 * CSKY needs to take care when CPUName and ArchName are the same.
 * Many os/target combinations result in __ELF__ being defined twice.
   Instead define __ELF__ just once in InitPreprocessor based on
   the Triple, which already knows what the object format is based
   on os and target.

These changes shouldn't change the final result of which macros are
defined, with the exception of the changes to __ELF__ where if you
explicitly specify the object type in the triple then this affects
if __ELF__ is defined, e.g. --target=i686-windows-elf results in it
being defined where it wasn't before, but this is more accurate as an
ELF file is in fact generated.

Differential Revision: https://reviews.llvm.org/D150966
2023-05-24 17:28:41 +01:00
Stoorx
42d758bfa6 [clang] Return std::string_view from TargetInfo::getClobbers()
Change the return type of `getClobbers` function from `const char*`
to `std::string_view`. Update the function usages in CodeGen module.

The reasoning of these changes is to remove unsafe `const char*`
strings and prevent unnecessary allocations for constructing the
`std::string` in usages of `getClobbers()` function.

Differential Revision: https://reviews.llvm.org/D148799
2023-04-24 12:16:54 +03:00
Tim Northover
d3aed4f401 MachO use generic code to detect atomic support.
The default code can detect what width of atomic instructions are supported
based on the targeted architecture profile, version etc so there's no need to
hard-code 64 on Darwin targets (especially as it's wrong in most M-class
cases).
2023-04-04 13:44:45 +01:00
Pavel Kosov
28997feb0c [LLVM][OHOS] Clang toolchain and targets
Add a clang part of OpenHarmony target

Related LLVM part: D138202

~~~

Huawei RRI, OS Lab

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D145227
2023-03-20 12:53:24 +03:00
Michael Platings
60bbf271b5 [ARM][NFC] Use FPUKind enum instead of unsigned
Also rename some FPUID variables to FPUKind now it's clear that's what
they are.

Differential Revision: https://reviews.llvm.org/D146141
2023-03-16 13:38:10 +00:00
Daniel Thornburgh
d505d20a62 Revert "[LLVM][OHOS] Clang toolchain and targets"
This change had tests that break whenever LLVM_ENABLE_LINKER_BUILD_ID is
set, as is the case in the Fuchsia target.

This reverts commits:
f81317a54586dbcef0c14cf512a0770e8ecaab3d
72474afa27570a0a1307f3260f0187b703aa6d84
2023-03-14 13:46:21 -07:00
Pavel Kosov
72474afa27 [LLVM][OHOS] Clang toolchain and targets
Add a clang part of OpenHarmony target

Related LLVM part: D138202

~~~

Huawei RRI, OS Lab

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D145227
2023-03-14 12:24:44 +03:00
Brad Smith
13a10e7ec9 [Driver][FreeBSD] Simplify ARM handling
Since FreeBSD 8 / 9 support was dropped from the Driver there is room to simplify
things with the ARM handling.

The exception model handling function can be removed.

EABI is now the default.

Reviewed By: dim

Differential Revision: https://reviews.llvm.org/D144823
2023-03-10 16:10:44 -05:00
serge-sans-paille
5a7f47cc02
[clang] Optimize clang::Builtin::Info density
Reorganize clang::Builtin::Info to have them naturally align on 4 bytes
boundaries.

Instead of storing builtin headers as a straight char pointer, enumerate
them and store the enum. It allows to use a small enum instead of a
pointer to reference them.

On a 64 bit machine, this brings sizeof(clang::Builtin::Info) from 56
down to 48 bytes.

On a release build on my Linux 64 bit machine, it shrinks the size of
libclang-cpp.so by 193kB.

The impact on performance is negligible in terms of instruction count,
but the wall time seems better, see
https://llvm-compile-time-tracker.com/compare.php?from=b3d8639f3536a4876b511aca9fb7948ff9266cee&to=a89b56423f98b550260a58c41e64aff9e56b76be&stat=task-clock

Differential Revision: https://reviews.llvm.org/D142024
2023-01-23 14:27:44 +01:00
serge-sans-paille
a3c248db87
Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ part
This is a follow-up to https://reviews.llvm.org/D140896, split into
several parts as it touches a lot of files.

Differential Revision: https://reviews.llvm.org/D141139
2023-01-09 12:15:24 +01:00
serge-sans-paille
d9ab3e82f3
[clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470.

The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.

Differential Revision: https://reviews.llvm.org/D139881
2022-12-27 09:55:19 +01:00
Ties Stuij
983f63f7f0 [AArch64][ARM] add Armv8.9-a/Armv9.4-a identifier support
For both ARM and AArch64 add support for specifying -march=armv8.9a/armv9.4a to
clang. Add backend plumbing like target parser and predicate support.

For a summary of Amv8.9/Armv9.4 features, see:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022

For detailed information, consult the Arm Architecture Reference Manual for
A-profile architecture:
https://developer.arm.com/documentation/ddi0487/latest/

People who contributed to this patch:
- Keith Walker
- Ties Stuij

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D138010
2022-11-16 10:20:14 +00:00
Manoj Gupta
2497d5aa77 Define _GNU_SOURCE for arm baremetal in C++ mode.
This matches other C++ drivers e.g. Linux that define
_GNU_SOURCE. This lets clang compiler more code by default
without explicitly passing _GNU_SOURCE on command line.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D136712
2022-11-03 13:58:47 -07:00
David Green
9c48b7f0e7 [AArch64][ARM] Alter v8.1a neon intrinsics to be target-based, not preprocessor based
As a continuation of D132034, this switches the QRDMX v8.1a neon
intrinsics over from preprocessor defines to be target-gated. As there
is no "rdma" or "qrdmx" target feature, they use the "v8.1a"
architecture feature directly.

This works well for AArch64, but something needs to be done for Arm at
the same time, as they both use the same header and tablegen emitter.
This patch opts for adding "v8.1a" and all dependant target features to
the Arm TargetParser, similar to what was recently done for AArch64 but
through initFeatureMap when the Architecture is parsed. I attempted to
make the code similar to the AArch64 backend.

Otherwise this is similar to the changes made in D132034.

Differential Revision: https://reviews.llvm.org/D135615
2022-10-25 09:02:52 +01:00
Ties Stuij
95bbe9a193 [clang][ARM] follow GCC behavior for defining __SOFTFP__
GCC behavior regarding defining __SOFTFP__ when (implicitly) specifying
-mfloat-abi=softfp:
- compile without (implicit) FP: define __SOFTFP__
- compile with (implicit) FP: don't define __SOFTFP__

Currently Clang doesn't define __SOFTFP__ when softfp is specified, either with
or without FP. This patch brings Clang in line with GCC behavior.

This was raised by itaig1 over on Github:
https://github.com/llvm/llvm-project/issues/55755

Reviewed By: pratlucas

Differential Revision: https://reviews.llvm.org/D135680
2022-10-18 14:38:03 +01:00
David Green
b879f99f0e [AArch64][ARM] Alter most of arm_neon.h to be target-based, not preprocessor based.
Similar to D131064, this alters most of the intrinsics in arm_neon.h to
be target based, not preprocessor based. The intrinsics that are changed
are the ones with obvious target features (fp16, fp16fml, cryptos, i8mm
and bf16). The ones that are not yet altered are the ones without target
features like rdma (8.1) and complex (8.3). Those will be switched in a
followup patch that allows targeting architecture versions.

The existing ArchGuard in arm_neon.td is split into ArchGuard that still
adds ifdef defines (for example for intrinsics that require __aarch64__),
and TargetGuards for intrinsics dependant on target features. From there
the TargetGuards are used in two ways:
 - For intrinsics emitted as functions, __attribute__((target(TargetGuard)))
   is added to the definition of the function. Along with the existing
   always_inline intrinsic, this will give a compile time error if the
   function is used in a context where the target feature is not available.
 - For intrinsics emitted as macros, the __builtins are emitted into
   arm_neon.inc using TARGET_BUILTIN as opposed to BUILTIN, which includes
   the target feature and gives an error if the builtin is found in a
   function without the required features, similar to arm_sve.h.

The second method requires that the intrinsics be separable from the
existing _v intrinsics used in other types. For example
__builtin_neon_splat_lane_bf16 is used as opposed to
__builtin_neon_splat_lane_v. There are some adjustments to the CGBuiltin
to account for intrinsics that can be treated similarly, except for
their target features.

Differential Revision: https://reviews.llvm.org/D132034
2022-10-11 09:09:16 +01:00
David Green
123064dc39 [Clang][Arm] Convert -fallow-half-arguments-and-returns to a target option. NFC
This cc1 option -fallow-half-arguments-and-returns allows __fp16 to be
passed by argument and returned, without giving an error. It is
currently always enabled for Arm and AArch64, by forcing the option in
the driver. This means any cc1 tests (especially those needing
arm_neon.h) need to specify the option too, to prevent the error from
being emitted.

This changes it to a target option instead, set to true for Arm and
AArch64. This allows the option to be removed. Previously it was implied
by -fnative_half_arguments_and_returns, which is set for certain
languages like open_cl, renderscript and hlsl, so that option now too
controls the errors. There were are few other non-arm uses of
-fallow-half-arguments-and-returns but I believe they were unnecessary.
The strictfp_builtins.c tests were converted from __fp16 to _Float16 to
avoid the issues.

Differential Revision: https://reviews.llvm.org/D133885
2022-09-29 11:00:32 +01:00
Fangrui Song
069ecd0c6e [ARM] Check target feature support for __builtin_arm_crc*
`__builtin_arm_crc*` requires the target feature crc which is available on armv8
and above. Calling the fuctions for armv7 leads to a SelectionDAG crash.

```
% clang -c --target=armv7-unknown-linux-gnueabi -c a.c
fatal error: error in backend: Cannot select: intrinsic %llvm.arm.crc32b
PLEASE submit a bug report to ...
```

Add `TARGET_BUILTIN` and define required features for these builtins to
report an error in `CodeGenFunction::checkTargetFeatures`. The problem is quite widespread.
I will add `TARGET_BUILTIN` for more builtins later.

Fix https://github.com/llvm/llvm-project/issues/57802

Differential Revision: https://reviews.llvm.org/D134127
2022-09-21 11:50:15 -07:00
tyb0807
650aec687e [ARM][AArch64] Add missing v8.x checks
Summary:
This patch adds checks that were missing in clang for Armv8.5/6/7-A. These include:
* ACLE macro defines for AArch32.
* Handling of crypto and SM4, SHA and AES feature flags on clang's driver.

Reviewers: dmgreen, SjoerdMeijer, tmatheson

Differential Revision: https://reviews.llvm.org/D116153
2022-02-22 09:07:59 +00:00