llvm-project

Author	SHA1	Message	Date
Mikołaj Piróg	af522c5dd3	[AVX10.2] Fix wrong mask casting in some convert intrinsics (#126627 ) Found during work on #120927. This caused the compiler to silently drop ignore half of the mask in the specific intrinsics.	2025-02-11 13:13:36 +08:00
Mikołaj Piróg	161cfc6f39	[AVX10.2] Fix wrong intrinsic names after rename (#126390 ) In my previous PR (#123656) to update the names of AVX10.2 intrinsics and mnemonics, I have erroneously deleted `_ph` from few intrinsics. This PR corrects this.	2025-02-10 12:48:02 +08:00
Wael Yehia	8e61aae4a8	[profile] Add a clang option -fprofile-continuous that enables continuous instrumentation profiling mode (#124353 ) In Continuous instrumentation profiling mode, profile or coverage data collected via compiler instrumentation is continuously synced to the profile file. This feature has existed for a while, and is documented here: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program This PR creates a user facing option to enable the feature. --------- Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>	2025-02-08 17:25:07 -05:00
Michael Buch	e00fc80c19	[clang][DebugInfo] Set EnumKind based on enum_extensibility attribute (#126045 ) This is the 2nd part to https://github.com/llvm/llvm-project/pull/124752. Here we make sure to set the `DICompositeType` `EnumKind` if the enum was declared with `__attribute__((enum_extensibility(...)))`. In DWARF this will be rendered as `DW_AT_APPLE_enum_kind` and will be used by LLDB when creating `clang::EnumDecl`s from debug-info. Depends on https://github.com/llvm/llvm-project/pull/126044	2025-02-07 09:28:10 +00:00
Lukacma	e833e5276c	[AArch64][Clang] Update untyped sme intrinsics with fp8 variants (#124543 ) This patch adds fp8 variants to the following untyped SME intrinsics based on [ACLE](https://github.com/ARM-software/acle/blob/main/main/acle.md): ``` SVREVD SVSEL_X2 SVSEL_X4 SVZIP_X2 SVZIPQ_X2 SVZIP_X4 SVZIPQ_X4 SVUZP_X2 SVUZPQ_X2 SVUZP_X4 SVUZPQ_X4 SVREAD_ZA8_H SVREAD_ZA8_V SVREAD_ZA128 SVWRITE_ZA8_H SVWRITE_ZA8_V SVWRITE_ZA128 SVREAD_ZA8_VG2_H SVREAD_ZA8_VG2_V SVREAD_ZA8_VG4_H SVREAD_ZA8_VG4_V SVREAD_ZA8_VG1x2 SVREAD_ZA8_VG1x4 SVWRITE_ZA8_VG2_H SVWRITE_ZA8_VG2_V SVWRITE_ZA8_VG4_H SVWRITE_ZA8_VG4_V SVWRITE_ZA8_VG1x2 SVWRITE_ZA8_VG1x4 SVLUTI2_LANE_ZT_X4 SVLUTI2_LANE_ZT SVLUTI4_LANE_ZT SVLUTI2_LANE_ZT_X2 SVLUTI4_LANE_ZT_X2 SVREADZ_ZA8_X2_H SVREADZ_ZA8_X2_V SVREADZ_ZA8_X4_H SVREADZ_ZA8_X4_V SVREADZ_ZA8_H SVREADZ_ZA8_V SVREADZ_VG2_B SVREADZ_VG4_B ```	2025-02-06 14:06:45 +00:00
Scott Constable	e223485c9b	[X86] Extend kCFI with a 3-bit arity indicator (#121070 ) Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits. This hash is written into each indirect-callable function header by encoding it as the 32-bit immediate operand to a `MOVri` instruction, e.g.: ``` __cfi_foo: nop nop nop nop nop nop nop nop nop nop nop movl $199571451, %eax # hash of foo's type = 0xBE537FB foo: ... ``` This PR extends x86-based kCFI with a 3-bit arity indicator encoded in the `MOVri` instruction's register (reg) field as follows: \| Arity Indicator \| Description \| Encoding in reg field \| \| --------------- \| --------------- \| --------------- \| \| 0 \| 0 parameters \| EAX \| \| 1 \| 1 parameter in RDI \| ECX \| \| 2 \| 2 parameters in RDI and RSI \| EDX \| \| 3 \| 3 parameters in RDI, RSI, and RDX \| EBX \| \| 4 \| 4 parameters in RDI, RSI, RDX, and RCX \| ESP \| \| 5 \| 5 parameters in RDI, RSI, RDX, RCX, and R8 \| EBP \| \| 6 \| 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 \| ESI \| \| 7 \| At least one parameter may be passed on the stack \| EDI \| For example, if `foo` takes 3 register arguments and no stack arguments then the `MOVri` instruction in its kCFI header would instead be written as: ``` movl $199571451, %ebx # hash of foo's type = 0xBE537FB ``` This PR will benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI function header, then this information is trivial to infer. Note that there is another existing PR proposal that includes the 3-bit arity within the existing 32-bit immediate field, which introduces different security properties: https://github.com/llvm/llvm-project/pull/117121.	2025-02-06 10:54:22 +08:00
Saleem Abdulrasool	1901f4ac8e	CodeGen: support static linking for libclosure (#125384 ) When building on Windows, dealing with the BlocksRuntime is slightly more complicated. As we are not guaranteed a formward declaration for the blocks runtime ABI symbols, we may generate the declarations for them. In order to properly link against the well-known types, we always annotated them as `__declspec(dllimport)`. This would require the dynamic linking of the blocks runtime under all conditions. However, this is the only the only possible way to us the library. We may be building a fully sealed (static) executable. In such a case, the well known symbols should not be marked as `dllimport` as they are assumed to be statically available with the static linking to the BlocksRuntime. Introduce a new driver/cc1 option `-static-libclosure` which mirrors the myriad of similar options (`-static-libgcc`, `-static-libstdc++`, -static-libsan`, etc).	2025-02-05 15:15:36 -08:00
Paul Kirth	8b448842c4	[clang][NFC] Precommit test file refactoring (#125944 ) An upcoming change will need to use add additional tests to this file, so this patch updates the RUN line to use a test prefix.	2025-02-05 14:09:39 -08:00
Pranav Kant	4055be55b8	Fix broken clang codegen test (avx-cxx-record.cpp) (#125787 ) Fixes e8a486ea97895a18e1bba75431d37d9758886084	2025-02-04 16:12:12 -08:00
Bill Wendling	2eb44aa0a9	[Clang][counted-by] Bail out of visitor for LValueToRValue cast (#125571 ) An LValueToRValue cast shouldn't be ignored, so bail out of the visitor if we encounter one.	2025-02-04 11:00:44 -08:00
Pranav Kant	e8a486ea97	[clang] Return larger CXX records in memory (#120670 ) We incorrectly return CXX records in AVX registers when they should be returned in memory. This is violation of x86-64 psABI. Detailed discussion is here: https://groups.google.com/g/x86-64-abi/c/BjOOyihHuqg/m/KurXdUcWAgAJ	2025-02-04 09:42:12 -08:00
Durgadoss R	91cb8f5d32	[NVPTX] Add tcgen05 alloc/dealloc intrinsics (#124961 ) This patch adds intrinsics for the tcgen05 alloc/dealloc family of PTX instructions. This patch also adds an addrspace 6 for tensor memory which is used by these intrinsics. lit tests are added and verified with a ptxas-12.8 executable. Documentation for these additions is also added in NVPTXUsage.rst. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-02-04 14:31:40 +05:30
Benjamin Maxwell	692c9b2107	[clang] Support member function poiners in Decl::getFunctionType() (#125077 ) This seems consistent with the documentation, which claims it: ``` /// Looks through the Decl's underlying type to extract a FunctionType /// when possible. Will return null if the type underlying the Decl does not /// have a FunctionType. const FunctionType *getFunctionType(bool BlocksToo = true) const; ``` Note: This patch rewords this doc comment to clarify it includes various function pointer types. Without this, attaching attributes (which use `HasFunctionProto`) to member function pointers errors with: ``` error: '<attr>' only applies to non-K&R-style functions ``` ...which does not really make sense, since member functions are not K&C functions. With this change the Arm SME TypeAttrs work correctly on member function pointers. Note, however, that not all attributes work correctly when applied to function pointers or member function pointers. For example, `alloc_align` crashes when applied to a function pointer (on truck): https://godbolt.org/z/YvMhnhKfx (as it only expects a `FunctionDecl` not a `ParmVarDecl`). The same crash applies to member function pointers (for the same reason).	2025-02-03 09:37:16 +00:00
Saleem Abdulrasool	b798679c07	test: correct a typo in the check identifier (NFCI) This corrects a swapped order of the spelling of blocks in the check. This enables the correct forward declarations which were previously disabled.	2025-02-01 14:49:29 -08:00
Florian Hahn	77d3f8a925	[TBAA] Don't emit pointer-tbaa for void pointers. (#122116 ) While there are no special rules in the standards regarding void pointers and strict aliasing, emitting distinct tags for void pointers break some common idioms and there is no good alternative to re-write the code without strict-aliasing violations. An example is to count the entries in an array of pointers: int count_elements(void * values) { void **seq = values; int count; for (count = 0; seq && seq[count]; count++); return count; } https://clang.godbolt.org/z/8dTv51v8W An example in the wild is from https://github.com/llvm/llvm-project/issues/119099 This patch avoids emitting distinct tags for void pointers, to avoid those idioms causing mis-compiles for now. Fixes https://github.com/llvm/llvm-project/issues/119099. Fixes https://github.com/llvm/llvm-project/issues/122537. PR: https://github.com/llvm/llvm-project/pull/122116	2025-01-31 11:38:14 +00:00
Oliver Stannard	97b066f4e9	[ARM] Empty structs are 1-byte for C++ ABI (#124762 ) For C++ (but not C), empty structs should be passed to functions as if they are a 1 byte object with 1 byte alignment. This is defined in Arm's CPPABI32: https://github.com/ARM-software/abi-aa/blob/main/cppabi32/cppabi32.rst For the purposes of parameter passing in AAPCS32, a parameter whose type is an empty class shall be treated as if its type were an aggregate with a single member of type unsigned byte. The AArch64 equivalent of this has an exception for structs containing an array of size zero, I've kept that logic for ARM. I've not found a reason for this exception, but I've checked that GCC does have the same behaviour for ARM as it does for AArch64. The AArch64 version has an Apple ABI with different rules, which ignores empty structs in both C and C++. This is documented at https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms. The ARM equivalent of that appears to be AAPCS16_VFP, used for WatchOS, but I can't find any documentation for that ABI, so I'm not sure what rules it should follow. For now I've left it following the AArch64 Apple rules.	2025-01-31 09:03:01 +00:00
David Green	9f1c825fb6	[AArch64] Enable vscale_range with +sme (#124466 ) If we have +sme but not +sve, we would not set vscale_range on functions. It should be valid to apply it with the same range with just +sme, which can help mitigate some performance regressions in cases such as scalable vector bitcasts (https://godbolt.org/z/exhe4jd8d).	2025-01-31 07:57:43 +00:00
Bill Wendling	cff0a460ae	[Clang][counted_by] Refactor __builtin_dynamic_object_size on FAMs (#122198 ) Refactoring of how __builtin_dynamic_object_size() is calculated for flexible array members (in preparation for adding support for the 'counted_by' attribute on pointers in structs). The only functionality change is that we use the already emitted Expr code to build our calculations off of rather than re-emitting the Expr. That allows the 'StructFieldAccess' visitor to sift through all casts and ArraySubscriptExprs to find the first MemberExpr. We build our GEPs and calculate offsets based off of relative distances from that MemberExpr. The testcase passes execution tests. Calculate the flexible array member's object size using these formulae (note: if the calculation is negative, we return 0.): struct p; struct s { /* ... / int count; struct p array[] __attribute__((counted_by(count))); }; 1) 'ptr->array': count = ptr->count; flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; if (flexible_array_member_size < 0) return 0; return flexible_array_member_size; 2) '&ptr->array[idx]': count = ptr->count; index = idx; flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; index_size = index * flexible_array_member_base_size; if (flexible_array_member_size < 0 \|\| index < 0) return 0; return flexible_array_member_size - index_size; 3) '&ptr->field': count = ptr->count; sizeof_struct = sizeof (struct s); flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; field_offset = offsetof (struct s, field); offset_diff = sizeof_struct - field_offset; if (flexible_array_member_size < 0) return 0; return offset_diff + flexible_array_member_size; 4) '&ptr->field_array[idx]': count = ptr->count; index = idx; sizeof_struct = sizeof (struct s); flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; field_base_size = sizeof (ptr->field_array); field_offset = offsetof (struct s, field) field_offset += index field_base_size; offset_diff = sizeof_struct - field_offset; if (flexible_array_member_size < 0 \|\| index < 0) return 0; return offset_diff + flexible_array_member_size; --------- Signed-off-by: Bill Wendling <morbo@google.com>	2025-01-30 15:36:13 -08:00
Daniel Paoliello	845cc968e9	[clang][llvm][aarch64][win] Add a clang flag and module attribute for import call optimization, and remove LLVM flag (#122831 ) Switches import call optimization from being enabled by an LLVM flag to instead using a module attribute, and creates a new Clang flag that will set that attribute. This addresses the concern raised in the original PR: <https://github.com/llvm/llvm-project/pull/121516#discussion_r1911763991> This change also only creates the Called Global info if the module attribute is present, addressing this concern: <https://github.com/llvm/llvm-project/pull/122762#pullrequestreview-2547595934>	2025-01-30 09:51:43 -08:00
Thurston Dang	9c0606a08b	Reapply "[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs>" (#125032 ) (#125037 ) This reverts commit 928cad49beec0120686478f502899222e836b545 i.e., relands dccd27112722109d2e2f03e8da9ce8690f06e11b, with a fix to avoid use-after-scope by changing the lambda to capture by value.	2025-01-30 09:37:16 -08:00
Thurston Dang	928cad49be	Revert "[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs>" (#125032 ) Reverts llvm/llvm-project#124857 due to buildbot breakage (https://lab.llvm.org/buildbot/#/builders/46/builds/11310)	2025-01-29 22:03:05 -08:00
Thurston Dang	dccd271127	[ubsan] Connect -fsanitize-skip-hot-cutoff to LowerAllowCheckPass<cutoffs> (#124857 ) This adds the plumbing between -fsanitize-skip-hot-cutoff (introduced in https://github.com/llvm/llvm-project/pull/121619) and LowerAllowCheckPass<cutoffs> (introduced in https://github.com/llvm/llvm-project/pull/124211). The net effect is that -fsanitize-skip-hot-cutoff now combines the functionality of -ubsan-guard-checks and -lower-allow-check-percentile-cutoff (though this patch does not remove those yet), and generalizes the latter to allow per-sanitizer cutoffs. Note: this patch replaces Intrinsic::allow_ubsan_check's SanitizerHandler parameter with SanitizerOrdinal; this is necessary because the hot cutoffs are specified in terms of SanitizerOrdinal (e.g., null, alignment), not SanitizerHandler (e.g., TypeMismatch). Likewise, CodeGenFunction::EmitCheck is changed to emit allow_ubsan_check() for each individual check. --------- Co-authored-by: Vitaly Buka <vitalybuka@gmail.com> Co-authored-by: Vitaly Buka <vitalybuka@google.com>	2025-01-29 21:03:26 -08:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Oliver Stannard	e9c2e0acd7	[AArch64] Match GCC behaviour for zero-size structs (#124760 ) We had a test claiming that this empty struct type consumes a register slot when passing it to a function with GCC, but that does not appear to be the case, at least with GCC versions going back to 4.8. This also caused a miscompilation when passing one of these structs to a variadic function, but it turned out that our implementation of `va_arg` matches GCC's ABI, so the one change fixes both bugs.	2025-01-29 15:02:37 +00:00
Fraser Cormack	1ac3665e66	[clang] Restrict the use of scalar types in vector builtins (#119423 ) This commit restricts the use of scalar types in vector math builtins, particularly the `__builtin_elementwise_` builtins. Previously, small scalar integer types would be promoted to `int`, as per the usual conversions. This would silently do the wrong thing for certain operations, such as `add_sat`, `popcount`, `bitreverse`, and others. Similarly, since unsigned integer types were promoted to `int`, something like `add_sat(unsigned char, unsigned char)` would perform a signed* operation. With this patch, promotable scalar integer types are not promoted to int, and are kept intact. If any of the types differ in the binary and ternary builtins, an error is issued. Similarly an error is issued if builtins are supplied integer types of different signs. Mixing enums of different types in binary/ternary builtins now consistently raises an error in all language modes. This brings the behaviour surrounding scalar types more in line with that of vector types. No change is made to vector types, which are both not promoted and whose element types must match. Fixes #84047. RFC: https://discourse.llvm.org/t/rfc-change-behaviour-of-elementwise-builtins-on-scalar-integer-types/83725	2025-01-29 09:40:04 +00:00
Stephen Tozer	548ecde428	Add extra explicit triple to fix errors from #110102 Attempted fix for errors observed on: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8724466517233166081/overview Following the previous fixes, one test that needed an explicit triple was missed; this commit adds that triple.	2025-01-28 22:51:30 +00:00
Artem Belevich	310f55875f	[CUDA] Make target intrinsics work with ptx 8.7 (#124818 ) Fixes build break with CUDA-12.8 introduced in #123398	2025-01-28 12:14:46 -08:00
Stephen Tozer	de9b0ddedc	Add explicit triple to fix errors from #110102 Attempted fix for errors observed on: https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-windows-x64/b8724497761651154417/overview Several tests that required an explicit triple had none present; this patch adds those triples.	2025-01-28 19:34:41 +00:00
Stephen Tozer	822f74a911	[Clang] Cleanup docs and comments relating to -fextend-variable-liveness (#124767 ) This patch contains a number of changes relating to the above flag; primarily it updates comment references to the old flag names, "-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names, "-fextend-variable-liveness[={all,this}]". These changes are all NFC. This patch also removes the explicit -fextend-this-ptr-liveness flag alias, and shortens the help-text for the main flag; these are both changes that were meant to be applied in the initial PR (#110000), but due to some user-error on my part they were not included in the merged commit.	2025-01-28 18:25:32 +00:00
Thurston Dang	ef92e6b99f	[BoundsChecking] Update ubsantrap to use GuardKind (#124613 ) This change makes it consistent with other uses of ubsantrap. This also updates the tests. Notably, BoundsChecking/runtimes.ll had guard=3 which passed only because the method of calculating the parameter (`IRB.GetInsertBlock()->getParent()->size()`) happened to give the same answer.	2025-01-28 08:52:31 -08:00
Stephen Tozer	8ad9e1ecb7	[Clang] Fix use of deprecated method and missing triple Fixes two buildbot errors caused by 4424c44c (#110102): The first error, seen on some sanitizer bots: https://lab.llvm.org/buildbot/#/builders/51/builds/9901 The initial commit used the deprecated getDeclaration intrinsic instead of the non-deprecated getOrInsert- equivalent. This patch trivially updates the code in question to use the new intrinsic. The second error, seen on the clang-armv8-quick bot: https://lab.llvm.org/buildbot/#/builders/154/builds/10983 One of the tests depends on a particular triple to get the exact output expected by the test, but did not specify this triple; this patch adds the triple in question.	2025-01-28 13:21:41 +00:00
Wolfgang Pieb	4424c44c8c	[Clang] Add fake use emission to Clang with -fextend-lifetimes (#110102 ) Following the previous patch which adds the "extend lifetimes" flag without (almost) any functionality, this patch adds the real feature by allowing Clang to emit fake uses. These are emitted as a new form of cleanup, set for variable addresses, which just emits a fake use intrinsic when the variable falls out of scope. The code for achieving this is simple, with most of the logic centered on determining whether to emit a fake use for a given address, and on ensuring that fake uses are ignored in a few cases. Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>	2025-01-28 12:30:31 +00:00
Momchil Velikov	db6fa74dfe	[AArch64] Implement FP8 Neon reinterpret intrinsics (#120476 )	2025-01-28 11:06:24 +00:00
Nikita Popov	1295aa2e81	[Clang] Add -fwrapv-pointer flag (#122486 ) GCC supports three flags related to overflow behavior: * `-fwrapv`: Makes signed integer overflow well-defined. * `-fwrapv-pointer`: Makes pointer overflow well-defined. * `-fno-strict-overflow`: Implies `-fwrapv -fwrapv-pointer`, making both signed integer overflow and pointer overflow well-defined. Clang currently only supports `-fno-strict-overflow` and `-fwrapv`, but not `-fwrapv-pointer`. This PR proposes to introduce `-fwrapv-pointer` and adjust the semantics of `-fwrapv` to match GCC. This allows signed integer overflow and pointer overflow to be controlled independently, while `-fno-strict-overflow` still exists to control both at the same time (and that option is consistent across GCC and Clang).	2025-01-28 09:57:00 +01:00
Chandler Carruth	b968fd9502	[StrTable] Mechanically convert NVPTX builtins to use TableGen (#122873 ) This switches them to use tho common TableGen layer, extending it to support the missing features needed by the NVPTX backend. The biggest thing was to build a TableGen system that computes the cumulative SM and PTX feature sets the same way the macros did. That's done with some string concatenation tricks in TableGen, but they worked out pretty neatly and are very comparable in complexity to the macro version. Then the actual defines were mapped over using a very hacky Python script. It was never productionized or intended to work in the future, but for posterity: https://gist.github.com/chandlerc/10bdf8fb1312e252b4a501bace184b66 Last but not least, there was a very odd "bug" in one of the converted builtins' prototype in the TableGen model: it didn't handle uses of `Z` and `U` both as qualifiers of a single type, treating `Z` as its own `int32_t` type. So my hacky Python script converted `ZUi` into two types, an `int32_t` and an `unsigned int`. This produced a very wrong prototype. But the tests caught this nicely and I fixed it manually rather than trying to improve the Python script as it occurred in exactly one place I could find. This should provide direct benefits of allowing future refactorings to more directly leverage TableGen to express builtins more structurally rather than textually. It will also make my efforts to move builtins to string tables significantly more effective for the NVPTX backend where the X-macro approach resulted in significantly less efficient string tables than other targets due to the long repeated feature strings.	2025-01-27 22:45:37 -08:00
Momchil Velikov	f75860f895	[AArch64] Implement NEON FP8 intrinsics for fused multiply-add (#123615 ) This patch adds the following intrinsics: * Fused multiply-add non-indexed float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) * Floating-point multiply-add long to half-precision (vector, by element) float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) * Floating-point multiply-add long-long to single-precision (vector, by element) float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-28 00:38:44 +00:00
Momchil Velikov	804b81d39f	[AArch64] Add FP8 Neon intrinsics for dot-product (#123613 ) This patch adds the following intrinsics: float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-27 21:14:16 +00:00
Momchil Velikov	99bd2e3f12	[AArch64] Add Neon FP8 conversion intrinsics (#123612 ) The patch adds the following intrinsics: bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm) mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t fpm) Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>	2025-01-27 17:32:47 +00:00
Momchil Velikov	87103a016f	[AArch64] Implement NEON FP8 vectors as VectorType (#123603 ) Reimplement Neon FP8 vector types using attribute `neon_vector_type` instead of having them as builtin types. This allows to implement FP8 Neon intrinsics without the need to add special cases for these types when using `__builtin_shufflevector` or bitcast (using C-style cast operator) between vectors, both extensively used in the generated code in `arm_neon.h`.	2025-01-27 10:41:53 +00:00
Alexandros Lamprineas	474f5d2aef	[FMV][AArch64] Remove features predres and ls64. (#124266 ) These cannot be detected by reading the ID_AA64ISAR1_EL1 register since their corresponding bitfields are hidden. Additionally the instructions that these features enable are unusable from EL0. ACLE: https://github.com/ARM-software/acle/pull/382	2025-01-24 17:22:27 +00:00
Phoebe Wang	ee2722fc88	[X86][AVX10.2-BF16] Remove [NE]P from intrinsic and instruction name (#123335 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2025-01-24 15:49:28 +08:00
Phoebe Wang	24f177df61	[X86][AVX10.2-BF16] Update VCOMISBF16 intrinsics and instructions (#123307 ) - Add `I` to intrinsics and instructions - Add `_` before sbf16 in intrinsics Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2025-01-24 08:37:29 +08:00
Sam Elliott	33c4407471	[RISCV] Support cR Inline Asm Constraint (#124174 ) This denotes RVC-compatible GPR Pairs, which are used by the Zclsd extension. C API PR: riscv-non-isa/riscv-c-api-doc#102	2025-01-23 16:19:19 -08:00
Alexandros Lamprineas	5a7d92f7a0	[NFC] Remove invalid features from test and autogenerate checks. (#124130 )	2025-01-23 17:55:03 +00:00
Mikołaj Piróg	25653e558c	[AVX10.2] Update convert chapter intrinsic and mnemonics names (#123656 ) Intel spec for avx10.2 (https://cdrdv2.intel.com/v1/dl/getContent/828965) has been updated. This PR changes relevant names from the "AVX10 CONVERT INSTRUCTIONS" chapter .	2025-01-23 22:23:56 +08:00
Phoebe Wang	4f40b07533	[X86][AVX10.2-SATCVT][NFC] Remove NE from intrinsic and instruction name (#123275 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2025-01-22 22:53:47 +08:00
Oliver Stannard	c4ef805b0b	[Clang] Re-write codegen for atomic_test_and_set and atomic_clear (#121943 ) Re-write the sema and codegen for the atomic_test_and_set and atomic_clear builtin functions to go via AtomicExpr, like the other atomic builtins do. This simplifies the code, because AtomicExpr already handles things like generating code for to dynamically select the memory ordering, which was duplicated for these builtins. This also fixes a few crash bugs, one when passing an integer to the pointer argument, and one when using an array. This also adds diagnostics for the memory orderings which are not valid for atomic_clear according to https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, which were missing before. Fixes https://github.com/llvm/llvm-project/issues/111293. This is a re-land of #120449, modified to allow any non-const pointer type for the first argument.	2025-01-22 10:48:04 +00:00
Jonathan Thackray	d028eaaeb8	[AArch64] Update SVE untyped intrinsics to have FP8 variants (#123585 ) Update the following intrinsics to have FP8 variants: ``` c svuint8_t svdup_laneq[_u8](svuint8_t zn, uint64_t imm_idx); svuint8_t svextq[_u8](svuint8_t zdn, svuint8_t zm, uint64_t imm); svint8_t svtblq[_s8](svint8_t zn, svuint8_t zm); svint8_t svtbxq[_s8](svint8_t fallback, svint8_t zn, svuint8_t zm); svuint8_t svuzpq1[_u8](svuint8_t zn, svuint8_t zm); svuint8_t svuzpq2[_u8](svuint8_t zn, svuint8_t zm); svuint8_t svzipq1[_u8](svuint8_t zn, svuint8_t zm); svuint8_t svzipq2[_u8](svuint8_t zn, svuint8_t zm); ```	2025-01-21 13:34:57 +00:00
Phoebe Wang	13c6abfac8	[X86][AVX10.2-MINMAX][NFC] Remove NE[P] from intrinsic and instruction (#123272 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2025-01-21 19:55:09 +08:00
Oliver Stannard	84fa1755a5	[AArch64] FEAT_SPEv1p2 is optional in v8.7-A and v9.2-A (#123336 ) The FEAT_SPEv1p2 feature (known to LLVM as FeatureSPE_EEF and +spe-eef) was incorrectly marked as a required feature of Armv8.7-A (and later), which is incorrect because it is optional, and some CPUs do not implement it. This moves it to the default features list, so that it is still enabled by -march=armv8.7-a, but can be configured individually for each processor. For Cortex-A520 and Cortex-A520AE, I've checked that these do not have any of the FEAT_SPE* features, so updated the tests accordingly. All other Arm-designed v8.7A+ and v9.2A+ CPUs should continue to have it enabled. For Ampere1B and Fujitsu Monaka, these CPUs do not have the feature, so I've removed it from their tests. For Apple M4, I haven't found any reference for whether that CPU should have this feature, so I've added it to the CPU definition to avoid this being a functional change.	2025-01-21 10:12:36 +00:00

1 2 3 4 5 ...

9662 Commits