llvm-project

Author	SHA1	Message	Date
Orlando Cazalet-Hyams	9459c8309c	[KeyInstr][Clang] Add ApplyAtomGroup (#134632 ) This is a scoped helper similar to ApplyDebugLocation that creates a new source location atom group which instructions can be added to. A source atom is a source construct that is "interesting" for debug stepping purposes. We use an atom group number to track the instruction(s) that implement the functionality for the atom, plus backup instructions/source locations. This patch is part of a stack that teaches Clang to generate Key Instructions metadata for C and C++. RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668 The feature is only functional in LLVM if LLVM is built with CMake flag LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.	2025-05-21 17:40:45 +01:00
Thurston Dang	b24c33a9d7	[cfi] Enable -fsanitize-annotate-debug-info functionality for CFI checks (#139809 ) This connects the -fsanitize-annotate-debug-info plumbing (https://github.com/llvm/llvm-project/pull/138577) to CFI check codegen, using SanitizerAnnotateDebugInfo() (https://github.com/llvm/llvm-project/pull/139965) and SanitizerInfoFromCFIKind (https://github.com/llvm/llvm-project/pull/140117). Note: SanitizerAnnotateDebugInfo() is updated to a public function because it is needed in ItaniumCXXABI. Updates the tests from https://github.com/llvm/llvm-project/pull/139149.	2025-05-19 09:39:26 -07:00
Thurston Dang	5defe490c9	[sanitizer][NFCI] Add 'SanitizerAnnotateDebugInfo' (#139965 ) This generalizes the debug info annotation code from https://github.com/llvm/llvm-project/pull/139149 and moves it into a helper function, SanitizerAnnotateDebugInfo(). Future work can use 'ApplyDebugLocation ApplyTrapDI(*this, SanitizerAnnotateDebugInfo(Ordinal));' to add annotations to additional checks.	2025-05-15 09:24:04 -07:00
Bill Wendling	9ae3bce175	[Clang][counted_by] Add support for 'counted_by' on struct pointers (#137250 ) The 'counted_by' attribute is now available for pointers in structs. It generates code for sanity checks as well as __builtin_dynamic_object_size() calculations. For example: struct annotated_ptr { int count; char buf __attribute__((counted_by(count))); }; If the pointer's type is 'void ', use the 'sized_by' attribute, which works similarly to 'counted_by', but can handle the 'void' base type: struct annotated_ptr { int count; void buf __attribute__((sized_by(count))); }; If the 'count' field member occurs after the pointer, use the '-fexperimental-late-parse-attributes' flag during compilation. Note that 'counted_by' cannot be applied to a pointer to an incomplete type, because the size isn't known. struct foo; struct annotated_ptr { int count; struct foo buf __attribute__((counted_by(count))); /* invalid */ }; Signed-off-by: Bill Wendling <morbo@google.com>	2025-05-13 16:01:36 -07:00
Matt Arsenault	a11d86461e	clang: Fix broken implicit cast to generic address space (#138863 ) This fixes emitting undefined behavior where a 64-bit generic pointer is written to a 32-bit slot allocated for a private pointer. This can be seen in test/CodeGenOpenCL/amdgcn-automatic-variable.cl's wrong_pointer_alloca.	2025-05-08 07:51:57 +02:00
Shafik Yaghmour	47681736cb	[Clang][NFC] Explicitly delete copy ctor and assignment for CGAtomicOptionsRAII (#137275 ) Static analysis flagged CGAtomicOptionsRAII as having an explicit destructor but not having explicit copy ctor and assignment. Rule of three says we should. We are just using this as an RAII object, no need for either so they will be specified as deleted.	2025-05-01 07:51:58 -07:00
Vitaly Buka	c3000333cd	Revert "[Reland][Clang][CodeGen][UBSan] Add more precise attributes to recoverable ubsan handlers" (#136402 ) Reverts llvm/llvm-project#135135 Breaks several bots, details in #135135.	2025-04-18 22:14:03 -07:00
Yingwei Zheng	909a9feda9	[Reland][Clang][CodeGen][UBSan] Add more precise attributes to recoverable ubsan handlers (#135135 ) This patch relands https://github.com/llvm/llvm-project/pull/130990. If the check value is passed by reference, add `memory(read)`. Original PR description: This patch adds `memory(argmem: read, inaccessiblemem: readwrite)` to recoverable ubsan handlers in order to unblock some memory/loop optimizations. It provides an average of 3% performance improvement on llvm-test-suite (except for 49 test failures due to ubsan diagnostics).	2025-04-17 23:23:30 +08:00
Akira Hatanaka	a3283a92ae	[PAC] Add support for __ptrauth type qualifier (#100830 ) The qualifier allows programmer to directly control how pointers are signed when they are stored in a particular variable. The qualifier takes three arguments: the signing key, a flag specifying whether address discrimination should be used, and a non-negative integer that is used for additional discrimination. ``` typedef void (my_callback)(const void); my_callback __ptrauth(ptrauth_key_process_dependent_code, 1, 0xe27a) callback; ``` Co-Authored-By: John McCall rjmccall@apple.com	2025-04-15 12:54:25 -07:00
Jan Górski	ff687af04f	[clang][CodeGen] Add range metadata for atomic load of boolean type. #131476 (#133546 ) Fixes #131476. For `x86_64` it folds ``` movzbl t1(%rip), %eax andb $1, %al ``` into ``` movzbl t1(%rip), %eax ``` when run: `clang -S atomic-ops-load.c -o atomic-ops-load.s -O1 --target=x86_64`. But for riscv replaces: ``` lb a0, %lo(t1)(a0) andi a0, a0, 1 ``` with ``` lb a0, %lo(t1)(a0) zext.b a0, a0 ``` when run: `clang -S atomic-ops-load.c -o atomic-ops-load.s -O1 --target=riscv64`.	2025-04-14 14:26:10 -07:00
Yingwei Zheng	04c38981a9	[Clang][CodeGen] Do not set inbounds flag in `EmitMemberDataPointerAddress` when the base pointer is null (#130952 ) See also https://github.com/llvm/llvm-project/pull/130734 for the original motivation. This pattern (`container_of`) is also widely used by real-world programs. Examples: `1d89d7d5d7/llvm/include/llvm/IR/SymbolTableListTraits.h (L77-L87)` `a2a53cb728/src/util-inl.h (L134-L137)` https://github.com/search?q=%29nullptr-%3E&type=code	2025-04-11 10:51:08 +08:00
Yingwei Zheng	1711996805	[Clang][CodeGen] Do not set inbounds flag for struct GEP with null base pointers (#130734 ) In the LLVM middle-end we want to fold `gep inbounds null, idx -> null`: https://alive2.llvm.org/ce/z/5ZkPx- This pattern is common in real-world programs (https://github.com/dtcxzyw/llvm-opt-benchmark/pull/55#issuecomment-1870963906). Generally, it exists in some (actually) unreachable blocks, which is introduced by JumpThreading. However, some old-style offsetof macros are still widely used in real-world C/C++ code (e.g., hwloc/slurm/luajit). To avoid breaking existing code and inconvenience to downstream users, this patch removes the inbounds flag from the struct gep if the base pointer is null.	2025-04-11 09:04:23 +08:00
Aaron Ballman	5c8ba28c75	[C11] Implement WG14 N1285 (temporary lifetimes) (#133472 ) This feature largely models the same behavior as in C++11. It is technically a breaking change between C99 and C11, so the paper is not being backported to older language modes. One difference between C++ and C is that things which are rvalues in C are often lvalues in C++ (such as the result of a ternary operator or a comma operator). Fixes #96486	2025-04-10 08:12:14 -04:00
Orlando Cazalet-Hyams	308654608c	[Clang][NFC] Move some static functions into CodeGenFunction (#134634 ) Patches in the Key Instructions (KeyInstr) stack need to access CGF in these functions. 2 CGF fields are passed to these functions already; at this point it felt natural to promote them to CGF methods.	2025-04-08 08:44:10 +01:00
Farzon Lotfi	16c84c4475	[DirectX] Add target builtins (#134439 ) - fixes #132303 - Moves dot2add from a language builtin to a target builtin. - Sets the scaffolding for Sema checks for DX builtins - Setup DirectX backend as able to have target builtins - Adds a DX TargetBuiltins emitter in `clang/lib/CodeGen/TargetBuiltins/DirectX.cpp`	2025-04-07 12:06:57 -04:00
Nikita Popov	b384d6d6cc	[CodeGen] Don't include CGDebugInfo.h in CodeGenFunction.h (NFC) (#134100 ) This is an expensive header, only include it where needed. Move some functions out of line to achieve that. This reduces time to build clang by ~0.5% in terms of instructions retired.	2025-04-03 08:04:19 +02:00
Lukacma	6c3adaafe3	[AARCH64][Neon] switch to using bitcasts in arm_neon.h where appropriate (#127043 ) Currently arm_neon.h emits C-style casts to do vector type casts. This relies on implicit conversion between vector types to be enabled, which is currently deprecated behaviour and soon will disappear. To ensure NEON code will keep working afterwards, this patch changes all this vector type casts into bitcasts. Co-authored-by: Momchil Velikov <momchil.velikov@arm.com>	2025-04-01 09:45:16 +01:00
Younan Zhang	f4218753ad	[Clang] Implement P0963R3 "Structured binding declaration as a condition" (#130228 ) This implements the R2 semantics of P0963. The R1 semantics, as outlined in the paper, were introduced in Clang 6. In addition to that, the paper proposes swapping the evaluation order of condition expressions and the initialization of binding declarations (i.e. std::tuple-like decompositions).	2025-03-11 15:41:56 +08:00
erichkeane	d5cec386c1	[OpenACC] Implement 'cache' construct AST/Sema This statement level construct takes no clauses and has no associated statement, and simply labels a number of array elements as valid for caching. The implementation here is pretty simple, but it is a touch of a special case for parsing, so the parsing code reflects that.	2025-03-03 13:57:23 -08:00
Yaxun (Sam) Liu	240f2269ff	Add clang atomic control options and attribute (#114841 ) Add option and statement attribute for controlling emitting of target-specific metadata to atomicrmw instructions in IR. The RFC for this attribute and option is https://discourse.llvm.org/t/rfc-add-clang-atomic-control-options-and-pragmas/80641, Originally a pragma was proposed, then it was changed to clang attribute. This attribute allows users to specify one, two, or all three options and must be applied to a compound statement. The attribute can also be nested, with inner attributes overriding the options specified by outer attributes or the target's default options. These options will then determine the target-specific metadata added to atomic instructions in the IR. In addition to the attribute, three new compiler options are introduced: `-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`, `-f[no-]atomic-ignore-denormal-mode`. These compiler options allow users to override the default options through the Clang driver and front end. `-m[no-]unsafe-fp-atomics` is aliased to `-f[no-]ignore-denormal-mode`. In terms of implementation, the atomic attribute is represented in the AST by the existing AttributedStmt, with minimal changes to AST and Sema. During code generation in Clang, the CodeGenModule maintains the current atomic options, which are used to emit the relevant metadata for atomic instructions. RAII is used to manage the saving and restoring of atomic options when entering and exiting nested AttributedStmt.	2025-02-27 10:41:04 -05:00
Chris B	761d422441	[HLSL] Implement HLSL intialization list support (#123141 ) This PR implements HLSL's initialization list behvaior as specified in the draft language specifcation under [Decl.Init.Agg](https://microsoft.github.io/hlsl-specs/specs/hlsl.html#Decl.Init.Agg). This behavior is a bit unusual for C/C++ because intermediate braces in initializer lists are ignored and a whole array of additional conversions occur unintuitively to how initializaiton works in C. The implementaiton in this PR generates a valid C/C++ initialization list AST for the HLSL initializer so that there are no changes required to Clang's CodeGen to support this. This design will also allow us to use Clang's rewrite to convert HLSL initializers to valid C/C++ initializers that are equivalent. It does have the downside that it will generate often redundant accesses during codegen. The IR optimizer is extremely good at eliminating those so this will have no impact on the final executable performance. There is some opportunity for optimizing the initializer list generation that we could consider in subsequent commits. One notable opportunity would be to identify aggregate objects that occur in the same place in both initializers and do not require converison, those aggregates could be initialized as aggregates rather than fully scalarized. Closes #56067 --------- Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com> Co-authored-by: Helena Kotas <hekotas@microsoft.com> Co-authored-by: Justin Bogner <mail@justinbogner.com>	2025-02-15 13:21:36 -06:00
Zahira Ammarguellat	cf69b4c668	[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#126927 ) This patch was reviewed and approved here: https://github.com/llvm/llvm-project/pull/119891 However it has been reverted here: `083df25dc2` due to a build issue here: https://lab.llvm.org/buildbot/#/builders/51/builds/10694 This patch is reintroducing the support.	2025-02-13 07:14:36 -05:00
Peter Rong	53c618c071	[clang] run clang-format on some CGObjC files (#126644 ) These files are relatively old and don't confront our formatting rules. It's hard to change them without massive clang-format changes. --------- Signed-off-by: Peter Rong <PeterRong@meta.com>	2025-02-12 11:52:49 -08:00
Kazu Hirata	67e1e98811	Revert "[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 )" This reverts commit 070f84ebc89b11df616a83a56df9ac56efbab783. Buildbot failure: https://lab.llvm.org/buildbot/#/builders/51/builds/10694	2025-02-11 12:39:01 -08:00
Zahira Ammarguellat	070f84ebc8	[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 ) Implement basic parsing and semantic support for `#pragma omp stripe` constuct introduced in https://www.openmp.org/wp-content/uploads/[OpenMP-API-Specification-6-0.pdf](https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf), section 11.7.	2025-02-11 13:58:21 -05:00
Sarah Spall	3f8e280206	[HLSL] Implement HLSL Elementwise casting (excluding splat cases); Re-land #118842 (#126258 ) Implement HLSLElementwiseCast excluding support for splat cases Do not support casting types that contain bitfields. Partly closes https://github.com/llvm/llvm-project/issues/100609 and partly closes https://github.com/llvm/llvm-project/issues/100619 Re-land #118842 after fixing warning as an error, found by a buildbot.	2025-02-07 09:12:55 -08:00
Sarah Spall	14716f2e4b	Revert "[HLSL] Implement HLSL Flat casting (excluding splat cases)" (#126149 ) Reverts llvm/llvm-project#118842	2025-02-06 15:25:20 -08:00
Sarah Spall	01072e546f	[HLSL] Implement HLSL Flat casting (excluding splat cases) (#118842 ) Implement HLSLElementwiseCast excluding support for splat cases Do not support casting types that contain bitfields. Partly closes #100609 and partly closes #100619	2025-02-06 14:38:01 -08:00
erichkeane	99a9133a68	[OpenACC] Implement Sema/AST for 'atomic' construct The atomic construct is a particularly complicated one. The directive itself is pretty simple, it has 5 options for the 'atomic-clause'. However, the associated statement is fairly complicated. 'read' accepts: v = x; 'write' accepts: x = expr; 'update' (or no clause) accepts: x++; x--; ++x; --x; x binop= expr; x = x binop expr; x = expr binop x; 'capture' accepts either a compound statement, or: v = x++; v = x--; v = ++x; v = --x; v = x binop= expr; v = x = x binop expr; v = x = expr binop x; IF 'capture' has a compound statement, it accepts: {v = x; x binop= expr; } {x binop= expr; v = x; } {v = x; x = x binop expr; } {v = x; x = expr binop x; } {x = x binop expr ;v = x; } {x = expr binop x; v = x; } {v = x; x = expr; } {v = x; x++; } {v = x; ++x; } {x++; v = x; } {++x; v = x; } {v = x; x--; } {v = x; --x; } {x--; v = x; } {--x; v = x; } While these are all quite complicated, there is a significant amount of similarity between the 'capture' and 'update' lists, so this patch reuses a lot of the same functions. This patch implements the entirety of 'atomic', creating a new Sema file for the sema for it, as it is fairly sizable.	2025-02-03 07:22:22 -08:00
Bill Wendling	cff0a460ae	[Clang][counted_by] Refactor __builtin_dynamic_object_size on FAMs (#122198 ) Refactoring of how __builtin_dynamic_object_size() is calculated for flexible array members (in preparation for adding support for the 'counted_by' attribute on pointers in structs). The only functionality change is that we use the already emitted Expr code to build our calculations off of rather than re-emitting the Expr. That allows the 'StructFieldAccess' visitor to sift through all casts and ArraySubscriptExprs to find the first MemberExpr. We build our GEPs and calculate offsets based off of relative distances from that MemberExpr. The testcase passes execution tests. Calculate the flexible array member's object size using these formulae (note: if the calculation is negative, we return 0.): struct p; struct s { /* ... / int count; struct p array[] __attribute__((counted_by(count))); }; 1) 'ptr->array': count = ptr->count; flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; if (flexible_array_member_size < 0) return 0; return flexible_array_member_size; 2) '&ptr->array[idx]': count = ptr->count; index = idx; flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; index_size = index * flexible_array_member_base_size; if (flexible_array_member_size < 0 \|\| index < 0) return 0; return flexible_array_member_size - index_size; 3) '&ptr->field': count = ptr->count; sizeof_struct = sizeof (struct s); flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; field_offset = offsetof (struct s, field); offset_diff = sizeof_struct - field_offset; if (flexible_array_member_size < 0) return 0; return offset_diff + flexible_array_member_size; 4) '&ptr->field_array[idx]': count = ptr->count; index = idx; sizeof_struct = sizeof (struct s); flexible_array_member_base_size = sizeof (ptr->array); flexible_array_member_size = count flexible_array_member_base_size; field_base_size = sizeof (ptr->field_array); field_offset = offsetof (struct s, field) field_offset += index field_base_size; offset_diff = sizeof_struct - field_offset; if (flexible_array_member_size < 0 \|\| index < 0) return 0; return offset_diff + flexible_array_member_size; --------- Signed-off-by: Bill Wendling <morbo@google.com>	2025-01-30 15:36:13 -08:00
Stephen Tozer	822f74a911	[Clang] Cleanup docs and comments relating to -fextend-variable-liveness (#124767 ) This patch contains a number of changes relating to the above flag; primarily it updates comment references to the old flag names, "-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names, "-fextend-variable-liveness[={all,this}]". These changes are all NFC. This patch also removes the explicit -fextend-this-ptr-liveness flag alias, and shortens the help-text for the main flag; these are both changes that were meant to be applied in the initial PR (#110000), but due to some user-error on my part they were not included in the merged commit.	2025-01-28 18:25:32 +00:00
Wolfgang Pieb	4424c44c8c	[Clang] Add fake use emission to Clang with -fextend-lifetimes (#110102 ) Following the previous patch which adds the "extend lifetimes" flag without (almost) any functionality, this patch adds the real feature by allowing Clang to emit fake uses. These are emitted as a new form of cleanup, set for variable addresses, which just emits a fake use intrinsic when the variable falls out of scope. The code for achieving this is simple, with most of the logic centered on determining whether to emit a fake use for a given address, and on ensuring that fake uses are ignored in a few cases. Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>	2025-01-28 12:30:31 +00:00
Momchil Velikov	f75860f895	[AArch64] Implement NEON FP8 intrinsics for fused multiply-add (#123615 ) This patch adds the following intrinsics: * Fused multiply-add non-indexed float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) * Floating-point multiply-add long to half-precision (vector, by element) float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) * Floating-point multiply-add long-long to single-precision (vector, by element) float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-28 00:38:44 +00:00
Momchil Velikov	804b81d39f	[AArch64] Add FP8 Neon intrinsics for dot-product (#123613 ) This patch adds the following intrinsics: float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-27 21:14:16 +00:00
Momchil Velikov	99bd2e3f12	[AArch64] Add Neon FP8 conversion intrinsics (#123612 ) The patch adds the following intrinsics: bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm) mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t fpm) Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>	2025-01-27 17:32:47 +00:00
Jeremy Morse	e14962a39c	[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. This patch changes some more complex call-sites, those crossing file boundaries and where I've had to perform some minor rewrites.	2025-01-27 15:25:17 +00:00
Tom Honermann	8fb42300a0	[SYCL] AST support for SYCL kernel entry point functions. (#122379 ) A SYCL kernel entry point function is a non-member function or a static member function declared with the `sycl_kernel_entry_point` attribute. Such functions define a pattern for an offload kernel entry point function to be generated to enable execution of a SYCL kernel on a device. A SYCL library implementation orchestrates the invocation of these functions with corresponding SYCL kernel arguments in response to calls to SYCL kernel invocation functions specified by the SYCL 2020 specification. The offload kernel entry point function (sometimes referred to as the SYCL kernel caller function) is generated from the SYCL kernel entry point function by a transformation of the function parameters followed by a transformation of the function body to replace references to the original parameters with references to the transformed ones. Exactly how parameters are transformed will be explained in a future change that implements non-trivial transformations. For now, it suffices to state that a given parameter of the SYCL kernel entry point function may be transformed to multiple parameters of the offload kernel entry point as needed to satisfy offload kernel argument passing requirements. Parameters that are decomposed in this way are reconstituted as local variables in the body of the generated offload kernel entry point function. For example, given the following SYCL kernel entry point function definition: ``` template<typename KernelNameType, typename KernelType> [[clang::sycl_kernel_entry_point(KernelNameType)]] void sycl_kernel_entry_point(KernelType kernel) { kernel(); } ``` and the following call: ``` struct Kernel { int dm1; int dm2; void operator()() const; }; Kernel k; sycl_kernel_entry_point<class kernel_name>(k); ``` the corresponding offload kernel entry point function that is generated might look as follows (assuming `Kernel` is a type that requires decomposition): ``` void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) { Kernel kernel{dm1, dm2}; kernel(); } ``` Other details of the generated offload kernel entry point function, such as its name and calling convention, are implementation details that need not be reflected in the AST and may differ across target devices. For that reason, only the transformation described above is represented in the AST; other details will be filled in during code generation. These transformations are represented using new AST nodes introduced with this change. `OutlinedFunctionDecl` holds a sequence of `ImplicitParamDecl` nodes and a sequence of statement nodes that correspond to the transformed parameters and function body. `SYCLKernelCallStmt` wraps the original function body and associates it with an `OutlinedFunctionDecl` instance. For the example above, the AST generated for the `sycl_kernel_entry_point<kernel_name>` specialization would look as follows: ``` FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)' TemplateArgument type 'kernel_name' TemplateArgument type 'Kernel' ParmVarDecl kernel 'Kernel' SYCLKernelCallStmt CompoundStmt <original statements> OutlinedFunctionDecl ImplicitParamDecl 'dm1' 'int' ImplicitParamDecl 'dm2' 'int' CompoundStmt VarDecl 'kernel' 'Kernel' <initialization of 'kernel' with 'dm1' and 'dm2'> <transformed statements with redirected references of 'kernel'> ``` Any ODR-use of the SYCL kernel entry point function will (with future changes) suffice for the offload kernel entry point to be emitted. An actual call to the SYCL kernel entry point function will result in a call to the function. However, evaluation of a `SYCLKernelCallStmt` statement is a no-op, so such calls will have no effect other than to trigger emission of the offload kernel entry point. Additionally, as a related change inspired by code review feedback, these changes disallow use of the `sycl_kernel_entry_point` attribute with functions defined with a _function-try-block_. The SYCL 2020 specification prohibits the use of C++ exceptions in device functions. Even if exceptions were not prohibited, it is unclear what the semantics would be for an exception that escapes the SYCL kernel entry point function; the boundary between host and device code could be an implicit noexcept boundary that results in program termination if violated, or the exception could perhaps be propagated to host code via the SYCL library. Pending support for C++ exceptions in device code and clear semantics for handling them at the host-device boundary, this change makes use of the `sycl_kernel_entry_point` attribute with a function defined with a _function-try-block_ an error.	2025-01-22 16:39:08 -05:00
CHANDRA GHALE	30f9a4f754	[OpenMP] codegen support for masked combined construct parallel masked taskloop simd. (#121746 ) Added codegen support for combined masked constructs `Parallel masked taskloop simd`. Added implementation for `EmitOMPParallelMaskedTaskLoopSimdDirective`. Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2025-01-14 18:26:46 +05:30
joaosaffran	380bb51b70	[HLSL] Adding Flatten and Branch if attributes with test fixes (#122157 ) - Adding the changes from PRs: - #116331 - #121852 - Fixes test `tools/dxil-dis/debug-info.ll` - Address some missed comments in the previous PR --------- Co-authored-by: joaosaffran <joao.saffran@microsoft.com>	2025-01-13 10:31:25 -08:00
CHANDRA GHALE	6f558e0e12	[OpenMP] codegen support for masked combined construct masked taskloop (#121914 ) Added codegen support for combined masked constructs `masked taskloop.` Added implementation for `EmitOMPMaskedTaskLoopDirective`. --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2025-01-13 11:42:13 +05:30
CHANDRA GHALE	1d2eea962a	[OpenMP] codegen support for masked combined construct masked taskloop simd (#121916 ) Added codegen support for combined masked constructs `masked taskloop simd`. Added implementation for `EmitOMPMaskedTaskLoopSimdDirective`. Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2025-01-12 23:38:00 +05:30
Thurston Dang	55b587506e	[ubsan][NFCI] Use SanitizerOrdinal instead of SanitizerMask for EmitCheck (exactly one sanitizer is required) (#122511 ) The `Checked` parameter of `CodeGenFunction::EmitCheck` is of type `ArrayRef<std::pair<llvm::Value *, SanitizerMask>>`, which is overly generalized: SanitizerMask can denote that zero or more sanitizers are enabled, but `EmitCheck` requires that exactly one sanitizer is specified in the SanitizerMask (e.g., `SanitizeTrap.has(Checked[i].second)` enforces that). This patch replaces SanitizerMask with SanitizerOrdinal in the `Checked` parameter of `EmitCheck` and code that transitively relies on it. This should not affect the behavior of UBSan, but it has the advantages that: - the code is clearer: it avoids ambiguity in EmitCheck about what to do if multiple bits are set - specifying the wrong number of sanitizers in `Checked[i].second` will be detected as a compile-time error, rather than a runtime assertion failure Suggested by Vitaly in https://github.com/llvm/llvm-project/pull/122392 as an alternative to adding an explicit runtime assertion that the SanitizerMask contains exactly one sanitizer.	2025-01-10 12:40:57 -08:00
CHANDRA GHALE	aedb30fdc7	[OpenMP] codegen support for masked combined construct parallel masked taskloop (#121741 ) Added codegen support for combined masked constructs Parallel masked taskloop. Added implementation for EmitOMPParallelMaskedTaskLoopDirective. --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2025-01-09 16:38:36 +05:30
NAKAMURA Takumi	397ac44f62	[Coverage] Introduce the type `CounterPair` for RegionCounterMap. NFC. (#112724 ) `CounterPair` can hold `<uint32_t, uint32_t>` instead of current `unsigned`, to hold also the counter number of SkipPath. For now, this change provides the skeleton and only `CounterPair::Executed` is used. Each counter number can have `None` to suppress emitting counter increment. 2nd element `Skipped` is initialized as `None` by default, since most `Stmt` don't have a pair of counters. This change also provides stubs for the verifier. I'll provide the impl of verifier for `+Asserts` later. `markStmtAsUsed(bool, Stmt)` may be used to inform that other side counter may not emitted. `markStmtMaybeUsed(S)` may be used for the `Stmt` and its inner will be excluded for emission in the case of skipping by constant folding. I put it into places where I found. `verifyCounterMap()` will check the coverage map and the counter map, and can be used to report inconsistency. These verifier methods shall be eliminated in `-Asserts`. https://discourse.llvm.org/t/rfc-integrating-singlebytecoverage-with-branch-coverage/82492	2025-01-09 17:11:07 +09:00
Chris B	b66f6b25cb	Revert #116331 & #121852 (#122105 )	2025-01-08 08:55:02 -06:00
erichkeane	db81e8c42e	[OpenACC] Initial sema implementation of 'update' construct This executable construct has a larger list of clauses than some of the others, plus has some additional restrictions. This patch implements the AST node, plus the 'cannot be the body of a if, while, do, switch, or label' statement restriction. Future patches will handle the rest of the restrictions, which are based on clauses.	2025-01-07 08:20:20 -08:00
erichkeane	21c785d7bd	[OpenACC] Implement 'set' construct sema The 'set' construct is another fairly simple one, it doesn't have an associated statement and only a handful of allowed clauses. This patch implements it and all the rules for it, allowing 3 of its for clauses. The only exception is default_async, which will be implemented in a future patch, because it isn't just being enabled, it needs a complete new implementation.	2025-01-06 11:03:18 -08:00
joaosaffran	0d5c07285f	[HLSL] Adding Flatten and Branch if attributes (#116331 ) - adding Flatten and Branch to if stmt. - adding dxil control flow hint metadata generation - modifing spirv OpSelectMerge to account for the specific attributes. Closes #70112 --------- Co-authored-by: Joao Saffran <jderezende@microsoft.com> Co-authored-by: joaosaffran <joao.saffran@microsoft.com>	2025-01-06 10:27:02 -08:00
Farzon Lotfi	21edac25f0	[SPIRV] Add Target Builtins using Distance ext as an example (#121598 ) - Update pr labeler so new SPIRV files get properly labeled. - Add distance target builtin to BuiltinsSPIRV.td. - Update TargetBuiltins.h to account for spirv builtins. - Update clang basic CMakeLists.txt to build spirv builtin tablegen. - Hook up sema for SPIRV in Sema.h\|cpp, SemaSPIRV.h\|cpp, and SemaChecking.cpp. - Hookup sprv target builtins to SPIR.h\|SPIR.cpp target. - Update GBuiltin.cpp to emit spirv intrinsics when we get the expected spirv target builtin. Consensus was reach in this RFC to add both target builtins and pattern matching: https://discourse.llvm.org/t/rfc-add-targetbuiltins-for-spirv-to-support-hlsl/83329. pattern matching will come in a separate pr this one just sets up the groundwork to do target builtins for spirv. partially resolves [#99107](https://github.com/llvm/llvm-project/issues/99107)	2025-01-06 11:37:20 -05:00
Sameer Sahasrabuddhe	df67e37e37	[clang][NFC] clean up the handling of convergence control tokens (#121738 )	2025-01-06 21:34:11 +05:30

1 2 3 4 5 ...

1759 Commits