llvm-project

Author	SHA1	Message	Date
Joseph Huber	07896d44a3	[OpenMP] Emit aggregate kernel prototypes and remove libffi dependency (#186261 ) Summary: This PR changes the handling of the emitted kernels when targeting a CPU to be a pointer struct. The old handling emitted a standard function prototype, this necessitated a target specific ABI to call it because the signature differed with the number of arguments. Instead, this PR emits a void pointer to a naturally aligned struct, this is what APIs like `pthreads` assert. This allows us to remove all the complexity around launching host kernels and just pass the argument list.	2026-03-20 13:08:23 -05:00
Shivam Gupta	358f477720	[Clang] Fix clang crash for fopenmp statement(for) inside lambda function (#146772 ) C++ range-for statements introduce implicit variables such as `__range`, `__begin`, and `__end`. When such a loop appears inside an OpenMP loop-based directive (e.g. `#pragma omp for`) within a lambda, these implicit variables were not emitted before OpenMP privatization logic ran. OMPLoopScope assumes that loop-related variables are already present in LocalDeclMap and temporarily overrides their addresses. Since the range-for implicit variables had not yet been emitted, they were treated as newly introduced entries and later erased during restore(), leading to missing mappings and a crash during codegen. Fix this by emitting the range-for implicit variables before OpenMP privatization (setVarAddr/apply), ensuring that existing mappings are correctly overridden and restored. This fixes #146335	2026-03-03 20:55:17 +05:30
Wei Wang	9dde0a803b	[SampleProf][OMP] Handle OMP helper function name canonicalization (#178339 ) Fix an issue where `FunctionSamples::getCanonicalFnName` incorrectly canonicalizes omp helper functions to collide with the original function itself. This causes the sample loader to annotate the wrong functions. Canonicalization strips everything comes after the first dot (.), unless the function attribute "sample-profile-suffix-elision-policy" is set to "selected", in which case it only strips after the known suffixes. The helper function names have the suffixes like `.omp_outlined`. After canonicalization, the name becomes the same as the original function. Add the attribute to helper functions so that the suffixes are not stripped. This is the same fix applied previously to coroutine await suspend wrapper functions (#174881).	2026-01-30 11:43:10 -08:00
Jason-VanBeusekom	7a174c91f3	[OpenMP][clang] Register Vtables on device for indirect calls (#159856 )	2026-01-05 13:25:03 -05:00
Dave Bartolomeo	80887c7de9	[clang][NFC][diagnostics] Remove most usage of `getCustomDiagID()` from CodeGen (#172557 )	2025-12-21 08:17:19 -05:00
Roger Ferrer Ibáñez	6a5231e200	[clang][OpenMP][CodeGen] Use an else if instead of checking twice (#168776 ) These two classes are mutually exclusive so avoid doing the two checks when the first succeeded.	2025-11-21 14:59:07 +01:00
Walter J.T.V	cd4c5280c7	[Clang][OpenMP][LoopTransformations] Implement "#pragma omp fuse" loop transformation directive and "looprange" clause (#139293 ) This change implements the fuse directive, `#pragma omp fuse`, as specified in the OpenMP 6.0, along with the `looprange` clause in clang. This change also adds minimal stubs so flang keeps compiling (a full implementation in flang of this directive is still pending). --------- Co-authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>	2025-09-29 07:48:18 +02:00
Robert Imschweiler	814a3a6b61	[OpenMP][clang] Set num_threads 'strict' to unsupported on GPUs (#160659 ) Setting the prescriptiveness of the num_threads clause to 'strict' and having a corresponding check (with message and severity clauses) does not align well with how OpenMP should be handled for GPUs. The num_threads expression may be an arbitrary integer expression which is evaluated on the target, in correspondance to the OpenMP spec. This prevents the check from being done before launching the kernel, especially considering that the num_threads clause is associated with the parallel directive and that there may be multiple parallel directives with different num_threads clauses in a single target region. Acting on the result of the 'strict' check on the GPU would require doing I/O on the GPU, which can introduce performance regressions. Delaying any actions resulting from the 'strict' check and doing them on the host after executing the target region involves additional data copies and is not really semantically correct. For now, the 'strict' modifier for the num_threads clause and its associated message and severity clause are set to be unsupported on GPUs. Targets other than GPUs still support the aforementioned features in the context of an OpenMP target region.	2025-09-26 13:50:18 -05:00
Nick Sarnie	b6be44ad0d	[clang][OpenMP][SPIR-V] Fix addrspace of pointer kernel arguments (#157172 ) In SPIR-V, kernel arguments are not allowed to be in the Generic AS, in both Intel's internal SPIR-V offloading implementation as well as HIPSPV, `CrossWorkgroup` AS1 is used. Do the same for OMPSPV. Currently with Generic AS the `llvm-spirv` translator blows up if we are using it, and if not, the GPU runtime blows up. To get the existing logic to set the correct AS to kick in, we need to know if the function is a kernel or not at the time we first create the function that may end up as the kernel. I use the existing `arrangeSYCLKernelCallerDeclaration` function to do the right kernel ABI computation, but since the function is not specific to SYCL anymore because I merged all the device kernel clang attributes into one. Rename the function to be accurate to the current behavior, `arrangeDeviceKernelCallerDeclaration`. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-09-09 18:42:26 +00:00
Roger Ferrer Ibáñez	0833493603	[Clang][NFC] Rename OMPLoopTransformationDirective to OMPCanonicalLoopNestTransformationDirective (#155848 ) This is preparatory work for the implementation of `#pragma omp fuse` in https://github.com/llvm/llvm-project/pull/139293 Not all OpenMP loop transformations makes sense to make them inherit from `OMPLoopBasedDirective`, in particular in OpenMP 6.0 'fuse' (to be implemented later) is a transformation of a canonical loop sequence. This change renames class `OMPLoopTransformationDirective` to `OMPCanonicalLoopNestTransformationDirective` so we can reclaim that name in a later change.	2025-09-08 10:47:01 +02:00
Sirraide	e4a1b5f36e	[Clang] [C2y] Implement N3355 ‘Named Loops’ (#152870 ) This implements support for [named loops](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3355.htm) for C2y. When parsing a `LabelStmt`, we create the `LabeDecl` early before we parse the substatement; this label is then passed down to `ParseWhileStatement()` and friends, which then store it in the loop’s (or switch statement’s) `Scope`; when we encounter a `break/continue` statement, we perform a lookup for the label (and error if it doesn’t exist), and then walk the scope stack and check if there is a scope whose preceding label is the target label, which identifies the jump target. The feature is only supported in C2y mode, though a cc1-only option exists for testing (`-fnamed-loops`), which is mostly intended to try and make sure that we don’t have to refactor this entire implementation when/if we start supporting it in C++. --------- Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>	2025-09-02 16:37:19 +00:00
Robert Imschweiler	c94b5f0c0c	Reland: [OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (#155839 ) OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary codegen changes.	2025-08-28 21:00:15 +02:00
Robert Imschweiler	9d7e436d86	Revert "[OpenMP][clang] 6.0: num_threads strict (part 3: codegen)" (#155809 ) Reverts llvm/llvm-project#146405	2025-08-28 12:12:53 +02:00
Robert Imschweiler	baf9d2c35d	[OpenMP][clang] 6.0: num_threads strict (part 3: codegen) (#146405 ) OpenMP 6.0 12.1.2 specifies the behavior of the strict modifier for the num_threads clause on parallel directives, along with the message and severity clauses. This commit implements necessary codegen changes.	2025-08-28 08:52:27 +00:00
Matheus Izvekov	91cdd35008	[clang] Improve nested name specifier AST representation (#147835 ) This is a major change on how we represent nested name qualifications in the AST. * The nested name specifier itself and how it's stored is changed. The prefixes for types are handled within the type hierarchy, which makes canonicalization for them super cheap, no memory allocation required. Also translating a type into nested name specifier form becomes a no-op. An identifier is stored as a DependentNameType. The nested name specifier gains a lightweight handle class, to be used instead of passing around pointers, which is similar to what is implemented for TemplateName. There is still one free bit available, and this handle can be used within a PointerUnion and PointerIntPair, which should keep bit-packing aficionados happy. * The ElaboratedType node is removed, all type nodes in which it could previously apply to can now store the elaborated keyword and name qualifier, tail allocating when present. * TagTypes can now point to the exact declaration found when producing these, as opposed to the previous situation of there only existing one TagType per entity. This increases the amount of type sugar retained, and can have several applications, for example in tracking module ownership, and other tools which care about source file origins, such as IWYU. These TagTypes are lazily allocated, in order to limit the increase in AST size. This patch offers a great performance benefit. It greatly improves compilation time for [stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for `test_on2.cpp` in that project, which is the slowest compiling test, this patch improves `-c` compilation time by about 7.2%, with the `-fsyntax-only` improvement being at ~12%. This has great results on compile-time-tracker as well: ![image](https://github.com/user-attachments/assets/700dce98-2cab-4aa8-97d1-b038c0bee831) This patch also further enables other optimziations in the future, and will reduce the performance impact of template specialization resugaring when that lands. It has some other miscelaneous drive-by fixes. About the review: Yes the patch is huge, sorry about that. Part of the reason is that I started by the nested name specifier part, before the ElaboratedType part, but that had a huge performance downside, as ElaboratedType is a big performance hog. I didn't have the steam to go back and change the patch after the fact. There is also a lot of internal API changes, and it made sense to remove ElaboratedType in one go, versus removing it from one type at a time, as that would present much more churn to the users. Also, the nested name specifier having a different API avoids missing changes related to how prefixes work now, which could make existing code compile but not work. How to review: The important changes are all in `clang/include/clang/AST` and `clang/lib/AST`, with also important changes in `clang/lib/Sema/TreeTransform.h`. The rest and bulk of the changes are mostly consequences of the changes in API. PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just for easier to rebasing. I plan to rename it back after this lands. Fixes #136624 Fixes https://github.com/llvm/llvm-project/issues/43179 Fixes https://github.com/llvm/llvm-project/issues/68670 Fixes https://github.com/llvm/llvm-project/issues/92757	2025-08-09 05:06:53 -03:00
Kazu Hirata	bb080107e4	[CodeGen] Remove unnecessary casts (NFC) (#146463 ) Both of these functions return void.	2025-07-01 07:32:21 -07:00
CHANDRA GHALE	afbcf9529a	[OpenMP 6.0 ]Codegen for Reduction over private variables with reduction clause (#134709 ) Codegen support for reduction over private variable with reduction clause. Section 7.6.10 in in OpenMP 6.0 spec. - An internal shared copy is initialized with an initializer value. - The shared copy is updated by combining its value with the values from the private copies created by the clause. - Once an encountering thread verifies that all updates are complete, its original list item is updated by merging its value with that of the shared copy and then broadcast to all threads. Sample Test Case from OpenMP 6.0 Example ``` #include <assert.h> #include <omp.h> #define N 10 void do_red(int n, int *v, int &sum_v) { sum_v = 0; // sum_v is private #pragma omp for reduction(original(private),+: sum_v) for (int i = 0; i < n; i++) { sum_v += v[i]; } } int main(void) { int v[N]; for (int i = 0; i < N; i++) v[i] = i; #pragma omp parallel num_threads(4) { int s_v; // s_v is private do_red(N, v, s_v); assert(s_v == 45); } return 0; } ``` Expected Codegen: ``` // A shared global/static variable is introduced for the reduction result. // This variable is initialized (e.g., using memset or a UDR initializer) // e.g., .omp.reduction.internal_private_var // Barrier before any thread performs combination call void @__kmpc_barrier(...) // Initialization block (executed by thread 0) // e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...) call void @__kmpc_critical(...) // Inside critical section: // Load the current value from the shared variable // Load the thread-local private variable's value // Perform the reduction operation // Store the result back to the shared variable call void @__kmpc_end_critical(...) // Barrier after all threads complete their combinations call void @__kmpc_barrier(...) // Broadcast phase: // Load the final result from the shared variable) // Store the final result to the original private variable in each thread // Final barrier after broadcast call void @__kmpc_barrier(...) ``` --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2025-06-11 14:01:31 +05:30
Nikita Popov	e2b536431d	[CodeGen] Move CodeGenPGO behind unique_ptr (NFC) (#142155 ) The InstrProf headers are very expensive. Avoid including them in all of CodeGen/ by moving the CodeGenPGO member behind a unqiue_ptr. This reduces clang build time by 0.8%.	2025-06-02 09:51:54 +02:00
Devon Loehr	63de20c0de	Reland "Add macro to suppress -Wunnecessary-virtual-specifier" (#141091 ) This fixes #139614 on non-clang compilers by moving `__has_warning` completely inside the `#if defined(__clang__)` block. This prevents a parse failure from compilers which don't recognize `__has_warning`. Original description: Followup to #138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.	2025-05-28 12:15:22 +02:00
Philip Reames	e4e7a7e64e	Revert "Add macro to suppress -Wunnecessary-virtual-specifier (#139614 )" This reverts commit 0954c9d487e7cb30673df9f0ac125f71320d2936. It breaks the build when built with gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04).	2025-05-21 11:31:26 -07:00
Devon Loehr	0954c9d487	Add macro to suppress -Wunnecessary-virtual-specifier (#139614 ) Followup to #138741. This adds the requested macro to silence `-Wunnecessary-virtual-specifier` when declaring virtual anchor functions in `final` classes, per [LLVM policy](https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers). It also cleans up any remaining instances of the warning, allowing us to stop disabling it when we build LLVM.	2025-05-21 10:54:36 -07:00
Kazu Hirata	f002f300c5	[clang] Remove unused local variables (NFC) (#138453 )	2025-05-04 10:51:40 -07:00
Tom Eccles	7b70fc74d0	[mlir][OpenMP] Convert omp.cancel sections to LLVMIR (#137193 ) This is quite ugly but it is the best I could think of. The old FiniCBWrapper was way too brittle depending upon the exact block structure inside of the section, and could be confused by any control flow in the section (e.g. an if clause on cancel). The wording in the comment and variable names didn't seem to match where it was actually branching too as well. Clang's (non-OpenMPIRBuilder) lowering for cancel inside of sections branches to a block containing __kmpc_for_static_fini. This was hard to achieve here because sometimes the FiniCBWrapper has to run before the worksharing loop finalization has been crated. To get around this ordering issue I created a dummy branch to a dummy block, which is then fixed later once all of the information is available.	2025-04-29 17:19:40 +01:00
Nikita Popov	b384d6d6cc	[CodeGen] Don't include CGDebugInfo.h in CodeGenFunction.h (NFC) (#134100 ) This is an expensive header, only include it where needed. Move some functions out of line to achieve that. This reduces time to build clang by ~0.5% in terms of instructions retired.	2025-04-03 08:04:19 +02:00
Zahira Ammarguellat	cf69b4c668	[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#126927 ) This patch was reviewed and approved here: https://github.com/llvm/llvm-project/pull/119891 However it has been reverted here: `083df25dc2` due to a build issue here: https://lab.llvm.org/buildbot/#/builders/51/builds/10694 This patch is reintroducing the support.	2025-02-13 07:14:36 -05:00
Matt	a1826b4d26	[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering (#126172 ) A proposed fix for the issue #95611, [OpenMP][SIMD] ordered has no effect in a loop SIMD region as of LLVM 18.1.0 Changes: - Implement new lowering behavior: Conservatively serialize "omp simd" loops that have `omp simd ordered` directive to prevent incorrect vectorization (which results in incorrect execution behavior of the miscompiled program). Implementation outline: - We start with the optimistic default initial value of `LoopStack.setParallel(/Enable=/true);` in `CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`. - We only disable the loop parallel memory access assumption with `if (HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the `HasOrderedDirective` (which tests for the presence of an `OMPOrderedDirective`). - This results in no longer incorrectly vectorizing the loop when the `omp simd ordered` directive is present. Motivation: We'd like to prevent incorrect vectorization of the loops marked with the `#pragma omp ordered simd` directive which has previously resulted in miscompiled code. At the same time, we'd like the usage outside of the `#pragma omp ordered simd` context to remain unaffected: Note that in the test "clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the `!llvm.access.group` metadata in `foo_simd` alone. This is conservative, in that it's possible some of the loops would be possible to vectorize, but we prefer to avoid miscompilation of the loops that are currently illegal to vectorize. A concrete example follows: ```cpp // "test.c" #include <float.h> #include <math.h> #include <omp.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int compare_float(float x1, float x2, float scalar) { const float diff = fabsf(x1 - x2); x1 = fabsf(x1); x2 = fabsf(x2); const float l = (x2 > x1) ? x2 : x1; if (diff <= l * scalar * FLT_EPSILON) return 1; else return 0; } #define ARRAY_SIZE 256 __attribute__((noinline)) void initialization_loop( float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { const float max = 1000.0; srand(time(NULL)); for (int r = 0; r < ARRAY_SIZE; r++) { for (int c = 0; c < ARRAY_SIZE; c++) { X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max; Y[r][c] = X[r][c]; } } } __attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) { for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { #pragma omp simd for (int k = 2; k < ARRAY_SIZE; ++k) { #pragma omp ordered simd X[r][k] = X[r][k - 2] + sinf((float)(r / c)); } } } } __attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { int totalErrors_simd = 0; const float scalar = 1.0; for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { for (int k = 2; k < ARRAY_SIZE; ++k) { Y[r][k] = Y[r][k - 2] + sinf((float)(r / c)); } } // check row for simd update for (int k = 0; k < ARRAY_SIZE; ++k) { if (!compare_float(X[r][k], Y[r][k], scalar)) { ++totalErrors_simd; } } } return totalErrors_simd; } int main(void) { float X[ARRAY_SIZE][ARRAY_SIZE]; float Y[ARRAY_SIZE][ARRAY_SIZE]; initialization_loop(X, Y); omp_simd_loop(X); const int totalErrors_simd = comparison_loop(X, Y); if (totalErrors_simd) { fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd); fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n", __FILE__, __LINE__); } else { fprintf(stdout, "Success!\n"); } return totalErrors_simd; } ``` Before: ``` $ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test totalErrors_simd: 15408 test.c : 76 - FAIL: error in ordered simd computation. ``` clang 19.1.0: https://godbolt.org/z/6EvhxqEhe After: ``` $ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test Success! ``` Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>	2025-02-12 08:53:47 -05:00
Kazu Hirata	67e1e98811	Revert "[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 )" This reverts commit 070f84ebc89b11df616a83a56df9ac56efbab783. Buildbot failure: https://lab.llvm.org/buildbot/#/builders/51/builds/10694	2025-02-11 12:39:01 -08:00
Zahira Ammarguellat	070f84ebc8	[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 ) Implement basic parsing and semantic support for `#pragma omp stripe` constuct introduced in https://www.openmp.org/wp-content/uploads/[OpenMP-API-Specification-6-0.pdf](https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf), section 11.7.	2025-02-11 13:58:21 -05:00
Alexey Bataev	3041dd5c20	Revert "[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering" (#126079 ) Reverts llvm/llvm-project#123867 to fix the test failures https://lab.llvm.org/buildbot/#/builders/144/builds/17521	2025-02-06 10:04:11 -05:00
Matt	60d8e6f528	[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering (#123867 ) A proposed fix for #95611 [OpenMP][SIMD] ordered has no effect in a loop SIMD region as of LLVM 18.1.0 Changes: - Implement new lowering behavior: Conservatively serialize "omp simd" loops that have `omp simd ordered` directive to prevent incorrect vectorization (which results in incorrect execution behavior of the miscompiled program). Implementation outline: - We start with the optimistic default initial value of `LoopStack.setParallel(/Enable=/true);` in `CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`. - We only disable the loop parallel memory access assumption with `if (HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the `HasOrderedDirective` (which tests for the presence of an `OMPOrderedDirective`). - This results in no longer incorrectly vectorizing the loop when the `omp simd ordered` directive is present. Motivation: We'd like to prevent incorrect vectorization of the loops marked with the `#pragma omp ordered simd` directive which has previously resulted in miscompiled code. At the same time, we'd like the usage outside of the `#pragma omp ordered simd` context to remain unaffected: Note that in the test "clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the `!llvm.access.group` metadata in `foo_simd` alone. This is conservative, in that it's possible some of the loops would be possible to vectorize, but we prefer to avoid miscompilation of the loops that are currently illegal to vectorize. A concrete example follows: ```cpp // "test.c" #include <float.h> #include <math.h> #include <omp.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int compare_float(float x1, float x2, float scalar) { const float diff = fabsf(x1 - x2); x1 = fabsf(x1); x2 = fabsf(x2); const float l = (x2 > x1) ? x2 : x1; if (diff <= l * scalar * FLT_EPSILON) return 1; else return 0; } #define ARRAY_SIZE 256 __attribute__((noinline)) void initialization_loop( float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { const float max = 1000.0; srand(time(NULL)); for (int r = 0; r < ARRAY_SIZE; r++) { for (int c = 0; c < ARRAY_SIZE; c++) { X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max; Y[r][c] = X[r][c]; } } } __attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) { for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { #pragma omp simd for (int k = 2; k < ARRAY_SIZE; ++k) { #pragma omp ordered simd X[r][k] = X[r][k - 2] + sinf((float)(r / c)); } } } } __attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { int totalErrors_simd = 0; const float scalar = 1.0; for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { for (int k = 2; k < ARRAY_SIZE; ++k) { Y[r][k] = Y[r][k - 2] + sinf((float)(r / c)); } } // check row for simd update for (int k = 0; k < ARRAY_SIZE; ++k) { if (!compare_float(X[r][k], Y[r][k], scalar)) { ++totalErrors_simd; } } } return totalErrors_simd; } int main(void) { float X[ARRAY_SIZE][ARRAY_SIZE]; float Y[ARRAY_SIZE][ARRAY_SIZE]; initialization_loop(X, Y); omp_simd_loop(X); const int totalErrors_simd = comparison_loop(X, Y); if (totalErrors_simd) { fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd); fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n", __FILE__, __LINE__); } else { fprintf(stdout, "Success!\n"); } return totalErrors_simd; } ``` Before: ``` $ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test totalErrors_simd: 15408 test.c : 76 - FAIL: error in ordered simd computation. ``` clang 19.1.0: https://godbolt.org/z/6EvhxqEhe After: ``` $ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test Success! ``` Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>	2025-02-06 09:44:11 -05:00
CHANDRA GHALE	30f9a4f754	[OpenMP] codegen support for masked combined construct parallel masked taskloop simd. (#121746 ) Added codegen support for combined masked constructs `Parallel masked taskloop simd`. Added implementation for `EmitOMPParallelMaskedTaskLoopSimdDirective`. Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2025-01-14 18:26:46 +05:30
CHANDRA GHALE	6f558e0e12	[OpenMP] codegen support for masked combined construct masked taskloop (#121914 ) Added codegen support for combined masked constructs `masked taskloop.` Added implementation for `EmitOMPMaskedTaskLoopDirective`. --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2025-01-13 11:42:13 +05:30
CHANDRA GHALE	1d2eea962a	[OpenMP] codegen support for masked combined construct masked taskloop simd (#121916 ) Added codegen support for combined masked constructs `masked taskloop simd`. Added implementation for `EmitOMPMaskedTaskLoopSimdDirective`. Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2025-01-12 23:38:00 +05:30
CHANDRA GHALE	aedb30fdc7	[OpenMP] codegen support for masked combined construct parallel masked taskloop (#121741 ) Added codegen support for combined masked constructs Parallel masked taskloop. Added implementation for EmitOMPParallelMaskedTaskLoopDirective. --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2025-01-09 16:38:36 +05:30
Sergio Afonso	b79ed8729b	[OpenMP][OMPIRBuilder] Handle non-failing calls properly (#115863 ) The preprocessor definition used to enable asserts and the one that `llvm::Error` and `llvm::Expected` use to ensure all created instances are checked are not the same. By making these checks inside of an `assert` in cases where errors are not expected, certain build configurations would trigger runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF -DLLVM_UNREACHABLE_OPTIMIZE=ON`). The `llvm::cantFail()` function, which was intended for this use case, is used by this patch in place of `assert` to prevent these runtime failures. In tests, new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and `EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release builds.	2025-01-09 10:28:16 +00:00
CHANDRA GHALE	76e6c8d3fc	Codegen changes for strict modifier with grainsize/num_tasks of taskloop construct (#117196 ) Initial parsing/sema for 'strict' modifier with 'num_tasks' and ‘grainsize’ clause is present in these commits [grainsize_parsing](`ab9eac762c`) and [num_tasks_parsing](`56c1660170 (diff-4184486638e85284c3a2c961a81e7752231022daf97e411007c13a6732b50db9R6545)`) . However, this implementation appears incomplete as it lacks code generation support. A runtime patch was introduced in this runtime commit [runtime_patch](`540007b427 (diff-5e95f9319910d6965d09c301359dbe6b23f3eef5ce4d262ef2c2d2137875b5c4R374)`) , which adds a new API, _kmpc_taskloop_5, to accommodate the strict modifier. In this patch I have added codegen support. When the strict modifier is present alongside the grainsize or num_tasks clauses of taskloop construct, the code now emits a call to _kmpc_taskloop_5, which includes an additional parameter of type i32 with the value 1 to indicate the strict modifier. If the strict modifier is not present, it falls back to the existing _kmpc_taskloop API call. --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2024-11-28 14:18:59 +05:30
CHANDRA GHALE	44a6b3a4b6	Fix for codegen Crash in Clang when using locator omp_all_memory with depobj construct (#114221 ) A codegen crash is occurring when a depend object was initialized with omp_all_memory in the depobj directive. https://github.com/llvm/llvm-project/issues/114214(url) The root cause of issue looks to be the improper handling of the dependency list when omp_all_memory was specified. The change introduces the use of OMPTaskDataTy to manage dependencies. The buildDependences function is called to construct the dependency list, and the list is iterated over to emit and store the dependencies. Reduced Test Case : ``` #include <omp.h> int main() { omp_depend_t obj; #pragma omp depobj(obj) depend(inout: omp_all_memory) } ``` ``` #1 0x0000000003de6623 SignalHandler(int) Signals.cpp:0:0 #2 0x00007f8e4a6b990f (/lib64/libpthread.so.0+0x1690f) #3 0x00007f8e4a117d2a raise (/lib64/libc.so.6+0x4ad2a) #4 0x00007f8e4a1193e4 abort (/lib64/libc.so.6+0x4c3e4) #5 0x00007f8e4a10fc69 __assert_fail_base (/lib64/libc.so.6+0x42c69) #6 0x00007f8e4a10fcf1 __assert_fail (/lib64/libc.so.6+0x42cf1) #7 0x0000000004114367 clang::CodeGen::CodeGenFunction::EmitOMPDepobjDirective(clang::OMPDepobjDirective const&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4114367) #8 0x00000000040f8fac clang::CodeGen::CodeGenFunction::EmitStmt(clang::Stmt const, llvm::ArrayRef<clang::Attr const>) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x40f8fac) #9 0x00000000040ff4fb clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x40ff4fb) #10 0x00000000041847b2 clang::CodeGen::CodeGenFunction::EmitFunctionBody(clang::Stmt const) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41847b2) #11 0x0000000004199e4a clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function, clang::CodeGen::CGFunctionInfo const&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4199e4a) #12 0x00000000041f7b9d clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41f7b9d) #13 0x00000000041f16a3 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41f16a3) #14 0x00000000041fd954 clang::CodeGen::CodeGenModule::EmitDeferred() (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41fd954) #15 0x0000000004200277 clang::CodeGen::CodeGenModule::Release() (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4200277) #16 0x00000000046b6a49 (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit(clang::ASTContext&) ModuleBuilder.cpp:0:0 #17 0x00000000046b4cb6 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x46b4cb6) #18 0x0000000006204d5c clang::ParseAST(clang::Sema&, bool, bool) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x6204d5c) #19 0x000000000496b278 clang::FrontendAction::Execute() (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x496b278) #20 0x00000000048dd074 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x48dd074) #21 0x0000000004a38092 clang::ExecuteCompilerInvocation(clang::CompilerInstance) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4a38092) #22 0x0000000000fd4e9c cc1_main(llvm::ArrayRef<char const>, char const, void) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0xfd4e9c) #23 0x0000000000fcca73 ExecuteCC1Tool(llvm::SmallVectorImpl<char const>&, llvm::ToolContext const&) driver.cpp:0:0 #24 0x0000000000fd140c clang_main(int, char*, llvm::ToolContext const&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0xfd140c) #25 0x0000000000ee2ef3 main (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0xee2ef3) #26 0x00007f8e4a10224c __libc_start_main (/lib64/libc.so.6+0x3524c) #27 0x0000000000fcaae9 _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120:0 clang: error: unable to execute command: Aborted ``` --------- Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>	2024-11-11 14:34:16 +05:30
Sergio Afonso	d87964de78	[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533 ) This patch implements an approach to communicate errors between the OMPIRBuilder and its users. It introduces `llvm::Error` and `llvm::Expected` objects to replace the values returned by callbacks passed to `OMPIRBuilder` codegen functions. These functions then check the result for errors when callbacks are called and forward them back to the caller, which has the flexibility to recover, exit cleanly or dump a stack trace. This prevents a failed callback to leave the IR in an invalid state and still continue the codegen process, triggering unrelated assertions or segmentation faults. In the case of MLIR to LLVM IR translation of the 'omp' dialect, this change results in the compiler emitting errors and exiting early instead of triggering a crash for not-yet-implemented errors. The behavior in Clang and openmp-opt stays unchanged, since callbacks will continue always returning 'success'.	2024-10-25 11:30:16 +01:00
Jay Foad	4dd55c567a	[clang] Use {} instead of std::nullopt to initialize empty ArrayRef (#109399 ) Follow up to #109133.	2024-10-24 10:23:40 +01:00
Congcong Cai	eca5949031	[codegen][NFC] add static mark for internal usage variable and function (#109431 ) Detect by clang-tidy misc-use-internal-linkage	2024-09-24 07:25:07 +08:00
David Pagan	d7c69c20a7	[clang][OpenMP] Add codegen for scope directive (#109197 ) Added codegen for scope directive, enabled allocate and firstprivate clauses, and added scope directive LIT test. Testing - LIT tests (including new scope test). - OpenMP scope example test from 5.2 OpenMP API examples document. - Three executable scope tests from OpenMP_VV/sollve_vv suite.	2024-09-19 13:17:24 -07:00
Shilei Tian	1c269929d0	[Clang][Sema][OpenMP] Allow `thread_limit` to accept multiple expressions (#102715 )	2024-08-10 09:54:58 -04:00
Shilei Tian	cee594cf36	[Clang][Sema][OpenMP] Allow `num_teams` to accept multiple expressions (#99732 ) By the OpenMP standard, `num_teams` clause can only accept one expression (for now). In this patch, we extend it to allow to accept multiple expressions when it is used with `target teams ompx_bare` construct. This will allow to launch a multi-dim grid, same as CUDA/HIP.	2024-08-06 10:55:15 -04:00
Julian Brown	a42e515e3a	[OpenMP] OpenMP 5.1 "assume" directive parsing support (#92731 ) This is a minimal patch to support parsing for "omp assume" directives. These are meant to be hints to a compiler's optimisers: as such, it is legitimate (if not very useful) to ignore them. The patch builds on top of the existing support for "omp assumes" directives (note spelling!). Unlike the "omp [begin/end] assumes" directives, "omp assume" is associated with a compound statement, i.e. it can appear within a function. The "holds" assumption could (theoretically) be mapped onto the existing builtin "__builtin_assume", though the latter applies to a single point in the program, and the former to a range (i.e. the whole of the associated compound statement). This patch fixes sollve's OpenMP 5.1 "omp assume"-based tests.	2024-08-05 07:37:07 -04:00
Krzysztof Parzyszek	243b27f7e1	[clang][OpenMP] Rename `varlists` to `varlist`, NFC (#101058 ) It returns a range of variables (via Expr*), not a range of lists.	2024-07-30 08:11:09 -05:00
Matt Arsenault	e108853ac8	clang: Allow targets to set custom metadata on atomics (#96906 ) Use this to replace the emission of the amdgpu-unsafe-fp-atomics attribute in favor of per-instruction metadata. In the future new fine grained controls should be introduced that also cover the integer cases. Add a wrapper around CreateAtomicRMW that appends the metadata, and update a few use contexts to use it.	2024-07-26 09:57:28 +04:00
Johannes Doerfert	3c8efd7928	[OpenMP] Ensure the actual kernel is annotated with launch bounds (#99927 ) In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function.	2024-07-23 09:02:47 -07:00
Krzysztof Parzyszek	c74730070a	[clang][OpenMP] Move "loop" directive mapping from sema to codegen (#99905 ) Given "loop" construct, clang will try to treat it as "for", "distribute" or "simd", depending on either the implied binding, or the bind clause if present. This patch moves the code that performs this construct remapping from sema to codegen. For a "loop" construct without a bind clause, this patch will create an implicit bind clause based on implied binding to simplify further analysis. During codegen the function `EmitOMPGenericLoopDirective` (i.e. "loop") will invoke the "emit" functions for "for", "distribute" or "simd", depending on the bind clause. --------- Co-authored-by: Alexey Bataev <a.bataev@gmx.com>	2024-07-23 07:31:42 -05:00
Michael Kruse	5c93a94f5a	[Clang][OpenMP] Add interchange directive (#93022 ) Add the interchange directive which will be introduced in the upcoming OpenMP 6.0 specification. A preview has been published in [Technical Report 12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf).	2024-07-19 09:24:40 +02:00
Michael Kruse	80865c01e1	[Clang][OpenMP] Add reverse directive (#92916 ) Add the reverse directive which will be introduced in the upcoming OpenMP 6.0 specification. A preview has been published in [Technical Report 12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf). --------- Co-authored-by: Alexey Bataev <a.bataev@outlook.com>	2024-07-18 10:35:32 +02:00

1 2 3 4 5 ...

804 Commits