This adds support for using `ATTACH` map-type for proper pointer-attachment when mapping list-items that have base-pointers. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps are now emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Previously, the two possible maps emitted by clang were: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, both of which are incorrect. ----- With this change, we are using ATTACH-style maps, like `(A)`, for cases where the expression has a base-pointer. For example: ```cpp int *p, **pp; S *ps, **pps; ... map(p[0]) ... map(p[10:20]) ... map(*p) ... map(([20])p) ... map(ps->a) ... map(pps->p->a) ... map(pp[0][0]) ... map(*(pp + 10)[0]) ``` #### Grouping of maps based on attach base-pointers We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like: ``` S **spp; map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0] map(spp[0]), // attach-ptr: spp map(spp), // attach-ptr: N/A ``` We first map `spp`, then `spp[0]` then `spp[0][0]` and `spp[0][0].a`. This allows us to also group "struct" allocation based on their attach pointers. This resolves the issues of us always mapping everything from the beginning of the symbol `spp`. Each group is mapped independently, and at the same level, like `spp[0][0]` and its member `spp[0][0].a`, we still get map them together as part of the same contiguous struct `spp[0][0]`. This resolves issue #141042. #### use_device_ptr/addr fixes The handling of `use_device_ptr/addr` was updated to use the attach-ptr information, and works for many cases that were failing before. It has to be done as part of this series because otherwise, the switch from ptr_to_obj to attach-style mapping would have caused regressions in existing use_device_ptr/addr tests. #### Handling of attach-pointers that are members of implicitly mapped structs: * When a struct member-pointer, like `p` below, is a base-pointer in a `map` clause on a target construct (like `map(p[0:1])`, and the base of that struct is either the `this` pointer (implicitly or explicitly), or a struct that is implicitly mapped on that construct, we add an implicit `map(p)` so that we don't implicitly map the full struct. ```c struct S { int *p; void f1() { #pragma omp target map(p[0:1]) // Implicitly map this->p, to ensure // that the implicit map of `this[:]` does // not map the full struct printf("%p %p\n", &p, p); } ``` #### Scope for improvement: * We may be able to compute attach-ptr expr while collecting component-lists in Sema. * But we cache the computation results already, and `findAttachPtrExpr` is fairly simple, and fast. * There may be a better way to implement semantic expr comparison. #### Needs future work: * Attach-style maps not yet emitted for declare mappers. * Mapping of class member references: We are still using PTR_AND_OBJ maps for them. We will likely need to change that to handle `ref_ptr/ref_ptee`, and `attach` map-type-modifier on them. * Implicit capturing of "this" needs to map the full `this[0:1]` unless there is an explicit map on one of the members, or a map with a member as its base-pointer. * Implicit map added for capturing a class member pointer needs to also add a zero-length-array-section map. * `use_device_addr` on array-sections-on-pointers need further improvements (documented using FIXMEs) #### Why a large PR While it's unfortunate that this PR has gotten large and difficult to review, the issue is that all the functional changes have to be made together, to prevent regressions from partially implemented changes. For example, the changes to capturing were previously done separately (#145454), but they would still cause stability issues in absence of full attach-mapping. And attach-mapping needs those changes to be able to launch kernels. We extracted the utilities and functions, like those for finding attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't call those functions, just adds them (#155625). Maybe the change that adds a new error message for use_device_addr on array-sections with non-var base-pointers could have been extracted out too (but that would have had to be a follow-up change in that case, and we would get comp-fails with this PR when the erroneous case was not caught/diagnosed). --------- Co-authored-by: Alex Duran <alejandro.duran@intel.com>
602 lines
41 KiB
C++
602 lines
41 KiB
C++
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-globals --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+" "reduction_size[.].+[.]" "pl_cond[.].+[.|,]" --prefix-filecheck-ir-name _ --global-value-regex "\.offload_.*" --version 2
|
|
// Test host codegen.
|
|
// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=powerpc64le-ibm-linux-gnu -emit-llvm %s -o - | FileCheck %s
|
|
// expected-no-diagnostics
|
|
|
|
int kernel_within_loop(int *a, int *b, int N, int num_iters) {
|
|
int i;
|
|
for (i = 0; i < num_iters; ++i) {
|
|
#pragma omp target parallel for map(a[0:N]) map(b[0:N])
|
|
for (int j = 0; j< N; j++)
|
|
a[j] = b[j];
|
|
|
|
#pragma omp target teams distribute parallel for map(a[0:N]) map(b[0:N])
|
|
for (int j = 0; j< N; j+=3)
|
|
a[j] = b[j] * 2;
|
|
}
|
|
return a[N-1];
|
|
}
|
|
//.
|
|
// CHECK: @.offload_sizes = private unnamed_addr constant [5 x i64] [i64 4, i64 0, i64 8, i64 0, i64 8]
|
|
// CHECK: @.offload_maptypes = private unnamed_addr constant [5 x i64] [i64 800, i64 35, i64 16384, i64 35, i64 16384]
|
|
// CHECK: @.offload_sizes.1 = private unnamed_addr constant [5 x i64] [i64 4, i64 0, i64 8, i64 0, i64 8]
|
|
// CHECK: @.offload_maptypes.2 = private unnamed_addr constant [5 x i64] [i64 800, i64 35, i64 16384, i64 35, i64 16384]
|
|
//.
|
|
// CHECK-LABEL: define dso_local noundef signext i32 @_Z18kernel_within_loopPiS_ii
|
|
// CHECK-SAME: (ptr noundef [[A:%.*]], ptr noundef [[B:%.*]], i32 noundef signext [[N:%.*]], i32 noundef signext [[NUM_ITERS:%.*]]) #[[ATTR0:[0-9]+]] {
|
|
// CHECK-NEXT: entry:
|
|
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[B_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[N_ADDR:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[NUM_ITERS_ADDR:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[I:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[N_CASTED:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: [[DOTOFFLOAD_BASEPTRS:%.*]] = alloca [5 x ptr], align 8
|
|
// CHECK-NEXT: [[DOTOFFLOAD_PTRS:%.*]] = alloca [5 x ptr], align 8
|
|
// CHECK-NEXT: [[DOTOFFLOAD_MAPPERS:%.*]] = alloca [5 x ptr], align 8
|
|
// CHECK-NEXT: [[DOTOFFLOAD_SIZES:%.*]] = alloca [5 x i64], align 8
|
|
// CHECK-NEXT: [[KERNEL_ARGS:%.*]] = alloca [[STRUCT___TGT_KERNEL_ARGUMENTS:%.*]], align 8
|
|
// CHECK-NEXT: [[N_CASTED3:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: [[DOTOFFLOAD_BASEPTRS8:%.*]] = alloca [5 x ptr], align 8
|
|
// CHECK-NEXT: [[DOTOFFLOAD_PTRS9:%.*]] = alloca [5 x ptr], align 8
|
|
// CHECK-NEXT: [[DOTOFFLOAD_MAPPERS10:%.*]] = alloca [5 x ptr], align 8
|
|
// CHECK-NEXT: [[DOTOFFLOAD_SIZES11:%.*]] = alloca [5 x i64], align 8
|
|
// CHECK-NEXT: [[TMP:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTCAPTURE_EXPR_12:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[KERNEL_ARGS14:%.*]] = alloca [[STRUCT___TGT_KERNEL_ARGUMENTS]], align 8
|
|
// CHECK-NEXT: store ptr [[A]], ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[B]], ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: store i32 [[N]], ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[NUM_ITERS]], ptr [[NUM_ITERS_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 0, ptr [[I]], align 4
|
|
// CHECK-NEXT: br label [[FOR_COND:%.*]]
|
|
// CHECK: for.cond:
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[I]], align 4
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[NUM_ITERS_ADDR]], align 4
|
|
// CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP0]], [[TMP1]]
|
|
// CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_END:%.*]]
|
|
// CHECK: for.body:
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP2]], ptr [[N_CASTED]], align 4
|
|
// CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[N_CASTED]], align 8
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP5:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP7:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[TMP7]], i64 0
|
|
// CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: [[CONV:%.*]] = sext i32 [[TMP8]] to i64
|
|
// CHECK-NEXT: [[TMP9:%.*]] = mul nuw i64 [[CONV]], 4
|
|
// CHECK-NEXT: [[TMP10:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP11:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds nuw i32, ptr [[TMP11]], i64 0
|
|
// CHECK-NEXT: [[TMP12:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: [[CONV2:%.*]] = sext i32 [[TMP12]] to i64
|
|
// CHECK-NEXT: [[TMP13:%.*]] = mul nuw i64 [[CONV2]], 4
|
|
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 8 [[DOTOFFLOAD_SIZES]], ptr align 8 @.offload_sizes, i64 40, i1 false)
|
|
// CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
|
|
// CHECK-NEXT: store i64 [[TMP3]], ptr [[TMP14]], align 8
|
|
// CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
|
|
// CHECK-NEXT: store i64 [[TMP3]], ptr [[TMP15]], align 8
|
|
// CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS]], i64 0, i64 0
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP16]], align 8
|
|
// CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 1
|
|
// CHECK-NEXT: store ptr [[TMP6]], ptr [[TMP17]], align 8
|
|
// CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 1
|
|
// CHECK-NEXT: store ptr [[ARRAYIDX]], ptr [[TMP18]], align 8
|
|
// CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds [5 x i64], ptr [[DOTOFFLOAD_SIZES]], i32 0, i32 1
|
|
// CHECK-NEXT: store i64 [[TMP9]], ptr [[TMP19]], align 8
|
|
// CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS]], i64 0, i64 1
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP20]], align 8
|
|
// CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 2
|
|
// CHECK-NEXT: store ptr [[A_ADDR]], ptr [[TMP21]], align 8
|
|
// CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 2
|
|
// CHECK-NEXT: store ptr [[ARRAYIDX]], ptr [[TMP22]], align 8
|
|
// CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS]], i64 0, i64 2
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP23]], align 8
|
|
// CHECK-NEXT: [[TMP24:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 3
|
|
// CHECK-NEXT: store ptr [[TMP10]], ptr [[TMP24]], align 8
|
|
// CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 3
|
|
// CHECK-NEXT: store ptr [[ARRAYIDX1]], ptr [[TMP25]], align 8
|
|
// CHECK-NEXT: [[TMP26:%.*]] = getelementptr inbounds [5 x i64], ptr [[DOTOFFLOAD_SIZES]], i32 0, i32 3
|
|
// CHECK-NEXT: store i64 [[TMP13]], ptr [[TMP26]], align 8
|
|
// CHECK-NEXT: [[TMP27:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS]], i64 0, i64 3
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP27]], align 8
|
|
// CHECK-NEXT: [[TMP28:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 4
|
|
// CHECK-NEXT: store ptr [[B_ADDR]], ptr [[TMP28]], align 8
|
|
// CHECK-NEXT: [[TMP29:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 4
|
|
// CHECK-NEXT: store ptr [[ARRAYIDX1]], ptr [[TMP29]], align 8
|
|
// CHECK-NEXT: [[TMP30:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS]], i64 0, i64 4
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP30]], align 8
|
|
// CHECK-NEXT: [[TMP31:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
|
|
// CHECK-NEXT: [[TMP32:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
|
|
// CHECK-NEXT: [[TMP33:%.*]] = getelementptr inbounds [5 x i64], ptr [[DOTOFFLOAD_SIZES]], i32 0, i32 0
|
|
// CHECK-NEXT: [[TMP34:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
|
|
// CHECK-NEXT: store i32 3, ptr [[TMP34]], align 4
|
|
// CHECK-NEXT: [[TMP35:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
|
|
// CHECK-NEXT: store i32 5, ptr [[TMP35]], align 4
|
|
// CHECK-NEXT: [[TMP36:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
|
|
// CHECK-NEXT: store ptr [[TMP31]], ptr [[TMP36]], align 8
|
|
// CHECK-NEXT: [[TMP37:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 3
|
|
// CHECK-NEXT: store ptr [[TMP32]], ptr [[TMP37]], align 8
|
|
// CHECK-NEXT: [[TMP38:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 4
|
|
// CHECK-NEXT: store ptr [[TMP33]], ptr [[TMP38]], align 8
|
|
// CHECK-NEXT: [[TMP39:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 5
|
|
// CHECK-NEXT: store ptr @.offload_maptypes, ptr [[TMP39]], align 8
|
|
// CHECK-NEXT: [[TMP40:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 6
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP40]], align 8
|
|
// CHECK-NEXT: [[TMP41:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 7
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP41]], align 8
|
|
// CHECK-NEXT: [[TMP42:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 8
|
|
// CHECK-NEXT: store i64 0, ptr [[TMP42]], align 8
|
|
// CHECK-NEXT: [[TMP43:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 9
|
|
// CHECK-NEXT: store i64 0, ptr [[TMP43]], align 8
|
|
// CHECK-NEXT: [[TMP44:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 10
|
|
// CHECK-NEXT: store [3 x i32] [i32 1, i32 0, i32 0], ptr [[TMP44]], align 4
|
|
// CHECK-NEXT: [[TMP45:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 11
|
|
// CHECK-NEXT: store [3 x i32] zeroinitializer, ptr [[TMP45]], align 4
|
|
// CHECK-NEXT: [[TMP46:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 12
|
|
// CHECK-NEXT: store i32 0, ptr [[TMP46]], align 4
|
|
// CHECK-NEXT: [[TMP47:%.*]] = call i32 @__tgt_target_kernel(ptr @[[GLOB2:[0-9]+]], i64 -1, i32 1, i32 0, ptr @.{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l9.region_id, ptr [[KERNEL_ARGS]])
|
|
// CHECK-NEXT: [[TMP48:%.*]] = icmp ne i32 [[TMP47]], 0
|
|
// CHECK-NEXT: br i1 [[TMP48]], label [[OMP_OFFLOAD_FAILED:%.*]], label [[OMP_OFFLOAD_CONT:%.*]]
|
|
// CHECK: omp_offload.failed:
|
|
// CHECK-NEXT: call void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l9(i64 [[TMP3]], ptr [[TMP4]], ptr [[TMP5]]) #[[ATTR2:[0-9]+]]
|
|
// CHECK-NEXT: br label [[OMP_OFFLOAD_CONT]]
|
|
// CHECK: omp_offload.cont:
|
|
// CHECK-NEXT: [[TMP49:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP49]], ptr [[N_CASTED3]], align 4
|
|
// CHECK-NEXT: [[TMP50:%.*]] = load i64, ptr [[N_CASTED3]], align 8
|
|
// CHECK-NEXT: [[TMP51:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP52:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP53:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP54:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds nuw i32, ptr [[TMP54]], i64 0
|
|
// CHECK-NEXT: [[TMP55:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: [[CONV5:%.*]] = sext i32 [[TMP55]] to i64
|
|
// CHECK-NEXT: [[TMP56:%.*]] = mul nuw i64 [[CONV5]], 4
|
|
// CHECK-NEXT: [[TMP57:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP58:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[ARRAYIDX6:%.*]] = getelementptr inbounds nuw i32, ptr [[TMP58]], i64 0
|
|
// CHECK-NEXT: [[TMP59:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: [[CONV7:%.*]] = sext i32 [[TMP59]] to i64
|
|
// CHECK-NEXT: [[TMP60:%.*]] = mul nuw i64 [[CONV7]], 4
|
|
// CHECK-NEXT: call void @llvm.memcpy.p0.p0.i64(ptr align 8 [[DOTOFFLOAD_SIZES11]], ptr align 8 @.offload_sizes.1, i64 40, i1 false)
|
|
// CHECK-NEXT: [[TMP61:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS8]], i32 0, i32 0
|
|
// CHECK-NEXT: store i64 [[TMP50]], ptr [[TMP61]], align 8
|
|
// CHECK-NEXT: [[TMP62:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS9]], i32 0, i32 0
|
|
// CHECK-NEXT: store i64 [[TMP50]], ptr [[TMP62]], align 8
|
|
// CHECK-NEXT: [[TMP63:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS10]], i64 0, i64 0
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP63]], align 8
|
|
// CHECK-NEXT: [[TMP64:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS8]], i32 0, i32 1
|
|
// CHECK-NEXT: store ptr [[TMP53]], ptr [[TMP64]], align 8
|
|
// CHECK-NEXT: [[TMP65:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS9]], i32 0, i32 1
|
|
// CHECK-NEXT: store ptr [[ARRAYIDX4]], ptr [[TMP65]], align 8
|
|
// CHECK-NEXT: [[TMP66:%.*]] = getelementptr inbounds [5 x i64], ptr [[DOTOFFLOAD_SIZES11]], i32 0, i32 1
|
|
// CHECK-NEXT: store i64 [[TMP56]], ptr [[TMP66]], align 8
|
|
// CHECK-NEXT: [[TMP67:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS10]], i64 0, i64 1
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP67]], align 8
|
|
// CHECK-NEXT: [[TMP68:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS8]], i32 0, i32 2
|
|
// CHECK-NEXT: store ptr [[A_ADDR]], ptr [[TMP68]], align 8
|
|
// CHECK-NEXT: [[TMP69:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS9]], i32 0, i32 2
|
|
// CHECK-NEXT: store ptr [[ARRAYIDX4]], ptr [[TMP69]], align 8
|
|
// CHECK-NEXT: [[TMP70:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS10]], i64 0, i64 2
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP70]], align 8
|
|
// CHECK-NEXT: [[TMP71:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS8]], i32 0, i32 3
|
|
// CHECK-NEXT: store ptr [[TMP57]], ptr [[TMP71]], align 8
|
|
// CHECK-NEXT: [[TMP72:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS9]], i32 0, i32 3
|
|
// CHECK-NEXT: store ptr [[ARRAYIDX6]], ptr [[TMP72]], align 8
|
|
// CHECK-NEXT: [[TMP73:%.*]] = getelementptr inbounds [5 x i64], ptr [[DOTOFFLOAD_SIZES11]], i32 0, i32 3
|
|
// CHECK-NEXT: store i64 [[TMP60]], ptr [[TMP73]], align 8
|
|
// CHECK-NEXT: [[TMP74:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS10]], i64 0, i64 3
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP74]], align 8
|
|
// CHECK-NEXT: [[TMP75:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS8]], i32 0, i32 4
|
|
// CHECK-NEXT: store ptr [[B_ADDR]], ptr [[TMP75]], align 8
|
|
// CHECK-NEXT: [[TMP76:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS9]], i32 0, i32 4
|
|
// CHECK-NEXT: store ptr [[ARRAYIDX6]], ptr [[TMP76]], align 8
|
|
// CHECK-NEXT: [[TMP77:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_MAPPERS10]], i64 0, i64 4
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP77]], align 8
|
|
// CHECK-NEXT: [[TMP78:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS8]], i32 0, i32 0
|
|
// CHECK-NEXT: [[TMP79:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS9]], i32 0, i32 0
|
|
// CHECK-NEXT: [[TMP80:%.*]] = getelementptr inbounds [5 x i64], ptr [[DOTOFFLOAD_SIZES11]], i32 0, i32 0
|
|
// CHECK-NEXT: [[TMP81:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP81]], ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[TMP82:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[SUB:%.*]] = sub i32 [[TMP82]], -2
|
|
// CHECK-NEXT: [[DIV:%.*]] = udiv i32 [[SUB]], 3
|
|
// CHECK-NEXT: [[SUB13:%.*]] = sub i32 [[DIV]], 1
|
|
// CHECK-NEXT: store i32 [[SUB13]], ptr [[DOTCAPTURE_EXPR_12]], align 4
|
|
// CHECK-NEXT: [[TMP83:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_12]], align 4
|
|
// CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP83]], 1
|
|
// CHECK-NEXT: [[TMP84:%.*]] = zext i32 [[ADD]] to i64
|
|
// CHECK-NEXT: [[TMP85:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 0
|
|
// CHECK-NEXT: store i32 3, ptr [[TMP85]], align 4
|
|
// CHECK-NEXT: [[TMP86:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 1
|
|
// CHECK-NEXT: store i32 5, ptr [[TMP86]], align 4
|
|
// CHECK-NEXT: [[TMP87:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 2
|
|
// CHECK-NEXT: store ptr [[TMP78]], ptr [[TMP87]], align 8
|
|
// CHECK-NEXT: [[TMP88:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 3
|
|
// CHECK-NEXT: store ptr [[TMP79]], ptr [[TMP88]], align 8
|
|
// CHECK-NEXT: [[TMP89:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 4
|
|
// CHECK-NEXT: store ptr [[TMP80]], ptr [[TMP89]], align 8
|
|
// CHECK-NEXT: [[TMP90:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 5
|
|
// CHECK-NEXT: store ptr @.offload_maptypes.2, ptr [[TMP90]], align 8
|
|
// CHECK-NEXT: [[TMP91:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 6
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP91]], align 8
|
|
// CHECK-NEXT: [[TMP92:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 7
|
|
// CHECK-NEXT: store ptr null, ptr [[TMP92]], align 8
|
|
// CHECK-NEXT: [[TMP93:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 8
|
|
// CHECK-NEXT: store i64 [[TMP84]], ptr [[TMP93]], align 8
|
|
// CHECK-NEXT: [[TMP94:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 9
|
|
// CHECK-NEXT: store i64 0, ptr [[TMP94]], align 8
|
|
// CHECK-NEXT: [[TMP95:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 10
|
|
// CHECK-NEXT: store [3 x i32] zeroinitializer, ptr [[TMP95]], align 4
|
|
// CHECK-NEXT: [[TMP96:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 11
|
|
// CHECK-NEXT: store [3 x i32] zeroinitializer, ptr [[TMP96]], align 4
|
|
// CHECK-NEXT: [[TMP97:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 12
|
|
// CHECK-NEXT: store i32 0, ptr [[TMP97]], align 4
|
|
// CHECK-NEXT: [[TMP98:%.*]] = call i32 @__tgt_target_kernel(ptr @[[GLOB2]], i64 -1, i32 0, i32 0, ptr @.{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l13.region_id, ptr [[KERNEL_ARGS14]])
|
|
// CHECK-NEXT: [[TMP99:%.*]] = icmp ne i32 [[TMP98]], 0
|
|
// CHECK-NEXT: br i1 [[TMP99]], label [[OMP_OFFLOAD_FAILED15:%.*]], label [[OMP_OFFLOAD_CONT16:%.*]]
|
|
// CHECK: omp_offload.failed15:
|
|
// CHECK-NEXT: call void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l13(i64 [[TMP50]], ptr [[TMP51]], ptr [[TMP52]]) #[[ATTR2]]
|
|
// CHECK-NEXT: br label [[OMP_OFFLOAD_CONT16]]
|
|
// CHECK: omp_offload.cont16:
|
|
// CHECK-NEXT: br label [[FOR_INC:%.*]]
|
|
// CHECK: for.inc:
|
|
// CHECK-NEXT: [[TMP100:%.*]] = load i32, ptr [[I]], align 4
|
|
// CHECK-NEXT: [[INC:%.*]] = add nsw i32 [[TMP100]], 1
|
|
// CHECK-NEXT: store i32 [[INC]], ptr [[I]], align 4
|
|
// CHECK-NEXT: br label [[FOR_COND]], !llvm.loop [[LOOP7:![0-9]+]]
|
|
// CHECK: for.end:
|
|
// CHECK-NEXT: [[TMP101:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP102:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: [[SUB17:%.*]] = sub nsw i32 [[TMP102]], 1
|
|
// CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[SUB17]] to i64
|
|
// CHECK-NEXT: [[ARRAYIDX18:%.*]] = getelementptr inbounds i32, ptr [[TMP101]], i64 [[IDXPROM]]
|
|
// CHECK-NEXT: [[TMP103:%.*]] = load i32, ptr [[ARRAYIDX18]], align 4
|
|
// CHECK-NEXT: ret i32 [[TMP103]]
|
|
//
|
|
//
|
|
// CHECK-LABEL: define internal void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l9
|
|
// CHECK-SAME: (i64 noundef [[N:%.*]], ptr noundef [[A:%.*]], ptr noundef [[B:%.*]]) #[[ATTR1:[0-9]+]] {
|
|
// CHECK-NEXT: entry:
|
|
// CHECK-NEXT: [[N_ADDR:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[B_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[N_CASTED:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: store i64 [[N]], ptr [[N_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[A]], ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[B]], ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP0]], ptr [[N_CASTED]], align 4
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr [[N_CASTED]], align 8
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB2]], i32 3, ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l9.omp_outlined, i64 [[TMP1]], ptr [[TMP2]], ptr [[TMP3]])
|
|
// CHECK-NEXT: ret void
|
|
//
|
|
//
|
|
// CHECK-LABEL: define internal void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l9.omp_outlined
|
|
// CHECK-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], i64 noundef [[N:%.*]], ptr noundef [[A:%.*]], ptr noundef [[B:%.*]]) #[[ATTR1]] {
|
|
// CHECK-NEXT: entry:
|
|
// CHECK-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[N_ADDR:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[B_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[DOTOMP_IV:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[TMP:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[J:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_LB:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_UB:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_IS_LAST:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[J3:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR]], align 8
|
|
// CHECK-NEXT: store i64 [[N]], ptr [[N_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[A]], ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[B]], ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP0]], ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP1]], 0
|
|
// CHECK-NEXT: [[DIV:%.*]] = sdiv i32 [[SUB]], 1
|
|
// CHECK-NEXT: [[SUB2:%.*]] = sub nsw i32 [[DIV]], 1
|
|
// CHECK-NEXT: store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: store i32 0, ptr [[J]], align 4
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 0, [[TMP2]]
|
|
// CHECK-NEXT: br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
|
|
// CHECK: omp.precond.then:
|
|
// CHECK-NEXT: store i32 0, ptr [[DOTOMP_LB]], align 4
|
|
// CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP3]], ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: store i32 1, ptr [[DOTOMP_STRIDE]], align 4
|
|
// CHECK-NEXT: store i32 0, ptr [[DOTOMP_IS_LAST]], align 4
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
|
|
// CHECK-NEXT: call void @__kmpc_for_static_init_4(ptr @[[GLOB1:[0-9]+]], i32 [[TMP5]], i32 34, ptr [[DOTOMP_IS_LAST]], ptr [[DOTOMP_LB]], ptr [[DOTOMP_UB]], ptr [[DOTOMP_STRIDE]], i32 1, i32 1)
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load i32, ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: [[TMP7:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: [[CMP4:%.*]] = icmp sgt i32 [[TMP6]], [[TMP7]]
|
|
// CHECK-NEXT: br i1 [[CMP4]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
|
|
// CHECK: cond.true:
|
|
// CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: br label [[COND_END:%.*]]
|
|
// CHECK: cond.false:
|
|
// CHECK-NEXT: [[TMP9:%.*]] = load i32, ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: br label [[COND_END]]
|
|
// CHECK: cond.end:
|
|
// CHECK-NEXT: [[COND:%.*]] = phi i32 [ [[TMP8]], [[COND_TRUE]] ], [ [[TMP9]], [[COND_FALSE]] ]
|
|
// CHECK-NEXT: store i32 [[COND]], ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: [[TMP10:%.*]] = load i32, ptr [[DOTOMP_LB]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP10]], ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: br label [[OMP_INNER_FOR_COND:%.*]]
|
|
// CHECK: omp.inner.for.cond:
|
|
// CHECK-NEXT: [[TMP11:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: [[TMP12:%.*]] = load i32, ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: [[CMP5:%.*]] = icmp sle i32 [[TMP11]], [[TMP12]]
|
|
// CHECK-NEXT: br i1 [[CMP5]], label [[OMP_INNER_FOR_BODY:%.*]], label [[OMP_INNER_FOR_END:%.*]]
|
|
// CHECK: omp.inner.for.body:
|
|
// CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP13]], 1
|
|
// CHECK-NEXT: [[ADD:%.*]] = add nsw i32 0, [[MUL]]
|
|
// CHECK-NEXT: store i32 [[ADD]], ptr [[J3]], align 4
|
|
// CHECK-NEXT: [[TMP14:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[J3]], align 4
|
|
// CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[TMP15]] to i64
|
|
// CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP14]], i64 [[IDXPROM]]
|
|
// CHECK-NEXT: [[TMP16:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
|
|
// CHECK-NEXT: [[TMP17:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP18:%.*]] = load i32, ptr [[J3]], align 4
|
|
// CHECK-NEXT: [[IDXPROM6:%.*]] = sext i32 [[TMP18]] to i64
|
|
// CHECK-NEXT: [[ARRAYIDX7:%.*]] = getelementptr inbounds i32, ptr [[TMP17]], i64 [[IDXPROM6]]
|
|
// CHECK-NEXT: store i32 [[TMP16]], ptr [[ARRAYIDX7]], align 4
|
|
// CHECK-NEXT: br label [[OMP_BODY_CONTINUE:%.*]]
|
|
// CHECK: omp.body.continue:
|
|
// CHECK-NEXT: br label [[OMP_INNER_FOR_INC:%.*]]
|
|
// CHECK: omp.inner.for.inc:
|
|
// CHECK-NEXT: [[TMP19:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: [[ADD8:%.*]] = add nsw i32 [[TMP19]], 1
|
|
// CHECK-NEXT: store i32 [[ADD8]], ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: br label [[OMP_INNER_FOR_COND]]
|
|
// CHECK: omp.inner.for.end:
|
|
// CHECK-NEXT: br label [[OMP_LOOP_EXIT:%.*]]
|
|
// CHECK: omp.loop.exit:
|
|
// CHECK-NEXT: [[TMP20:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[TMP20]], align 4
|
|
// CHECK-NEXT: call void @__kmpc_for_static_fini(ptr @[[GLOB1]], i32 [[TMP21]])
|
|
// CHECK-NEXT: br label [[OMP_PRECOND_END]]
|
|
// CHECK: omp.precond.end:
|
|
// CHECK-NEXT: ret void
|
|
//
|
|
//
|
|
// CHECK-LABEL: define internal void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l13
|
|
// CHECK-SAME: (i64 noundef [[N:%.*]], ptr noundef [[A:%.*]], ptr noundef [[B:%.*]]) #[[ATTR1]] {
|
|
// CHECK-NEXT: entry:
|
|
// CHECK-NEXT: [[N_ADDR:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[B_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[N_CASTED:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: store i64 [[N]], ptr [[N_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[A]], ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[B]], ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP0]], ptr [[N_CASTED]], align 4
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i64, ptr [[N_CASTED]], align 8
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: call void (ptr, i32, ptr, ...) @__kmpc_fork_teams(ptr @[[GLOB2]], i32 3, ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l13.omp_outlined, i64 [[TMP1]], ptr [[TMP2]], ptr [[TMP3]])
|
|
// CHECK-NEXT: ret void
|
|
//
|
|
//
|
|
// CHECK-LABEL: define internal void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l13.omp_outlined
|
|
// CHECK-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], i64 noundef [[N:%.*]], ptr noundef [[A:%.*]], ptr noundef [[B:%.*]]) #[[ATTR1]] {
|
|
// CHECK-NEXT: entry:
|
|
// CHECK-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[N_ADDR:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[B_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[DOTOMP_IV:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[TMP:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[J:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_COMB_LB:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_COMB_UB:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_IS_LAST:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[J3:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[N_CASTED:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR]], align 8
|
|
// CHECK-NEXT: store i64 [[N]], ptr [[N_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[A]], ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[B]], ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP0]], ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[SUB:%.*]] = sub i32 [[TMP1]], -2
|
|
// CHECK-NEXT: [[DIV:%.*]] = udiv i32 [[SUB]], 3
|
|
// CHECK-NEXT: [[SUB2:%.*]] = sub i32 [[DIV]], 1
|
|
// CHECK-NEXT: store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: store i32 0, ptr [[J]], align 4
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 0, [[TMP2]]
|
|
// CHECK-NEXT: br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
|
|
// CHECK: omp.precond.then:
|
|
// CHECK-NEXT: store i32 0, ptr [[DOTOMP_COMB_LB]], align 4
|
|
// CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP3]], ptr [[DOTOMP_COMB_UB]], align 4
|
|
// CHECK-NEXT: store i32 1, ptr [[DOTOMP_STRIDE]], align 4
|
|
// CHECK-NEXT: store i32 0, ptr [[DOTOMP_IS_LAST]], align 4
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
|
|
// CHECK-NEXT: call void @__kmpc_for_static_init_4u(ptr @[[GLOB3:[0-9]+]], i32 [[TMP5]], i32 92, ptr [[DOTOMP_IS_LAST]], ptr [[DOTOMP_COMB_LB]], ptr [[DOTOMP_COMB_UB]], ptr [[DOTOMP_STRIDE]], i32 1, i32 1)
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load i32, ptr [[DOTOMP_COMB_UB]], align 4
|
|
// CHECK-NEXT: [[TMP7:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: [[CMP4:%.*]] = icmp ugt i32 [[TMP6]], [[TMP7]]
|
|
// CHECK-NEXT: br i1 [[CMP4]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
|
|
// CHECK: cond.true:
|
|
// CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: br label [[COND_END:%.*]]
|
|
// CHECK: cond.false:
|
|
// CHECK-NEXT: [[TMP9:%.*]] = load i32, ptr [[DOTOMP_COMB_UB]], align 4
|
|
// CHECK-NEXT: br label [[COND_END]]
|
|
// CHECK: cond.end:
|
|
// CHECK-NEXT: [[COND:%.*]] = phi i32 [ [[TMP8]], [[COND_TRUE]] ], [ [[TMP9]], [[COND_FALSE]] ]
|
|
// CHECK-NEXT: store i32 [[COND]], ptr [[DOTOMP_COMB_UB]], align 4
|
|
// CHECK-NEXT: [[TMP10:%.*]] = load i32, ptr [[DOTOMP_COMB_LB]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP10]], ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: br label [[OMP_INNER_FOR_COND:%.*]]
|
|
// CHECK: omp.inner.for.cond:
|
|
// CHECK-NEXT: [[TMP11:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: [[TMP12:%.*]] = load i32, ptr [[DOTOMP_COMB_UB]], align 4
|
|
// CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP12]], 1
|
|
// CHECK-NEXT: [[CMP5:%.*]] = icmp ult i32 [[TMP11]], [[ADD]]
|
|
// CHECK-NEXT: br i1 [[CMP5]], label [[OMP_INNER_FOR_BODY:%.*]], label [[OMP_INNER_FOR_END:%.*]]
|
|
// CHECK: omp.inner.for.body:
|
|
// CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[DOTOMP_COMB_LB]], align 4
|
|
// CHECK-NEXT: [[TMP14:%.*]] = zext i32 [[TMP13]] to i64
|
|
// CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[DOTOMP_COMB_UB]], align 4
|
|
// CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP15]] to i64
|
|
// CHECK-NEXT: [[TMP17:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP17]], ptr [[N_CASTED]], align 4
|
|
// CHECK-NEXT: [[TMP18:%.*]] = load i64, ptr [[N_CASTED]], align 8
|
|
// CHECK-NEXT: [[TMP19:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP20:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB2]], i32 5, ptr @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l13.omp_outlined.omp_outlined, i64 [[TMP14]], i64 [[TMP16]], i64 [[TMP18]], ptr [[TMP19]], ptr [[TMP20]])
|
|
// CHECK-NEXT: br label [[OMP_INNER_FOR_INC:%.*]]
|
|
// CHECK: omp.inner.for.inc:
|
|
// CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[DOTOMP_STRIDE]], align 4
|
|
// CHECK-NEXT: [[ADD6:%.*]] = add i32 [[TMP21]], [[TMP22]]
|
|
// CHECK-NEXT: store i32 [[ADD6]], ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: br label [[OMP_INNER_FOR_COND]]
|
|
// CHECK: omp.inner.for.end:
|
|
// CHECK-NEXT: br label [[OMP_LOOP_EXIT:%.*]]
|
|
// CHECK: omp.loop.exit:
|
|
// CHECK-NEXT: [[TMP23:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP24:%.*]] = load i32, ptr [[TMP23]], align 4
|
|
// CHECK-NEXT: call void @__kmpc_for_static_fini(ptr @[[GLOB3]], i32 [[TMP24]])
|
|
// CHECK-NEXT: br label [[OMP_PRECOND_END]]
|
|
// CHECK: omp.precond.end:
|
|
// CHECK-NEXT: ret void
|
|
//
|
|
//
|
|
// CHECK-LABEL: define internal void @{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z18kernel_within_loopPiS_ii_l13.omp_outlined.omp_outlined
|
|
// CHECK-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], i64 noundef [[DOTPREVIOUS_LB_:%.*]], i64 noundef [[DOTPREVIOUS_UB_:%.*]], i64 noundef [[N:%.*]], ptr noundef [[A:%.*]], ptr noundef [[B:%.*]]) #[[ATTR1]] {
|
|
// CHECK-NEXT: entry:
|
|
// CHECK-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[DOTPREVIOUS_LB__ADDR:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: [[DOTPREVIOUS_UB__ADDR:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: [[N_ADDR:%.*]] = alloca i64, align 8
|
|
// CHECK-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[B_ADDR:%.*]] = alloca ptr, align 8
|
|
// CHECK-NEXT: [[DOTOMP_IV:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[TMP:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTCAPTURE_EXPR_1:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[J:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_LB:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_UB:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_STRIDE:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[DOTOMP_IS_LAST:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: [[J4:%.*]] = alloca i32, align 4
|
|
// CHECK-NEXT: store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR]], align 8
|
|
// CHECK-NEXT: store i64 [[DOTPREVIOUS_LB_]], ptr [[DOTPREVIOUS_LB__ADDR]], align 8
|
|
// CHECK-NEXT: store i64 [[DOTPREVIOUS_UB_]], ptr [[DOTPREVIOUS_UB__ADDR]], align 8
|
|
// CHECK-NEXT: store i64 [[N]], ptr [[N_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[A]], ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: store ptr [[B]], ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[N_ADDR]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP0]], ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[SUB:%.*]] = sub i32 [[TMP1]], -2
|
|
// CHECK-NEXT: [[DIV:%.*]] = udiv i32 [[SUB]], 3
|
|
// CHECK-NEXT: [[SUB2:%.*]] = sub i32 [[DIV]], 1
|
|
// CHECK-NEXT: store i32 [[SUB2]], ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: store i32 0, ptr [[J]], align 4
|
|
// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_]], align 4
|
|
// CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 0, [[TMP2]]
|
|
// CHECK-NEXT: br i1 [[CMP]], label [[OMP_PRECOND_THEN:%.*]], label [[OMP_PRECOND_END:%.*]]
|
|
// CHECK: omp.precond.then:
|
|
// CHECK-NEXT: store i32 0, ptr [[DOTOMP_LB]], align 4
|
|
// CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP3]], ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr [[DOTPREVIOUS_LB__ADDR]], align 8
|
|
// CHECK-NEXT: [[CONV:%.*]] = trunc i64 [[TMP4]] to i32
|
|
// CHECK-NEXT: [[TMP5:%.*]] = load i64, ptr [[DOTPREVIOUS_UB__ADDR]], align 8
|
|
// CHECK-NEXT: [[CONV3:%.*]] = trunc i64 [[TMP5]] to i32
|
|
// CHECK-NEXT: store i32 [[CONV]], ptr [[DOTOMP_LB]], align 4
|
|
// CHECK-NEXT: store i32 [[CONV3]], ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: store i32 1, ptr [[DOTOMP_STRIDE]], align 4
|
|
// CHECK-NEXT: store i32 0, ptr [[DOTOMP_IS_LAST]], align 4
|
|
// CHECK-NEXT: [[TMP6:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
|
|
// CHECK-NEXT: call void @__kmpc_for_static_init_4u(ptr @[[GLOB1]], i32 [[TMP7]], i32 34, ptr [[DOTOMP_IS_LAST]], ptr [[DOTOMP_LB]], ptr [[DOTOMP_UB]], ptr [[DOTOMP_STRIDE]], i32 1, i32 1)
|
|
// CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: [[TMP9:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: [[CMP5:%.*]] = icmp ugt i32 [[TMP8]], [[TMP9]]
|
|
// CHECK-NEXT: br i1 [[CMP5]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
|
|
// CHECK: cond.true:
|
|
// CHECK-NEXT: [[TMP10:%.*]] = load i32, ptr [[DOTCAPTURE_EXPR_1]], align 4
|
|
// CHECK-NEXT: br label [[COND_END:%.*]]
|
|
// CHECK: cond.false:
|
|
// CHECK-NEXT: [[TMP11:%.*]] = load i32, ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: br label [[COND_END]]
|
|
// CHECK: cond.end:
|
|
// CHECK-NEXT: [[COND:%.*]] = phi i32 [ [[TMP10]], [[COND_TRUE]] ], [ [[TMP11]], [[COND_FALSE]] ]
|
|
// CHECK-NEXT: store i32 [[COND]], ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: [[TMP12:%.*]] = load i32, ptr [[DOTOMP_LB]], align 4
|
|
// CHECK-NEXT: store i32 [[TMP12]], ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: br label [[OMP_INNER_FOR_COND:%.*]]
|
|
// CHECK: omp.inner.for.cond:
|
|
// CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: [[TMP14:%.*]] = load i32, ptr [[DOTOMP_UB]], align 4
|
|
// CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP14]], 1
|
|
// CHECK-NEXT: [[CMP6:%.*]] = icmp ult i32 [[TMP13]], [[ADD]]
|
|
// CHECK-NEXT: br i1 [[CMP6]], label [[OMP_INNER_FOR_BODY:%.*]], label [[OMP_INNER_FOR_END:%.*]]
|
|
// CHECK: omp.inner.for.body:
|
|
// CHECK-NEXT: [[TMP15:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: [[MUL:%.*]] = mul i32 [[TMP15]], 3
|
|
// CHECK-NEXT: [[ADD7:%.*]] = add i32 0, [[MUL]]
|
|
// CHECK-NEXT: store i32 [[ADD7]], ptr [[J4]], align 4
|
|
// CHECK-NEXT: [[TMP16:%.*]] = load ptr, ptr [[B_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP17:%.*]] = load i32, ptr [[J4]], align 4
|
|
// CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[TMP17]] to i64
|
|
// CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP16]], i64 [[IDXPROM]]
|
|
// CHECK-NEXT: [[TMP18:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
|
|
// CHECK-NEXT: [[MUL8:%.*]] = mul nsw i32 [[TMP18]], 2
|
|
// CHECK-NEXT: [[TMP19:%.*]] = load ptr, ptr [[A_ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[J4]], align 4
|
|
// CHECK-NEXT: [[IDXPROM9:%.*]] = sext i32 [[TMP20]] to i64
|
|
// CHECK-NEXT: [[ARRAYIDX10:%.*]] = getelementptr inbounds i32, ptr [[TMP19]], i64 [[IDXPROM9]]
|
|
// CHECK-NEXT: store i32 [[MUL8]], ptr [[ARRAYIDX10]], align 4
|
|
// CHECK-NEXT: br label [[OMP_BODY_CONTINUE:%.*]]
|
|
// CHECK: omp.body.continue:
|
|
// CHECK-NEXT: br label [[OMP_INNER_FOR_INC:%.*]]
|
|
// CHECK: omp.inner.for.inc:
|
|
// CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: [[ADD11:%.*]] = add i32 [[TMP21]], 1
|
|
// CHECK-NEXT: store i32 [[ADD11]], ptr [[DOTOMP_IV]], align 4
|
|
// CHECK-NEXT: br label [[OMP_INNER_FOR_COND]]
|
|
// CHECK: omp.inner.for.end:
|
|
// CHECK-NEXT: br label [[OMP_LOOP_EXIT:%.*]]
|
|
// CHECK: omp.loop.exit:
|
|
// CHECK-NEXT: [[TMP22:%.*]] = load ptr, ptr [[DOTGLOBAL_TID__ADDR]], align 8
|
|
// CHECK-NEXT: [[TMP23:%.*]] = load i32, ptr [[TMP22]], align 4
|
|
// CHECK-NEXT: call void @__kmpc_for_static_fini(ptr @[[GLOB1]], i32 [[TMP23]])
|
|
// CHECK-NEXT: br label [[OMP_PRECOND_END]]
|
|
// CHECK: omp.precond.end:
|
|
// CHECK-NEXT: ret void
|
|
//
|