1188 Commits

Author SHA1 Message Date
Kareem Ergawy
42da12063f
[flang][OpenMP] Extend delayed privatization for omp.simd (#122156)
Adds support for delayed privatization for `simd` directives. This PR
includes PFT down to LLVM IR lowering.
2025-01-12 07:46:58 +01:00
William Moses
38fcf62483
[MLIR] Import LLVM add flag to disable loadalldialects (#122574)
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
2025-01-11 09:11:22 -05:00
Kareem Ergawy
6f9e688203
[flang][OpenMP] Fix reduction init region block management (#122079)
Replaces https://github.com/llvm/llvm-project/pull/121886
Fixes https://github.com/llvm/llvm-project/issues/120254 (hopefully 🤞)

## Problem

Consider the following example:
```fortran
program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program
```

The HLFIR+OMP IR for this example looks like this:
```mlir
  func.func @_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
      %6 = fir.alloca !fir.box<!fir.array<1xf32>>
      ...
      omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }
```

The problem addressed by this PR is related to: the `alloca` in the
`omp.parallel` region + the related `reduction` clause on the
`omp.wsloop` op. When we try translate the reduction from MLIR to LLVM,
we have to choose an `alloca` insertion point. This happens in
`convertOmpWsloop` where at entry to that function, this is what the
LLVM module looks like:

```llvm
define void @_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
```

Now, when we choose an `alloca` insertion point for the reduction, this
is the chosen block `omp.par.entry` (without the changes in this PR).
The problem is that the allocation needed for the reduction needs to
reference the `%5` SSA value. This results in inserting allocations in
`omp.par.entry` that reference allocations in a later block
`omp.par.region1` which causes the `Instruction does not dominate all
uses!` error.

## Possible solution - take 2:

This PR contains a more localized solution than
https://github.com/llvm/llvm-project/pull/121886. It makes sure that on
entry to `initReductionVars`, the IR builder is at a point where we can
starting inserting initialization region; to make things cleaner, we
still split the builder insertion point to a dedicated
`omp.reduction.init`. This way we avoid splitting after the latest
allocation block; which is what causing the issue.
2025-01-09 16:11:18 +01:00
William Moses
1c067a513c
[MLIR] Enable import of non self referential alias scopes (#121987)
Fixes #121965.

---------

Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
Co-authored-by: Alex Zinenko <git@ozinenko.com>
2025-01-08 13:40:05 +01:00
Alex MacLean
4583f6d344
[NVPTX] Switch front-ends and tests to ptx_kernel cc (#120806)
the `ptx_kernel` calling convention is a more idiomatic and standard way
of specifying a NVPTX kernel than using the metadata which is not
supposed to change the meaning of the program. Further, checking the
calling convention is significantly faster than traversing the metadata,
improving compile time.

This change updates the clang and mlir frontends as well as the
NVPTXCtorDtorLowering pass to emit kernels using the calling convention.
In addition, this updates all NVPTX unit tests to use the calling
convention as well.
2025-01-07 18:24:50 -08:00
William Moses
b5f21671ef
MLIR: Enable importing inlineasm calls (#121624) 2025-01-05 11:02:49 -05:00
agozillon
fa56e8bb64
[OpenMP][MLIR] Fix threadprivate lowering when compiling for target when target operations are in use (#119310)
Currently the compiler will ICE in programs like the following on the
device lowering pass:

```
program main
    implicit none

    type i1_t
       integer :: val(1000)
    end type i1_t
    integer :: i
    type(i1_t), pointer :: newi1
    type(i1_t), pointer :: tab=>null()

    integer, dimension(:), pointer :: tabval

!$omp THREADPRIVATE(tab)

allocate(newi1)

tab=>newi1
tab%val(:)=1
tabval=>tab%val

!$omp target teams distribute parallel do
  do i = 1, 1000
   tabval(i) = i
 end do
!$omp end target teams distribute parallel do

end program main
```

This is due to the fact that THREADPRIVATE returns a result operation,
and this operation can actually be used by other LLVM dialect (or other
dialect) operations. However, we currently skip the lowering of
threadprivate, so we effectively never generate and bind an LLVM-IR
result to the threadprivate operation result. So when we later go on to
lower dependent LLVM dialect operations, we are missing the required
LLVM-IR result, try to access and use it and then ICE. The fix in this
particular PR is to allow compilation of threadprivate for device as
well as host, and simply treat the device compilation as a no-op,
binding the LLVM-IR result of threadprivate with no alterations and
binding it, which will allow the rest of the compilation to proceed,
where we'll eventually discard the host segment in any case.

The other possible solution to this I can think of, is doing something
similar to Flang's passes that occur prior to CodeGen to the LLVM
dialect, where they erase/no-op certain unrequired operations or
transform them to lower level series of operations. And we would
erase/no-op threadprivate on device as we'd never have these in target
regions.

The main issues I can see with this are that we currently do not
specialise this stage based on wether we're compiling for device or
host, so it's setting a precedent and adding another point of having to
understand the separation between target and host compilation. I am also
not sure we'd necessarily want to enforce this at a dialect level incase
someone else wishes to add a different lowering flow or translation
flow. Another possible issue is that a target operation we have/utilise
would depend on the result of threadprivate, meaning we'd not be allowed
to entirely erase/no-op it, I am not sure of any situations where this
may be an issue currently though.
2025-01-03 18:01:01 +01:00
Kaviya Rajendiran
d3eb65f15d
[MLIR][OpenMP] Lowering aligned clause to LLVM IR for SIMD directive (#119536)
This patch,
- Added a translation support for aligned clause in SIMD directive by passing the alignment details to "llvm.assume" intrinsic.
- Updated the insertion point for llvm.assume intrinsic call in "OMPIRBuilder.cpp".
- Added a check in aligned clause MLIR lowering, to ensure that the alignment value must be a power of 2.
2025-01-03 16:22:38 +05:30
Thirumalai Shaktivel
cbe583b0bd
[Flang] Add translation support for MutexInOutSet and InOutSet (#120715)
Implementatoin details:
Both Mutexinoutset and Inoutset is recognized as flag=0x4 
and 0x8 respectively, the flags is set to `kmp_depend_info` and 
passed as argument to `__kmpc_omp_task_with_deps` runtime call
2024-12-26 15:02:09 +05:30
Muhammad Omair Javaid
927a70daf3 Revert "[Flang OpenMP] Add LLVM translation support for UNTIED in Task (#115283)"
This reverts commit 919aead1db64b2f1444842bc75a3af7836238671.
It breaks following LLVM bots:
https://lab.llvm.org/buildbot/#/builders/199
https://lab.llvm.org/buildbot/#/builders/143
https://lab.llvm.org/buildbot/#/builders/17
2024-12-24 01:47:24 +05:00
Thirumalai Shaktivel
919aead1db
[Flang OpenMP] Add LLVM translation support for UNTIED in Task (#115283)
Implementation details:
The UNTIED clause is recognized by setting the flag=0 for the default
case or performing logical OR to flag if other clauses are specified,
and this flag is passed as an argument to the `__kmpc_omp_task_alloc`
runtime call.
2024-12-20 16:36:51 +05:30
Mehdi Amini
6a7d6c5f69
[MLIR] Add a MLIR_NVVM_EMBED_LIBDEVICE CMake option that embeds libdevice in the binary (#120238)
This removes a runtime dependency on the CUDA Toolkit path, instead of
looking up the filesystem we use a version of libdevice embedded in the
binary at build time.
2024-12-17 16:53:38 +01:00
Mehdi Amini
72e8b9aeaa
[MLIR] Add a BlobAttr interface for attribute to wrap arbitrary content and use it as linkLibs for ModuleToObject (#120116)
This change allows to expose through an interface attributes wrapping
content as external resources, and the usage inside the ModuleToObject
show how we will be able to provide runtime libraries without relying on
the filesystem.
2024-12-17 01:30:56 +01:00
Renaud Kauffmann
9919295cfd
[mlir][gpu] Adding ELF section option to the gpu-module-to-binary pass (#119440)
This is a follow-up of #117246.

I thought then it would be easy to edit a DictionaryAttr but it turns
out that these attributes are immutable and need to be passed during the
construction of the gpu.binary Op.

The first commit was using the NVVMTargetAttr to pass the information.
After feedback from @fabianmcg, this PR now passes the information
through a new option of the gpu-module-to-binary pass.

Please add reviewers, as you see fit.
2024-12-16 09:09:41 -08:00
Ivan R. Ivanov
7c9404c279
[flang][OpenMP] Add frontend support for ompx_bare clause (#111106) 2024-12-13 21:44:43 +09:00
Jie Fu
46ec271e03 [mlir] Fix -Wunused-variable in OpenMPToLLVMIRTranslation.cpp (NFC)
/llvm-project/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp:3921:12:
 error: unused variable 'varType' [-Werror,-Wunused-variable]
      Type varType = mapInfoOp.getVarType();
           ^
1 error generated.
2024-12-12 22:11:41 +08:00
Kareem Ergawy
f9734b9df1
[mlir][OpenMP] - MLIR to LLVMIR translation support for delayed privatization of allocatables in omp.target ops (#116576)
This PR adds support to translate the `private` clause from MLIR to
LLVMIR when used on allocatables in the context of an `omp.target` op.

This replaces https://github.com/llvm/llvm-project/pull/113208.

Parent PR: https://github.com/llvm/llvm-project/pull/116770. Only the
latest commit is relevant to the PR.
2024-12-12 14:39:58 +01:00
Zichen Lu
4971e53612
[mlir][Target] Support Fatbin target for static nvptxcompiler (#118044)
### Background

In `lib/Target/LLVM/NVVM/Target.cpp`, `NVPTXSerializer` compile PTX to
binary with two different flows controlled by
`MLIR_ENABLE_NVPTXCOMPILER`.

If building mlir with `-DMLIR_ENABLE_NVPTXCOMPILER=ON`, the flow does
not check if the target is `gpu::CompilationTarget::Fatbin`, and compile
PTX to cubin directly, which is not consistent with another flow.

### Implement

Use static [nvfatbin](https://docs.nvidia.com/cuda/nvfatbin/index.html)
library.

I have tested it locally, the two flows can return the same Fatbin
result after inputing the same `GpuModule`.
2024-12-10 11:45:24 +01:00
Kareem Ergawy
0e70e0edd5
[reapply (#118463)][OpenMP][OMPIRBuilder] Add delayed privatization support for wsloop (#119170)
This reapplies PR #118463 after introducing a fix for a bug uncovere by
the test suite. The problem is that when the alloca block is terminated
with a conditional branch, this violates a pre-condition of
`allocatePrivateVars` (which assumes the alloca block has a single
successor). This new PR includes a test that reproduces the issue.

Extend MLIR to LLVM lowering by adding support for `omp.wsloop` for
delayed privatization. This also refactors a few bit of code to isolate
the logic needed for `firstprivate` initialization in a shared util that
can be used across constructs that need it. The same is done for
`dealloc` regions.
2024-12-09 14:32:04 +01:00
NimishMishra
9eb4056144
[mlir][llvm] Translation support for task detach (#116601)
This PR adds translation support for task detach. Essentially, if the
`detach` clause is present on a task, emit a
`__kmpc_task_allow_completion_event` on it, and store its return (of
type `kmp_event_t*`) into the `event_handle`.
2024-12-08 06:09:52 -08:00
Kareem Ergawy
c54616ea48
Revert "[OpenMP][OMPIRBuilder] Add delayed privatization support for wsloop (#118463)" (#118848) 2024-12-05 20:49:13 +01:00
Kareem Ergawy
0993335134
[OpenMP][OMPIRBuilder] Add delayed privatization support for wsloop (#118463)
Extend MLIR to LLVM lowering by adding support for `omp.wsloop` for
delayed privatization. This also refactors a few bit of code to isolate
the logic needed for `firstprivate` initialization in a shared util that
can be used across constructs that need it. The same is done for
`dealloc`
regions.

Parent PR: https://github.com/llvm/llvm-project/pull/118447. Only latest
commit is relevant for this PR.
2024-12-05 05:59:52 +01:00
Kareem Ergawy
7f72d71de7
[OpenMP][OMPIRBuilder] Refactor reduction initialization logic into one util (#118447)
This refactors the logic needed to emit init logic for reductions by
moving some duplicated code into a shared util. The logic for doing is
quite involved and is needed for any construct that has reductions.
Moreover, when a construct has both private and reduction clauses, both
sets of clauses need to cooperate with each other when emitting the
logic needed for allocation and initialization. Therefore, this PR
clearly sets the boundaries for the logic needed to initialize
reductions.
2024-12-05 05:23:49 +01:00
Krzysztof Drewniak
92a15dd748
[mlir][LLVM] Plumb range attributes on parameters and results through (#117801)
We've had the ability to define LLVM's `range` attribute through
 #llvm.constant_range for some time, and have used this for some GPU
intrinsics. This commit allows using `llvm.range` as a parameter or
result attribute on function declarations and definitions.
2024-11-27 12:31:51 -06:00
NimishMishra
b9e3a769b9
[flang][mlir][llvm][OpenMP] Add lowering and translation support for mergeable clause on task (#114662)
Add FIR generation and LLVMIR translation support for mergeable clause
on task construct. If mergeable clause is present on a task, the
relevant flag in `ompt_task_flag_t` is set and passed to
`__kmpc_omp_task_alloc`.
2024-11-26 02:40:26 -08:00
Renaud Kauffmann
7fcc0f9065
Populate the llvm::GlobalVariable ELF section, with the attribute from the ObjectAttrs (#117246) 2024-11-22 07:58:45 -08:00
arthurqiu
81055ff070
[mlir][nvvm] Add attributes for cluster dimension PTX directives (#116973)
PTX programming models provides cluster dimension directives, which are
leveraged by the downstream `ptxas` compiler. See
https://docs.nvidia.com/cuda/nvvm-ir-spec/#supported-properties and
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-dimension-directives

This PR introduces the cluster dimension directives to MLIR's NVVM
dialect as listed below:
```
cluster_dim_{x,y,z}    ->    exact number of CTAs per cluster
cluster_max_blocks     ->    max number of CTAs per cluster
```
2024-11-20 18:31:01 +01:00
Zichen Lu
08e7609692
[mlir][fix] Add callback functions for ModuleToObject (#116916)
Here is the [merged
MR](https://github.com/llvm/llvm-project/pull/116007) which caused a
failure and [was
reverted](https://github.com/llvm/llvm-project/pull/116811).

Thanks to @joker-eph for the help, I fix it (miss constructing
`ModuleObject` with callback functions in
`mlir/lib/Target/LLVM/NVVM/Target.cpp`) and split unit tests from origin
test which don't need `ptxas` to make the test runs more widely.
2024-11-20 13:22:08 +01:00
Mehdi Amini
af41c55673
Revert "[MLIR] Add callback functions for ModuleToObject" (#116811)
Reverts llvm/llvm-project#116007

Bot is broken.
2024-11-19 15:28:17 +01:00
Zichen Lu
2153672ba3
[MLIR] Add callback functions for ModuleToObject (#116007)
In ModuleToObject flow, users may want to add some callback functions
invoked with LLVM IR/ISA for debugging or other purposes.
2024-11-19 13:51:08 +01:00
Tom Eccles
a6385a3fc8
[mlir][OpenMP][NFC] use llvm::zip_equal for firstprivate copy region translation (#116416)
I think this is a bit easier to read.
2024-11-18 10:25:19 +00:00
Victor Perez
4f78f85190
[MLIR][SPIRV] Add definition and (de)serialization for cache controls (#115461)
[SPV_INTEL_cache_controls](https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/INTEL/SPV_INTEL_cache_controls.html)
defines decorations for load and store cache control. Add support for
this extension in the SPIR-V dialect.

As several `CacheControlLoadINTEL` and `CacheControlStoreINTEL` may be
applied to the same value, these are represented as array attributes.
(De)Serialization takes care of this representation.

---------

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
2024-11-18 09:42:31 +01:00
agozillon
b5db75bfce
[OpenMP][MLIR] Descriptor explicit member map lowering changes (#113556)
This is one of 3 PRs in a PR stack that aims to add support for explicit
mapping of allocatable members in derived types.

The primary changes in this PR are the OpenMPToLLVMIRTranslation.cpp
changes, which are small and seek to alter the current member mapping to
add an additional map insertion for pointers. Effectively, if the member
is a pointer (currently indicated by having a varPtrPtr field) we add an
additional map for the pointer and then alter the subsequent mapping of
the member (the data) to utilise the member rather than the parents base
pointer. This appears to be necessary in certain cases when mapping
pointer data within record types to avoid segfaulting on device (due to
incorrect data mapping). In general this record type mapping may be
simplifiable in the future.

There are also additions of tests which should help to showcase the
affect of the changes above.
2024-11-16 12:26:29 +01:00
lfrenot
40afff7bd9
[mlir][LLVM] Add disjoint flag (#115855)
The implementation is mostly based on the one existing for the exact
flag.

disjoint means that for each bit, that bit is zero in at least one of
the inputs. This allows the Or to be treated as an Add since no carry
can occur from any bit. If the disjoint keyword is present, the result
value of the or is a [poison
value](https://llvm.org/docs/LangRef.html#poisonvalues) if both inputs
have a one in the same bit position. For vectors, only the element
containing the bit is poison.
2024-11-15 13:48:01 +01:00
agozillon
d84d0caf28
[Flang][OpenMP] Update MapInfoFinalization to use BlockArgs Interface and modify use_device_ptr/addr to be order independent (#113919)
This patch primarily updates the MapInfoFinalization pass to utilise the
BlockArgument interface. It also shuffles newly added arguments the
MapInfoFinalization passes to the end of the BlockArg/Relevant MapInfo
lists, instead of one prior to the owning descriptor type.

During this it was noted that the use_device_ptr/addr handling of target
data was a little bit too order dependent so I've attempted to make it
less so, as we cannot depend on argument ordering to be the same as
Fortran for any future frontends.
2024-11-14 15:47:37 +01:00
lfrenot
89aaf2cf68
[mlir][LLVM] Add nneg flag (#115498)
This implementation is based on the existing one for the exact flag.

If the nneg flag is set and the argument is negative, the result is a
poison value.
2024-11-11 14:01:50 +01:00
lfrenot
afa178d360
[mlir][LLVM] Add exact flag (#115327)
The implementation is mostly based on the one existing for the nsw and
nuw flags.

If the exact flag is present, the corresponding operation returns a
poison value when the result is not exact. (For a division, if rounding
happens; for a right shift, if a non-zero bit is shifted out.)
2024-11-08 13:56:44 +01:00
Matthias Springer
b613a54075
[mlir][IR][NFC] Cleanup insertion point API usage (#115415)
Use `setInsertionPointToStart` / `setInsertionPointToEnd` when possible.
2024-11-08 14:31:27 +09:00
Tom Eccles
8269c400b4
[mlir][OpenMP][NFC] delayed privatisation cleanup (#115298)
Upstreaming some code cleanups ahead of supporting delayed task
execution.
- Make allocatePrivateVars not need to be a template (it will need to
operate separately on firstprivate and private variables for delayed
task execution so it can't index into lists of all variables in the
operation).
 - Use llvm::SmallVectorImpl for function arguments
- collectPrivatizationDecls already reserves size for privateDecls so we
don't need to do that in callers
 - Use llvm::zip_equal instead of C-style array indexing
2024-11-07 12:27:31 +00:00
Ilya Enkovich
2f743ac52e
[MLIR] [AMX] Utilize x86_amx type for AMX dialect in MLIR. (#111197)
This patch is intended to resolve #109481 and improve the usability of
the AMX dialect.

In LLVM IR, AMX intrinsics use `x86_amx` which is one of the primitive
types. This type is supposed to be used for AMX intrinsic calls and no
other operations. AMX dialect of MLIR uses regular 2D vector types,
which are then lowered to arrays of vectors in the LLVMIR dialect. This
creates an inconsistency in the types used in the LLVMIR dialect and
LLVMIR. Translation of AMX intrinsic calls to LLVM IR doesn't require
result types to match and that is where tile loads and mul operation
results get `x86_amx` type. This works in very simple cases when mul and
tile store operations directly consume the result of another AMX
intrinsic call, but it doesn't work when an argument is a block argument
(phi node).

In addition to translation problems, this inconsistency between types
used in MLIR and LLVM IR makes MLIR verification and transformation
quite problematic. Both `amx.tileload` and `vector::transfer_read` can
load values of the same type, but only one of them can be used in AMX
operations. In general, by looking at a type of value, we cannot
determine if it can only be used for AMX operations or contrary can be
used in other operations but AMX ones.

To remove this inconsistency and make AMX operations more explicit in
their limitations, I propose to add `LLVMX86AMXType` type to the LLVMIR
dialect to match `x86_amx` type in LLVM IR, and introduce
`amx::TileType` to be used by AMX operations in MLIR. This resolves
translation problems for AMX usage with phi nodes and provides proper
type verification in MLIR for AMX operations.

P.S. This patch also adds missing FP16 support. It's trivial but
unrelated to type system changes, so let me know if I should submit it
separately.

---------

Signed-off-by: Ilya Enkovich <ilya.enkovich@intel.com>
2024-11-06 14:30:53 +00:00
Tom Eccles
28452acac0
[mlir][OpenMP] delayed privatisation for TASK (#114785)
This uses essentially an identical implementation to that used for
ParallelOp. The private variable allocation and deallocation use shared
functions to avoid code duplication. FIRSTPRIVATE variable copying uses
duplicated code for now because I anticipate the implementation
diverging in the near future once I store data for firstprivate
variables in the task description structure.

After enabling delayed privatisation for TASK in flang, one more test in
the fujitsu test suite passes (I haven't looked into why).
2024-11-06 13:19:12 +00:00
Zichen Lu
f87484d591
Fix libnvptxcompiler_static.a absolute path (#115015)
Now when building llvm-solid with `-DMLIR_ENABLE_NVPTXCOMPILER=ON`,
there will be an absolute path (`/path/to/libnvptxcompiler_static.a`) in
MLIRNVVMTarget dependencies (in
`/build/path/install/lib/cmake/mlir/MLIRTargets.cmake`). For example,

```cmake
set_target_properties(MLIRNVVMTarget PROPERTIES
  INTERFACE_LINK_LIBRARIES "MLIRIR;MLIRExecutionEngineUtils;MLIRSupport;MLIRGPUDialect;MLIRTargetLLVM;MLIRNVVMToLLVMIRTranslation;LLVMSupport;/path/to/libnvptxcompiler_static.a"
)
```

If downstream project uses pre-built llvm and depends on MLIRNVVMTarget,
it may fail to build due to the absence of the
`libnvptxcompiler_static.a` absolute path.

After this commit, there will no absolute path in
`/build/path/install/lib/cmake/mlir/MLIRTargets.cmake`

```cmake
set_target_properties(MLIRNVVMTarget PROPERTIES
  INTERFACE_LINK_LIBRARIES "MLIRIR;MLIRExecutionEngineUtils;MLIRSupport;MLIRGPUDialect;MLIRTargetLLVM;MLIRNVVMToLLVMIRTranslation;LLVMSupport;\$<LINK_ONLY:MLIR_NVPTXCOMPILER_LIB>"
)
```

Then downstream project can modify `libnvptxcompiler_static.a` path and
use cmake to build. For example,

```cmake
# find_library(...)

add_library(MLIR_NVPTXCOMPILER_LIB STATIC IMPORTED GLOBAL)
set_property(TARGET MLIR_NVPTXCOMPILER_LIB PROPERTY IMPORTED_LOCATION ${...})  
```
2024-11-06 11:51:18 +01:00
Sergio Afonso
d3e796c2d0
[MLIR][OpenMP] Update not-yet-implemented errors, NFC (#114966)
This patch improves not-yet-implemented error diagnostics to more
closely follow the format used by Flang lowering for the same kind of
errors. This helps keep some level of uniformity from a user
perspective.
2024-11-05 12:48:54 +00:00
Sergio Afonso
6c28530ed0
[Flang][OpenMP] Properly bind arguments of composite operations (#113682)
When composite constructs are lowered, clauses for each leaf construct
are lowered before creating the set of loop wrapper operations, using
these outside values to populate their operand lists. Then, when the
loop nest associated to that composite construct is lowered, the binding
of Fortran symbols to the entry block arguments defined by these loop
wrappers is performed, resulting in the creation of `hlfir.declare`
operations in the entry block of the `omp.loop_nest`.

This approach prevents `hlfir.declare` operations related to the binding
and other operations resulting from the evaluation of the clauses from
being inserted between loop wrapper operations, which would be an
illegal MLIR representation. However, this introduces the problem of
entry block arguments defined by a wrapper that then should be used by
one of its nested wrappers, because the corresponding Fortran symbol
would still be mapped to an outside value at the time of gathering the
list of operands for the nested wrapper.

This patch adds operand re-mapping logic to update wrappers without
changing when clauses are evaluated or where the `hlfir.declare`
creation is performed.
2024-10-31 16:39:53 +00:00
Sergio Afonso
bd6c21460f
[MLIR][OpenMP] Emit descriptive errors for all unsupported clauses (#114037)
This patch improves error reporting in the MLIR to LLVM IR translation
pass for the 'omp' dialect by emitting descriptive errors when
encountering clauses not yet supported by that pass.

Additionally, not-yet-implemented errors previously missing for some
clauses are added, to avoid silently ignoring them.

Error messages related to inlining of `omp.private` and
`omp.declare_reduction` regions have been updated to use the same
format.
2024-10-31 11:59:51 +00:00
Sergio Afonso
21a6032eca
[MLIR][OpenMP] Simplify translation to LLVM IR error handling (#114036)
This patch unifies the handling of errors passed through the
OpenMPIRBuilder and removes some redundant error messages through the
introduction of a custom `ErrorInfo` subclass.

Additionally, the current list of operations and clauses unsupported by
the MLIR to LLVM IR translation pass is added to a new Lit test to check
they are being reported to the user.
2024-10-31 11:34:24 +00:00
Abid Qadeer
89f2d50cda
[mlir][debug] Support DIGenericSubrange. (#113441)
`DIGenericSubrange` is used when the dimensions of the arrays are
unknown at build time (e.g. assumed-rank arrays in Fortran). It has same
`lowerBound`, `upperBound`, `count` and `stride` fields as in
`DISubrange` and its translation looks quite similar as a result.

---------

Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
2024-10-31 10:09:26 +00:00
Sergio Afonso
a1f2fb6078
[MLIR][OpenMP] Prevent composite omp.simd related crashes (#113680)
This patch updates the translation of `omp.wsloop` with a nested
`omp.simd` to prevent uses of block arguments defined by the latter from
triggering null pointer dereferences.

This happens because the inner `omp.simd` operation representing
composite `do simd` constructs is currently skipped and not translated,
but this results in block arguments defined by it not being mapped to an
LLVM value. The proposed solution is to map these block arguments to the
LLVM value associated to the corresponding operand, which is defined
above.
2024-10-29 17:05:12 +00:00
Sergio Afonso
d87964de78
[OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533)
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.

This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
2024-10-25 11:30:16 +01:00
Kareem Ergawy
ad70f3e095
[flang][OpenMP] Support target enter|update|exit .. nowait (#113305)
Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
2024-10-23 10:48:54 +02:00