342 Commits

Author SHA1 Message Date
Zhen Wang
c4e6cf0abf
[flang][cuda] Support non-allocatable module-level managed variables (#188526)
Add support for non-allocatable module-level CUDA managed variables
using pointer indirection through a companion global in
__nv_managed_data__. The CUDA runtime populates this pointer with the
unified memory address via __cudaRegisterManagedVar and
__cudaInitModule.

1. Create a .managed.ptr companion global in the __nv_managed_data__
section and register it with _FortranACUFRegisterManagedVariable
(CUFAddConstructor.cpp)
2. Call __cudaInitModule after registration to populate the managed
pointer (registration.cpp)
3. Annotate managed globals in gpu.module with nvvm.managed for PTX
.attribute(.managed) generation (cuda-code-gen.mlir)
4. Suppress cuf.data_transfer for assignments to/from non-allocatable
module managed variables, since cudaMemcpy would target the shadow
address rather than the actual unified memory (tools.h)
5. Preserve cuf.data_transfer for device_var = managed_var assignments
where explicit transfer is still required
2026-03-31 16:27:08 +00:00
jeanPerier
252eb2a743
[flang][FIR] add a new fir.bitcast operation (#187793)
This patch introduces a new bitcast operation for integer, float,
character, and logical.

The main rational for it is that it is currently not possible to express
such bitcast in FIR without going trough memory and there is a need to
have some bitcast support when interfacing with the memref dialect where
one cannot use fir.char<> and fir.logical and must use the underlying
storage type. Using fir.convert is not a good idea because it is a
semantic cast and it will for instance normalize integers when
converting from/to logical.

This could also be used to simplify the implementation of TRANSFER for
the cases of simple scalars of those types.

Assisted by: Claude
2026-03-23 09:55:25 +01:00
Matthias Springer
13c00cbc2a
[mlir][IR] Rename DenseIntOrFPElementsAttr to DenseTypedElementsAttr (#185687)
`DenseIntOrFPElementsAttr` was recently generalized to accept any type
that implement the `DenseElementType` interface. The name
`DenseIntOrFPElementsAttr` does not make sense anymore. This commit
renames the attribute to `DenseTypedElementsAttr`. An alias is kept for
migration purposes. The alias will be removed after some time.
2026-03-13 17:27:23 +01:00
Jack Styles
7e0ef4a203
[Flang] Apply nusw nuw flags on array_coor gep's (#184573)
When generating the LLVM IR, since #110060, `nsw` is applied to
operations when lowering the subscripts. This was, up until now, only
applied to arithmetic, and not the related getelementptr's.

The original Discouse thread noted that NSW helped with vectorisation
later on in the process. Changes to the BasicAA pipeline has led to
vectorisation no longer being applied where wrapping cannot be
guaranteed for array_coor instructions. By applying the `nusw nuw` flags
to the GEP's, this enables vectorisation in the middle end. Supporting
arithmatic instructions will also be marked `nuw` to ensure instcombine
does not remove these flags when transforming instructions.

There does need to be some consideration to the `sub` operations
generated in this process. There are cases, such as when an array is
shifted, where unsigned wrapping may occur due to using negative values.
To protect against this, if an array is shifted, `nuw` won't be applied
to the `sub` operations.

This patch has been verified using the following with no regressions:
- llvm-test-suite
- Fujitsu test suite
- Various Opensource HPC Applications

Original Discourse thread:
https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584

Assisted-by: Codex
2026-03-13 11:03:53 +00:00
Jason Van Beusekom
3878131497
[FLANG][MLIR][OpenMP] add MathToNVVM conversion pass to NVPTX MLIR (#180060)
This Commit adds the MLIR MathToNVVM conversion pass to flang's
NVPTX codegen lowering Math and Arith operations to libdevice library calls.
This allows support for calls to Fortran math intrinsics for OpenMP offload
for NVIDIA Targets. To support this support for -nogpulib was added for
NVIDIA targets

Fix #147023  
Fix #179347
2026-03-10 15:56:01 -05:00
Delaram Talaashrafi
81bcf41861
[flang] Add setMemcpyAlignmentArgAttrs and use it for box-load memcpy (#185126)
Introduce `setMemcpyAlignmentArgAttrs` to set the LLVM alignment operand
attributes on a memcpy. The function is used when lowering `fir.load` of a box to
LLVM memcpy so the generated memcpy has correct align attributes on dst and src,
improving codegen and downstream optimizations.
2026-03-09 10:14:59 -04:00
tedj
8735e69f1e
[flang] OPTIONAL char dummy has no defining op; add null check (#182582)
size.getDefiningOp() returns nullptr for block arguments when a OPTIONAL
character length generated the conditional "fir.if". Check for a nullptr
before calling mlir::isa<> to avoid the crash.

Addresses: https://github.com/llvm/llvm-project/issues/182436
Passes check-flang, check-flang-rt, and llvm-test-suite (x86_64)

---------

Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
2026-02-20 20:22:18 +00:00
jeanPerier
dd1cc049b1
[flang][FIR] allow mem2reg over fir.declare (#181848)
This patch adds the possibility for MLIR mem2reg to work over
fir.declare.
Note that mem2reg is not part of FIR pipeline, and this is just part of
work to be able to leverage it.

The patch:
- Adds a fir.declare_value operation
- Implements the PromotableOpInterface for fir.declare simple scalars
and replace it by fir.declare_value.
- Generates llvm.dbg.debug_value from it (when a FusedLoc with a
DILocalVariableAttr is created for it in AddDebugInfo, like for
fir.declare).
2026-02-19 11:41:37 +01:00
jeanPerier
3f0f8349ac
[flang] fix codegen of fir.select with only default case (#181373)
The case where fir.select only has a "unit" block target (i.e., it is a
switch with only the default case) was not handled correctly in codegen.
2026-02-16 10:30:28 +01:00
Susan Tan (ス-ザン タン)
2698d15664
[flang] Lowering FIR memory ops to MemRef dialect (#173507)
This patch introduces FIRToMemRef, a lowering pass that converts FIR
memory operations to the MemRef dialect, including support for slices,
shifts, and descriptor-style access patterns. To support partial
lowering, where FIR and MemRef types can coexist, we extend the handling
of fir.convert to correctly marshal between FIR reference-like types and
MemRef descriptors. The patch also factors the type conversion logic
into a reusable FIRToMemRefTypeConverter, which centralizes the rules
for converting FIR types (e.g. !fir.ref, !fir.box, sequences, logicals)
to their corresponding memref types, and is used throughout the new
pass.

---------

Co-authored-by: Scott Manley <rscottmanley@gmail.com>
Co-authored-by: jeanPerier <jean.perier.polytechnique@gmail.com>
2026-01-14 10:46:50 -05:00
khaki3
9057744221
[flang] Fix SelectCaseOpConversion to convert block signatures (#175298)
When `fir.select_case` branches to blocks with arguments that have FIR
types (e.g., `!fir.ref`), the block signature must be converted to LLVM
types before creating the branch. Otherwise, the branch passes LLVM
types (`!llvm.ptr`) but the block expects FIR types, causing a type
mismatch error.

This adds block signature conversion similar to what
`SelectOpConversionBase` already does for `fir.select` and
`fir.select_rank`.
2026-01-12 09:45:15 -08:00
Thirumalai Shaktivel
212527c00b
[Flang] Add FIR and LLVM lowering support for prefetch directive (#167272)
Implementation details:
* Add PrefetchOp in FirOps
* Handle PrefetchOp in FIR Lowering and also pass required default
values
* Handle PrefetchOp in CodeGen.cpp
* Add required tests
2026-01-05 13:24:10 +05:30
Abid Qadeer
fc9e6e13fd
[flang] Represent use statement in fir. (#168106)
We have a longstanding issue in debug info that use statement is not
fully respected. The problem has been described in
https://github.com/llvm/llvm-project/issues/160923. This is first part
of the effort to address this issue. This PR adds infrastructure to emit
`use` statement information in FIR, which will be used by subsequent
patches to generate DWARF debug information.

The information about use statement is collected during semantic
analysis and stored in `PreservedUseStmt` objects. During lowering,
`fir.use_stmt` operations are emitted for each `PreservedUseStmt`
object. The `fir.use_stmt` operation captures the module name, `only`
list symbols, and any renames specified in the use statement. The
`fir.use_stmt` is removed during `CodeGen`.
2026-01-02 12:10:18 +00:00
Susan Tan (ス-ザン タン)
01c3e25586
[flang] restrict fir.convert lowering (#172117)
Restrict lowering of fir.convert and exclude core memref types from it.
This is in preparation for a lowering that accommodates MemRef dialect.
2025-12-15 11:52:18 -05:00
Jean-Didier PAILLEUX
3b83e7fa4e
[flang] Implement !DIR$ IVDEP directive (#133728)
This directive tells the compiler to ignore vector dependencies in the
following loop and it must be placed before a `do loop`.

Sometimes the compiler may not have sufficient information to decide
whether a particular loop is vectorizable due to potential dependencies
between iterations and the directive is here to tell to the compiler
that vectorization is safe with `parallelAccesses` metadata.

This directive is also equivalent to `#pragma clang loop assume(safety)`
in C++
2025-11-14 14:06:46 +01:00
Jean-Didier PAILLEUX
c1779f33bd
[flang] Implement !DIR$ [NO]INLINE and FORCEINLINE directives (#134350)
This patch adds the support of these two directives : `!dir$ inline` and
`!dir$ noinline`.
- `!dir$ noinline` tells to the compiler to not perform inlining on
specific function calls by adding the `noinline` metadata on the call.
- `!dir$ inline` tells to the compiler to attempt inlining on specific
function calls by adding the `inlinehint` metadata on the call.
- `!dir$ forceinline` tells to the compiler to always perfom inlining on
specific function calls by adding the `alwaysinline` metadata on the
call.

Currently, these directives can be placed before a `DO LOOP`, call
functions or assignments. Maybe other statements can be added in the
future if needed.

For the `inline` directive the correct name might be `forceinline` but
I'm not sure ?
2025-10-28 08:02:15 +01:00
Jakub Kuderski
23ead47655
[flang][mlir] Migrate to free create functions. NFC. (#164657)
See
https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339.

I plan to mark these as deprecated in
https://github.com/llvm/llvm-project/pull/164649.
2025-10-22 12:47:48 -04:00
jeanPerier
c9fb37c75f
[flang][FIR] add fir.assumed_size_extent to abstract assumed-size extent encoding (#164452)
The purpose of this patch is to allow converting FIR array representation to
memref when possible without hitting memref verifier issue.

The issue was that FIR arrays may be assumed size, in which case the
last dimension will not be known at runtime. Flang uses -1 to encode
this to fulfill Fortran 2023 standard requirements in 18.5.3 point 5
about CFI_desc_t.

When arrays are converted to memeref, if this `-1` reaches memeref
operations, it triggers verifier errors (even if the conversion happened
in code that guards the code to be entered at runtime if the array is
assumed-size because folders/verifiers do not take into account
reachability).

This follows-up on discussions in #163505 merge requests
2025-10-22 11:46:18 +02:00
Valentin Clement (バレンタイン クレメン)
7b10e977f8
[flang][cuda] Do not fail if global is not found (#163445) 2025-10-14 20:51:42 +00:00
Valentin Clement (バレンタイン クレメン)
6a7754f2ac
[flang][cuda] Set address space for constant variables (#163430)
Set the correct address space for constant variables. Address of
operation will introduce an address cast.
2025-10-14 19:16:26 +00:00
Valentin Clement (バレンタイン クレメン)
1c8cd1ed97
[flang][cuda] Add a TODO for code generation of CONSTANT variable (#163268) 2025-10-13 21:52:31 +00:00
Alexey Bataev
0ca23a3054
[Flang]Fix the build with the EXPENSIVE_CHECKS enabled (#162541) 2025-10-08 16:35:23 -04:00
Valentin Clement (バレンタイン クレメン)
e47a42e5b5
[flang][cuda] Do not use managed memory inside gpu module (#160730)
Do not issue call to _FortranACUFAllocDescriptor inside gpu module.
2025-09-25 16:51:49 +00:00
Valentin Clement (バレンタイン クレメン)
37de695cb1
[flang][cuda] Make sure global device descriptor is allocated in managed memory (#160596)
When the descriptor of a global device variable is re-materialized to be
passed to a kernel, make sure it is allocated in managed memory
otherwise the kernel launch will fail.
2025-09-24 20:32:59 +00:00
agozillon
046d6a3998
[Flang][OpenMP] Additional global address space modifications for device (#119585)
A prior PR added a portion of the global address space modifications
required for declare target to, this PR seeks to add
 a small amount more leftover from that PR.
   
The intent is to allow for more correct IR that the backends (in
particular AMDGPU) can treat more aptly for optimisations and code
correctness
    
1/3 required PRs to enable declare target to mapping, should look at PR
3/3 to check for full green passes (this one will fail a number due to
some dependencies).
    
Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
2025-09-17 03:27:03 +02:00
Fabian Mora
48babe1931
[mlir][LLVM] Add LLVMAddrSpaceAttrInterface and NVVMMemorySpaceAttr (#157339)
This patch introduces the `LLVMAddrSpaceAttrInterface` for defining
compatible LLVM address space attributes

To test this interface, this patch also adds:
- Adds NVVMMemorySpaceAttr implementing both LLVMAddrSpaceAttrInterface
and MemorySpaceAttrInterface
- Converts NVVM memory space constants from enum to MLIR enums
- Updates all NVVM memory space references to use new attribute system
- Adds support for NVVM memory spaces in ptr dialect translation

Example:
```mlir
llvm.func @nvvm_ptr_address_space(
    !ptr.ptr<#nvvm.memory_space<global>>,
    !ptr.ptr<#nvvm.memory_space<shared>>,
    !ptr.ptr<#nvvm.memory_space<constant>>,
    !ptr.ptr<#nvvm.memory_space<local>>,
    !ptr.ptr<#nvvm.memory_space<tensor>>,
    !ptr.ptr<#nvvm.memory_space<shared_cluster>>
  ) -> !ptr.ptr<#nvvm.memory_space<generic>>
```
Translating the above code to LLVM produces:
```llvm
declare ptr @nvvm_ptr_address_space(ptr addrspace(1), ptr addrspace(3), ptr addrspace(4), ptr addrspace(5), ptr addrspace(6), ptr addrspace(7))
```


To convert the memory space enum to the new enum class use:
```bash
grep -r . -e "NVVMMemorySpace::kGenericMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kGenericMemorySpace/NVVMMemorySpace::Generic/g"
grep -r . -e "NVVMMemorySpace::kGlobalMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kGlobalMemorySpace/NVVMMemorySpace::Global/g"
grep -r . -e "NVVMMemorySpace::kSharedMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kSharedMemorySpace/NVVMMemorySpace::Shared/g"
grep -r . -e "NVVMMemorySpace::kConstantMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kConstantMemorySpace/NVVMMemorySpace::Constant/g"
grep -r . -e "NVVMMemorySpace::kLocalMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kLocalMemorySpace/NVVMMemorySpace::Local/g"
grep -r . -e "NVVMMemorySpace::kTensorMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kTensorMemorySpace/NVVMMemorySpace::Tensor/g"
grep -r . -e "NVVMMemorySpace::kSharedClusterMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kSharedClusterMemorySpace/NVVMMemorySpace::SharedCluster/g"
```

NOTE: A future patch will add support for ROCDL, it wasn't added here to
keep the patch small.
2025-09-14 09:05:28 -04:00
jeanPerier
355dbbc37c
[flang][FIR] enable fir.box_addr codegen inside fir.global (#157120)
FIR lowering of the fir.box type inside fir.global is special (it is an
actual descriptor struct value instead of being a descriptor in memory)
and causes builtin.unrealized_conversion_cast to be inserted under the
hood by MLIR dialect conversion framework after each operation producing
a fir.box is translated. These builtin.unrealized_conversion_cast must
be removed before the code generation of operation of using the fir.box
in order to get the right "by value" code generation required in global
initial value definitions.
2025-09-08 10:15:22 +02:00
Chaitanya
4a3bf27c69
[OpenMP] Introduce omp.target_allocmem and omp.target_freemem omp dialect ops. (#145464)
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.


Example:
  %1 = omp.target_allocmem %device : i32, i64
  omp.target_freemem %device, %1 : i32, i64

The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
2025-08-18 18:15:11 +05:30
Slava Zakharin
b8e4232bd2
[flang] Cast fir.select[_rank] selector to i64. (#153239)
Properly cast the selector to `i64` regardless of its integer type.
We used to generate llvm.trunc always.

We have to use `i64` as long as the case values may exceed INT_MAX.

Fixes #153050.
2025-08-12 16:43:44 -07:00
Valentin Clement (バレンタイン クレメン)
3847620ba9
[flang][NFC] Move the rest of ops creation to new APIs (#152079) 2025-08-05 07:27:43 -07:00
Maksim Levental
dcfc853c51
[mlir][NFC] update flang/lib create APIs (12/n) (#149914)
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-24 19:05:40 -04:00
Diego Caballero
c99c213e72
[mlir][Flang][NFC] Replace use of vector.insertelement/extractelement (#143272)
This PR is part of the last step to remove `vector.extractelement` and
`vector.insertelement` ops (RFC:
https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops).
It replaces `vector.insertelement` and `vector.extractelement` with
`vector.insert` and `vector.extract` in Flang. It looks like no lit
tests are impacted?
2025-07-18 14:43:03 -07:00
Kelvin Li
df56b1a2cf
[flang] handle allocation of zero-sized objects (#149165)
This PR handles the allocation of zero-sized objects for different
implementations. One byte is allocated for the zero-sized objects.
2025-07-17 23:52:48 -04:00
Akash Banerjee
fc114e4d93
[MLIR] Add ComplexTOROCDLLibraryCalls pass (#144926) 2025-07-16 13:59:41 +01:00
Tom Eccles
9a805ba169
[flang][NFC] Fix deprecation warning (#147932)
I started getting deprecation warnings from operations constructors
which seem to be doing implicit construction of mlir::ValueRange from a
std::nullopt by relying on implicit conversion from std::nullopt into
llvm::ArrayRef. ArrayRef{std::nullopt} is what has been deprecated.
2025-07-11 10:37:34 +01:00
Kareem Ergawy
eba35cc1c0
[flang][do concurrent] Re-model reduce to match reductions are modelled in OpenMP and OpenACC (#145837)
This PR proposes re-modelling `reduce` specifiers to match OpenMP and
OpenACC. In particular, this PR includes the following:

* A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's
`omp.declare_reduction` op.
* Updating the `reduce` clause on `fir.do_concurrent.loop` to use the
new op.
* Re-uses the `ReductionProcessor` component to emit reductions for `do
concurrent` just like we do for OpenMP. To do this, the
`ReductionProcessor` had to be refactored to be more generalized.
* Upates mapping `do concurrent` to `fir.loop ... unordered` nests using
the new reduction model.

Unfortunately, this is a big PR that would be difficult to divide up in
smaller parts because the bottom of the changes are the `fir` table-gen
changes to `do concurrent`. However, doing these MLIR changes cascades
to the other parts that have to be modified to not break things.

This PR goes in the same direction we went for `private/local`
speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects
are modelled in essentially the same way which makes mapping between
them more trivial, hopefully.

PR stack:
- https://github.com/llvm/llvm-project/pull/145837 (this one)
- https://github.com/llvm/llvm-project/pull/146025
- https://github.com/llvm/llvm-project/pull/146028
- https://github.com/llvm/llvm-project/pull/146033
2025-07-11 06:39:30 +02:00
Daniel Chen
13ead00049
[Flang] Fix PowerPC build failure due to the deprecation of ArrayRef(std::nullopt_t) {}. (#147816)
Our local Flang build on PowerPC was broken as
```
llvm/flang/../mlir/include/mlir/IR/ValueRange.h:401:20: error: 'ArrayRef' is deprecated: Use {} or ArrayRef<T>() instead [-Werror,-Wdeprecated-declarations]
  401 |       : ValueRange(ArrayRef<Value>(std::forward<Arg>(arg))) {}
      |                    ^
llvm/flang/lib/Optimizer/CodeGen/CodeGen.cpp:2243:53: note: in instantiation of function template specialization 'mlir::ValueRange::ValueRange<const std::nullopt_t &, void>' requested here
 2243 |                              /*cstInteriorIndices=*/std::nullopt, fieldIndices,
      |                                                     ^
 llvm/include/llvm/ADT/ArrayRef.h:70:18: note: 'ArrayRef' has been explicitly marked deprecated here
   70 |     /*implicit*/ LLVM_DEPRECATED("Use {} or ArrayRef<T>() instead", "{}")
      |                  ^
llvm/include/llvm/Support/Compiler.h:244:50: note: expanded from macro 'LLVM_DEPRECATED'
  244 | #define LLVM_DEPRECATED(MSG, FIX) __attribute__((deprecated(MSG, FIX)))
      |                                                  ^
1 error generated.
```

This patch is to fix it.
2025-07-10 09:53:03 -04:00
Shunsuke Watanabe
c9900015a9
[flang] Add -fcomplex-arithmetic= option and select complex division algorithm (#146641)
This patch adds an option to select the method for computing complex
number division. It uses `LoweringOptions` to determine whether to lower
complex division to a runtime function call or to MLIR's `complex.div`,
and `CodeGenOptions` to select the computation algorithm for
`complex.div`. The available option values and their corresponding
algorithms are as follows:
- `full`: Lower to a runtime function call. (Default behavior)
- `improved`: Lower to `complex.div` and expand to Smith's algorithm.
- `basic`: Lower to `complex.div` and expand to the algebraic algorithm.

See also the discussion in the following discourse post:
https://discourse.llvm.org/t/optimization-of-complex-number-division/83468

---------

Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
2025-07-09 13:43:54 +09:00
Kareem Ergawy
b1774222c7
[flang] Emit fir.global in the global address space (#146653)
Instead of emitting globals in the program/default address space, emit
them in the global address space. This also requires changes how address
of code-gen is handled, we need to cast to the default address space to
prevent code-gen issues.
2025-07-02 17:15:22 +02:00
jeanPerier
faefe7cf7d
[flang] add option to generate runtime type info as external (#146071)
Reland #145901 with a fix for shared library builds.

So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.

This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.

This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.

Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.

I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
2025-06-30 09:58:00 +02:00
jeanPerier
37e2d10499
Revert "[flang] add option to generate runtime type info as external" (#146064)
Reverts llvm/llvm-project#145901

Broke shared library builds because of the usage of
`skipExternalRttiDefinition` in Lowering.
2025-06-27 14:05:59 +02:00
jeanPerier
e816817bbb
[flang] add option to generate runtime type info as external (#145901)
So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.

This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.

This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.

Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.

I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
2025-06-27 13:00:29 +02:00
Kareem Ergawy
282e471018
[flang] Erase fir.local ops before lowering fir to llvm (#143687)
`fir.local` ops are not supposed to have any uses at this point (i.e.
during lowering to LLVM). In case of serialization, the
`fir.do_concurrent` users are expected to have been lowered to
`fir.do_loop` nests. In case of parallelization, the `fir.do_concurrent`
users are expected to have been lowered to the target parallel model
(e.g. OpenMP).

This hopefully resolved a build issue introduced by
https://github.com/llvm/llvm-project/pull/142567 (see for example:
https://lab.llvm.org/buildbot/#/builders/199/builds/4009).
2025-06-12 05:58:55 +02:00
Pranav Bhandarkar
8395912895
[Flang] - Handle BoxCharType in fir.box_offset op (#141713)
To map `fir.boxchar` types reliably onto an offload target, such as a
GPU, the `omp.map.info` operation is used to map the underlying data
pointer (`fir.ref<fir.char<k, ?>>`) wrapped by the `fir.boxchar` MLIR
value. The `omp.map.info` operation needs a pointer to the underlying
data pointer.
Given a reference to a descriptor (`fir.box`), the `fir.box_offset` is
used to obtain the address of the underlying data pointer. This PR
extends `fir.box_offset` to provide the same functionality for
`fir.boxchar` as well.
2025-06-06 10:48:07 -05:00
Valentin Clement (バレンタイン クレメン)
6811a3bedf
[flang][cuda] Allocate extra descriptor in managed memory when it is coming from device (#140818) 2025-05-20 18:55:13 -07:00
jeanPerier
ed07412888
[flang] translate derived type array init to attribute if possible (#140268)
This patch relies on #140235 and #139724 to speed-up compilations of
files with derived type array global with initial value.
Currently, such derived type global init was lowered to an
llvm.mlir.insertvalue chain in the LLVM IR dialect because there was no
way to represent such value via attributes.

This chain was later folded in LLVM dialect to LLVM IR using LLVM IR
(not dialect) folding. This insert chain generation and folding is very
expensive for big arrays. For instance, this patch brings down the
compilation of FM_lib fmsave.f95 from 50s
to 0.5s.
2025-05-20 16:11:27 +02:00
jeanPerier
416b7dfaa0
[flang] use DataLayout instead of GEP to compute element size (#140235)
Now that the datalayout is part of codegen, use that to generate type
size constants in codegen instead of generating GEP.
2025-05-19 13:59:09 +02:00
Asher Mancinelli
8836bce842
[flang] Add lowering of volatile references (#132486)
[RFC on
discourse](https://discourse.llvm.org/t/rfc-volatile-representation-in-flang/85404/1)

Flang currently lacks support for volatile variables. For some cases,
the compiler produces TODO error messages and others are ignored. Some
of our tests are like the example from _C.4 Clause 8 notes: The VOLATILE
attribute (8.5.20)_ and require volatile variables.

Prior commits:
```
c9ec1bc753b0 [flang] Handle volatility in lowering and codegen (#135311)
e42f8609858f [flang][nfc] Support volatility in Fir ops (#134858)
b2711e1526f9 [flang][nfc] Support volatile on ref, box, and class types (#134386)
```
2025-04-30 08:46:33 -07:00
Kaviya Rajendiran
857ac4c229
[MLIR][OpenMP] Lowering nontemporal clause to LLVM IR for SIMD directive (#118751)
This patch,
- Added a new attribute `nontemporal` to fir.load and fir.store operation in the FIR dialect.
- Added a pass `lower-nontemporal` which is called before FIRToLLVM conversion pass and adds the nontemporal attribute to loads and stores on the list items specified in the nontemporal clause of the SIMD directive.
- Set the `UnitAttr:$nontemporal` to llvm.load and llvm.store operations during FIR to LLVM dialect conversion, if the corresponding fir.load or fir.store operations have the nontemporal attribute.
- Attached the `nontemporal metadata` to load and store instructions that have the nontemporal attribute, during LLVM dialect to LLVM IR translation.
2025-04-30 11:13:20 +05:30
Asher Mancinelli
c9ec1bc753
[flang] Handle volatility in lowering and codegen (#135311)
* Enable lowering and conversion patterns to pass volatility information
from higher level operations to lower level ones.
* Enable codegen to pass volatility to LLVM dialect ops by setting an
attribute on loads, stores, and memory intrinsics.
* Add utilities for passing along the volatility from an input type to
an output type.

To introduce volatile types into the IR, entities with the volatile
attribute will be given a volatile type in the bridge; this is not
enabled in this patch. User code should not result in IR with volatile
types yet, so this patch contains no tests with Fortran source, only IR
that already contains volatile types.

Part 3 of #132486.
2025-04-14 11:02:23 -07:00