342 Commits

Author SHA1 Message Date
khaki3
c6d770fece
[flang] Fix FIRToMemRef index computation for array_coor with slice and shape_shift (#189496)
Use shift instead of sliceLb only when the array_coor has an explicit
slice (indicesAreFortran case). When the slice comes from an embox,
the indices are 1-based section indices and must subtract 1.
2026-03-31 12:09:43 -07:00
Mehdi Amini
509f181f40
[MLIR][TableGen] Fix ArrayRefParameter in struct format roundtrip (#189065)
When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a
non-last position within a struct() assembly format directive, the
printed
output is ambiguous: the comma-separated array elements are
indistinguishable from the struct-level commas separating key-value
pairs.

Fix this by wrapping such parameters in square brackets in both the
generated printer and parser. The printer emits '[' before and ']' after
the array value; the parser calls parseLSquare()/parseRSquare() around
the
FieldParser call. Parameters with a custom printer or parser are
unaffected
(the user controls the format in that case).

Fixes #156623

Assisted-by: Claude Code
2026-03-27 18:41:46 +00:00
Carlos Seo
db5cd626b9
[flang][OpenMP] Restrict isSafeToParallelize to write-only thread-local effects (#188595)
This is a follow-up fix for commit 0f5e9bee.

Only write effects to thread-local memory should be considered safe to
parallelize in workshare lowering, not reads. When both reads and writes
were safe, the cascading effect in moveToSingle could cause entire
SingleRegions to become fully parallelized, eliminating the omp.single
and its implicit barrier. This removed synchronization points needed to
keep threads coordinated inside sequential loops containing workshared
operations, causing race conditions in forall-workshare patterns.

This was exposed by the Fujitsu Test Suite and made the following tests
regress:

FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0031.test
FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0013.test
FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0030.test
FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0014.test

Updates #143330
2026-03-27 12:11:27 -03:00
Hocky Yudhiono
ed37bdcc3e
[mlir][func] Fix crashes in FuncToLLVM discardable attributes propagation logic (#188232)
Refactor how `func.func` discardable attributes are handled in the
Func-to-LLVM conversion. Instead of ad hoc checks for linkage and
readnone followed by a simple filter, the pass now generically processes
inherent attributes from LLVMFuncOp.

Attributes that correspond to inherent `llvm.func` ODS names can be
attached as `llvm.<name>` on `func.func` and are stripped to `<name>`
when building `LLVM::LLVMFuncOp`, so LLVM-specific knobs stay namespaced
on the source op but land on the right inherent slots on `llvm.func`.

Other discardable attributes continue to be propagated as-is.

Fixes #175959
Fixes #181464

Assisted-by: CLion code completion, GPT 5.3-Codex

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2026-03-26 11:12:10 +00:00
Susan Tan (ス-ザン タン)
55111e8d17
[flang] use fir.bitcast for FIRToMemRef scalar reinterpretation (#188328)
Use fir.bitcast in FIR-to-MemRef casts so bit patterns are preserved
(e.g. TRANSFER), while keeping fir.convert for memref/reference
marshaling and non-bitcast-compatible cases.
2026-03-25 15:27:43 -04:00
Abid Qadeer
95d54423d9
[flang][debug] Always include (kind=X) suffix in debug type names (#186255)
Previously, 32-bit types (integer, real, logical, complex) were printed
without the (kind=4) suffix in DWARF debug type names, while other sizes
always included the kind suffix. This inconsistency is now removed by
always appending (kind=X) to all basic type names, making the format
uniform across all type sizes.

Fixes https://github.com/llvm/llvm-project/issues/119478.
2026-03-24 13:40:35 +00:00
khaki3
4219fb8a21
[flang] Fix FIRToMemRef index computation for array_coor with shape_shift and slice (#186523)
When fir.array_coor carries an explicit shape_shift (non-default lower
bounds) and an explicit slice, the indices are Fortran indices rather
than 1-based section indices. The FIRToMemRef pass was unconditionally
subtracting 1 from sliced indices, which is only correct for 1-based
section indices (the embox-with-embedded-slice case).

For shape_shift + explicit slice, the correct adjustment is to subtract
the slice lower bound instead of 1. This produces proper 0-based memref
indices.

This pattern arises after the FIR inliner canonicalizes
fir.embox(shape_shift, slice) + fir.array_coor(box) into a single
fir.array_coor with explicit shape_shift and slice operands, where the
indices become Fortran indices.

Without this fix, arrays with non-default lower bounds (e.g., A(0:N) or
A(-1:N)) produce negative memref indices, writing before the array
allocation and causing a segfault.
2026-03-23 10:35:27 -07:00
laoshd
6e5e1c97e0
[flang][flang-rt] Implement F202X leading-zero control edit descriptors LZ, LZS, and LZP for formatted output (F, E, D, and G editing) (#183500)
LZ: processor-dependent (default, flang prints leading zero); LZS:
suppress the optional leading zero before the decimal point; LZP: print
the optional leading zero before the decimal point. Changes span the
source parser, compile-time format validator, runtime format processing,
and runtime output formatting. Includes semantic test (io18.f90) and
documentation updates.
2026-03-23 11:50:48 -04:00
Scott Manley
965ee6c91f
[FIRToMemRef] copy ACC Variable Name attribute (#187724)
When converting from fir.alloca to memref.alloca, also copy the acc
variable name attribute if it exists
2026-03-20 12:29:41 -05:00
Kareem Ergawy
acd52a2419
[flang][OpenMP][DoConcurrent] Emit declare mapper for records (#179936)
Extends `do concurrent` device support by emitting compiler-generated
declare mapper ops for live-ins whose types are record types and have
allocatable members.
2026-03-11 13:43:55 +01:00
Valentin Clement (バレンタイン クレメン)
f35042a639
[flang][openacc] Attach IndirectGlobalAccessModel to fir.use_stmt (#185767)
In some cases, `fir.use_stmt` operation can end up in offload region
like in acc routine for example. Make sure we can validate the symbols
associated with the `fir.use_stmt` operation.
2026-03-10 22:31:15 +00:00
Kareem Ergawy
0bf9bb5c42
[Flang][OpenMP] Fix close map flag propagation for derived types in USM (#185330)
This fixes a bug in USM mode where the `close` map type modifer was
attached to some `map.info.op`'s corresponding to user-defined type
members while the parent type instance itself is not marked as `close`.

This fix ensures that if a parent record type map does not have the
'close' flag, it is cleared from its members as well, maintaining
consistency.

Gemini was used to create tests. AI generated test code was reviewed
line-by-line by me. Which were derived from a reproducer I was working
with to debug the issue.

Assisted-by: Gemini <gemini@google.com>
2026-03-09 15:55:53 +01:00
Susan Tan (ス-ザン タン)
97cf8bf220
[flang] materialize fir.box when it is from a block argument (#184898)
We have to materialize `fir.box` before adding a `fir.convert` to a
memref type. Otherwise we get:
`'fir.convert' op invalid type conversion'!fir.box<!fir.array<?xi32>>' /
'memref<?xi32, strided<[?], offset: ?>>'`
2026-03-06 16:02:09 -05:00
Carlos Seo
0f5e9bee83
[flang][OpenMP] Fix crash when a sliced array is specified in a forall within a workshare construct (#170913)
This is a fix for two problems that caused a crash:

1. Thread-local variables sometimes are required to be parallelized.
Added a special case to handle this in
`LowerWorkshare.cpp:isSafeToParallelize`.
2. Race condition caused by a `nowait` added to the `omp.workshare` if
it is the last operation in a block. This allowed multiple threads to
execute the `omp.workshare` region concurrently. Since
_FortranAPushValue modifies a shared stack, this concurrent access
causes a crash. Disable the addition of `nowait` and rely on the
implicit barrier at the the of the `omp.workshare` region.

Fixes #143330
2026-03-05 09:59:20 -03:00
Razvan Lupusoru
e63e55cae8
[mlir][acc] Add ACCRecipeMaterialization pass and reduction ops (#184252)
Pass
----
Add the `acc-recipe-materialization` pass, which materializes OpenACC
privatization, firstprivate and reduction recipes by inlining their
init, copy, combiner, and destroy regions into the operation for the
construct. The pass runs on acc.parallel, acc.serial, acc.kernels, and
acc.loop.

- Firstprivate: Inserts acc.firstprivate_map so the initial value is
available on the device, then clones the recipe init and copy regions
into the construct and replaces uses with the materialized alloca.
Optional destroy region is cloned before the region terminator.

- Private: Clones the recipe init region into the construct (at region
entry or at the loop op for acc.loop private). Replaces uses of the
recipe result with the materialized alloca. Optional destroy region is
cloned before the region terminator.

- Reduction: Creates acc.reduction_init (init region inlined) and
acc.reduction_combine_region (combiner region inlined). All uses of the
reduction in the region are updated to the reduction init result.

New operations
--------------
- acc.reduction_init: Allocates and initializes a private reduction
variable from a recipe. Takes the original reduction variable and
reduction_operator; has a single region that must yield one value (the
private storage) via acc.yield. Used by the pass to materialize
acc.reduction_recipe init regions inside the compute construct.

- acc.reduction_combine_region: Combines the private reduction value
with the shared reduction variable. Takes the shared and private
memrefs; has a single region (the recipe combiner) terminated by
acc.yield with no operands. Used by the pass to materialize the
reduction recipe combiner.

Both ops implement RegionBranchOpInterface. acc.yield is updated to
allow terminating ReductionInitOp and ReductionCombineRegionOp regions.

Supporting changes
------------------
- OpenACCUtilsLoop: Factor cloneACCRegionInto out of the existing
loop-conversion helper so the pass can clone recipe regions with
optional result replacement; loop conversion now calls the shared
helper.
- Flang: Add ReductionInitOpFortranObjectViewModel
(FortranObjectViewOpInterface) for acc.reduction_init and register it in
OpenACC extensions.

Tests
-----
- MLIR: acc-recipe-materialization-{firstprivate,private,reduction,
kernel-private,parallel}.mlir (memref dialect).
- Flang: acc-recipe-materialization-{firstprivate,firstprivate-derived,
private,reduction,kernel-private,parallel}.fir; firstprivate test has a
second RUN with -acc-optimize-firstprivate-map.

---------

Co-authored-by: Scott Manley <rscottmanley@gmail.com>
2026-03-02 17:35:22 -08:00
Yangyu Chen
7f0a343a8e
[flang] Implement -grecord-command-line for Flang (#181686)
Enable Flang to match Clang behavior for command-line recording in DWARF
producer strings when using -grecord-command-line.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
2026-02-28 01:45:52 +08:00
Tim
603e5c832a
[flang][debug] Supply missing subprogram attributes (#181425)
Add DW_AT_elemental, DW_AT_pure, and DW_AT_recursive attributes to
subprograms and functions when they are specified in the source.
2026-02-20 21:23:01 +00:00
Susan Tan (ス-ザン タン)
2b074823e4
Reapply "[flang] Lowering a ArrayCoorOp to arithmetic computations" (#182585)
Reapplying the changes. Reverted it wrongly yesterday

This reverts commit 3c6523dcb8ebc0396f69c578285599b66e16dce7.
2026-02-20 15:33:26 -05:00
Susan Tan (ス-ザン タン)
3c6523dcb8
Revert "[flang] Lowering a ArrayCoorOp to arithmetic computations whe… (#182365)
This reverts commit 2bd23d3fa688d0e25c8492ceeaa251af4759d559.
2026-02-19 20:43:05 +00:00
Susan Tan (ス-ザン タン)
2bd23d3fa6
[flang] Lowering a ArrayCoorOp to arithmetic computations when a fir memref is a block argument (#182139)
Remove the special-case that handled `fir.array_coor` with a
block-argument base by converting the element ref result (!fir.ref<i32>
-> memref<i32>) and leaving fir.array_coor alive.

Instead, we now always convert the base (!fir.ref<!fir.array<...>> ->
memref<...>) and compute the memref indices from the fir.array_coor
operands, so loads/stores become memref.load/store base[indices] and
fir.array_coor can be erased when it’s only used by memory ops.
2026-02-19 11:46:17 -05:00
Abid Qadeer
deedc7bfe3
[Flang][OpenMP] Don't generate code for unreachable target regions. (#178937)
When a target region is placed inside a constant false condition (e.g.,
`if (.false.)`), the dead code gets eliminated on the host side,
removing the `omp.target` operation entirely. However, the device-side
compilation pipeline is unaware of this elimination and attempts to
generate kernel code. Since the host never created offload metadata for
the eliminated target, the device-side kernel function lacks the
"kernel" attribute, causing `OpenMPOpt` to fail with an assertion when
it expects all outlined kernels to have this attribute. The problem can
be seen with the following code:

```fortran
program cele
  implicit none
  real :: V
  integer :: i
  if (.false.) then
    !$omp target teams distribute parallel do
    do i = 1, 5
      V = V * 2
    end do
    !$omp end target teams distribute parallel do
  end if
end program
```

It currently fails with the following assertion:

```
Assertion `omp::isOpenMPKernel(*Kernel) && "Expected kernel function!"' failed.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp:4291
```

This PR adds `DeleteUnreachableTargetsPass` that identifies `omp.target`
operations in unreachable code blocks and removes them.
2026-02-16 09:31:42 +00:00
Philipp Rados
9914ee6ef4
[flang] Fix -debug crash from VScaleAttrPass (#180234)
This pass splits up the `vscaleRange` pass-option from the
`VScaleAttrPass` into `vscaleMin` and `vscaleMax` respectively, since a
`std::pair<>` cannot be used as a cli-option and crashes when running
`flang -march=rv64gcv -O3 file.f90 -mmlir -debug`.

Since the options can now be set individually I added some error
checking following the semantics described in the langref
https://llvm.org/docs/LangRef.html#function-attributes.

I also added tests since there were none for only this pass before.
2026-02-10 11:46:06 +01:00
Slava Zakharin
1f26c39cfc
[flang] Allow fir.field_index and fir.coordinate_of speculation. (#179785)
This change makes `fir.field_index` a Pure operation, and
add support of `ConditionallySpeculatable` interface for
`fir.coordinate_of`. The test demonstrates how this affects
Flang LICM.
2026-02-05 16:22:30 -08:00
Slava Zakharin
2f97c47cc2
[flang,openacc] Fixed canMoveOutOf() for acc.loop. (#178971)
We should check all data operands, and do not exit after the first one.
2026-01-30 16:00:37 -08:00
Razvan Lupusoru
f951f6305e
[flang][acc] Add ACCOptimizeFirstprivateMap pass (#178546)
This pass optimizes acc.firstprivate_map operations generated during
OpenACC recipe materialization when acc.firstprivate is materialized
into the mapping and a private allocation inside region. The
optimization applies to scalar variables of trivial types (integers,
reals, logicals) as long as they are not optional.

The pass hoists loads from the firstprivate variable to before the
compute region, converting the firstprivate copy to a pass-by-value
pattern. This eliminates the need for runtime copying the firstprivate
variable since only its value is needed for initializing private copies.
2026-01-29 19:02:22 +00:00
Sergio Afonso
1e4b4fa1b2
[Flang][OpenMP] Minimize host ops remaining in device compilation (#137200)
This patch updates the function filtering OpenMP pass intended to remove
host functions from the MLIR module created by Flang lowering when
targeting an OpenMP target device.

Host functions holding target regions must be kept, so that the target
regions within them can be translated for the device. The issue is that
non-target operations inside these functions cannot be discarded because
some of them hold information that is also relevant during target device
codegen. Specifically, mapping information resides outside of
`omp.target` regions.

This patch updates the previous behavior where all host operations were
preserved to then ignore all of those that are not actually needed by
target device codegen. This, in practice, means only keeping target
regions and mapping information needed by the device. Arguments for some
of these remaining operations are replaced by placeholder allocations
and `fir.undefined`, since they are only actually defined inside of the
target regions themselves.

As a result, this set of changes makes it possible to later simplify
target device codegen, as it is no longer necessary to handle host
operations differently to avoid issues.
2026-01-29 12:44:00 +00:00
jeanPerier
9a39c2ff75
Revert "[flang] Use outermost fir.dummy_scope for TBAA of local allocations. (#146006) (#177617)
This reverts commit 90da61634a4accc9869b4e1cb1ac3736158c33e6.

See https://github.com/llvm/llvm-project/pull/177615 for more context
about why this patch is and can now be reverted.
2026-01-27 15:25:21 +01:00
jeanPerier
45102be5e5
[flang] emit declare for function result before call (#177615)
This change moves the declare of result storage alloca before the call
so that alias analysis can revert to linking fir.declare to the fisrt
dominating dummy_scope instead of the dominating one.

This is only relevant when MLIR inlining is enabled and is the first
step to fix issues recent TBAA changes that placed target data in its
own tree exposed an issue with the result storage of a TARGET result.
After inlining, the usages of the result storage inside the callee and
after the call ended-up being placed in different nodes (target and non
target) of the same TBAA tree (for the dominating function).

The fact that both nodes are placed in the same tree stems from
https://github.com/llvm/llvm-project/pull/146006 that fixed another TBAA
issue related to MLIR inlining and function result where the function
result was placed into the wrong TBAA tree, which with nested inlining
could end-up being the tree of a callee where the result storage was a
dummy, causing the TBAA to wrongfully tell that any access to the result
storage inside the nested callee did not alias with any access after the
call.

By moving the declare before the call that will be inlined, this patch
will allow reverting #146006 and fixing both issues: the TBAA emit for
usages of the result storage after the call will always be placed in a
different TBAA tree than any usages of the result storage inside the
callee.
2026-01-27 15:25:07 +01:00
Kareem Ergawy
e74e970036
[flang][OpenMP][DoConcurrent] Add collapse clause to generated omp.loop_nest op (#178138)
Adds the collpase clause to the generated loop nest both on host and
device.
2026-01-27 11:58:57 +01:00
Slava Zakharin
7e66d1511d
[flang][CUF] Limit LICM for cuf.kernel. (#178073)
This patch prevents hoisting of operations with reference operands.
Such a hoisting may break the assumptions that later CUF passes
rely on.
2026-01-26 15:24:39 -08:00
Slava Zakharin
dc5f905a87
[flang,openacc] Limit operations hoisting from acc.loop. (#177727)
This patch implements `OperationMoveOpInterface::canMoveOutOf()`
method for `acc.loop`, such that even Pure operations are not hoisted
by LICM if any of their operands are referenced in the data operands
of `acc.loop`. Related to #175108.
2026-01-26 11:48:37 -08:00
Slava Zakharin
f5e2f29cf3
[flang] Added ConditionallySpeculatable and Pure for some FIR ops. (#174013)
This patch implements `ConditionallySpeculatable` interface for some
FIR operations (`embox`, `rebox`, `box_addr`, `box_dims` and `convert`).
It also adds `Pure` trait for `fir.shape`, `fir.shapeshift`,
`fir.shift` and `fir.slice`.

I could have split this into multiple patches, but the changes
are better tested together on real apps, and the amount of affected
code is small.

There are more `NoMemoryEffect` operations for which I am planning
to do the same in future PRs.
2026-01-23 17:42:52 -08:00
Slava Zakharin
5d91c11df5
[flang] Support cuf.device_address in FIR AliasAnalysis. (#177518)
Support `cuf.device_address` same way as `fir.address_of`.
This implementation implies that the host address and the device
address `MustAlias` (as shown in the new test). This should be
conservatively correct as long as `MustAlias` does not allow
to assume that the actual addresses are the same (that is what
LLVM documentation implies, I believe).

It is probably worth adding an operation interface to handle
`fir::AddrOfOp` and `cuf::DeviceAddressOp` in FIR AliasAnalysis,
but for the initial implementation I hardcoded the checks.

I also removed the call to `fir::valueHasFirAttribute` that performs
on demand SymbolTable lookups, which may be costly, and added
SymbolTable caching in FIR AliasAnalysis object. Anyway,
`fir::valueHasFirAttribute` does not work for `cuf::DeviceAddressOp`.
2026-01-23 17:42:35 -08:00
agozillon
a16668a8d7
[Flang][OpenMP][MLIR] Align declare mapper pass handling with other map and global operations (#176852)
This PR makes a couple of minor tweaks to the lowering for
declare_mapper operations:

1) Add declare_mapper operations to the list of global operations to
have optimisation passes
executed on them. Primarily just to make sure we keep it inline with
other global operations
that contain regions. Prevents oddities where we embed FIR/HLFIR into
the mapper that needs
lowered before being converted to LLVM-IR. One example that springs to
mind is if we ever
decide to remove the single block condition on the operation to allow
conditional checks
   for mapped data.
2) Add a CodeGenOpenMP.cpp conversion for DeclareMapperOp to make sure
we convert the return
type correctly from a BoxType to a struct type rather than an opaque
pointer when lowering.
Currently, I've left out the block argument types from being converted
as they're wrapped
   in a fir.ref and would be opauqe pointers in either case.

So some minor additions to keep declare_mapper a little more inline with
the rest of the OpenMP operations.
2026-01-23 22:36:39 +01:00
Kareem Ergawy
ab4f66d6f3
[OpenMP][flang] Move todo for checking reduction support status on the GPU (#175172)
Moves a `todo` to check for the current level of support for by-ref
reductions to the `FunctionFiltering` pass. This guarantees that the
check does not trigger when the same module is compiled twice: on the
CPU and on the GPU.
2026-01-21 13:22:45 +01:00
Razvan Lupusoru
8dfec25974
[mlir][acc] Add OffloadTargetVerifier pass (#176467)
Add a verification pass that checks live-in values and symbol references
within offload regions are legal for the target execution model.

When code is offloaded to a device (e.g., GPU), not all values and
symbols from the host context are directly accessible. Data must be
explicitly mapped via OpenACC data clauses (copyin, create, present
etc.), declared with device attributes, or be trivial scalars that can
be passed by value. Similarly, symbol references to globals must have
proper `declare` attributes or device-resident data attributes.

This pass walks operations implementing `OffloadRegionOpInterface`,
which includes OpenACC compute constructs (`acc.parallel`,
`acc.kernels`, `acc.serial`) as well as GPU operations like
`gpu.launch`. For each region, it uses liveness analysis to identify
values flowing into the region and checks their validity using the
`OpenACCSupport` analysis.

Key features:
- Validates live-in values against OpenACC data mapping requirements
- Validates symbol references for device accessibility
- Supports soft-check mode for diagnostic-only verification
- Configurable device_type for target-specific behavior
2026-01-20 17:17:08 +00:00
Abid Qadeer
dc9c08e6e0
[flang][debug] Generate DWARF debug info using fir.use_stmt. (#168541)
This patch uses the fir.use_stmt operations to generate correct debug
metadata for use statement when `only` and `=>` are used. The debug flow
is changed a bit where we process the module globals first so that we
have the global variables when we start to process `fir.use_stmt`.
    
Fixes #160923.
2026-01-19 17:16:11 +00:00
Slava Zakharin
09ae1bf8b7
[flang] Added OperationMoveOpInterface for controlling LICM. (#175108)
In #173438 I added a FIR specific loop invariant code motion pass.

During the review, Tom pointed out certain limitations about OpenMP
dialect operations that should be taken into consideration during
transformations such as LICM:
https://github.com/llvm/llvm-project/pull/173438#discussion_r2657612148

I also found issues with hoisting operations out of `acc.loop`
operations in certain conditions (see the added test in `licm.fir`).

I am proposing a new operation interface that will allow to control
movement of operations during MLIR transformations. In particular, I
propose two methods (there might be more):
* op.canMoveOutOf(cand) - returns true, if it is allowed to move 'cand'
operation out of 'op'.
* op.canMoveFromDescendant(descendant, cand) - return true, if it is
allowed to move 'cand' out of 'descendant' and into 'op'.

I used the new interface to get rid of explicit OpenMP interfaces checks
in Flang's LICM, and I also used it for `acc.loop` operation (though, I
provided conservative initial implementation).

The new interface is part of FIR dialect, but I think it would better
fit into the core MLIR set of interfaces so that the checks that I make
in Flang's LICM are actually done in
`mlir::moveLoopInvariantCode`. Moreover, other code movement
transformations that may appear in MLIR may also need to use such an
interface.

I would like to get some feedback on whether it is reasonable to move
the interface to core MLIR.
2026-01-16 08:32:38 -08:00
Razvan Lupusoru
ab7217a089
[acc][flang] Add isDeviceData APIs for device data detection (#176219)
Add comprehensive APIs to detect device-resident data across OpenACC
type and operation interfaces. This enables passes to identify data that
is already on the device (e.g., CUF device/managed/constant memory, GPU
address spaces) and handle it appropriately.

New interface methods:
- PointerLikeType::isDeviceData(Value): Returns true if the pointer
points to device data.
- MappableType::isDeviceData(Value): Returns true if the variable
represents device data.
- GlobalVariableOpInterface::isDeviceData(): Returns true if the global
variable is device data.

New utilities in OpenACCUtils:
- acc::isDeviceValue(Value): Checks if a value represents device data by
querying type interfaces, PartialEntityAccessOpInterface for base
entities, and AddressOfGlobalOpInterface for global symbols.
- acc::isValidValueUse(Value, Region): Checks if a value is legal in an
OpenACC region by verifying it comes from a data operation, is only used
by private clauses, or is device data.

Updated isValidSymbolUse to check
GlobalVariableOpInterface::isDeviceData()
for symbols referencing device-resident globals.

FIR implementations check for CUF data attributes (device, managed,
constant, shared, unified) on operations, block arguments, and globals.
The implementation traces through fir.rebox, fir.embox, fir.declare,
hlfir.declare, and fir.address_of to find the underlying data source.

Memref implementations check for gpu::AddressSpaceAttr on the memref
type.

Updated ACCImplicitData to use acc::isDeviceValue for generating
acc.deviceptr clauses for device-resident data instead of
copyin/copyout.

Updated OpenACCSupport::isValidValueUse to fallback to the new
acc::isValidValueUse utility.
2026-01-15 20:56:26 +00:00
jeanPerier
b0b1ab8a40
[flang][openacc] support array section privatization in lowering (#175184)
Add support array section in private, firstprivate, and reduction.

Key changes:
- Change the related data operation result type to return the same type
as the array base (same type as the acc variable input in the
operation), while it was the type of the section before. This allows
remapping the base the to result value (to use the data operation result
as the base when generating addressing inside the compute region).
- The generatePrivateInit implementation of FIROpenACCTypeInterfaces is
modified to allocate storage only for the section, and to return the
mock base address (that is the address of the allocation minus the
offset/lower bound of the privatized section).
- The code generating the copy and combiner region is moved from
OpenACC.cpp to FIROpenACCTypeInterfaces.cpp via the addition of new
generateCopy and generateCombiner interface in the
MappableTypeInterface. This allows sharing all the addressing helper
with generatePrivateInit, and will allow late generation of all recipes
with Fortran.
- Update generatePrivateDestroy to deallocate the beginning of the
section if any.
 
In the process, the generatePrivateInit implementation is
modified so that it is more uniform to make it easier to deal with the
section. This also allowed removing runtime calls when initializing the
private for array reduction.
2026-01-15 09:32:13 +01:00
Susan Tan (ス-ザン タン)
2698d15664
[flang] Lowering FIR memory ops to MemRef dialect (#173507)
This patch introduces FIRToMemRef, a lowering pass that converts FIR
memory operations to the MemRef dialect, including support for slices,
shifts, and descriptor-style access patterns. To support partial
lowering, where FIR and MemRef types can coexist, we extend the handling
of fir.convert to correctly marshal between FIR reference-like types and
MemRef descriptors. The patch also factors the type conversion logic
into a reusable FIRToMemRefTypeConverter, which centralizes the rules
for converting FIR types (e.g. !fir.ref, !fir.box, sequences, logicals)
to their corresponding memref types, and is used throughout the new
pass.

---------

Co-authored-by: Scott Manley <rscottmanley@gmail.com>
Co-authored-by: jeanPerier <jean.perier.polytechnique@gmail.com>
2026-01-14 10:46:50 -05:00
Eugene Epshteyn
d593bcdc54
[flang] Changes to "unsafe Cray pointers" option (#175223)
Reserve "-funsafe-cray-pointers" (with "f") for the driver. In the
fir-alias-analysis use "-unsafe-cray-pointers" (without "f").

This prevents conflicts with how certain kinds of tools use the "unsafe
Cray pointers" options.
2026-01-10 19:33:30 -05:00
Kareem Ergawy
e82399dac2
[flang][OpenMP] Prevent omp.map.info ops with user-defined mappers from being marked as parial maps (#175133)
The following test was triggering a runtime crash **on the host before
launching the kernel**:
```fortran
program test_omp_target_map_bug_v5
  implicit none
  type nested_type
    real, allocatable :: alloc_field(:)
  end type nested_type

  type nesting_type
    integer :: int_field
    type(nested_type) :: derived_field
  end type nesting_type

  type(nesting_type) :: config

  allocate(config%derived_field%alloc_field(1))

  !$OMP TARGET ENTER DATA MAP(TO:config, config%derived_field%alloc_field)

  !$OMP TARGET
  config%derived_field%alloc_field(1) = 1.0
  !$OMP END TARGET

  deallocate(config%derived_field%alloc_field)
end program test_omp_target_map_bug_v5
```

In particular, the runtime was producing a segmentation fault when the
test is compiled with any optimization level > 0; if you compile with
-O0 the sample ran fine.

After debugging the runtime, it turned out the crash was happening at
the point where the runtime calls the default mapper emitted by the
compiler for `nesting_type; in particular at this point in the runtime:
c62cd2877c/offload/libomptarget/omptarget.cpp (L307).

Bisecting the optimization pipeline using `-mllvm -opt-bisect-limit=N`,
the first pass that triggered the issue on `O1` was the `instcombine`
pass. Debugging this further, the issue narrows down to canonicalizing
`getelementptr` instructions from using struct types (in this case the
`nesting_type` in the sample above) to using addressing bytes (`i8`). In
particular, in `O0`, you would see something like this:
```llvm
define internal void @.omp_mapper._QQFnesting_type_omp_default_mapper(ptr noundef %0, ptr noundef %1, ptr noundef %2, i64 noundef %3, i64 noundef %4, ptr noundef %5) #6 {
entry:
  %6 = udiv exact i64 %3, 56
  %7 = getelementptr %_QFTnesting_type, ptr %2, i64 %6
  ....
}
```

```llvm
define internal void @.omp_mapper._QQFnesting_type_omp_default_mapper(ptr noundef %0, ptr noundef %1, ptr noundef %2, i64 noundef %3, i64 noundef %4, ptr noundef %5) #6 {
entry:
  %6 = getelementptr i8, ptr %2, i64 %3
  ....
}
```

The `udiv exact` instruction emitted by the OMP IR Builder (see:
c62cd2877c/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (L9154))
allows `instcombine` to assume that `%3` is divisible by the struct size
(here `56`) and, therefore, replaces the result of the division with
direct GEP on `i8` rather than the struct type.

However, the runtime was calling
`@.omp_mapper._QQFnesting_type_omp_default_mapper` not with `56` (the
proper struct size) but with `48`!

Debugging this further, I found that the size of `omp.map.info`
operation to which the default mapper is attached computes the value of
`48` because we set the map to partial (see:
c62cd2877c/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp (L1146)
and
c62cd2877c/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (L4501-L4512)).

However, I think this is incorrect since the emitted mapper (and
user-defined mappers in general) are defined on the whole struct type
and should never be marked as partial. Hence, the fix in this PR.
2026-01-09 15:15:10 +01:00
Slava Zakharin
84cc15344f
[flang] Make fir.result Pure operation. (#173508)
This allows speculating recursively speculatable operations
containing `fir.result`. Note that making it Pure does not allow
speculating `fir.result` itself from its containing operation,
since it is a terminator.
2026-01-07 17:04:56 -08:00
Slava Zakharin
0bf4df8b1e
[flang] Added LoopInvariantCodeMotion pass for [HL]FIR. (#173438)
The new pass allows hoisting some `fir.load` operations early
in MLIR. For example, many descriptor load might be hoisted
out of the loops, though it does not make much difference
in performance, because LLVM is able to optimize such loads
(which are lowered as `llvm.memcpy` into temporary descriptors),
given that proper TBAA information is generated by Flang.

Further hoisting improvements are possible in [HL]FIR LICM,
e.g. getting proper mod-ref results for Fortran runtime calls
may allow hoisting loads from global variables, which LLVM
cannot do due to lack of alias information.

This patch also contains improvements for FIR mod-ref analysis:
We may recurse into `HasRecursiveMemoryEffects` operations and
use `getModRef` recursively to get more precise results for
regions with `fir.call` operations.

This patch also modifies `AliasAnalysis` to set the instantiation
point for cases where the tracked data is accessed through a load
from `!fir.ref<!fir.box<>>`: without this change the mod-ref
analysis was not able to recognize user pointer/allocatable variables.
2026-01-07 16:16:52 -08:00
Valentin Clement (バレンタイン クレメン)
a0dfe45036
Reland "[flang][cuda] Add support for derived-type initialization on device #172568" (#174107)
The build bots failure have been address in #174048
2025-12-31 11:26:17 -08:00
Slava Zakharin
fe0f366f6e
[flang] Fixed hoisting order in fir.do_concurrent simplification. (#174044)
The order has to be fixed after #173502. This results in
reversing the order of `fir.alloca`, but that should be
insignificant.
2025-12-31 10:08:32 -08:00
Valentin Clement (バレンタイン クレメン)
f43d683409
Revert "Reland "[flang][cuda] Add support for derived-type initialization on device #172568" (#174033)
This fails https://lab.llvm.org/staging/#/builders/65
This reverts commit 1ac1a547ee3b74b4d02bc94faf02ca0381196d11.
2025-12-30 15:15:44 -08:00
Slava Zakharin
91981a5736
[flang] Fixed operations hoisting out of fir.do_concurrent. (#173502)
LICM (#173438) may insert new operations at the beginning of
`fir.do_concurrent`'s block and they cannot be always hoisted
to the alloca-block of the parent operation. This patch
only moves `fir.alloca`s into the alloca-block, and moves
all other operations right before fir.do_concurrent.
2025-12-30 10:27:31 -08:00
Valentin Clement (バレンタイン クレメン)
1ac1a547ee
Reland "[flang][cuda] Add support for derived-type initialization on device #172568" (#172913)
#172568
2025-12-30 08:49:04 -08:00