llvm-project

Author	SHA1	Message	Date
khaki3	c6d770fece	[flang] Fix FIRToMemRef index computation for array_coor with slice and shape_shift (#189496 ) Use shift instead of sliceLb only when the array_coor has an explicit slice (indicesAreFortran case). When the slice comes from an embox, the indices are 1-based section indices and must subtract 1.	2026-03-31 12:09:43 -07:00
Mehdi Amini	509f181f40	[MLIR][TableGen] Fix ArrayRefParameter in struct format roundtrip (#189065 ) When an ArrayRefParameter (or OptionalArrayRefParameter) appears in a non-last position within a struct() assembly format directive, the printed output is ambiguous: the comma-separated array elements are indistinguishable from the struct-level commas separating key-value pairs. Fix this by wrapping such parameters in square brackets in both the generated printer and parser. The printer emits '[' before and ']' after the array value; the parser calls parseLSquare()/parseRSquare() around the FieldParser call. Parameters with a custom printer or parser are unaffected (the user controls the format in that case). Fixes #156623 Assisted-by: Claude Code	2026-03-27 18:41:46 +00:00
Carlos Seo	db5cd626b9	[flang][OpenMP] Restrict isSafeToParallelize to write-only thread-local effects (#188595 ) This is a follow-up fix for commit 0f5e9bee. Only write effects to thread-local memory should be considered safe to parallelize in workshare lowering, not reads. When both reads and writes were safe, the cascading effect in moveToSingle could cause entire SingleRegions to become fully parallelized, eliminating the omp.single and its implicit barrier. This removed synchronization points needed to keep threads coordinated inside sequential loops containing workshared operations, causing race conditions in forall-workshare patterns. This was exposed by the Fujitsu Test Suite and made the following tests regress: FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0031.test FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0013.test FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0030.test FAIL: test-suite :: Fujitsu/Fortran/0398/Fujitsu-Fortran-0398_0014.test Updates #143330	2026-03-27 12:11:27 -03:00
Hocky Yudhiono	ed37bdcc3e	[mlir][func] Fix crashes in FuncToLLVM discardable attributes propagation logic (#188232 ) Refactor how `func.func` discardable attributes are handled in the Func-to-LLVM conversion. Instead of ad hoc checks for linkage and readnone followed by a simple filter, the pass now generically processes inherent attributes from LLVMFuncOp. Attributes that correspond to inherent `llvm.func` ODS names can be attached as `llvm.<name>` on `func.func` and are stripped to `<name>` when building `LLVM::LLVMFuncOp`, so LLVM-specific knobs stay namespaced on the source op but land on the right inherent slots on `llvm.func`. Other discardable attributes continue to be propagated as-is. Fixes #175959 Fixes #181464 Assisted-by: CLion code completion, GPT 5.3-Codex --------- Co-authored-by: Mehdi Amini <joker.eph@gmail.com>	2026-03-26 11:12:10 +00:00
Susan Tan (ス-ザン　タン)	55111e8d17	[flang] use fir.bitcast for FIRToMemRef scalar reinterpretation (#188328 ) Use fir.bitcast in FIR-to-MemRef casts so bit patterns are preserved (e.g. TRANSFER), while keeping fir.convert for memref/reference marshaling and non-bitcast-compatible cases.	2026-03-25 15:27:43 -04:00
Abid Qadeer	95d54423d9	[flang][debug] Always include (kind=X) suffix in debug type names (#186255 ) Previously, 32-bit types (integer, real, logical, complex) were printed without the (kind=4) suffix in DWARF debug type names, while other sizes always included the kind suffix. This inconsistency is now removed by always appending (kind=X) to all basic type names, making the format uniform across all type sizes. Fixes https://github.com/llvm/llvm-project/issues/119478.	2026-03-24 13:40:35 +00:00
khaki3	4219fb8a21	[flang] Fix FIRToMemRef index computation for array_coor with shape_shift and slice (#186523 ) When fir.array_coor carries an explicit shape_shift (non-default lower bounds) and an explicit slice, the indices are Fortran indices rather than 1-based section indices. The FIRToMemRef pass was unconditionally subtracting 1 from sliced indices, which is only correct for 1-based section indices (the embox-with-embedded-slice case). For shape_shift + explicit slice, the correct adjustment is to subtract the slice lower bound instead of 1. This produces proper 0-based memref indices. This pattern arises after the FIR inliner canonicalizes fir.embox(shape_shift, slice) + fir.array_coor(box) into a single fir.array_coor with explicit shape_shift and slice operands, where the indices become Fortran indices. Without this fix, arrays with non-default lower bounds (e.g., A(0:N) or A(-1:N)) produce negative memref indices, writing before the array allocation and causing a segfault.	2026-03-23 10:35:27 -07:00
laoshd	6e5e1c97e0	[flang][flang-rt] Implement F202X leading-zero control edit descriptors LZ, LZS, and LZP for formatted output (F, E, D, and G editing) (#183500 ) LZ: processor-dependent (default, flang prints leading zero); LZS: suppress the optional leading zero before the decimal point; LZP: print the optional leading zero before the decimal point. Changes span the source parser, compile-time format validator, runtime format processing, and runtime output formatting. Includes semantic test (io18.f90) and documentation updates.	2026-03-23 11:50:48 -04:00
Scott Manley	965ee6c91f	[FIRToMemRef] copy ACC Variable Name attribute (#187724 ) When converting from fir.alloca to memref.alloca, also copy the acc variable name attribute if it exists	2026-03-20 12:29:41 -05:00
Kareem Ergawy	acd52a2419	[flang][OpenMP][DoConcurrent] Emit declare mapper for records (#179936 ) Extends `do concurrent` device support by emitting compiler-generated declare mapper ops for live-ins whose types are record types and have allocatable members.	2026-03-11 13:43:55 +01:00
Valentin Clement (バレンタインクレメン)	f35042a639	[flang][openacc] Attach IndirectGlobalAccessModel to fir.use_stmt (#185767 ) In some cases, `fir.use_stmt` operation can end up in offload region like in acc routine for example. Make sure we can validate the symbols associated with the `fir.use_stmt` operation.	2026-03-10 22:31:15 +00:00
Kareem Ergawy	0bf9bb5c42	[Flang][OpenMP] Fix close map flag propagation for derived types in USM (#185330 ) This fixes a bug in USM mode where the `close` map type modifer was attached to some `map.info.op`'s corresponding to user-defined type members while the parent type instance itself is not marked as `close`. This fix ensures that if a parent record type map does not have the 'close' flag, it is cleared from its members as well, maintaining consistency. Gemini was used to create tests. AI generated test code was reviewed line-by-line by me. Which were derived from a reproducer I was working with to debug the issue. Assisted-by: Gemini <gemini@google.com>	2026-03-09 15:55:53 +01:00
Susan Tan (ス-ザン　タン)	97cf8bf220	[flang] materialize fir.box when it is from a block argument (#184898 ) We have to materialize `fir.box` before adding a `fir.convert` to a memref type. Otherwise we get: `'fir.convert' op invalid type conversion'!fir.box<!fir.array<?xi32>>' / 'memref<?xi32, strided<[?], offset: ?>>'`	2026-03-06 16:02:09 -05:00
Carlos Seo	0f5e9bee83	[flang][OpenMP] Fix crash when a sliced array is specified in a forall within a workshare construct (#170913 ) This is a fix for two problems that caused a crash: 1. Thread-local variables sometimes are required to be parallelized. Added a special case to handle this in `LowerWorkshare.cpp:isSafeToParallelize`. 2. Race condition caused by a `nowait` added to the `omp.workshare` if it is the last operation in a block. This allowed multiple threads to execute the `omp.workshare` region concurrently. Since _FortranAPushValue modifies a shared stack, this concurrent access causes a crash. Disable the addition of `nowait` and rely on the implicit barrier at the the of the `omp.workshare` region. Fixes #143330	2026-03-05 09:59:20 -03:00
Razvan Lupusoru	e63e55cae8	[mlir][acc] Add ACCRecipeMaterialization pass and reduction ops (#184252 ) Pass ---- Add the `acc-recipe-materialization` pass, which materializes OpenACC privatization, firstprivate and reduction recipes by inlining their init, copy, combiner, and destroy regions into the operation for the construct. The pass runs on acc.parallel, acc.serial, acc.kernels, and acc.loop. - Firstprivate: Inserts acc.firstprivate_map so the initial value is available on the device, then clones the recipe init and copy regions into the construct and replaces uses with the materialized alloca. Optional destroy region is cloned before the region terminator. - Private: Clones the recipe init region into the construct (at region entry or at the loop op for acc.loop private). Replaces uses of the recipe result with the materialized alloca. Optional destroy region is cloned before the region terminator. - Reduction: Creates acc.reduction_init (init region inlined) and acc.reduction_combine_region (combiner region inlined). All uses of the reduction in the region are updated to the reduction init result. New operations -------------- - acc.reduction_init: Allocates and initializes a private reduction variable from a recipe. Takes the original reduction variable and reduction_operator; has a single region that must yield one value (the private storage) via acc.yield. Used by the pass to materialize acc.reduction_recipe init regions inside the compute construct. - acc.reduction_combine_region: Combines the private reduction value with the shared reduction variable. Takes the shared and private memrefs; has a single region (the recipe combiner) terminated by acc.yield with no operands. Used by the pass to materialize the reduction recipe combiner. Both ops implement RegionBranchOpInterface. acc.yield is updated to allow terminating ReductionInitOp and ReductionCombineRegionOp regions. Supporting changes ------------------ - OpenACCUtilsLoop: Factor cloneACCRegionInto out of the existing loop-conversion helper so the pass can clone recipe regions with optional result replacement; loop conversion now calls the shared helper. - Flang: Add ReductionInitOpFortranObjectViewModel (FortranObjectViewOpInterface) for acc.reduction_init and register it in OpenACC extensions. Tests ----- - MLIR: acc-recipe-materialization-{firstprivate,private,reduction, kernel-private,parallel}.mlir (memref dialect). - Flang: acc-recipe-materialization-{firstprivate,firstprivate-derived, private,reduction,kernel-private,parallel}.fir; firstprivate test has a second RUN with -acc-optimize-firstprivate-map. --------- Co-authored-by: Scott Manley <rscottmanley@gmail.com>	2026-03-02 17:35:22 -08:00
Yangyu Chen	7f0a343a8e	[flang] Implement -grecord-command-line for Flang (#181686 ) Enable Flang to match Clang behavior for command-line recording in DWARF producer strings when using -grecord-command-line. Signed-off-by: Yangyu Chen <cyy@cyyself.name>	2026-02-28 01:45:52 +08:00
Tim	603e5c832a	[flang][debug] Supply missing subprogram attributes (#181425 ) Add DW_AT_elemental, DW_AT_pure, and DW_AT_recursive attributes to subprograms and functions when they are specified in the source.	2026-02-20 21:23:01 +00:00
Susan Tan (ス-ザン　タン)	2b074823e4	Reapply "[flang] Lowering a ArrayCoorOp to arithmetic computations" (#182585 ) Reapplying the changes. Reverted it wrongly yesterday This reverts commit 3c6523dcb8ebc0396f69c578285599b66e16dce7.	2026-02-20 15:33:26 -05:00
Susan Tan (ス-ザン　タン)	3c6523dcb8	Revert "[flang] Lowering a ArrayCoorOp to arithmetic computations whe… (#182365 ) This reverts commit 2bd23d3fa688d0e25c8492ceeaa251af4759d559.	2026-02-19 20:43:05 +00:00
Susan Tan (ス-ザン　タン)	2bd23d3fa6	[flang] Lowering a ArrayCoorOp to arithmetic computations when a fir memref is a block argument (#182139 ) Remove the special-case that handled `fir.array_coor` with a block-argument base by converting the element ref result (!fir.ref<i32> -> memref<i32>) and leaving fir.array_coor alive. Instead, we now always convert the base (!fir.ref<!fir.array<...>> -> memref<...>) and compute the memref indices from the fir.array_coor operands, so loads/stores become memref.load/store base[indices] and fir.array_coor can be erased when it’s only used by memory ops.	2026-02-19 11:46:17 -05:00
Abid Qadeer	deedc7bfe3	[Flang][OpenMP] Don't generate code for unreachable target regions. (#178937 ) When a target region is placed inside a constant false condition (e.g., `if (.false.)`), the dead code gets eliminated on the host side, removing the `omp.target` operation entirely. However, the device-side compilation pipeline is unaware of this elimination and attempts to generate kernel code. Since the host never created offload metadata for the eliminated target, the device-side kernel function lacks the "kernel" attribute, causing `OpenMPOpt` to fail with an assertion when it expects all outlined kernels to have this attribute. The problem can be seen with the following code: ```fortran program cele implicit none real :: V integer :: i if (.false.) then !$omp target teams distribute parallel do do i = 1, 5 V = V * 2 end do !$omp end target teams distribute parallel do end if end program ``` It currently fails with the following assertion: ``` Assertion `omp::isOpenMPKernel(*Kernel) && "Expected kernel function!"' failed. llvm/lib/Transforms/IPO/OpenMPOpt.cpp:4291 ``` This PR adds `DeleteUnreachableTargetsPass` that identifies `omp.target` operations in unreachable code blocks and removes them.	2026-02-16 09:31:42 +00:00
Philipp Rados	9914ee6ef4	[flang] Fix -debug crash from VScaleAttrPass (#180234 ) This pass splits up the `vscaleRange` pass-option from the `VScaleAttrPass` into `vscaleMin` and `vscaleMax` respectively, since a `std::pair<>` cannot be used as a cli-option and crashes when running `flang -march=rv64gcv -O3 file.f90 -mmlir -debug`. Since the options can now be set individually I added some error checking following the semantics described in the langref https://llvm.org/docs/LangRef.html#function-attributes. I also added tests since there were none for only this pass before.	2026-02-10 11:46:06 +01:00
Slava Zakharin	1f26c39cfc	[flang] Allow fir.field_index and fir.coordinate_of speculation. (#179785 ) This change makes `fir.field_index` a Pure operation, and add support of `ConditionallySpeculatable` interface for `fir.coordinate_of`. The test demonstrates how this affects Flang LICM.	2026-02-05 16:22:30 -08:00
Slava Zakharin	2f97c47cc2	[flang,openacc] Fixed canMoveOutOf() for acc.loop. (#178971 ) We should check all data operands, and do not exit after the first one.	2026-01-30 16:00:37 -08:00
Razvan Lupusoru	f951f6305e	[flang][acc] Add ACCOptimizeFirstprivateMap pass (#178546 ) This pass optimizes acc.firstprivate_map operations generated during OpenACC recipe materialization when acc.firstprivate is materialized into the mapping and a private allocation inside region. The optimization applies to scalar variables of trivial types (integers, reals, logicals) as long as they are not optional. The pass hoists loads from the firstprivate variable to before the compute region, converting the firstprivate copy to a pass-by-value pattern. This eliminates the need for runtime copying the firstprivate variable since only its value is needed for initializing private copies.	2026-01-29 19:02:22 +00:00
Sergio Afonso	1e4b4fa1b2	[Flang][OpenMP] Minimize host ops remaining in device compilation (#137200 ) This patch updates the function filtering OpenMP pass intended to remove host functions from the MLIR module created by Flang lowering when targeting an OpenMP target device. Host functions holding target regions must be kept, so that the target regions within them can be translated for the device. The issue is that non-target operations inside these functions cannot be discarded because some of them hold information that is also relevant during target device codegen. Specifically, mapping information resides outside of `omp.target` regions. This patch updates the previous behavior where all host operations were preserved to then ignore all of those that are not actually needed by target device codegen. This, in practice, means only keeping target regions and mapping information needed by the device. Arguments for some of these remaining operations are replaced by placeholder allocations and `fir.undefined`, since they are only actually defined inside of the target regions themselves. As a result, this set of changes makes it possible to later simplify target device codegen, as it is no longer necessary to handle host operations differently to avoid issues.	2026-01-29 12:44:00 +00:00
jeanPerier	9a39c2ff75	Revert "[flang] Use outermost fir.dummy_scope for TBAA of local allocations. (#146006 ) (#177617 ) This reverts commit 90da61634a4accc9869b4e1cb1ac3736158c33e6. See https://github.com/llvm/llvm-project/pull/177615 for more context about why this patch is and can now be reverted.	2026-01-27 15:25:21 +01:00
jeanPerier	45102be5e5	[flang] emit declare for function result before call (#177615 ) This change moves the declare of result storage alloca before the call so that alias analysis can revert to linking fir.declare to the fisrt dominating dummy_scope instead of the dominating one. This is only relevant when MLIR inlining is enabled and is the first step to fix issues recent TBAA changes that placed target data in its own tree exposed an issue with the result storage of a TARGET result. After inlining, the usages of the result storage inside the callee and after the call ended-up being placed in different nodes (target and non target) of the same TBAA tree (for the dominating function). The fact that both nodes are placed in the same tree stems from https://github.com/llvm/llvm-project/pull/146006 that fixed another TBAA issue related to MLIR inlining and function result where the function result was placed into the wrong TBAA tree, which with nested inlining could end-up being the tree of a callee where the result storage was a dummy, causing the TBAA to wrongfully tell that any access to the result storage inside the nested callee did not alias with any access after the call. By moving the declare before the call that will be inlined, this patch will allow reverting #146006 and fixing both issues: the TBAA emit for usages of the result storage after the call will always be placed in a different TBAA tree than any usages of the result storage inside the callee.	2026-01-27 15:25:07 +01:00
Kareem Ergawy	e74e970036	[flang][OpenMP][DoConcurrent] Add `collapse` clause to generated `omp.loop_nest` op (#178138 ) Adds the collpase clause to the generated loop nest both on host and device.	2026-01-27 11:58:57 +01:00
Slava Zakharin	7e66d1511d	[flang][CUF] Limit LICM for cuf.kernel. (#178073 ) This patch prevents hoisting of operations with reference operands. Such a hoisting may break the assumptions that later CUF passes rely on.	2026-01-26 15:24:39 -08:00
Slava Zakharin	dc5f905a87	[flang,openacc] Limit operations hoisting from acc.loop. (#177727 ) This patch implements `OperationMoveOpInterface::canMoveOutOf()` method for `acc.loop`, such that even Pure operations are not hoisted by LICM if any of their operands are referenced in the data operands of `acc.loop`. Related to #175108.	2026-01-26 11:48:37 -08:00
Slava Zakharin	f5e2f29cf3	[flang] Added ConditionallySpeculatable and Pure for some FIR ops. (#174013 ) This patch implements `ConditionallySpeculatable` interface for some FIR operations (`embox`, `rebox`, `box_addr`, `box_dims` and `convert`). It also adds `Pure` trait for `fir.shape`, `fir.shapeshift`, `fir.shift` and `fir.slice`. I could have split this into multiple patches, but the changes are better tested together on real apps, and the amount of affected code is small. There are more `NoMemoryEffect` operations for which I am planning to do the same in future PRs.	2026-01-23 17:42:52 -08:00
Slava Zakharin	5d91c11df5	[flang] Support cuf.device_address in FIR AliasAnalysis. (#177518 ) Support `cuf.device_address` same way as `fir.address_of`. This implementation implies that the host address and the device address `MustAlias` (as shown in the new test). This should be conservatively correct as long as `MustAlias` does not allow to assume that the actual addresses are the same (that is what LLVM documentation implies, I believe). It is probably worth adding an operation interface to handle `fir::AddrOfOp` and `cuf::DeviceAddressOp` in FIR AliasAnalysis, but for the initial implementation I hardcoded the checks. I also removed the call to `fir::valueHasFirAttribute` that performs on demand SymbolTable lookups, which may be costly, and added SymbolTable caching in FIR AliasAnalysis object. Anyway, `fir::valueHasFirAttribute` does not work for `cuf::DeviceAddressOp`.	2026-01-23 17:42:35 -08:00
agozillon	a16668a8d7	[Flang][OpenMP][MLIR] Align declare mapper pass handling with other map and global operations (#176852 ) This PR makes a couple of minor tweaks to the lowering for declare_mapper operations: 1) Add declare_mapper operations to the list of global operations to have optimisation passes executed on them. Primarily just to make sure we keep it inline with other global operations that contain regions. Prevents oddities where we embed FIR/HLFIR into the mapper that needs lowered before being converted to LLVM-IR. One example that springs to mind is if we ever decide to remove the single block condition on the operation to allow conditional checks for mapped data. 2) Add a CodeGenOpenMP.cpp conversion for DeclareMapperOp to make sure we convert the return type correctly from a BoxType to a struct type rather than an opaque pointer when lowering. Currently, I've left out the block argument types from being converted as they're wrapped in a fir.ref and would be opauqe pointers in either case. So some minor additions to keep declare_mapper a little more inline with the rest of the OpenMP operations.	2026-01-23 22:36:39 +01:00
Kareem Ergawy	ab4f66d6f3	[OpenMP][flang] Move `todo` for checking reduction support status on the GPU (#175172 ) Moves a `todo` to check for the current level of support for by-ref reductions to the `FunctionFiltering` pass. This guarantees that the check does not trigger when the same module is compiled twice: on the CPU and on the GPU.	2026-01-21 13:22:45 +01:00
Razvan Lupusoru	8dfec25974	[mlir][acc] Add OffloadTargetVerifier pass (#176467 ) Add a verification pass that checks live-in values and symbol references within offload regions are legal for the target execution model. When code is offloaded to a device (e.g., GPU), not all values and symbols from the host context are directly accessible. Data must be explicitly mapped via OpenACC data clauses (copyin, create, present etc.), declared with device attributes, or be trivial scalars that can be passed by value. Similarly, symbol references to globals must have proper `declare` attributes or device-resident data attributes. This pass walks operations implementing `OffloadRegionOpInterface`, which includes OpenACC compute constructs (`acc.parallel`, `acc.kernels`, `acc.serial`) as well as GPU operations like `gpu.launch`. For each region, it uses liveness analysis to identify values flowing into the region and checks their validity using the `OpenACCSupport` analysis. Key features: - Validates live-in values against OpenACC data mapping requirements - Validates symbol references for device accessibility - Supports soft-check mode for diagnostic-only verification - Configurable device_type for target-specific behavior	2026-01-20 17:17:08 +00:00
Abid Qadeer	dc9c08e6e0	[flang][debug] Generate DWARF debug info using fir.use_stmt. (#168541 ) This patch uses the fir.use_stmt operations to generate correct debug metadata for use statement when `only` and `=>` are used. The debug flow is changed a bit where we process the module globals first so that we have the global variables when we start to process `fir.use_stmt`. Fixes #160923.	2026-01-19 17:16:11 +00:00
Slava Zakharin	09ae1bf8b7	[flang] Added OperationMoveOpInterface for controlling LICM. (#175108 ) In #173438 I added a FIR specific loop invariant code motion pass. During the review, Tom pointed out certain limitations about OpenMP dialect operations that should be taken into consideration during transformations such as LICM: https://github.com/llvm/llvm-project/pull/173438#discussion_r2657612148 I also found issues with hoisting operations out of `acc.loop` operations in certain conditions (see the added test in `licm.fir`). I am proposing a new operation interface that will allow to control movement of operations during MLIR transformations. In particular, I propose two methods (there might be more): * op.canMoveOutOf(cand) - returns true, if it is allowed to move 'cand' operation out of 'op'. * op.canMoveFromDescendant(descendant, cand) - return true, if it is allowed to move 'cand' out of 'descendant' and into 'op'. I used the new interface to get rid of explicit OpenMP interfaces checks in Flang's LICM, and I also used it for `acc.loop` operation (though, I provided conservative initial implementation). The new interface is part of FIR dialect, but I think it would better fit into the core MLIR set of interfaces so that the checks that I make in Flang's LICM are actually done in `mlir::moveLoopInvariantCode`. Moreover, other code movement transformations that may appear in MLIR may also need to use such an interface. I would like to get some feedback on whether it is reasonable to move the interface to core MLIR.	2026-01-16 08:32:38 -08:00
Razvan Lupusoru	ab7217a089	[acc][flang] Add isDeviceData APIs for device data detection (#176219 ) Add comprehensive APIs to detect device-resident data across OpenACC type and operation interfaces. This enables passes to identify data that is already on the device (e.g., CUF device/managed/constant memory, GPU address spaces) and handle it appropriately. New interface methods: - PointerLikeType::isDeviceData(Value): Returns true if the pointer points to device data. - MappableType::isDeviceData(Value): Returns true if the variable represents device data. - GlobalVariableOpInterface::isDeviceData(): Returns true if the global variable is device data. New utilities in OpenACCUtils: - acc::isDeviceValue(Value): Checks if a value represents device data by querying type interfaces, PartialEntityAccessOpInterface for base entities, and AddressOfGlobalOpInterface for global symbols. - acc::isValidValueUse(Value, Region): Checks if a value is legal in an OpenACC region by verifying it comes from a data operation, is only used by private clauses, or is device data. Updated isValidSymbolUse to check GlobalVariableOpInterface::isDeviceData() for symbols referencing device-resident globals. FIR implementations check for CUF data attributes (device, managed, constant, shared, unified) on operations, block arguments, and globals. The implementation traces through fir.rebox, fir.embox, fir.declare, hlfir.declare, and fir.address_of to find the underlying data source. Memref implementations check for gpu::AddressSpaceAttr on the memref type. Updated ACCImplicitData to use acc::isDeviceValue for generating acc.deviceptr clauses for device-resident data instead of copyin/copyout. Updated OpenACCSupport::isValidValueUse to fallback to the new acc::isValidValueUse utility.	2026-01-15 20:56:26 +00:00
jeanPerier	b0b1ab8a40	[flang][openacc] support array section privatization in lowering (#175184 ) Add support array section in private, firstprivate, and reduction. Key changes: - Change the related data operation result type to return the same type as the array base (same type as the acc variable input in the operation), while it was the type of the section before. This allows remapping the base the to result value (to use the data operation result as the base when generating addressing inside the compute region). - The generatePrivateInit implementation of FIROpenACCTypeInterfaces is modified to allocate storage only for the section, and to return the mock base address (that is the address of the allocation minus the offset/lower bound of the privatized section). - The code generating the copy and combiner region is moved from OpenACC.cpp to FIROpenACCTypeInterfaces.cpp via the addition of new generateCopy and generateCombiner interface in the MappableTypeInterface. This allows sharing all the addressing helper with generatePrivateInit, and will allow late generation of all recipes with Fortran. - Update generatePrivateDestroy to deallocate the beginning of the section if any. In the process, the generatePrivateInit implementation is modified so that it is more uniform to make it easier to deal with the section. This also allowed removing runtime calls when initializing the private for array reduction.	2026-01-15 09:32:13 +01:00
Susan Tan (ス-ザン　タン)	2698d15664	[flang] Lowering FIR memory ops to MemRef dialect (#173507 ) This patch introduces FIRToMemRef, a lowering pass that converts FIR memory operations to the MemRef dialect, including support for slices, shifts, and descriptor-style access patterns. To support partial lowering, where FIR and MemRef types can coexist, we extend the handling of fir.convert to correctly marshal between FIR reference-like types and MemRef descriptors. The patch also factors the type conversion logic into a reusable FIRToMemRefTypeConverter, which centralizes the rules for converting FIR types (e.g. !fir.ref, !fir.box, sequences, logicals) to their corresponding memref types, and is used throughout the new pass. --------- Co-authored-by: Scott Manley <rscottmanley@gmail.com> Co-authored-by: jeanPerier <jean.perier.polytechnique@gmail.com>	2026-01-14 10:46:50 -05:00
Eugene Epshteyn	d593bcdc54	[flang] Changes to "unsafe Cray pointers" option (#175223 ) Reserve "-funsafe-cray-pointers" (with "f") for the driver. In the fir-alias-analysis use "-unsafe-cray-pointers" (without "f"). This prevents conflicts with how certain kinds of tools use the "unsafe Cray pointers" options.	2026-01-10 19:33:30 -05:00
Kareem Ergawy	e82399dac2	[flang][OpenMP] Prevent `omp.map.info` ops with user-defined mappers from being marked as parial maps (#175133 ) The following test was triggering a runtime crash on the host before launching the kernel: ```fortran program test_omp_target_map_bug_v5 implicit none type nested_type real, allocatable :: alloc_field(:) end type nested_type type nesting_type integer :: int_field type(nested_type) :: derived_field end type nesting_type type(nesting_type) :: config allocate(config%derived_field%alloc_field(1)) !$OMP TARGET ENTER DATA MAP(TO:config, config%derived_field%alloc_field) !$OMP TARGET config%derived_field%alloc_field(1) = 1.0 !$OMP END TARGET deallocate(config%derived_field%alloc_field) end program test_omp_target_map_bug_v5 ``` In particular, the runtime was producing a segmentation fault when the test is compiled with any optimization level > 0; if you compile with -O0 the sample ran fine. After debugging the runtime, it turned out the crash was happening at the point where the runtime calls the default mapper emitted by the compiler for `nesting_type; in particular at this point in the runtime: `c62cd2877c/offload/libomptarget/omptarget.cpp (L307)`. Bisecting the optimization pipeline using `-mllvm -opt-bisect-limit=N`, the first pass that triggered the issue on `O1` was the `instcombine` pass. Debugging this further, the issue narrows down to canonicalizing `getelementptr` instructions from using struct types (in this case the `nesting_type` in the sample above) to using addressing bytes (`i8`). In particular, in `O0`, you would see something like this: ```llvm define internal void @.omp_mapper._QQFnesting_type_omp_default_mapper(ptr noundef %0, ptr noundef %1, ptr noundef %2, i64 noundef %3, i64 noundef %4, ptr noundef %5) #6 { entry: %6 = udiv exact i64 %3, 56 %7 = getelementptr %_QFTnesting_type, ptr %2, i64 %6 .... } ``` ```llvm define internal void @.omp_mapper._QQFnesting_type_omp_default_mapper(ptr noundef %0, ptr noundef %1, ptr noundef %2, i64 noundef %3, i64 noundef %4, ptr noundef %5) #6 { entry: %6 = getelementptr i8, ptr %2, i64 %3 .... } ``` The `udiv exact` instruction emitted by the OMP IR Builder (see: `c62cd2877c/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (L9154)`) allows `instcombine` to assume that `%3` is divisible by the struct size (here `56`) and, therefore, replaces the result of the division with direct GEP on `i8` rather than the struct type. However, the runtime was calling `@.omp_mapper._QQFnesting_type_omp_default_mapper` not with `56` (the proper struct size) but with `48`! Debugging this further, I found that the size of `omp.map.info` operation to which the default mapper is attached computes the value of `48` because we set the map to partial (see: `c62cd2877c/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp (L1146)` and `c62cd2877c/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (L4501-L4512)`). However, I think this is incorrect since the emitted mapper (and user-defined mappers in general) are defined on the whole struct type and should never be marked as partial. Hence, the fix in this PR.	2026-01-09 15:15:10 +01:00
Slava Zakharin	84cc15344f	[flang] Make fir.result Pure operation. (#173508 ) This allows speculating recursively speculatable operations containing `fir.result`. Note that making it Pure does not allow speculating `fir.result` itself from its containing operation, since it is a terminator.	2026-01-07 17:04:56 -08:00
Slava Zakharin	0bf4df8b1e	[flang] Added LoopInvariantCodeMotion pass for [HL]FIR. (#173438 ) The new pass allows hoisting some `fir.load` operations early in MLIR. For example, many descriptor load might be hoisted out of the loops, though it does not make much difference in performance, because LLVM is able to optimize such loads (which are lowered as `llvm.memcpy` into temporary descriptors), given that proper TBAA information is generated by Flang. Further hoisting improvements are possible in [HL]FIR LICM, e.g. getting proper mod-ref results for Fortran runtime calls may allow hoisting loads from global variables, which LLVM cannot do due to lack of alias information. This patch also contains improvements for FIR mod-ref analysis: We may recurse into `HasRecursiveMemoryEffects` operations and use `getModRef` recursively to get more precise results for regions with `fir.call` operations. This patch also modifies `AliasAnalysis` to set the instantiation point for cases where the tracked data is accessed through a load from `!fir.ref<!fir.box<>>`: without this change the mod-ref analysis was not able to recognize user pointer/allocatable variables.	2026-01-07 16:16:52 -08:00
Valentin Clement (バレンタインクレメン)	a0dfe45036	Reland "[flang][cuda] Add support for derived-type initialization on device #172568 " (#174107 ) The build bots failure have been address in #174048	2025-12-31 11:26:17 -08:00
Slava Zakharin	fe0f366f6e	[flang] Fixed hoisting order in fir.do_concurrent simplification. (#174044 ) The order has to be fixed after #173502. This results in reversing the order of `fir.alloca`, but that should be insignificant.	2025-12-31 10:08:32 -08:00
Valentin Clement (バレンタインクレメン)	f43d683409	Revert "Reland "[flang][cuda] Add support for derived-type initialization on device #172568 " (#174033 ) This fails https://lab.llvm.org/staging/#/builders/65 This reverts commit 1ac1a547ee3b74b4d02bc94faf02ca0381196d11.	2025-12-30 15:15:44 -08:00
Slava Zakharin	91981a5736	[flang] Fixed operations hoisting out of fir.do_concurrent. (#173502 ) LICM (#173438) may insert new operations at the beginning of `fir.do_concurrent`'s block and they cannot be always hoisted to the alloca-block of the parent operation. This patch only moves `fir.alloca`s into the alloca-block, and moves all other operations right before fir.do_concurrent.	2025-12-30 10:27:31 -08:00
Valentin Clement (バレンタインクレメン)	1ac1a547ee	Reland "[flang][cuda] Add support for derived-type initialization on device #172568 " (#172913 ) #172568	2025-12-30 08:49:04 -08:00

1 2 3 4 5 ...

342 Commits