llvm-project

Author	SHA1	Message	Date
Kareem Ergawy	e33cd9690f	[flang][fir] Basic PFT to MLIR lowering for do concurrent locality specifiers (#138534 ) Extends support for `fir.do_concurrent` locality specifiers to the PFT to MLIR level. This adds code-gen for generating the newly added `fir.local` ops and referencing these ops from `fir.do_concurrent.loop` ops that have locality specifiers attached to them. This reuses the `DataSharingProcessor` component and generalizes it a bit more to allow for handling `omp.private` ops and `fir.local` ops as well. PR stack: - https://github.com/llvm/llvm-project/pull/137928 - https://github.com/llvm/llvm-project/pull/138505 - https://github.com/llvm/llvm-project/pull/138506 - https://github.com/llvm/llvm-project/pull/138512 - https://github.com/llvm/llvm-project/pull/138534 (this PR) - https://github.com/llvm/llvm-project/pull/138816	2025-05-29 11:04:27 +02:00
Asher Mancinelli	bbb7f01481	[flang] Fix volatile attribute propagation on allocatables (#139183 ) Ensure volatility is reflected not just on the reference to an allocatable, but on the box, too. When we declare a volatile allocatable, we now get a volatile reference to a volatile box. Some related cleanups: * SELECT TYPE constructs check the selector's type for volatility when creating and designating the type used in the selecting block. * Refine the verifier for fir.convert. In general, I think it is ok to implicitly drop volatility in any ptr-to-int conversion because it means we are in codegen (and representing volatility on the LLVM ops and intrinsics) or we are calling an external function (are there any cases I'm not thinking of?) * An allocatable test that was XFAILed is now passing. Making allocatables' boxes volatile resulted in accesses of those boxes being volatile, which resolved some errors coming from the strict verifier. * I noticed a runtime function was missing the fir.runtime attribute.	2025-05-13 08:13:47 -07:00
Zhen Wang	eef4b5a0cd	[flang] [cuda] Fix CUDA implicit data transfer entity creation (#139414 ) Fixed an issue in `genCUDAImplicitDataTransfer` where creating an `hlfir::Entity` from a symbol address could fail when the address comes from a `hlfir.declare` operation. Fix is to check if the address comes from a `hlfir.declare` operation. If so, use the base value from the declare op when available. Falling back to the original address otherwise.	2025-05-12 10:06:39 -07:00
Andre Kuhlenschmidt	4d9479fa8f	[flang][openacc] Allow open acc routines from other modules. (#136012 ) OpenACC routines annotations in separate compilation units currently get ignored, which leads to errors in compilation. There are two reason for currently ignoring open acc routine information and this PR is addressing both. - The module file reader doesn't read back in openacc directives from module files. - Simple fix in `flang/lib/Semantics/mod-file.cpp` - The lowering to HLFIR doesn't generate routine directives for symbols imported from other modules that are openacc routines. - This is the majority of this diff, and is address by the changes that start in `flang/lib/Lower/CallInterface.cpp`.	2025-05-09 11:12:24 -07:00
Kareem Ergawy	227e1ff73b	[flang][fir] Add locality specifiers modeling to `fir.do_concurrent.loop` (#138506 )	2025-05-08 21:42:52 +02:00
Kareem Ergawy	2fb288d4b8	[flang][fir] Lower `do concurrent` loop nests to `fir.do_concurrent` (#137928 ) Adds support for lowering `do concurrent` nests from PFT to the new `fir.do_concurrent` MLIR op as well as its special terminator `fir.do_concurrent.loop` which models the actual loop nest. To that end, this PR emits the allocations for the iteration variables within the block of the `fir.do_concurrent` op and creates a region for the `fir.do_concurrent.loop` op that accepts arguments equal in number to the number of the input `do concurrent` iteration ranges. For example, given the following input: ```fortran do concurrent(i=1:10, j=11:20) end do ``` the changes in this PR emit the following MLIR: ```mlir fir.do_concurrent { %22 = fir.alloca i32 {bindc_name = "i"} %23:2 = hlfir.declare %22 {uniq_name = "_QFsub1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) %24 = fir.alloca i32 {bindc_name = "j"} %25:2 = hlfir.declare %24 {uniq_name = "_QFsub1Ej"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) fir.do_concurrent.loop (%arg1, %arg2) = (%18, %20) to (%19, %21) step (%c1, %c1_0) { %26 = fir.convert %arg1 : (index) -> i32 fir.store %26 to %23#0 : !fir.ref<i32> %27 = fir.convert %arg2 : (index) -> i32 fir.store %27 to %25#0 : !fir.ref<i32> } } ```	2025-05-07 12:52:25 +02:00
Asher Mancinelli	8836bce842	[flang] Add lowering of volatile references (#132486 ) [RFC on discourse](https://discourse.llvm.org/t/rfc-volatile-representation-in-flang/85404/1) Flang currently lacks support for volatile variables. For some cases, the compiler produces TODO error messages and others are ignored. Some of our tests are like the example from _C.4 Clause 8 notes: The VOLATILE attribute (8.5.20)_ and require volatile variables. Prior commits: ``` c9ec1bc753b0 [flang] Handle volatility in lowering and codegen (#135311) e42f8609858f [flang][nfc] Support volatility in Fir ops (#134858) b2711e1526f9 [flang][nfc] Support volatile on ref, box, and class types (#134386) ```	2025-04-30 08:46:33 -07:00
Valentin Clement (バレンタインクレメン)	46e734746d	[flang][cuda] Update stream type for cuf kernel op (#136627 ) Update the type of the stream operand to be similar to KernelLaunchOp.	2025-04-21 19:22:07 -07:00
Slava Zakharin	50db7a7d26	[flang] Fixed fir.dummy_scope generation to work for TBAA. (#136382 ) The nesting of fir.dummy_scope operations defines the roots of the TBAA forest. If we do not generate fir.dummy_scope in functions that do not have any dummy arguments, then the globals accessed in the function and the dummy arguments accessed by the callee may end up in different sub-trees of the same root. The added tbaa-with-dummy-scope2.fir demonstrates the issue.	2025-04-18 17:19:12 -07:00
Kareem Ergawy	30990c09c9	Revert "[flang][fir] Lower `do concurrent` loop nests to `fir.do_concurrent` (#132904 )" (#135904 ) This reverts commit 04b87e15e40f8857e29ade8321b8b67691545a50. The reasons for reverting is that the following: 1. I still need need to upstream some part of the do concurrent to OpenMP pass from our downstream implementation and taking this in downstream will make things more difficult. 2. I still need to work on a solution for modeling locality specifiers on `hlfir.do_concurrent` ops. I would prefer to do that and merge the entire stack together instead of having a partial solution. After merging the revert I will reopen the origianl PR and keep it updated against main until I finish the above.	2025-04-16 07:20:27 -05:00
Kareem Ergawy	04b87e15e4	[flang][fir] Lower `do concurrent` loop nests to `fir.do_concurrent` (#132904 ) Adds support for lowering `do concurrent` nests from PFT to the new `fir.do_concurrent` MLIR op as well as its special terminator `fir.do_concurrent.loop` which models the actual loop nest. To that end, this PR emits the allocations for the iteration variables within the block of the `fir.do_concurrent` op and creates a region for the `fir.do_concurrent.loop` op that accepts arguments equal in number to the number of the input `do concurrent` iteration ranges. For example, given the following input: ```fortran do concurrent(i=1:10, j=11:20) end do ``` the changes in this PR emit the following MLIR: ```mlir fir.do_concurrent { %22 = fir.alloca i32 {bindc_name = "i"} %23:2 = hlfir.declare %22 {uniq_name = "_QFsub1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) %24 = fir.alloca i32 {bindc_name = "j"} %25:2 = hlfir.declare %24 {uniq_name = "_QFsub1Ej"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) fir.do_concurrent.loop (%arg1, %arg2) = (%18, %20) to (%19, %21) step (%c1, %c1_0) { %26 = fir.convert %arg1 : (index) -> i32 fir.store %26 to %23#0 : !fir.ref<i32> %27 = fir.convert %arg2 : (index) -> i32 fir.store %27 to %25#0 : !fir.ref<i32> } } ```	2025-04-16 06:14:38 +02:00
Zhen Wang	8f0d8d28cc	Delete duplicated hlfir.declare op of induction variables of do concurrent when inside cuf kernel directive. (#134467 ) Delete duplicated creation of hlfir.declare op of do concurrent induction variables when inside cuf kernel directive. Obtain the correct hlfir.declare op generated from bindSymbol, and add it to ivValues.	2025-04-06 19:31:09 -07:00
Jean-Didier PAILLEUX	c309abd925	[flang] Implement !DIR$ NOVECTOR and !DIR$ NOUNROLL[_AND_JAM] (#133885 ) Hi, This patch implements support for the following directives : - `!DIR$ NOUNROLL_AND_JAM` to disable unrolling and jamming on a DO LOOP. - `!DIR$ NOUNROLL` to disable unrolling on a DO LOOP. - `!DIR$ NOVECTOR` to disable vectorization on a DO LOOP.	2025-04-02 14:30:01 +02:00
Thirumalai Shaktivel	091dcb8fc2	[Flang] Make a private copy for the common block variables in copyin clause (#111359 ) Fixes: https://github.com/llvm/llvm-project/issues/82949	2025-04-01 11:35:44 +05:30
Michael Kruse	123eb75cd4	[Flang] Do not emit numeric_storage_size into object file (#131463 ) The value of numeric_storage_size depends on compilation options and therefore its value is not yet known when building the builtins runtime. Instead, the parameter is folding a __numeric_storage_size() expression which is loaded into the user program. For the iso_fortran_env object file, omit the symbol as it is never used. Similar tests that ensure that __numeric_storage_size() is not folded until compiling the actual user program exist in FortranEvalutate: `1e6ba3cd2f/flang/lib/Evaluate/check-expression.cpp (L487-L492)` `1e6ba3cd2f/flang/lib/Evaluate/fold-integer.cpp (L1457-L1460)` Required for using CMake to compile the builtin module files. See RFC at https://discourse.llvm.org/t/rfc-building-flangs-builtin-mod-files/84626	2025-03-21 12:32:54 +01:00
jeanPerier	3ff3b29dd6	[flang] lower remaining cases of pointer assignments inside forall (#130772 ) Implement handling of `NULL()` RHS, polymorphic pointers, as well as lower bounds or bounds remapping in pointer assignment inside FORALL. These cases eventually do not require updating hlfir.region_assign, lowering can simply prepare the new descriptor for the LHS inside the RHS region. Looking more closely at the polymorphic cases, there is not need to call the runtime, fir.rebox and fir.embox do handle the dynamic type setting correctly. After this patch, the last remaining TODO is the allocatable assignment inside FORALL, which like some cases here, is more likely an accidental feature given FORALL was deprecated in F2003 at the same time than allocatable components where added.	2025-03-14 10:51:46 +01:00
Leandro Lupori	29f5d5bea9	[flang][OpenMP] Fix privatization of procedure pointers (#130336 ) Fixes #121720	2025-03-11 09:38:40 -03:00
jeanPerier	40e245a9aa	[flang] add support for procedure pointer assignment inside FORALL (#130114 ) Very similar to object pointer assignment, the difference is the SSA types of the LHS (!fir.ref<!fir.boxproc<()->()>> and RHS (!fir.boxproc<()->()). The RHS must be saved as simple address, not descriptors (it is not possible to make CFI descriptor out of procedure entity).	2025-03-07 10:28:02 +01:00
Valentin Clement (バレンタインクレメン)	478e516140	[flang][cuda] Sync double descriptor after c_f_pointer call (#130194 ) After a global device pointer is set through `c_f_pointer`, we need to sync the double descriptor so the version on the device is also up to date.	2025-03-06 19:19:51 -08:00
Zhen Wang	d1abbb4dc5	[flang][cuda] Change induction variable from i32 to index for doconcurrent inside cuf kernel directive (#129924 ) Use `index` instead of `i32` for induction variables for doconcurrent inside cuf kernel directive. Regular do loop inside cuf kernel directive also uses `index`: ``` cuf.kernel<<<, >>> (%arg0 : index) = ... ```	2025-03-05 14:50:42 -08:00
jeanPerier	7302e1b94e	[flang] implement simple pointer assignments inside FORALL (#129522 ) The semantic of pointer assignments inside FORALL requires evaluating the targets (RHS) and pointer variables (LHS) of all iterations before evaluating the assignments. In practice, if the compiler can prove that the RHS and LHS evaluations are not impacted by the assignments, the evaluation of the FORALL assignment statement can be done in a single loop. However, if the compiler cannot prove this, it needs to "save" the addresses of the targets and/or the pointer descriptors of each iterations before doing the assignments. This patch implements the most common cases where there is no lower bound spec, no bounds remapping, the LHS is not polymorphic, and the RHS is not NULL. The HLFIR operation used to represent assignments inside FORALL can be used for pointer assignments to (the only difference being that the LHS is a descriptor address). The analysis for intrinsic assignment can be reused, with the distinction that the RHS data is not read during the assignment. The logic that is used to save LHS in intrinsic assignments inside FORALL is extracted to be used for the RHS of pointer assignments when needed (saving a descriptor value). Pointer assignment LHS are just descriptor addresses and are saved as int_ptr values.	2025-03-05 11:24:04 +01:00
Valentin Clement (バレンタインクレメン)	d1fd3698a9	[flang][cuda] Allow unsupported data transfer to be done on the host (#129160 ) Some data transfer marked as unsupported can actually be deferred to an assignment on the host when the variables involved are unified or managed.	2025-03-02 16:12:01 -08:00
Zhen Wang	a67566b185	Allow do concurrent inside cuf kernel directive (#127693 ) Allow do concurrent inside cuf kernel directive to avoid the following Lowering error: ``` void {anonymous}::FirConverter::genFIR(const Fortran::parser::CUFKernelDoConstruct&): Assertion `bounds && "Expected bounds on the loop construct"' failed. ``` --------- Co-authored-by: Valentin Clement (バレンタインクレメン) <clementval@gmail.com>	2025-02-20 14:05:44 -08:00
Jean-Didier PAILLEUX	d6c6bde9db	[flang] Implement !DIR$ UNROLL_AND_JAM [N] (#125046 ) This patch implements support for the UNROLL_AND_JAM directive to enable or disable unrolling and jamming on a `DO LOOP`. It must be placed immediately before a `DO LOOP` and applies only to the loop that follows. N is an integer that specifying the unrolling factor. This is done by adding an attribute to the branch into the loop in LLVM to indicate that the loop should unrolled and jammed.	2025-02-19 15:00:09 +00:00
Akash Banerjee	9905728e2f	[MLIR][OpenMP] Add Lowering support for OpenMP Declare Mapper directive (#117046 ) This patch adds HLFIR/FIR lowering support for OpenMP Declare Mapper directive. Depends on #117045.	2025-02-18 16:36:01 +00:00
Asher Mancinelli	6b52fb25b9	[flang] Correctly handle `!dir$ unroll` with unrolling factors of 0 and 1 (#126170 ) https://github.com/llvm/llvm-project/pull/123331 added support for the unrolling directive. In the presence of an explicit unrolling factor, that unrolling factor would be unconditionally passed into the metadata even when it was 1 or 0. These special cases should instead disable unrolling. Adding an explicit unrolling factor of 0 triggered this assertion which is fixed by this patch: ``` unsigned int unrollCountPragmaValue(const llvm::Loop*): Assertion `Count >= 1 && "Unroll count must be positive."' failed. ``` Updated tests and documentation.	2025-02-10 08:21:22 -08:00
Michael Kruse	b815a3942a	[Flang] Move non-common headers to FortranSupport (#124416 ) Move non-common files from FortranCommon to FortranSupport (analogous to LLVMSupport) such that * declarations and definitions that are only used by the Flang compiler, but not by the runtime, are moved to FortranSupport * declarations and definitions that are used by both ("common"), the compiler and the runtime, remain in FortranCommon * generic STL-like/ADT/utility classes and algorithms remain in FortranCommon This allows a for cleaner separation between compiler and runtime components, which are compiled differently. For instance, runtime sources must not use STL's `<optional>` which causes problems with CUDA support. Instead, the surrogate header `flang/Common/optional.h` must be used. This PR fixes this for `fast-int-sel.h`. Declarations in include/Runtime are also used by both, but are header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler and runtime, is also moved into FortranCommon.	2025-02-06 15:29:10 +01:00
Jean-Didier PAILLEUX	e811cb00e5	[flang] Implement !DIR$ UNROLL [N] (#123331 ) This patch implements support for the UNROLL directive to control how many times a loop should be unrolled. It must be placed immediately before a `DO LOOP` and applies only to the loop that follows. N is an integer that specifying the unrolling factor. This is done by adding an attribute to the branch into the loop in LLVM to indicate that the loop should unrolled. The code pushed to support the directive `VECTOR ALWAYS` has been modified to take account of the fact that several directives can be used before a `DO LOOP`.	2025-01-29 09:44:09 +01:00
Valentin Clement (バレンタインクレメン)	654b76321a	[flang][cuda] Allow to set the stack limit size (#124859 ) This patch adds a call to the CUFInit function just after `ProgramStart` when CUDA Fortran is enabled to initialize the CUDA context. This allows us to set up some context information like the stack limit that can be defined by an environment variable `ACC_OFFLOAD_STACKSIZE=<value>`.	2025-01-28 20:57:33 -08:00
Kaviya Rajendiran	daa18205c6	[Flang][OpenMP] Fix copyin allocatable lowering to MLIR (#122097 ) Fixes https://github.com/llvm/llvm-project/issues/113191 Issue: [flang][OpenMP] Runtime segfault when an allocatable variable is used with copyin Rootcause: The value of the threadprivate variable is not being copied from the primary thread to the other threads within a parallel region. As a result it tries to access a null pointer inside a parallel region which causes segfault. Fix: When allocatables used with copyin clause need to ensure that, on entry to any parallel region each thread’s copy of a variable will acquire the allocation status of the primary thread, before copying the value of a threadprivate variable of the primary thread to the threadprivate variable of each other member of the team.	2025-01-23 11:14:00 +05:30
Kareem Ergawy	a0406ce823	[flang][OpenMP] Add `hostIsSource` paramemter to `copyHostAssociateVar` (#123162 ) This fixes a bug when the same variable is used in `firstprivate` and `lastprivate` clauses on the same construct. The issue boils down to the fact that `copyHostAssociateVar` was deciding the direction of the copy assignment (i.e. the `lhs` and `rhs`) based on whether the `copyAssignIP` parameter is set. This is not the best way to do it since it is not related to whether we doing a copy from host to localized copy or the other way around. When we set the insertion for `firstprivate` in delayed privatization, this resulted in switching the direction of the copy assignment. Instead, this PR adds a new paramter to explicitely tell the function the direction of the assignment. This is a follow up PR for https://github.com/llvm/llvm-project/pull/122471, only the latest commit is relevant.	2025-01-16 19:10:12 +01:00
jeanPerier	d82d53b2e3	[flang][openmp] initialize allocatable components of firstprivate copies (#121808 ) Descriptors of allocatable components of firstprivate derived type copies need to be set-up. Otherwise the program later die when manipulating them inside OpenMP region.	2025-01-07 10:04:27 +01:00
Valentin Clement (バレンタインクレメン)	9165848c82	[flang][cuda] Sync global descriptor when nullifying pointer (#121595 )	2025-01-03 14:37:14 -08:00
Matthias Springer	c870632ef6	[flang] Fix some memory leaks (#121050 ) This commit fixes some but not all memory leaks in Flang. There are still 91 tests that fail with ASAN. - Use `mlir::OwningOpRef` instead of `std::unique_ptr`. The latter does not free allocations of nested blocks. - Pass `ModuleOp` as value instead of reference. - Add few missing deallocations in test cases and other places.	2024-12-25 09:42:03 +01:00
Leandro Lupori	1fcb6a9754	[flang][OpenMP] Initialize allocatable members of derived types (#120295 ) Allocatable members of privatized derived types must be allocated, with the same bounds as the original object, whenever that member is also allocated in it, but Flang was not performing such initialization. The `Initialize` runtime function can't perform this task unless its signature is changed to receive an additional parameter, the original object, that is needed to find out which allocatable members, with their bounds, must also be allocated in the clone. As `Initialize` is used not only for privatization, sometimes this other object won't even exist, so this new parameter would need to be optional. Because of this, it seemed better to add a new runtime function: `InitializeClone`. To avoid unnecessary calls, lowering inserts a call to it only for privatized items that are derived types with allocatable members. Fixes https://github.com/llvm/llvm-project/issues/114888 Fixes https://github.com/llvm/llvm-project/issues/114889	2024-12-19 17:26:50 -03:00
Peter Klausler	fc97d2e68b	[flang] Add UNSIGNED (#113504 ) Implement the UNSIGNED extension type and operations under control of a language feature flag (-funsigned). This is nearly identical to the UNSIGNED feature that has been available in Sun Fortran for years, and now implemented in GNU Fortran for gfortran 15, and proposed for ISO standardization in J3/24-116.txt. See the new documentation for details; but in short, this is C's unsigned type, with guaranteed modular arithmetic for +, -, and *, and the related transformational intrinsic functions SUM & al.	2024-12-18 07:02:37 -08:00
Kareem Ergawy	e532241b02	Re-apply (#117867 ): [flang][OpenMP] Implicitly map allocatable record fields (#120374 ) This re-applies #117867 with a small fix that hopefully prevents build bot failures. The fix is avoiding `dyn_cast` for the result of `getOperation()`. Instead we can assign the result to `mlir::ModuleOp` directly since the type of the operation is known statically (`OpT` in `OperationPass`).	2024-12-18 09:19:45 +01:00
Kareem Ergawy	dc936f3c19	Revert "[flang][OpenMP] Implicitly map allocatable record fields (#117867 )" (#120360 )	2024-12-18 06:52:24 +01:00
Kareem Ergawy	db09014a07	[flang][OpenMP] Implicitly map allocatable record fields (#117867 ) This is a starting PR to implicitly map allocatable record fields. This PR contains the following changes: 1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that these utils work on the `mlir::Value` level rather than the `semantics::Symbol` level. This takes one step towards to enabling MLIR passes to more easily do some lowering themselves (e.g. creating `omp.map.bounds` ops for implicitely caputured data like this PR does). 2. Adds support for implicitely capturing and mapping allocatable fields in record types. There is quite some distant to still cover to have full support for this. I added a number of todos to guide further development. Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com> Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>	2024-12-18 05:37:58 +01:00
Slava Zakharin	9d33874936	[flang] Support -f[no-]realloc-lhs. (#120165 ) -frealloc-lhs is the default. If -fno-realloc-lhs is specified, then an allocatable on the left side of an intrinsic assignment is not implicitly (re)allocated to conform with the right hand side. Fortran runtime will issue an error if there is a mismatch in shape/type/allocation-status.	2024-12-17 09:06:05 -08:00
Valentin Clement (バレンタインクレメン)	0469bb91aa	[flang][cuda] Fix lowering when step is a variable (#119421 ) Add missing conversion.	2024-12-10 09:48:15 -08:00
Yusuke MINATO	a88677edc0	Reland "[flang] Integrate the option -flang-experimental-integer-overflow into -fno-wrapv" (#118933 ) This relands #110063. The performance issue on 503.bwaves_r is found not to be related to the patch, and is resolved by fbd89bcc when LTO is enabled.	2024-12-10 16:26:53 +09:00
Michael Kruse	c91ba04328	[Flang][NFC] Split runtime headers in preparation for cross-compilation. (#112188 ) Split some headers into headers for public and private declarations in preparation for #110217. Moving the runtime-private headers in runtime-private include directory will occur in #110298. * Do not use `sizeof(Descriptor)` in the compiler. The size of the descriptor is target-dependent while `sizeof(Descriptor)` is the size of the Descriptor for the host platform which might be too small when cross-compiling to a different platform. Another problem is that the emitted assembly ((cross-)compiling to the same target) is not identical between Flang's running on different systems. Moving the declaration of `class Descriptor` out of the included header will also reduce the amount of #included sources. * Do not use `sizeof(ArrayConstructorVector)` and `alignof(ArrayConstructorVector)` in the compiler. Same reason as with `Descriptor`. * Compute the descriptor's extra flags without instantiating a Descriptor. `Fortran::runtime::Descriptor` is defined in the runtime source, but not the compiler source. * Move `InquiryKeywordHashDecode` into runtime-private header. The function is defined in the runtime sources and trying to call it in the compiler would lead to a link-error. * Move allocator-kind magic numbers into common header. They are the only declarations out of `allocator-registry.h` in the compiler as well. This does not make Flang cross-compile ready yet, the main goal is to avoid transitive header dependencies from Flang to clang-rt. There are more assumptions that host platform is the same as the target platform.	2024-12-06 15:29:00 +01:00
jeanPerier	ff78cd5f3d	[flang] fix private pointers and default initialized variables (#118494 ) Both OpenMP privatization and DO CONCURRENT LOCAL lowering was incorrect for pointers and derived type with default initialization. For pointers, the descriptor was not established with the rank/type code/element size, leading to undefined behavior if any inquiry was made to it prior to a pointer assignment (and if/when using the runtime for pointer assignments, the descriptor must have been established). For derived type with default initialization, the copies were not default initialized.	2024-12-05 14:09:48 +01:00
vdonaldson	6003be7ef1	[flang] IEEE_GET_UNDERFLOW_MODE, IEEE_SET_UNDERFLOW_MODE (#118551 ) Implement IEEE_GET_UNDERFLOW_MODE and IEEE_SET_UNDERFLOW_MODE. Update IEEE_SUPPORT_UNDERFLOW_CONTROL to enable support for indvidual REAL kinds.	2024-12-04 16:21:11 -05:00
Yusuke MINATO	e573c6b67e	[flang] Add nsw to DO loop parameters (#113854 ) nsw is added to DO loop parameters (initial parameters, terminal parameters, and incrementation parameters). This can help vectorization in some cases like #110609. See also the discussion in https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/20.	2024-11-28 08:58:09 +09:00
Valentin Clement (バレンタインクレメン)	3433e4140d	[flang][cuda] Detect constant on the rhs of data transfer (#117806 ) When the rhs expression has some constants and a device symbol, an implicit data transfer needs to be generated for the device symbol and the computation with the constant is done on the host.	2024-11-26 17:04:00 -08:00
jeanPerier	bb8bf858e8	[flang] add internal_assoc flag to mark variable captured in internal procedure (#117161 ) This patch adds a flag to mark hlfir.declare of host variables that are captured in some internal procedure. It enables implementing a simple fir.call handling in fir::AliasAnalysis::getModRef leveraging Fortran language specifications and without a data flow analysis. This will allow implementing an optimization for "array = array_function()" where array storage is passed directly into the hidden result argument to "array_function" when it can be proven that arraY_function does not reference "array". Captured host variables are very tricky because they may be accessed indirectly in any calls if the internal procedure address was captured via some global procedure pointer. Without flagging them, there is no way around doing a complex inter procedural data flow analysis: - checking that the call is not made to an internal procedure is not enough because of the possibility of indirect calls made to internal procedures inside the callee. - checking that the current func.func has no internal procedure is not enough because this would be invalid with inlining when an procedure with internal procedures is inlined inside a procedure without internal procedure.	2024-11-26 09:21:13 +01:00
khaki3	ff7fca7fa8	[flang][cuda] Support memory cleanup at a return statement (#116304 ) We generate `cuf.free` and `func.return` twice if a return statement exists at the end of program. ```f90 program test integer, device :: a(10) return end ``` ``` % flang -x cuda test.cuf -mmlir --mlir-print-ir-after-all error: loc("/path/to/test.cuf":3:3): 'func.return' op must be the last operation in the parent block // -----// IR Dump After Fortran::lower::VerifierPass Failed () //----- // ``` Dumped IR: ```mlir "func.func"() <{function_type = () -> (), sym_name = "_QQmain"}> ({ ... "cuf.free"(%5#1) <{data_attr = #cuf.cuda<device>}> : (!fir.ref<!fir.array<10xi32>>) -> () "func.return"() : () -> () "cuf.free"(%5#1) <{data_attr = #cuf.cuda<device>}> : (!fir.ref<!fir.array<10xi32>>) -> () "func.return"() : () -> () } ... ``` The routine `genExitRoutine` in `Bridge.cpp` is guarded by `blockIsUnterminated()` to make sure that `func.return` is generated only at the end of a block. However, we redundantly run `bridge.fctCtx().finalizeAndKeep()` before `genExitRoutine` in this case, resulting in two pairs of `cuf.free` and `func.return`. This PR fixes `Bridge.cpp` by using `blockIsUnterminated()` to guard `finalizeAndKeep` as well.	2024-11-15 08:44:42 -08:00
Valentin Clement (バレンタインクレメン)	37143fe27e	[flang][cuda] Make launch configuration optional for cuf kernel (#115947 )	2024-11-12 16:49:44 -08:00

1 2 3 4 5 ...

441 Commits