llvm-project

Author	SHA1	Message	Date
Jean-Didier PAILLEUX	c1779f33bd	[flang] Implement !DIR$ [NO]INLINE and FORCEINLINE directives (#134350 ) This patch adds the support of these two directives : `!dir$ inline` and `!dir$ noinline`. - `!dir$ noinline` tells to the compiler to not perform inlining on specific function calls by adding the `noinline` metadata on the call. - `!dir$ inline` tells to the compiler to attempt inlining on specific function calls by adding the `inlinehint` metadata on the call. - `!dir$ forceinline` tells to the compiler to always perfom inlining on specific function calls by adding the `alwaysinline` metadata on the call. Currently, these directives can be placed before a `DO LOOP`, call functions or assignments. Maybe other statements can be added in the future if needed. For the `inline` directive the correct name might be `forceinline` but I'm not sure ?	2025-10-28 08:02:15 +01:00
Jakub Kuderski	23ead47655	[flang][mlir] Migrate to free create functions. NFC. (#164657 ) See https://discourse.llvm.org/t/psa-opty-create-now-with-100-more-tab-complete/87339. I plan to mark these as deprecated in https://github.com/llvm/llvm-project/pull/164649.	2025-10-22 12:47:48 -04:00
jeanPerier	c9fb37c75f	[flang][FIR] add fir.assumed_size_extent to abstract assumed-size extent encoding (#164452 ) The purpose of this patch is to allow converting FIR array representation to memref when possible without hitting memref verifier issue. The issue was that FIR arrays may be assumed size, in which case the last dimension will not be known at runtime. Flang uses -1 to encode this to fulfill Fortran 2023 standard requirements in 18.5.3 point 5 about CFI_desc_t. When arrays are converted to memeref, if this `-1` reaches memeref operations, it triggers verifier errors (even if the conversion happened in code that guards the code to be entered at runtime if the array is assumed-size because folders/verifiers do not take into account reachability). This follows-up on discussions in #163505 merge requests	2025-10-22 11:46:18 +02:00
Valentin Clement (バレンタインクレメン)	7b10e977f8	[flang][cuda] Do not fail if global is not found (#163445 )	2025-10-14 20:51:42 +00:00
Valentin Clement (バレンタインクレメン)	6a7754f2ac	[flang][cuda] Set address space for constant variables (#163430 ) Set the correct address space for constant variables. Address of operation will introduce an address cast.	2025-10-14 19:16:26 +00:00
Valentin Clement (バレンタインクレメン)	1c8cd1ed97	[flang][cuda] Add a TODO for code generation of CONSTANT variable (#163268 )	2025-10-13 21:52:31 +00:00
Alexey Bataev	0ca23a3054	[Flang]Fix the build with the EXPENSIVE_CHECKS enabled (#162541 )	2025-10-08 16:35:23 -04:00
Valentin Clement (バレンタインクレメン)	e47a42e5b5	[flang][cuda] Do not use managed memory inside gpu module (#160730 ) Do not issue call to _FortranACUFAllocDescriptor inside gpu module.	2025-09-25 16:51:49 +00:00
Valentin Clement (バレンタインクレメン)	37de695cb1	[flang][cuda] Make sure global device descriptor is allocated in managed memory (#160596 ) When the descriptor of a global device variable is re-materialized to be passed to a kernel, make sure it is allocated in managed memory otherwise the kernel launch will fail.	2025-09-24 20:32:59 +00:00
agozillon	046d6a3998	[Flang][OpenMP] Additional global address space modifications for device (#119585 ) A prior PR added a portion of the global address space modifications required for declare target to, this PR seeks to add a small amount more leftover from that PR. The intent is to allow for more correct IR that the backends (in particular AMDGPU) can treat more aptly for optimisations and code correctness 1/3 required PRs to enable declare target to mapping, should look at PR 3/3 to check for full green passes (this one will fail a number due to some dependencies). Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com	2025-09-17 03:27:03 +02:00
Fabian Mora	48babe1931	[mlir][LLVM] Add LLVMAddrSpaceAttrInterface and NVVMMemorySpaceAttr (#157339 ) This patch introduces the `LLVMAddrSpaceAttrInterface` for defining compatible LLVM address space attributes To test this interface, this patch also adds: - Adds NVVMMemorySpaceAttr implementing both LLVMAddrSpaceAttrInterface and MemorySpaceAttrInterface - Converts NVVM memory space constants from enum to MLIR enums - Updates all NVVM memory space references to use new attribute system - Adds support for NVVM memory spaces in ptr dialect translation Example: ```mlir llvm.func @nvvm_ptr_address_space( !ptr.ptr<#nvvm.memory_space<global>>, !ptr.ptr<#nvvm.memory_space<shared>>, !ptr.ptr<#nvvm.memory_space<constant>>, !ptr.ptr<#nvvm.memory_space<local>>, !ptr.ptr<#nvvm.memory_space<tensor>>, !ptr.ptr<#nvvm.memory_space<shared_cluster>> ) -> !ptr.ptr<#nvvm.memory_space<generic>> ``` Translating the above code to LLVM produces: ```llvm declare ptr @nvvm_ptr_address_space(ptr addrspace(1), ptr addrspace(3), ptr addrspace(4), ptr addrspace(5), ptr addrspace(6), ptr addrspace(7)) ``` To convert the memory space enum to the new enum class use: ```bash grep -r . -e "NVVMMemorySpace::kGenericMemorySpace" -l \| xargs sed -i -e "s/NVVMMemorySpace::kGenericMemorySpace/NVVMMemorySpace::Generic/g" grep -r . -e "NVVMMemorySpace::kGlobalMemorySpace" -l \| xargs sed -i -e "s/NVVMMemorySpace::kGlobalMemorySpace/NVVMMemorySpace::Global/g" grep -r . -e "NVVMMemorySpace::kSharedMemorySpace" -l \| xargs sed -i -e "s/NVVMMemorySpace::kSharedMemorySpace/NVVMMemorySpace::Shared/g" grep -r . -e "NVVMMemorySpace::kConstantMemorySpace" -l \| xargs sed -i -e "s/NVVMMemorySpace::kConstantMemorySpace/NVVMMemorySpace::Constant/g" grep -r . -e "NVVMMemorySpace::kLocalMemorySpace" -l \| xargs sed -i -e "s/NVVMMemorySpace::kLocalMemorySpace/NVVMMemorySpace::Local/g" grep -r . -e "NVVMMemorySpace::kTensorMemorySpace" -l \| xargs sed -i -e "s/NVVMMemorySpace::kTensorMemorySpace/NVVMMemorySpace::Tensor/g" grep -r . -e "NVVMMemorySpace::kSharedClusterMemorySpace" -l \| xargs sed -i -e "s/NVVMMemorySpace::kSharedClusterMemorySpace/NVVMMemorySpace::SharedCluster/g" ``` NOTE: A future patch will add support for ROCDL, it wasn't added here to keep the patch small.	2025-09-14 09:05:28 -04:00
jeanPerier	355dbbc37c	[flang][FIR] enable fir.box_addr codegen inside fir.global (#157120 ) FIR lowering of the fir.box type inside fir.global is special (it is an actual descriptor struct value instead of being a descriptor in memory) and causes builtin.unrealized_conversion_cast to be inserted under the hood by MLIR dialect conversion framework after each operation producing a fir.box is translated. These builtin.unrealized_conversion_cast must be removed before the code generation of operation of using the fir.box in order to get the right "by value" code generation required in global initial value definitions.	2025-09-08 10:15:22 +02:00
Chaitanya	4a3bf27c69	[OpenMP] Introduce omp.target_allocmem and omp.target_freemem omp dialect ops. (#145464 ) This PR introduces two new ops in omp dialect, omp.target_allocmem and omp.target_freemem. omp.target_allocmem: Allocates heap memory on device. Will be lowered to omp_target_alloc call in llvm. omp.target_freemem: Deallocates heap memory on device. Will be lowered to omp+target_free call in llvm. Example: %1 = omp.target_allocmem %device : i32, i64 omp.target_freemem %device, %1 : i32, i64 The work in this PR is C-P/inspired from @ivanradanov commit from coexecute implementation: [Add fir omp target alloc and free ops](`be860ac8ba`) [Lower omp_target_{alloc,free} to llvm](`6e2d584dc9`)	2025-08-18 18:15:11 +05:30
Slava Zakharin	b8e4232bd2	[flang] Cast fir.select[_rank] selector to i64. (#153239 ) Properly cast the selector to `i64` regardless of its integer type. We used to generate llvm.trunc always. We have to use `i64` as long as the case values may exceed INT_MAX. Fixes #153050.	2025-08-12 16:43:44 -07:00
Valentin Clement (バレンタインクレメン)	3847620ba9	[flang][NFC] Move the rest of ops creation to new APIs (#152079 )	2025-08-05 07:27:43 -07:00
Maksim Levental	dcfc853c51	[mlir][NFC] update `flang/lib` create APIs (12/n) (#149914 ) See https://github.com/llvm/llvm-project/pull/147168 for more info.	2025-07-24 19:05:40 -04:00
Diego Caballero	c99c213e72	[mlir][Flang][NFC] Replace use of `vector.insertelement/extractelement` (#143272 ) This PR is part of the last step to remove `vector.extractelement` and `vector.insertelement` ops (RFC: https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops). It replaces `vector.insertelement` and `vector.extractelement` with `vector.insert` and `vector.extract` in Flang. It looks like no lit tests are impacted?	2025-07-18 14:43:03 -07:00
Kelvin Li	df56b1a2cf	[flang] handle allocation of zero-sized objects (#149165 ) This PR handles the allocation of zero-sized objects for different implementations. One byte is allocated for the zero-sized objects.	2025-07-17 23:52:48 -04:00
Akash Banerjee	fc114e4d93	[MLIR] Add ComplexTOROCDLLibraryCalls pass (#144926 )	2025-07-16 13:59:41 +01:00
Tom Eccles	9a805ba169	[flang][NFC] Fix deprecation warning (#147932 ) I started getting deprecation warnings from operations constructors which seem to be doing implicit construction of mlir::ValueRange from a std::nullopt by relying on implicit conversion from std::nullopt into llvm::ArrayRef. ArrayRef{std::nullopt} is what has been deprecated.	2025-07-11 10:37:34 +01:00
Kareem Ergawy	eba35cc1c0	[flang][do concurrent] Re-model `reduce` to match reductions are modelled in OpenMP and OpenACC (#145837 ) This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following: * A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op. * Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op. * Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized. * Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model. Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things. This PR goes in the same direction we went for `private/local` speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects are modelled in essentially the same way which makes mapping between them more trivial, hopefully. PR stack: - https://github.com/llvm/llvm-project/pull/145837 (this one) - https://github.com/llvm/llvm-project/pull/146025 - https://github.com/llvm/llvm-project/pull/146028 - https://github.com/llvm/llvm-project/pull/146033	2025-07-11 06:39:30 +02:00
Daniel Chen	13ead00049	[Flang] Fix PowerPC build failure due to the deprecation of ArrayRef(std::nullopt_t) {}. (#147816 ) Our local Flang build on PowerPC was broken as ``` llvm/flang/../mlir/include/mlir/IR/ValueRange.h:401:20: error: 'ArrayRef' is deprecated: Use {} or ArrayRef<T>() instead [-Werror,-Wdeprecated-declarations] 401 \| : ValueRange(ArrayRef<Value>(std::forward<Arg>(arg))) {} \| ^ llvm/flang/lib/Optimizer/CodeGen/CodeGen.cpp:2243:53: note: in instantiation of function template specialization 'mlir::ValueRange::ValueRange<const std::nullopt_t &, void>' requested here 2243 \| /cstInteriorIndices=/std::nullopt, fieldIndices, \| ^ llvm/include/llvm/ADT/ArrayRef.h:70:18: note: 'ArrayRef' has been explicitly marked deprecated here 70 \| /implicit/ LLVM_DEPRECATED("Use {} or ArrayRef<T>() instead", "{}") \| ^ llvm/include/llvm/Support/Compiler.h:244:50: note: expanded from macro 'LLVM_DEPRECATED' 244 \| #define LLVM_DEPRECATED(MSG, FIX) __attribute__((deprecated(MSG, FIX))) \| ^ 1 error generated. ``` This patch is to fix it.	2025-07-10 09:53:03 -04:00
Shunsuke Watanabe	c9900015a9	[flang] Add -fcomplex-arithmetic= option and select complex division algorithm (#146641 ) This patch adds an option to select the method for computing complex number division. It uses `LoweringOptions` to determine whether to lower complex division to a runtime function call or to MLIR's `complex.div`, and `CodeGenOptions` to select the computation algorithm for `complex.div`. The available option values and their corresponding algorithms are as follows: - `full`: Lower to a runtime function call. (Default behavior) - `improved`: Lower to `complex.div` and expand to Smith's algorithm. - `basic`: Lower to `complex.div` and expand to the algebraic algorithm. See also the discussion in the following discourse post: https://discourse.llvm.org/t/optimization-of-complex-number-division/83468 --------- Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>	2025-07-09 13:43:54 +09:00
Kareem Ergawy	b1774222c7	[flang] Emit `fir.global` in the global address space (#146653 ) Instead of emitting globals in the program/default address space, emit them in the global address space. This also requires changes how address of code-gen is handled, we need to cast to the default address space to prevent code-gen issues.	2025-07-02 17:15:22 +02:00
jeanPerier	faefe7cf7d	[flang] add option to generate runtime type info as external (#146071 ) Reland #145901 with a fix for shared library builds. So far flang generates runtime derived type info global definitions (as opposed to declarations) for all the types used in the current compilation unit even when the derived types are defined in other compilation units. It is using linkonce_odr to achieve derived type descriptor address "uniqueness" aspect needed to match two derived type inside the runtime. This comes at a big compile time cost because of all the extra globals and their definitions in apps with many and complex derived types. This patch adds and experimental option to only generate the rtti definition for the types defined in the current compilation unit and to only generate external declaration for the derived type descriptor object of types defined elsewhere. Note that objects compiled with this option are not compatible with object files compiled without because files compiled without it may drop the rtti for type they defined if it is not used in the compilation unit because of the linkonce_odr aspect. I am adding the option so that we can better measure the extra cost of the current approach on apps and allow speeding up some compilation where devirtualization does not matter (and the build config links to all module file object anyway).	2025-06-30 09:58:00 +02:00
jeanPerier	37e2d10499	Revert "[flang] add option to generate runtime type info as external" (#146064 ) Reverts llvm/llvm-project#145901 Broke shared library builds because of the usage of `skipExternalRttiDefinition` in Lowering.	2025-06-27 14:05:59 +02:00
jeanPerier	e816817bbb	[flang] add option to generate runtime type info as external (#145901 ) So far flang generates runtime derived type info global definitions (as opposed to declarations) for all the types used in the current compilation unit even when the derived types are defined in other compilation units. It is using linkonce_odr to achieve derived type descriptor address "uniqueness" aspect needed to match two derived type inside the runtime. This comes at a big compile time cost because of all the extra globals and their definitions in apps with many and complex derived types. This patch adds and experimental option to only generate the rtti definition for the types defined in the current compilation unit and to only generate external declaration for the derived type descriptor object of types defined elsewhere. Note that objects compiled with this option are not compatible with object files compiled without because files compiled without it may drop the rtti for type they defined if it is not used in the compilation unit because of the linkonce_odr aspect. I am adding the option so that we can better measure the extra cost of the current approach on apps and allow speeding up some compilation where devirtualization does not matter (and the build config links to all module file object anyway).	2025-06-27 13:00:29 +02:00
Kareem Ergawy	282e471018	[flang] Erase `fir.local` ops before lowering `fir` to `llvm` (#143687 ) `fir.local` ops are not supposed to have any uses at this point (i.e. during lowering to LLVM). In case of serialization, the `fir.do_concurrent` users are expected to have been lowered to `fir.do_loop` nests. In case of parallelization, the `fir.do_concurrent` users are expected to have been lowered to the target parallel model (e.g. OpenMP). This hopefully resolved a build issue introduced by https://github.com/llvm/llvm-project/pull/142567 (see for example: https://lab.llvm.org/buildbot/#/builders/199/builds/4009).	2025-06-12 05:58:55 +02:00
Pranav Bhandarkar	8395912895	[Flang] - Handle `BoxCharType` in `fir.box_offset` op (#141713 ) To map `fir.boxchar` types reliably onto an offload target, such as a GPU, the `omp.map.info` operation is used to map the underlying data pointer (`fir.ref<fir.char<k, ?>>`) wrapped by the `fir.boxchar` MLIR value. The `omp.map.info` operation needs a pointer to the underlying data pointer. Given a reference to a descriptor (`fir.box`), the `fir.box_offset` is used to obtain the address of the underlying data pointer. This PR extends `fir.box_offset` to provide the same functionality for `fir.boxchar` as well.	2025-06-06 10:48:07 -05:00
Valentin Clement (バレンタインクレメン)	6811a3bedf	[flang][cuda] Allocate extra descriptor in managed memory when it is coming from device (#140818 )	2025-05-20 18:55:13 -07:00
jeanPerier	ed07412888	[flang] translate derived type array init to attribute if possible (#140268 ) This patch relies on #140235 and #139724 to speed-up compilations of files with derived type array global with initial value. Currently, such derived type global init was lowered to an llvm.mlir.insertvalue chain in the LLVM IR dialect because there was no way to represent such value via attributes. This chain was later folded in LLVM dialect to LLVM IR using LLVM IR (not dialect) folding. This insert chain generation and folding is very expensive for big arrays. For instance, this patch brings down the compilation of FM_lib fmsave.f95 from 50s to 0.5s.	2025-05-20 16:11:27 +02:00
jeanPerier	416b7dfaa0	[flang] use DataLayout instead of GEP to compute element size (#140235 ) Now that the datalayout is part of codegen, use that to generate type size constants in codegen instead of generating GEP.	2025-05-19 13:59:09 +02:00
Asher Mancinelli	8836bce842	[flang] Add lowering of volatile references (#132486 ) [RFC on discourse](https://discourse.llvm.org/t/rfc-volatile-representation-in-flang/85404/1) Flang currently lacks support for volatile variables. For some cases, the compiler produces TODO error messages and others are ignored. Some of our tests are like the example from _C.4 Clause 8 notes: The VOLATILE attribute (8.5.20)_ and require volatile variables. Prior commits: ``` c9ec1bc753b0 [flang] Handle volatility in lowering and codegen (#135311) e42f8609858f [flang][nfc] Support volatility in Fir ops (#134858) b2711e1526f9 [flang][nfc] Support volatile on ref, box, and class types (#134386) ```	2025-04-30 08:46:33 -07:00
Kaviya Rajendiran	857ac4c229	[MLIR][OpenMP] Lowering nontemporal clause to LLVM IR for SIMD directive (#118751 ) This patch, - Added a new attribute `nontemporal` to fir.load and fir.store operation in the FIR dialect. - Added a pass `lower-nontemporal` which is called before FIRToLLVM conversion pass and adds the nontemporal attribute to loads and stores on the list items specified in the nontemporal clause of the SIMD directive. - Set the `UnitAttr:$nontemporal` to llvm.load and llvm.store operations during FIR to LLVM dialect conversion, if the corresponding fir.load or fir.store operations have the nontemporal attribute. - Attached the `nontemporal metadata` to load and store instructions that have the nontemporal attribute, during LLVM dialect to LLVM IR translation.	2025-04-30 11:13:20 +05:30
Asher Mancinelli	c9ec1bc753	[flang] Handle volatility in lowering and codegen (#135311 ) * Enable lowering and conversion patterns to pass volatility information from higher level operations to lower level ones. * Enable codegen to pass volatility to LLVM dialect ops by setting an attribute on loads, stores, and memory intrinsics. * Add utilities for passing along the volatility from an input type to an output type. To introduce volatile types into the IR, entities with the volatile attribute will be given a volatile type in the bridge; this is not enabled in this patch. User code should not result in IR with volatile types yet, so this patch contains no tests with Fortran source, only IR that already contains volatile types. Part 3 of #132486.	2025-04-14 11:02:23 -07:00
Slava Zakharin	27bc8a1811	[flang][NFC] Split CG dialect and the passes. (#135240 ) I am making a CG pass to depend on `FIROpenACCSupport` in #134346. This introduces a cyclic dependency between `FIROpenACCSupport` and `FIRCodeGen`. This patch splits `FIRCodeGen` into `FIRCodeGenDialect` (for FIR CG dialect definition) and `FIRCodeGen` (for the CG passes). Now, `FIROpenACCSupport` depends on `FIRCodeGenDialect`, and `FIRCodeGen` depends on `FIROpenACCSupport`.	2025-04-10 16:13:04 -07:00
Valentin Clement (バレンタインクレメン)	a862b6deae	[flang][cuda] Lower shared global to the correct NVVM address space (#131368 ) Global with the CUDA shared data attribute needs to be lowered to llvm globals with the correct address space (3). Address space is set from the `mlir::NVVM::NVVMMemorySpace::kSharedMemorySpace` enum from `mlir/Dialect/LLVMIR/NVVMDialect.h`	2025-03-14 15:28:32 -07:00
Asher Mancinelli	982527eef0	[flang] Use saturated intrinsics for floating point to integer conversions (#130686 ) The saturated floating point conversion intrinsics match the semantics in the standard more closely than the fptosi/fptoui instructions. Case 2 of 16.9.100 is > INT (A [, KIND]) > If A is of type real, there are two cases: if \|A\| < 1, INT (A) has the value 0; if \|A\| ≥ 1, INT (A) is the integer whose magnitude is the largest integer that does not exceed the magnitude of A and whose sign is the same as the sign of A. Currently, converting a floating point value into an integer type too small to hold the constant will be converted to poison in opt, leaving us with garbage: ``` > cat t.f90 program main real(kind=16) :: f integer(kind=4) :: i f=huge(f) i=f print , i end program main # current upstream > for i in `seq 10`; do; ./a.out; done -862156992 -1497393344 -739096768 -1649494208 1761228608 -1959270592 -746244288 -1629194432 -231217344 382322496 ``` With the saturated fptoui/fptosi intrinsics, we get the appropriate values ``` # mine > flang -O2 ./t.f90 && ./a.out 2147483647 > perl -e 'printf "%d\n", (2 * 31) - 1' 2147483647 ``` One notable difference: NaNs being converted to ints will become zero, unlike current flang (and some other compilers). Newer versions of GCC have this behavior.	2025-03-12 08:14:46 -07:00
jeanPerier	1ddf18057a	[flang] introduce fir.copy to avoid load store of aggregates (#130289 ) Introduce a FIR operation to do memcopy/memmove of compile time constant size types. This is to avoid requiring derived type copies to done with load/store which is badly supported in LLVM when the aggregate type is "big" (no threshold can easily be defined here, better to always avoid them for fir.type). This was the root cause of the regressions caused by #114002 which introduced a load/store of fir.type<> which caused hand/asserts to fire in LLVM on several benchmarks. See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type	2025-03-11 09:31:03 +01:00
R	1dffe8f364	Reland [flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#130386 ) Previous PR: https://github.com/llvm/llvm-project/pull/129308 Changes: * The alloc-32.fir test is now marked as requiring the X86 target. * Drive-by fixes uncovered when fixing tests involving malloc	2025-03-11 02:01:57 +00:00
R	3121da52aa	Revert "[flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#129308 )" This reverts commit cf1964af5a461196904b663ede04c26555fcff69. This causes breakage on all the non-x86 buildbots as they don't have the i686 target enabled. This was missed in pre-commit CI.	2025-03-08 02:42:24 +00:00
R	cf1964af5a	[flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#129308 ) Although 32-bit targets are currently not officially supported, add a type conversion in the AllocMemOp lowering when calling the `malloc` function on 32-bit targets. This fixes a type mismatch, and this fix makes it easier to potentially support such targets in the future. This involves making sure the `LLVMTypeConverter` has the necessary information to know the target bit width. Co-authored-by: Valentin Clement (バレンタインクレメン) <clementval@gmail.com>	2025-03-08 02:25:17 +00:00
jeanPerier	a8db1fb9b5	[flang] update fir.coordinate_of to carry the fields (#127231 ) This patch updates fir.coordinate_op to carry the field index as attributes instead of relying on getting it from the fir.field_index operations defining its operands. The rational is that FIR currently has a few operations that require DAGs to be preserved in order to be able to do code generation. This is the case of fir.coordinate_op, which requires its fir.field operand producer to be visible. This makes IR transformation harder/brittle, so I want to update FIR to get rid if this. Codegen/printer/parser of fir.coordinate_of and many tests need to be updated after this change.	2025-02-28 09:50:05 +01:00
Slava Zakharin	0caa8f42be	Reland "[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093 )" This change is inspired by a case in facerec benchmark, where performance of scalar code may improve by about 6%@aarch64 due to getting rid of redundant loads from Fortran descriptors. These descriptors are corresponding to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest in LocalMove subroutine contains call to Fortran runtime IO functions, and LLVM globals-aa analysis cannot prove that these calls do not modify the globalized descriptors with internal linkage. This patch sets and propagates llvm.memory_effects attribute for fir.call operations calling Fortran runtime functions. In particular, it tries to set the Other memory effect to NoModRef. The Other memory effect includes accesses to globals and captured pointers, so we cannot set it for functions taking Fortran descriptors with one exception for calls where the Fortran descriptor arguments are all null. As long as different calls to the same Fortran runtime function may have different attributes, I decided to attach the attributes to the calls rather than functions. Moreover, attaching the attributes to func.func will require propagating these attributes to llvm.func, which is not happening right now. In addition to llvm.memory_effects, the new pass sets llvm.nosync and llvm.nocallback attributes that may also help LLVM alias analysis (e.g. see #127707). These attributes are ignored currently. I will support them in LLVM IR dialect in a separate patch. I also added another pass for developers to be able to print declarations/calls of all Fortran runtime functions that are recognized by the attributes setting pass. It should help with maintenance of the LIT tests.	2025-02-24 14:18:17 -08:00
Slava Zakharin	69cc16fb55	Revert "[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093 )" This reverts commit 36fdeb2aded08a776fcffefa73cb7667e7fc6c2d.	2025-02-24 10:52:53 -08:00
Valentin Clement (バレンタインクレメン)	8dbc393e44	[flang][cuda][NFC] Remove shared alloc addr space (#128535 )	2025-02-24 10:05:32 -08:00
Slava Zakharin	36fdeb2ade	[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093 ) This change is inspired by a case in facerec benchmark, where performance of scalar code may improve by about 6%@aarch64 due to getting rid of redundant loads from Fortran descriptors. These descriptors are corresponding to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest in LocalMove subroutine contains call to Fortran runtime IO functions, and LLVM globals-aa analysis cannot prove that these calls do not modify the globalized descriptors with internal linkage. This patch sets and propagates llvm.memory_effects attribute for fir.call operations calling Fortran runtime functions. In particular, it tries to set the Other memory effect to NoModRef. The Other memory effect includes accesses to globals and captured pointers, so we cannot set it for functions taking Fortran descriptors with one exception for calls where the Fortran descriptor arguments are all null. As long as different calls to the same Fortran runtime function may have different attributes, I decided to attach the attributes to the calls rather than functions. Moreover, attaching the attributes to func.func will require propagating these attributes to llvm.func, which is not happening right now. In addition to llvm.memory_effects, the new pass sets llvm.nosync and llvm.nocallback attributes that may also help LLVM alias analysis (e.g. see #127707). These attributes are ignored currently. I will support them in LLVM IR dialect in a separate patch. I also added another pass for developers to be able to print declarations/calls of all Fortran runtime functions that are recognized by the attributes setting pass. It should help with maintenance of the LIT tests.	2025-02-24 09:27:48 -08:00
Razvan Lupusoru	f27081ba6a	[FIR] Avoid generating llvm.undef for dummy scoping info (#128098 ) Dummy scoping operations are generated to keep track of scopes for purpose of Fortran level analyses like Alias Analysis. For codegen, the scoping info is converted to a fir.undef during pre-codegen rewrite. Then during declare lowering, this info is no longer used - but it is still translated to llvm.undef. I cleaned up so it is simply erased. The generated LLVM should now no longer have a stray undef which looks off when trying to make sense of the IR. Co-authored-by: Razvan Lupusoru <rlupusoru@nvidia.com>	2025-02-20 18:49:23 -08:00
Valentin Clement (バレンタインクレメン)	726c4b9f77	[flang][cuda] Lower match_all_sync functions to nvvm intrinsics (#127940 )	2025-02-20 09:10:25 -08:00
jeanPerier	5836d91845	[flang] add ABI argument attributes in indirect calls (#126896 ) Last piece that implements the TODO for sret and byval setting on indirect calls. This includes a fix to the codegen last patch. I thought types in in type attributes were automatically converted in dialect conversion passes, but that is not the case. The sret and byval type needs to be converted to llvm types in codegen (mlir FuncOp conversion is doing a similar conversion).	2025-02-12 17:31:34 +01:00

1 2 3 4 5 ...

327 Commits