llvm-project

Author	SHA1	Message	Date
Slava Zakharin	5f268d04f9	[flang] Code generation for fir.pack/unpack_array. (#132080 ) The code generation relies on `ShallowCopyDirect` runtime to copy data between the original and the temporary arrays (both directions). The allocations are done by the compiler generated code. The heap allocations could have been passed to `ShallowCopy` runtime, but I decided to expose the allocations so that the temporary descriptor passed to `ShallowCopyDirect` has `nocapture` - maybe this will be better for LLVM optimizations.	2025-03-31 11:42:17 -07:00
Valentin Clement (バレンタインクレメン)	a862b6deae	[flang][cuda] Lower shared global to the correct NVVM address space (#131368 ) Global with the CUDA shared data attribute needs to be lowered to llvm globals with the correct address space (3). Address space is set from the `mlir::NVVM::NVVMMemorySpace::kSharedMemorySpace` enum from `mlir/Dialect/LLVMIR/NVVMDialect.h`	2025-03-14 15:28:32 -07:00
Asher Mancinelli	982527eef0	[flang] Use saturated intrinsics for floating point to integer conversions (#130686 ) The saturated floating point conversion intrinsics match the semantics in the standard more closely than the fptosi/fptoui instructions. Case 2 of 16.9.100 is > INT (A [, KIND]) > If A is of type real, there are two cases: if \|A\| < 1, INT (A) has the value 0; if \|A\| ≥ 1, INT (A) is the integer whose magnitude is the largest integer that does not exceed the magnitude of A and whose sign is the same as the sign of A. Currently, converting a floating point value into an integer type too small to hold the constant will be converted to poison in opt, leaving us with garbage: ``` > cat t.f90 program main real(kind=16) :: f integer(kind=4) :: i f=huge(f) i=f print , i end program main # current upstream > for i in `seq 10`; do; ./a.out; done -862156992 -1497393344 -739096768 -1649494208 1761228608 -1959270592 -746244288 -1629194432 -231217344 382322496 ``` With the saturated fptoui/fptosi intrinsics, we get the appropriate values ``` # mine > flang -O2 ./t.f90 && ./a.out 2147483647 > perl -e 'printf "%d\n", (2 * 31) - 1' 2147483647 ``` One notable difference: NaNs being converted to ints will become zero, unlike current flang (and some other compilers). Newer versions of GCC have this behavior.	2025-03-12 08:14:46 -07:00
jeanPerier	15e335f04f	[flang] also set llvm ABI argument attributes on direct calls (#130736 ) So far, flang was not setting argument attributes on direct calls assuming that putting them on the function operation was enough. It was clarified in `38565da525` that they must be set on both call and functions, even for direct calls. Crashes have been observed because of the lack of the attribute when compiling `abs(x)` at `O2` and above on X86-64 for complex(16).	2025-03-12 09:55:05 +01:00
jeanPerier	1ddf18057a	[flang] introduce fir.copy to avoid load store of aggregates (#130289 ) Introduce a FIR operation to do memcopy/memmove of compile time constant size types. This is to avoid requiring derived type copies to done with load/store which is badly supported in LLVM when the aggregate type is "big" (no threshold can easily be defined here, better to always avoid them for fir.type). This was the root cause of the regressions caused by #114002 which introduced a load/store of fir.type<> which caused hand/asserts to fire in LLVM on several benchmarks. See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type	2025-03-11 09:31:03 +01:00
R	1dffe8f364	Reland [flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#130386 ) Previous PR: https://github.com/llvm/llvm-project/pull/129308 Changes: * The alloc-32.fir test is now marked as requiring the X86 target. * Drive-by fixes uncovered when fixing tests involving malloc	2025-03-11 02:01:57 +00:00
R	3121da52aa	Revert "[flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#129308 )" This reverts commit cf1964af5a461196904b663ede04c26555fcff69. This causes breakage on all the non-x86 buildbots as they don't have the i686 target enabled. This was missed in pre-commit CI.	2025-03-08 02:42:24 +00:00
R	cf1964af5a	[flang] In AllocMemOp lowering, convert types for calling malloc on 32-bit (#129308 ) Although 32-bit targets are currently not officially supported, add a type conversion in the AllocMemOp lowering when calling the `malloc` function on 32-bit targets. This fixes a type mismatch, and this fix makes it easier to potentially support such targets in the future. This involves making sure the `LLVMTypeConverter` has the necessary information to know the target bit width. Co-authored-by: Valentin Clement (バレンタインクレメン) <clementval@gmail.com>	2025-03-08 02:25:17 +00:00
Kelvin Li	83f8721201	[flang] handle passing bind(c) derived type by value for ppc64le and powerpc64-aix (#128780 )	2025-03-03 14:43:43 -05:00
jeanPerier	a8db1fb9b5	[flang] update fir.coordinate_of to carry the fields (#127231 ) This patch updates fir.coordinate_op to carry the field index as attributes instead of relying on getting it from the fir.field_index operations defining its operands. The rational is that FIR currently has a few operations that require DAGs to be preserved in order to be able to do code generation. This is the case of fir.coordinate_op, which requires its fir.field operand producer to be visible. This makes IR transformation harder/brittle, so I want to update FIR to get rid if this. Codegen/printer/parser of fir.coordinate_of and many tests need to be updated after this change.	2025-02-28 09:50:05 +01:00
Slava Zakharin	0caa8f42be	Reland "[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093 )" This change is inspired by a case in facerec benchmark, where performance of scalar code may improve by about 6%@aarch64 due to getting rid of redundant loads from Fortran descriptors. These descriptors are corresponding to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest in LocalMove subroutine contains call to Fortran runtime IO functions, and LLVM globals-aa analysis cannot prove that these calls do not modify the globalized descriptors with internal linkage. This patch sets and propagates llvm.memory_effects attribute for fir.call operations calling Fortran runtime functions. In particular, it tries to set the Other memory effect to NoModRef. The Other memory effect includes accesses to globals and captured pointers, so we cannot set it for functions taking Fortran descriptors with one exception for calls where the Fortran descriptor arguments are all null. As long as different calls to the same Fortran runtime function may have different attributes, I decided to attach the attributes to the calls rather than functions. Moreover, attaching the attributes to func.func will require propagating these attributes to llvm.func, which is not happening right now. In addition to llvm.memory_effects, the new pass sets llvm.nosync and llvm.nocallback attributes that may also help LLVM alias analysis (e.g. see #127707). These attributes are ignored currently. I will support them in LLVM IR dialect in a separate patch. I also added another pass for developers to be able to print declarations/calls of all Fortran runtime functions that are recognized by the attributes setting pass. It should help with maintenance of the LIT tests.	2025-02-24 14:18:17 -08:00
Slava Zakharin	69cc16fb55	Revert "[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093 )" This reverts commit 36fdeb2aded08a776fcffefa73cb7667e7fc6c2d.	2025-02-24 10:52:53 -08:00
Valentin Clement (バレンタインクレメン)	8dbc393e44	[flang][cuda][NFC] Remove shared alloc addr space (#128535 )	2025-02-24 10:05:32 -08:00
Slava Zakharin	36fdeb2ade	[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093 ) This change is inspired by a case in facerec benchmark, where performance of scalar code may improve by about 6%@aarch64 due to getting rid of redundant loads from Fortran descriptors. These descriptors are corresponding to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest in LocalMove subroutine contains call to Fortran runtime IO functions, and LLVM globals-aa analysis cannot prove that these calls do not modify the globalized descriptors with internal linkage. This patch sets and propagates llvm.memory_effects attribute for fir.call operations calling Fortran runtime functions. In particular, it tries to set the Other memory effect to NoModRef. The Other memory effect includes accesses to globals and captured pointers, so we cannot set it for functions taking Fortran descriptors with one exception for calls where the Fortran descriptor arguments are all null. As long as different calls to the same Fortran runtime function may have different attributes, I decided to attach the attributes to the calls rather than functions. Moreover, attaching the attributes to func.func will require propagating these attributes to llvm.func, which is not happening right now. In addition to llvm.memory_effects, the new pass sets llvm.nosync and llvm.nocallback attributes that may also help LLVM alias analysis (e.g. see #127707). These attributes are ignored currently. I will support them in LLVM IR dialect in a separate patch. I also added another pass for developers to be able to print declarations/calls of all Fortran runtime functions that are recognized by the attributes setting pass. It should help with maintenance of the LIT tests.	2025-02-24 09:27:48 -08:00
David Truby	449f84fea6	[flang] fix AArch64 PCS for struct following pointer (#127802 ) Pointers are already handled as taking up a register in the ABI handling, but the handling for structs was not taking this into account. This patch changes the struct handling to acknowledge that pointer arguments take up an integer register. Fixes #123075	2025-02-21 10:50:52 -08:00
Razvan Lupusoru	f27081ba6a	[FIR] Avoid generating llvm.undef for dummy scoping info (#128098 ) Dummy scoping operations are generated to keep track of scopes for purpose of Fortran level analyses like Alias Analysis. For codegen, the scoping info is converted to a fir.undef during pre-codegen rewrite. Then during declare lowering, this info is no longer used - but it is still translated to llvm.undef. I cleaned up so it is simply erased. The generated LLVM should now no longer have a stray undef which looks off when trying to make sense of the IR. Co-authored-by: Razvan Lupusoru <rlupusoru@nvidia.com>	2025-02-20 18:49:23 -08:00
Valentin Clement (バレンタインクレメン)	726c4b9f77	[flang][cuda] Lower match_all_sync functions to nvvm intrinsics (#127940 )	2025-02-20 09:10:25 -08:00
jeanPerier	5836d91845	[flang] add ABI argument attributes in indirect calls (#126896 ) Last piece that implements the TODO for sret and byval setting on indirect calls. This includes a fix to the codegen last patch. I thought types in in type attributes were automatically converted in dialect conversion passes, but that is not the case. The sret and byval type needs to be converted to llvm types in codegen (mlir FuncOp conversion is doing a similar conversion).	2025-02-12 17:31:34 +01:00
jeanPerier	65075a863b	[flang][FIR] handle argument attributes in fir.call (#126711 ) Add pretty printer/parser for fir.call argument/result attributes and propagate them to llvm.call. This will allow implementing the TODO about ABI relevant argument attribute in indirect calls.	2025-02-12 09:49:52 +01:00
Michael Kruse	b815a3942a	[Flang] Move non-common headers to FortranSupport (#124416 ) Move non-common files from FortranCommon to FortranSupport (analogous to LLVMSupport) such that * declarations and definitions that are only used by the Flang compiler, but not by the runtime, are moved to FortranSupport * declarations and definitions that are used by both ("common"), the compiler and the runtime, remain in FortranCommon * generic STL-like/ADT/utility classes and algorithms remain in FortranCommon This allows a for cleaner separation between compiler and runtime components, which are compiled differently. For instance, runtime sources must not use STL's `<optional>` which causes problems with CUDA support. Instead, the surrogate header `flang/Common/optional.h` must be used. This PR fixes this for `fast-int-sel.h`. Declarations in include/Runtime are also used by both, but are header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler and runtime, is also moved into FortranCommon.	2025-02-06 15:29:10 +01:00
jeanPerier	327d627066	[mlir] share argument attributes interface between calls and callables (#123176 ) This patch shares core interface methods dealing with argument and result attributes from CallableOpInterface with the CallOpInterface and makes them mandatory to gives more consistent guarantees about concrete operations using these interfaces. This allows adding argument attributes on call like operations, which is sometimes required to get proper ABI, like with llvm.call (and llvm.invoke). The patch adds optional `arg_attrs` and `res_attrs` attributes to operations using these interfaces that did not have that already. They can then re-use the common "rich function signature" printing/parsing helpers if they want (for the LLVM dialect, this is done in the next patch). Part of RFC: https://discourse.llvm.org/t/mlir-rfc-adding-argument-and-result-attributes-to-llvm-call/84107	2025-02-03 11:27:14 +01:00
Tom Eccles	aeaafce464	[mlir][OpenMP][flang] make private variable allocation implicit in omp.private (#124019 ) The intention of this work is to give MLIR->LLVMIR conversion freedom to control how the private variable is allocated so that it can be allocated on the stack in ordinary cases or as part of a structure used to give closure context for tasks which might outlive the current stack frame. See RFC: https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084 For example, a privatizer for an integer used to look like ```mlir omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc { ^bb0(%arg0: !fir.ref<i32>): %0 = ... allocate proper memory for the private clone ... omp.yield(%0 : !fir.ref<i32>) } ``` After this change, allocation become implicit in the operation: ```mlir omp.private {type = private} @x.privatizer : i32 ``` For more complex types that require initialization after allocation, an init region can be used: ``` mlir omp.private {type = private} @x.privatizer : !some.type init { ^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>): // initialize %arg1, using %arg0 as a mold for allocations omp.yield(%arg1 : !some.pointer<!some.type>) } dealloc { ^bb0(%arg0: !some.pointer<!some.type>): ... deallocate memory allocated by the init region ... omp.yield } ``` This patch lays the groundwork for delayed task execution but is not enough on its own. After this patch all gfortran tests which previously passed still pass. There are the following changes to the Fujitsu test suite: - 0380_0009 and 0435_0009 are fixed - 0688_0041 now fails at runtime. This patch is testing firstprivate variables with tasks. Previously we got lucky with the undefined behavior and won the race. After these changes we no longer get lucky. This patch lays the groundwork for a proper fix for this issue. In flang the lowering re-uses the existing lowering used for reduction init and dealloc regions. In flang, before this patch we hit a TODO with the same wording when generating the copy region for firstprivate polymorphic variables. After this patch the box-like fir.class is passed by reference into the copy region, leading to a different path that didn't hit that old TODO but the generated code still didn't work so I added a new TODO in DataSharingProcessor.	2025-01-31 09:35:26 +00:00
agozillon	4186805060	[Flang][MLIR] Extend DataLayout utilities to have basic GPU Module support (#123149 ) As there is now certain areas where we now have the possibility of having either a ModuleOp or GPUModuleOp and both of these modules can have DataLayout's and we may require utilising the DataLayout utilities in these areas I've taken the liberty of trying to extend them for use with both. Those with more knowledge of how they wish the GPUModuleOp's to interact with their parent ModuleOp's DataLayout may have further alterations they wish to make in the future, but for the moment, it'll simply utilise the basic data layout construction which I believe combines parent and child datalayouts from the ModuleOp and GPUModuleOp. If there is no GPUModuleOp DataLayout it should default to the parent ModuleOp. It's worth noting there is some weirdness if you have two module operations defining builtin dialect DataLayout Entries, it appears the combinatorial functionality for DataLayouts doesn't support the merging of these. This behaviour is useful for areas like: https://github.com/llvm/llvm-project/pull/119585/files#diff-19fc4bcb38829d085e25d601d344bbd85bf7ef749ca359e348f4a7c750eae89dR1412 where we have a crossroads between the two different module operations.	2025-01-30 17:31:50 +01:00
Slava Zakharin	0b80491cd5	[flang] Support non-index shape/shift/slice for CG box operations. (#124625 ) That is another problem uncovered during hlfir.reshape inlining, where the shape bits could be any integer type. This patch adds explicit convertions to `index` type where needed.	2025-01-28 09:38:33 -08:00
Abid Qadeer	afa4681ce4	[flang][debug] Add support for common blocks. (#112398 ) This PR adds debug support for common block in flang. As variable which are part of a common block don't have a special marker to recognize them, we use the following check to find them. %0 = fir.address_of(@a) %1 = fir.convert %0 %2 = fir.coordinate_of %1, %c0 %3 = fir.convert %2 %4 = fircg.ext_declare %3 If the memref of a fircg.ext_declare points to a fir.coordinate_of and that in turn points to an fir.address_of (ignoring immediate fir.convert) then we assume that it is a common block variable. The fir.address_of gives us the global symbol which is the storage for common block and fir.coordinate_of provides the offset in this storage. The debug hierarchy looks like as subroutine f3 integer :: x, y common /a/ x, y end subroutine @a_ = global { ... } { ... }, !dbg !26, !dbg !28 !23 = !DISubprogram(name: "f3"...) !24 = !DICommonBlock(scope: !23, name: "a", ...) !25 = !DIGlobalVariable(name: "x", scope: !24 ...) !26 = !DIGlobalVariableExpression(var: !25, expr: !DIExpression()) !27 = !DIGlobalVariable(name: "y", scope: !24 ...) !28 = !DIGlobalVariableExpression(var: !27, expr: !DIExpression(DW_OP_plus_uconst, 4)) This required following changes: 1. Instead of using DIGlobalVariableAttr in the FusedLoc of GlobalOp, we use DIGlobalVariableExpressionAttr. This allows us the generate the DIExpression where we have the information. 2. Previously, only one DIGlobalVariableExpressionAttr could be linked to one global op. I recently removed this restriction in mlir. To make use of it, we add an ArrayAttr to the FusedLoc of a GlobalOp. This allows us to pass multiple DIGlobalVariableExpressionAttr. 3. I was depending on the name of global for the name of the common block. The name gets a '_' appended. I could not find a utility function in flang to remove it so I have to brute force it.	2025-01-28 12:54:15 +00:00
ssijaric-nv	16e9601e19	[Flang] Adjust the trampoline size for AArch64 and PPC (#118678 ) Set the trampoline size to match that in compiler-rt/lib/builtins/trampoline_setup.c and AArch64 and PPC lowering.	2025-01-27 08:02:18 -08:00
Kazu Hirata	df3bc54eff	[flang] Avoid repeated hash lookups (NFC) (#124230 )	2025-01-24 00:50:07 -08:00
Valentin Clement (バレンタインクレメン)	9f83c4ed1c	[flang][cuda] Allocate descriptor in managed memory on rebox block argument (#123971 ) Another case where the descriptor must be allocated with the CUF runtime and not a simple alloca instruction.	2025-01-22 10:04:39 -08:00
Valentin Clement	4280316e3d	[flang][cuda] Fix link issue after c26e1a2	2025-01-21 17:35:57 -08:00
Valentin Clement (バレンタインクレメン)	c26e1a22df	[flang][cuda] Allocate descriptor in managed memory when memref is a block argument (#123829 )	2025-01-21 17:20:46 -08:00
Michał Górny	6a2cc12229	[flang] Support linking to MLIR dylib (#120966 ) Introduce a new `MLIR_LIBS` argument to `add_flang_library`, that uses `mlir_target_link_libraries` to link the MLIR dylib alterantively to the component libraries. Use it, along with a few inline `mlir_target_link_libraries` in tools, to support linking Flang to MLIR dylib rather than the static libraries. With these changes, the vast majority of Flang can be linked dynamically. The only parts still using static libraries are these requiring MLIR test libraries, that are not included in the dylib.	2025-01-16 13:35:26 +00:00
Matthias Springer	f023da12d1	[mlir][IR] Remove factory methods from `FloatType` (#123026 ) This commit removes convenience methods from `FloatType` to make it independent of concrete interface implementations. See discussion here: https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361 Note for LLVM integration: Replace `FloatType::getF32(` with `Float32Type::get(` etc.	2025-01-16 08:56:09 +01:00
Kelvin Li	79e788d02e	[flang][AIX] BIND(C) derived type alignment for AIX (#121505 ) This patch is to handle the alignment requirement for the `bind(c)` derived type component that is real type and larger than 4 bytes. The alignment of such component is 4-byte.	2025-01-13 10:52:09 -05:00
Matthias Springer	599c739905	[mlir][GPU] Add NVVM-specific `cf.assert` lowering (#120431 ) This commit add an NVIDIA-specific lowering of `cf.assert` to to `__assertfail`. Note: `getUniqueFormatGlobalName`, `getOrCreateFormatStringConstant` and `getOrDefineFunction` are moved to `GPUOpsLowering.h`, so that they can be reused.	2025-01-06 12:00:11 +01:00
Matthias Springer	3ace685105	[mlir][Transforms] Support 1:N mappings in `ConversionValueMapping` (#116524 ) This commit updates the internal `ConversionValueMapping` data structure in the dialect conversion driver to support 1:N replacements. This is the last major commit for adding 1:N support to the dialect conversion driver. Since #116470, the infrastructure already supports 1:N replacements. But the `ConversionValueMapping` still stored 1:1 value mappings. To that end, the driver inserted temporary argument materializations (converting N SSA values into 1 value). This is no longer the case. Argument materializations are now entirely gone. (They will be deleted from the type converter after some time, when we delete the old 1:N dialect conversion driver.) Note for LLVM integration: Replace all occurrences of `addArgumentMaterialization` (except for 1:N dialect conversion passes) with `addSourceMaterialization`. --------- Co-authored-by: Markus Böck <markus.boeck02@gmail.com>	2025-01-03 16:11:56 +01:00
Matthias Springer	c870632ef6	[flang] Fix some memory leaks (#121050 ) This commit fixes some but not all memory leaks in Flang. There are still 91 tests that fail with ASAN. - Use `mlir::OwningOpRef` instead of `std::unique_ptr`. The latter does not free allocations of nested blocks. - Pass `ModuleOp` as value instead of reference. - Add few missing deallocations in test cases and other places.	2024-12-25 09:42:03 +01:00
Valentin Clement (バレンタインクレメン)	d36836de01	[flang][cuda] Create descriptor in managed memory when emboxing fir.box_addr value (#120980 )	2024-12-23 09:52:59 -08:00
Kazu Hirata	392651a7ec	[flang] Migrate away from PointerUnion::{is,get} (NFC) (#120880 ) Note that PointerUnion::{is,get} have been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> I'm not touching PointerUnion::dyn_cast for now because it's a bit complicated; we could blindly migrate it to dyn_cast_if_present, but we should probably use dyn_cast when the operand is known to be non-null.	2024-12-22 13:30:16 -08:00
Valentin Clement (バレンタインクレメン)	e650ac1654	[flang][cuda][NFC] Fix typo in CUFAllocDescriptor (#120797 ) Missing `r` in the function name.	2024-12-20 13:57:47 -08:00
Valentin Clement (バレンタインクレメン)	81831ef3e7	[flang][cuda] Correctly allocate descriptor in managed memory when reboxing (#120795 ) Reboxing might create a new in memory descriptor. If this one was allocate with managed memory, allocate the new one in managed memory as well.	2024-12-20 13:32:31 -08:00
Valentin Clement (バレンタインクレメン)	3e13acfbf4	[flang][cuda] Make default.nonTbpDefinedIoTable compiler generated (#120686 ) `default.nonTbpDefinedIoTable` is a special global defined for IO that doesn't follow the mangling scheme and is then not handle correctly in the `CompilerGeneratedNames` pass. Update how it is generated with doGenerated so it can be handle without special handling. Also do not generate comdat in gpu module as the current code is not handling nested module correctly.	2024-12-20 10:37:48 -08:00
Matthias Springer	eb6c4197d5	[mlir][CF] Split `cf-to-llvm` from `func-to-llvm` (#120580 ) Do not run `cf-to-llvm` as part of `func-to-llvm`. This commit fixes https://github.com/llvm/llvm-project/issues/70982. This commit changes the way how `func.func` ops are lowered to LLVM. Previously, the signature of the entire region (i.e., entry block and all other blocks in the `func.func` op) was converted as part of the `func.func` lowering pattern. Now, only the entry block is converted. The remaining block signatures are converted together with `cf.br` and `cf.cond_br` as part of `cf-to-llvm`. All unstructured control flow is not converted as part of a single pass (`cf-to-llvm`). `func-to-llvm` no longer deals with unstructured control flow. Also add more test cases for control flow dialect ops. Note: This PR is in preparation of #120431, which adds an additional GPU-specific lowering for `cf.assert`. This was a problem because `cf.assert` used to be converted as part of `func-to-llvm`. Note for LLVM integration: If you see failures, add `-convert-cf-to-llvm` to your pass pipeline.	2024-12-20 13:46:45 +01:00
Valentin Clement (バレンタインクレメン)	e93d226664	[flang][cuda] Update CompilerGeneratedNames pass to work on gpu module (#120660 ) - Update `CompilerGeneratedNames` so it can perform renaming in gpu.module - Update Codegen so it look in the correct module for the type descriptor.	2024-12-19 19:07:00 -08:00
Valentin Clement (バレンタインクレメン)	4530273d7c	[flang][cuda] Allocate descriptor in managed memory when emboxing device memory (#120485 ) When emboxing memory that comes from CUFMemAlloc, we need to allocate the descriptor in manage memory as it might be passed to a kernel.	2024-12-18 18:20:45 -08:00
Peter Klausler	fc97d2e68b	[flang] Add UNSIGNED (#113504 ) Implement the UNSIGNED extension type and operations under control of a language feature flag (-funsigned). This is nearly identical to the UNSIGNED feature that has been available in Sun Fortran for years, and now implemented in GNU Fortran for gfortran 15, and proposed for ISO standardization in J3/24-116.txt. See the new documentation for details; but in short, this is C's unsigned type, with guaranteed modular arithmetic for +, -, and *, and the related transformational intrinsic functions SUM & al.	2024-12-18 07:02:37 -08:00
David Truby	44aa476aa1	[flang] AArch64 ABI for BIND(C) VALUE parameters (#118305 ) This patch adds handling for derived type VALUE parameters in BIND(C) functions for AArch64.	2024-12-18 07:43:22 +00:00
Valentin Clement (バレンタインクレメン)	5e1f87e849	[flang][cuda] Correctly allocate memory for descriptor load (#120164 ) CodeGen will allocate memory for a new descriptor on descriptor loads. CUDA Fortran local descriptor are allocated in managed memory by the runtime. The newly allocated storage for cuda descriptor must also be allocated through the runtime.	2024-12-16 19:12:05 -08:00
Valentin Clement (バレンタインクレメン)	65e00315c9	[flang][cuda] Adapt TargetRewrite to support gpu.launch_func (#119933 )	2024-12-16 06:53:46 -08:00
Valentin Clement (バレンタインクレメン)	dc5236e6b1	[flang][cuda] Update target rewrite to work on gpu.func (#119283 ) Update the pass so it can perform the signature rewrite on gpu.func.	2024-12-10 12:36:49 -08:00
Kiran Chandramohan	4e59721cc6	[Flang][OpenMP] Make boxed procedure pass aware of OpenMP private ops (#118261 ) Fixes #109727	2024-12-09 17:27:18 +00:00

1 2 3 4 5 ...

468 Commits