When `fir.select_case` branches to blocks with arguments that have FIR
types (e.g., `!fir.ref`), the block signature must be converted to LLVM
types before creating the branch. Otherwise, the branch passes LLVM
types (`!llvm.ptr`) but the block expects FIR types, causing a type
mismatch error.
This adds block signature conversion similar to what
`SelectOpConversionBase` already does for `fir.select` and
`fir.select_rank`.
Implementation details:
* Add PrefetchOp in FirOps
* Handle PrefetchOp in FIR Lowering and also pass required default
values
* Handle PrefetchOp in CodeGen.cpp
* Add required tests
We have a longstanding issue in debug info that use statement is not
fully respected. The problem has been described in
https://github.com/llvm/llvm-project/issues/160923. This is first part
of the effort to address this issue. This PR adds infrastructure to emit
`use` statement information in FIR, which will be used by subsequent
patches to generate DWARF debug information.
The information about use statement is collected during semantic
analysis and stored in `PreservedUseStmt` objects. During lowering,
`fir.use_stmt` operations are emitted for each `PreservedUseStmt`
object. The `fir.use_stmt` operation captures the module name, `only`
list symbols, and any renames specified in the use statement. The
`fir.use_stmt` is removed during `CodeGen`.
This directive tells the compiler to ignore vector dependencies in the
following loop and it must be placed before a `do loop`.
Sometimes the compiler may not have sufficient information to decide
whether a particular loop is vectorizable due to potential dependencies
between iterations and the directive is here to tell to the compiler
that vectorization is safe with `parallelAccesses` metadata.
This directive is also equivalent to `#pragma clang loop assume(safety)`
in C++
This patch adds the support of these two directives : `!dir$ inline` and
`!dir$ noinline`.
- `!dir$ noinline` tells to the compiler to not perform inlining on
specific function calls by adding the `noinline` metadata on the call.
- `!dir$ inline` tells to the compiler to attempt inlining on specific
function calls by adding the `inlinehint` metadata on the call.
- `!dir$ forceinline` tells to the compiler to always perfom inlining on
specific function calls by adding the `alwaysinline` metadata on the
call.
Currently, these directives can be placed before a `DO LOOP`, call
functions or assignments. Maybe other statements can be added in the
future if needed.
For the `inline` directive the correct name might be `forceinline` but
I'm not sure ?
The purpose of this patch is to allow converting FIR array representation to
memref when possible without hitting memref verifier issue.
The issue was that FIR arrays may be assumed size, in which case the
last dimension will not be known at runtime. Flang uses -1 to encode
this to fulfill Fortran 2023 standard requirements in 18.5.3 point 5
about CFI_desc_t.
When arrays are converted to memeref, if this `-1` reaches memeref
operations, it triggers verifier errors (even if the conversion happened
in code that guards the code to be entered at runtime if the array is
assumed-size because folders/verifiers do not take into account
reachability).
This follows-up on discussions in #163505 merge requests
When the descriptor of a global device variable is re-materialized to be
passed to a kernel, make sure it is allocated in managed memory
otherwise the kernel launch will fail.
A prior PR added a portion of the global address space modifications
required for declare target to, this PR seeks to add
a small amount more leftover from that PR.
The intent is to allow for more correct IR that the backends (in
particular AMDGPU) can treat more aptly for optimisations and code
correctness
1/3 required PRs to enable declare target to mapping, should look at PR
3/3 to check for full green passes (this one will fail a number due to
some dependencies).
Co-authored-by: Raghu Maddhipatla raghu.maddhipatla@amd.com
This patch introduces the `LLVMAddrSpaceAttrInterface` for defining
compatible LLVM address space attributes
To test this interface, this patch also adds:
- Adds NVVMMemorySpaceAttr implementing both LLVMAddrSpaceAttrInterface
and MemorySpaceAttrInterface
- Converts NVVM memory space constants from enum to MLIR enums
- Updates all NVVM memory space references to use new attribute system
- Adds support for NVVM memory spaces in ptr dialect translation
Example:
```mlir
llvm.func @nvvm_ptr_address_space(
!ptr.ptr<#nvvm.memory_space<global>>,
!ptr.ptr<#nvvm.memory_space<shared>>,
!ptr.ptr<#nvvm.memory_space<constant>>,
!ptr.ptr<#nvvm.memory_space<local>>,
!ptr.ptr<#nvvm.memory_space<tensor>>,
!ptr.ptr<#nvvm.memory_space<shared_cluster>>
) -> !ptr.ptr<#nvvm.memory_space<generic>>
```
Translating the above code to LLVM produces:
```llvm
declare ptr @nvvm_ptr_address_space(ptr addrspace(1), ptr addrspace(3), ptr addrspace(4), ptr addrspace(5), ptr addrspace(6), ptr addrspace(7))
```
To convert the memory space enum to the new enum class use:
```bash
grep -r . -e "NVVMMemorySpace::kGenericMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kGenericMemorySpace/NVVMMemorySpace::Generic/g"
grep -r . -e "NVVMMemorySpace::kGlobalMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kGlobalMemorySpace/NVVMMemorySpace::Global/g"
grep -r . -e "NVVMMemorySpace::kSharedMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kSharedMemorySpace/NVVMMemorySpace::Shared/g"
grep -r . -e "NVVMMemorySpace::kConstantMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kConstantMemorySpace/NVVMMemorySpace::Constant/g"
grep -r . -e "NVVMMemorySpace::kLocalMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kLocalMemorySpace/NVVMMemorySpace::Local/g"
grep -r . -e "NVVMMemorySpace::kTensorMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kTensorMemorySpace/NVVMMemorySpace::Tensor/g"
grep -r . -e "NVVMMemorySpace::kSharedClusterMemorySpace" -l | xargs sed -i -e "s/NVVMMemorySpace::kSharedClusterMemorySpace/NVVMMemorySpace::SharedCluster/g"
```
NOTE: A future patch will add support for ROCDL, it wasn't added here to
keep the patch small.
FIR lowering of the fir.box type inside fir.global is special (it is an
actual descriptor struct value instead of being a descriptor in memory)
and causes builtin.unrealized_conversion_cast to be inserted under the
hood by MLIR dialect conversion framework after each operation producing
a fir.box is translated. These builtin.unrealized_conversion_cast must
be removed before the code generation of operation of using the fir.box
in order to get the right "by value" code generation required in global
initial value definitions.
This PR introduces two new ops in omp dialect, omp.target_allocmem and
omp.target_freemem.
omp.target_allocmem: Allocates heap memory on device. Will be lowered to
omp_target_alloc call in llvm.
omp.target_freemem: Deallocates heap memory on device. Will be lowered
to omp+target_free call in llvm.
Example:
%1 = omp.target_allocmem %device : i32, i64
omp.target_freemem %device, %1 : i32, i64
The work in this PR is C-P/inspired from @ivanradanov commit from
coexecute implementation:
[Add fir omp target alloc and free
ops](be860ac8ba)
[Lower omp_target_{alloc,free} to
llvm](6e2d584dc9)
Properly cast the selector to `i64` regardless of its integer type.
We used to generate llvm.trunc always.
We have to use `i64` as long as the case values may exceed INT_MAX.
Fixes#153050.
I started getting deprecation warnings from operations constructors
which seem to be doing implicit construction of mlir::ValueRange from a
std::nullopt by relying on implicit conversion from std::nullopt into
llvm::ArrayRef. ArrayRef{std::nullopt} is what has been deprecated.
This PR proposes re-modelling `reduce` specifiers to match OpenMP and
OpenACC. In particular, this PR includes the following:
* A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's
`omp.declare_reduction` op.
* Updating the `reduce` clause on `fir.do_concurrent.loop` to use the
new op.
* Re-uses the `ReductionProcessor` component to emit reductions for `do
concurrent` just like we do for OpenMP. To do this, the
`ReductionProcessor` had to be refactored to be more generalized.
* Upates mapping `do concurrent` to `fir.loop ... unordered` nests using
the new reduction model.
Unfortunately, this is a big PR that would be difficult to divide up in
smaller parts because the bottom of the changes are the `fir` table-gen
changes to `do concurrent`. However, doing these MLIR changes cascades
to the other parts that have to be modified to not break things.
This PR goes in the same direction we went for `private/local`
speicifiers. Now the `do concurrent` and OpenMP (and OpenACC) dialects
are modelled in essentially the same way which makes mapping between
them more trivial, hopefully.
PR stack:
- https://github.com/llvm/llvm-project/pull/145837 (this one)
- https://github.com/llvm/llvm-project/pull/146025
- https://github.com/llvm/llvm-project/pull/146028
- https://github.com/llvm/llvm-project/pull/146033
Our local Flang build on PowerPC was broken as
```
llvm/flang/../mlir/include/mlir/IR/ValueRange.h:401:20: error: 'ArrayRef' is deprecated: Use {} or ArrayRef<T>() instead [-Werror,-Wdeprecated-declarations]
401 | : ValueRange(ArrayRef<Value>(std::forward<Arg>(arg))) {}
| ^
llvm/flang/lib/Optimizer/CodeGen/CodeGen.cpp:2243:53: note: in instantiation of function template specialization 'mlir::ValueRange::ValueRange<const std::nullopt_t &, void>' requested here
2243 | /*cstInteriorIndices=*/std::nullopt, fieldIndices,
| ^
llvm/include/llvm/ADT/ArrayRef.h:70:18: note: 'ArrayRef' has been explicitly marked deprecated here
70 | /*implicit*/ LLVM_DEPRECATED("Use {} or ArrayRef<T>() instead", "{}")
| ^
llvm/include/llvm/Support/Compiler.h:244:50: note: expanded from macro 'LLVM_DEPRECATED'
244 | #define LLVM_DEPRECATED(MSG, FIX) __attribute__((deprecated(MSG, FIX)))
| ^
1 error generated.
```
This patch is to fix it.
This patch adds an option to select the method for computing complex
number division. It uses `LoweringOptions` to determine whether to lower
complex division to a runtime function call or to MLIR's `complex.div`,
and `CodeGenOptions` to select the computation algorithm for
`complex.div`. The available option values and their corresponding
algorithms are as follows:
- `full`: Lower to a runtime function call. (Default behavior)
- `improved`: Lower to `complex.div` and expand to Smith's algorithm.
- `basic`: Lower to `complex.div` and expand to the algebraic algorithm.
See also the discussion in the following discourse post:
https://discourse.llvm.org/t/optimization-of-complex-number-division/83468
---------
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
Instead of emitting globals in the program/default address space, emit
them in the global address space. This also requires changes how address
of code-gen is handled, we need to cast to the default address space to
prevent code-gen issues.
Reland #145901 with a fix for shared library builds.
So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.
This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.
This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.
Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.
I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.
This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.
This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.
Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.
I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
`fir.local` ops are not supposed to have any uses at this point (i.e.
during lowering to LLVM). In case of serialization, the
`fir.do_concurrent` users are expected to have been lowered to
`fir.do_loop` nests. In case of parallelization, the `fir.do_concurrent`
users are expected to have been lowered to the target parallel model
(e.g. OpenMP).
This hopefully resolved a build issue introduced by
https://github.com/llvm/llvm-project/pull/142567 (see for example:
https://lab.llvm.org/buildbot/#/builders/199/builds/4009).
To map `fir.boxchar` types reliably onto an offload target, such as a
GPU, the `omp.map.info` operation is used to map the underlying data
pointer (`fir.ref<fir.char<k, ?>>`) wrapped by the `fir.boxchar` MLIR
value. The `omp.map.info` operation needs a pointer to the underlying
data pointer.
Given a reference to a descriptor (`fir.box`), the `fir.box_offset` is
used to obtain the address of the underlying data pointer. This PR
extends `fir.box_offset` to provide the same functionality for
`fir.boxchar` as well.
This patch relies on #140235 and #139724 to speed-up compilations of
files with derived type array global with initial value.
Currently, such derived type global init was lowered to an
llvm.mlir.insertvalue chain in the LLVM IR dialect because there was no
way to represent such value via attributes.
This chain was later folded in LLVM dialect to LLVM IR using LLVM IR
(not dialect) folding. This insert chain generation and folding is very
expensive for big arrays. For instance, this patch brings down the
compilation of FM_lib fmsave.f95 from 50s
to 0.5s.
[RFC on
discourse](https://discourse.llvm.org/t/rfc-volatile-representation-in-flang/85404/1)
Flang currently lacks support for volatile variables. For some cases,
the compiler produces TODO error messages and others are ignored. Some
of our tests are like the example from _C.4 Clause 8 notes: The VOLATILE
attribute (8.5.20)_ and require volatile variables.
Prior commits:
```
c9ec1bc753b0 [flang] Handle volatility in lowering and codegen (#135311)
e42f8609858f [flang][nfc] Support volatility in Fir ops (#134858)
b2711e1526f9 [flang][nfc] Support volatile on ref, box, and class types (#134386)
```
This patch,
- Added a new attribute `nontemporal` to fir.load and fir.store operation in the FIR dialect.
- Added a pass `lower-nontemporal` which is called before FIRToLLVM conversion pass and adds the nontemporal attribute to loads and stores on the list items specified in the nontemporal clause of the SIMD directive.
- Set the `UnitAttr:$nontemporal` to llvm.load and llvm.store operations during FIR to LLVM dialect conversion, if the corresponding fir.load or fir.store operations have the nontemporal attribute.
- Attached the `nontemporal metadata` to load and store instructions that have the nontemporal attribute, during LLVM dialect to LLVM IR translation.
* Enable lowering and conversion patterns to pass volatility information
from higher level operations to lower level ones.
* Enable codegen to pass volatility to LLVM dialect ops by setting an
attribute on loads, stores, and memory intrinsics.
* Add utilities for passing along the volatility from an input type to
an output type.
To introduce volatile types into the IR, entities with the volatile
attribute will be given a volatile type in the bridge; this is not
enabled in this patch. User code should not result in IR with volatile
types yet, so this patch contains no tests with Fortran source, only IR
that already contains volatile types.
Part 3 of #132486.
I am making a CG pass to depend on `FIROpenACCSupport` in #134346.
This introduces a cyclic dependency between `FIROpenACCSupport`
and `FIRCodeGen`. This patch splits `FIRCodeGen` into
`FIRCodeGenDialect` (for FIR CG dialect definition) and `FIRCodeGen`
(for the CG passes).
Now, `FIROpenACCSupport` depends on `FIRCodeGenDialect`,
and `FIRCodeGen` depends on `FIROpenACCSupport`.
Global with the CUDA shared data attribute needs to be lowered to llvm
globals with the correct address space (3). Address space is set from
the `mlir::NVVM::NVVMMemorySpace::kSharedMemorySpace` enum from
`mlir/Dialect/LLVMIR/NVVMDialect.h`
The saturated floating point conversion intrinsics match the semantics in the standard more closely than the fptosi/fptoui instructions.
Case 2 of 16.9.100 is
> INT (A [, KIND])
> If A is of type real, there are two cases: if |A| < 1, INT (A) has the
value 0; if |A| ≥ 1, INT (A) is the integer whose magnitude is the
largest integer that does not exceed the magnitude of A and whose sign
is the same as the sign of A.
Currently, converting a floating point value into an integer type too
small to hold the constant will be converted to poison in opt, leaving
us with garbage:
```
> cat t.f90
program main
real(kind=16) :: f
integer(kind=4) :: i
f=huge(f)
i=f
print *, i
end program main
# current upstream
> for i in `seq 10`; do; ./a.out; done
-862156992
-1497393344
-739096768
-1649494208
1761228608
-1959270592
-746244288
-1629194432
-231217344
382322496
```
With the saturated fptoui/fptosi intrinsics, we get the appropriate
values
```
# mine
> flang -O2 ./t.f90 && ./a.out
2147483647
> perl -e 'printf "%d\n", (2 ** 31) - 1'
2147483647
```
One notable difference: NaNs being converted to ints will become zero, unlike current flang (and some other compilers). Newer versions of GCC have this behavior.
Introduce a FIR operation to do memcopy/memmove of compile time constant size types.
This is to avoid requiring derived type copies to done with load/store
which is badly supported in LLVM when the aggregate type is "big" (no
threshold can easily be defined here, better to always avoid them for
fir.type).
This was the root cause of the regressions caused by #114002 which introduced a
load/store of fir.type<> which caused hand/asserts to fire in LLVM on
several benchmarks.
See https://llvm.org/docs/Frontend/PerformanceTips.html#avoid-creating-values-of-aggregate-type
Previous PR: https://github.com/llvm/llvm-project/pull/129308
Changes:
* The alloc-32.fir test is now marked as requiring the X86 target.
* Drive-by fixes uncovered when fixing tests involving malloc
This reverts commit cf1964af5a461196904b663ede04c26555fcff69.
This causes breakage on all the non-x86 buildbots as they don't have the i686
target enabled. This was missed in pre-commit CI.
Although 32-bit targets are currently not officially supported, add a type conversion in the AllocMemOp lowering when calling the `malloc` function on 32-bit targets. This fixes a type mismatch, and this fix makes it easier to potentially support such targets in the future.
This involves making sure the `LLVMTypeConverter` has the necessary information to know the target bit width.
Co-authored-by: Valentin Clement (バレンタイン クレメン) <clementval@gmail.com>
This patch updates fir.coordinate_op to carry the field index as
attributes instead of relying on getting it from the fir.field_index
operations defining its operands.
The rational is that FIR currently has a few operations that require
DAGs to be preserved in order to be able to do code generation. This is
the case of fir.coordinate_op, which requires its fir.field operand
producer to be visible.
This makes IR transformation harder/brittle, so I want to update FIR to
get rid if this.
Codegen/printer/parser of fir.coordinate_of and many tests need to be
updated after this change.
This change is inspired by a case in facerec benchmark, where
performance
of scalar code may improve by about 6%@aarch64 due to getting rid of
redundant
loads from Fortran descriptors. These descriptors are corresponding
to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest
in LocalMove subroutine contains call to Fortran runtime IO functions,
and LLVM globals-aa analysis cannot prove that these calls do not modify
the globalized descriptors with internal linkage.
This patch sets and propagates llvm.memory_effects attribute for
fir.call
operations calling Fortran runtime functions. In particular, it tries
to set the Other memory effect to NoModRef. The Other memory effect
includes accesses to globals and captured pointers, so we cannot set
it for functions taking Fortran descriptors with one exception
for calls where the Fortran descriptor arguments are all null.
As long as different calls to the same Fortran runtime function may have
different attributes, I decided to attach the attributes to the calls
rather than functions. Moreover, attaching the attributes to func.func
will require propagating these attributes to llvm.func, which is not
happening right now.
In addition to llvm.memory_effects, the new pass sets llvm.nosync
and llvm.nocallback attributes that may also help LLVM alias analysis
(e.g. see #127707). These attributes are ignored currently.
I will support them in LLVM IR dialect in a separate patch.
I also added another pass for developers to be able to print
declarations/calls of all Fortran runtime functions that are recognized
by the attributes setting pass. It should help with maintenance
of the LIT tests.