The `noinline`, `alwaysinline`, and `optnone` function attributes are
already being used in MLIR code for the LLVM inlining interface and in
some SPIR-V lowering, despite residing in the passthrough dictionary,
which is intended as exactly that -- a pass through MLIR -- and not to
model any actual semantics being handled in MLIR itself.
Promote the `noinline`, `alwaysinline`, and `optnone` attributes out of
the passthrough dictionary on `llvm.func` into first class unit
attributes, updating the import and export accordingly.
Add a verifier to `llvm.func` that checks that these attributes are not
set in an incompatible way according to the LLVM specification.
Update the LLVM dialect inlining interface to use the first class
attributes to check whether inlining is possible.
Also reverts "[MLIR][Flang][DebugInfo] Convert debug format in MLIR translators"
The patch above introduces behaviour controlled by an LLVM flag into the
Flang driver, which is incorrect behaviour.
This reverts commits:
3cc2710e0dd53bb82742904fa13014018a1137ed.
460408f78b30720950040e336f7b566aa7203269.
Reapplies the original patch with some additional conversion layers added
to the MLIR translator, to ensure that we don't write the new debug info
format unless WriteNewDbgInfoFormat is set.
This reverts commit 8c5d9c79b96ed8297b381e00d3a706a432cd6c9d.
Reverted due to failure on buildbot due to missing use of the
WriteNewDbgInfoFormat flag in MLIR.
This reverts commit ca920bb6285e9995f5a202d040af79363e98ab28.
MLIR's LLVM dialect does not internally support debug records, only
converting to/from debug intrinsics. To smooth the transition from
intrinsics to records, there is a step prior to IR->MLIR translation
that switches the IR module to intrinsic-form; this patch adds the
equivalent conversion to record-form at MLIR->IR translation, and also
modifies the flang front end to use the WriteNewDbgInfoFormat flag when
it is emitting LLVM IR.
This commit adds a boolean parameter that allows downstream users to
disable the verification when translating an MLIR module to LLVM IR.
This is helpful for debugging broken LLVM IR modules post translation.
MLIR LLMArrayType is using `unsigned` for the number of elements while
LLVM ArrayType is using `uint64_t`
4ae896fe97/llvm/include/llvm/IR/DerivedTypes.h (L377)
This leads to silent truncation when we use it for globals in flang.
```
program test
integer(8), parameter :: large = 2**30
real, dimension(large) :: bigarray
common /c/ bigarray
bigarray(999) = 666
end
```
The above program would result in a segfault since the global would be
of size 0 because of the silent truncation.
```
fir.global common @c_(dense<0> : vector<4294967296xi8>) : !fir.array<4294967296xi8>
```
became
```
llvm.mlir.global common @c_(dense<0> : vector<4294967296xi8>) {addr_space = 0 : i32} : !llvm.array<0 x i8>
```
This patch updates the definition of MLIR ArrayType to take `uint64_t`
as argument of the number of elements to be compatible with LLVM.
This PR attempts to consolidate the different topological sort utilities
into one place. It adds them to the analysis folder because the
`SliceAnalysis` uses some of these.
There are now two different sorting strategies:
1. Sort only according to SSA use-def chains
2. Sort while taking regions into account. This requires a much more
elaborate traversal and cannot be applied on graph regions that easily.
This additionally reimplements the region aware topological sorting
because the previous implementation had an exponential space complexity.
I'm open to suggestions on how to combine this further or how to fuse
the test passes.
This commit renames the name of the block sorting utility function to
`getBlocksSortedByDominance`. A topological order is not defined on a
general directed graph, so the previous name did not make sense.
Currently, only those global variables which are at compile unit scope
are added to the 'globals' list of the DICompileUnit. This does not work
for languages which support modules (e.g. Fortran) where hierarchy
can be
variable -> module -> compile unit.
To fix this, if a variable scope points to a module, we walk one level
up and see if module is in the compile unit scope.
This was initially part of #91582 which adds debug information for
Fortran module variables. @kiranchandramohan pointed out that MLIR
changes should go in separate PRs.
Add operation mapping to the LLVM
`llvm.experimental.constrained.fptrunc.*` intrinsic.
The new operation implements the new
`LLVM::FPExceptionBehaviorOpInterface` and
`LLVM::RoundingModeOpInterface` interfaces.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
When importing from LLVM IR the data layout of all pointer types
contains an index bitwidth that should be used for index computations.
This revision adds a getter to the DataLayout that provides access to
the already stored bitwidth. The function returns an optional since only
pointer-like types have an index bitwidth. Querying the bitwidth of a
non-pointer type returns std::nullopt.
The new function works for the built-in Index type and, using a type
interface, for the LLVMPointerType.
This commit fixes the translation of access group metadata to LLVM IR.
Previously, it did not use a temporary metadata node to model the
placeholder of the self-referencing access group nodes. This is
dangerous since, the translation may produce a metadata list with a null
entry that is later on changed changed with a self reference. At the
same time, for example the debug info translation may create the same
uniqued node, which after setting the self-reference the suddenly
references the access group metadata. The commit avoids such breakages.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp:1050:11:
error: variable 'numConstantsHit' set but not used [-Werror,-Wunused-but-set-variable]
int numConstantsHit = 0;
^
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp:1051:11:
error: variable 'numConstantsErased' set but not used [-Werror,-Wunused-but-set-variable]
int numConstantsErased = 0;
^
2 errors generated.
There is memory explosion when converting the body or initializer region
of a large global variable, e.g. a constant array.
For example, when translating a constant array of 100000 strings:
llvm.mlir.global internal constant @cats_strings() {addr_space = 0 :
i32, alignment = 16 : i64} : !llvm.array<100000 x ptr<i8>> {
%0 = llvm.mlir.undef : !llvm.array<100000 x ptr<i8>>
%1 = llvm.mlir.addressof @om_1 : !llvm.ptr<array<1 x i8>>
%2 = llvm.getelementptr %1[0, 0] : (!llvm.ptr<array<1 x i8>>) ->
!llvm.ptr<i8>
%3 = llvm.insertvalue %2, %0[0] : !llvm.array<100000 x ptr<i8>>
%4 = llvm.mlir.addressof @om_2 : !llvm.ptr<array<1 x i8>>
%5 = llvm.getelementptr %4[0, 0] : (!llvm.ptr<array<1 x i8>>) ->
!llvm.ptr<i8>
%6 = llvm.insertvalue %5, %3[1] : !llvm.array<100000 x ptr<i8>>
%7 = llvm.mlir.addressof @om_3 : !llvm.ptr<array<1 x i8>>
%8 = llvm.getelementptr %7[0, 0] : (!llvm.ptr<array<1 x i8>>) ->
!llvm.ptr<i8>
%9 = llvm.insertvalue %8, %6[2] : !llvm.array<100000 x ptr<i8>>
%10 = llvm.mlir.addressof @om_4 : !llvm.ptr<array<1 x i8>>
%11 = llvm.getelementptr %10[0, 0] : (!llvm.ptr<array<1 x i8>>) ->
!llvm.ptr<i8>
%12 = llvm.insertvalue %11, %9[3] : !llvm.array<100000 x ptr<i8>>
... (ignore the remaining part)
}
where @om_1, @om_2, ... are string global constants.
Each time an operation is converted to LLVM, a new constant is created.
When it comes to llvm.insertvalue, a new constant array of 100000
elements is created and the old constant array (input) is not destroyed.
This causes memory explosion. We observed that, on a system with 128 GB
memory, the translation of 100000 elements got killed due to using up
all the memory. On a system with 64 GB, 65536 elements was enough to
cause the translation killed.
There is a previous patch (https://reviews.llvm.org/D148487) which fix
this issue but was reverted for
https://github.com/llvm/llvm-project/issues/62802
The old patch checks generated constants and destroyed them if there is
no use. But the check of use for the constant is too early, which cause
the constant be removed before use.
This new patch added a map was added a map to save expected use count
for a constant. Then decrease when reach each use.
And only erase the constant when the use count reach to zero
With new patch, the repro in
https://github.com/llvm/llvm-project/issues/62802 finished correctly.
Add support for attribute nvvm.grid_constant on LLVM function arguments.
The attribute can be attached only to arguments of type llvm.ptr that
have llvm.byval attribute.
Generate LLVM metadata for functions with nvvm.grid_constant arguments.
The metadata node is a list of integers, where each integer n denotes
that the nth parameter has the
grid_constant annotation (numbering from 1). The generated metadata node
will be handled by NVVM compiler. See
https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-properties
for documentation on grid_constant property.
This patch also adds convertParameterAttr to
LLVMTranslationDialectInterface for supporting the translation of
derived dialect attributes on function parameters
Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.
Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.
For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0
We have now done the same for ZA, such that we add:
* aarch64_new_za (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)
This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
Adds unsafe-fp-math, no-infs-fp-math, no-nans-fp-math,
approx-func-fp-math, and no-signed-zeros-fp-math function attributes.
This allows code generators using the LLVMIR dialect to match the
codegen of Clang.
This patch adds support for translating dense_resource attributes to
LLVMIR Target.
The support added is similar to how DenseElementsAttr is handled, except
we
don't need to handle splats.
Another possible way of doing this is adding iteration on
dense_resource, but that is
non-trivial as DenseResourceAttr is not meant to be something you should
directly
access. It has subclasses which you are supposed to use to iterate on
it.
This patch adds the target_cpu attribute to llvm.func MLIR operations
and updates the translation to/from LLVM IR to match "target-cpu"
function attributes.
This commit changes the MLIR to LLVMIR export to also attach subprogram
debug attachements to function declarations.
This commit additonally fixes the two passes that produce subprograms to
not attach the "Definition" flag to function declarations. This
otherwise results in invalid LLVM IR.
Add a rewriter for DIExpressions & use it to run legalization patterns
before exporting to llvm (because LLVM dialect allows DI Expressions
that may not be valid in LLVM IR).
The rewriter driver works similarly to the existing mlir rewriter
drivers, except it operates on lists of DIExpressionElemAttr (i.e.
DIExpressionAttr). Each rewrite pattern transforms a range of
DIExpressionElemAttr into a new list of DIExpressionElemAttr.
In addition, this PR sets up a place to add legalization patterns that
are broadly applicable internally to the LLVM dialect, and they will
always be applied prior to export. This PR adds one pattern for merging
fragment operators.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
This patch is based on a previous PR https://reviews.llvm.org/D144657
that added alloca address space handling to MLIR's DataLayout and DLTI
interface. This patch aims to add identical features to import and
access the global and program memory space through MLIR's
DataLayout/DLTI system.
Extend the `amendOperation` mechanism for translating dialect attributes
attached to operations from another dialect when translating MLIR to
LLVM IR. Previously, this mechanism would have no knowledge of the LLVM
IR instructions created for the given operation, making it impossible
for it to perform local modifications such as attaching operation-level
metadata. Collect instructions inserted by the LLVM IR builder and pass
them to `amendOperation`.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This commit ensures that we model DI information for global constants
correctly. These constructs can lack scopes, names, and linkage names,
so these parameters were made optional for the DIGlobalVariable
attribute.
This patch adds a target_features (TargetFeaturesAttr) to the LLVM
dialect to allow setting and querying the features in use on a function.
The motivation for this comes from the Arm SME dialect where we would
like a convenient way to check what variants of an operation are
available based on the CPU features.
Intended usage:
The target_features attribute is populated manually or by a pass:
```mlir
func.func @example() attributes {
target_features = #llvm.target_features<["+sme", "+sve", "+sme-f64f64"]>
} {
// ...
}
```
Then within a later rewrite the attribute can be checked, and used to
make lowering decisions.
```c++
// Finds the "target_features" attribute on the parent
// FunctionOpInterface.
auto targetFeatures = LLVM::TargetFeaturesAttr::featuresAt(op);
// Check a feature.
// Returns false if targetFeatures is null or the feature is not in
// the list.
if (!targetFeatures.contains("+sme-f64f64"))
return failure();
```
For now, this is rather simple just checks if the exact feature is in
the list, though it could be possible to extend with implied features
using information from LLVM.
Add support for frame pointers in MLIR.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
This PR introduces DIGlobalVariableAttr and
DIGlobalVariableExpressionAttr so that ModuleTranslation can emit the
required metadata needed for debug information about global variable.
The translator implementation for debug metadata needed to be refactored
in order to allow translation of nodes based on MDNode
(DIGlobalVariableExpressionAttr and DIExpression) in addition to
DINode-based nodes.
A DIGlobalVariableExpressionAttr can now be passed to the GlobalOp
operation directly and ModuleTranslation will create the respective
DIGlobalVariable and DIGlobalVariableExpression nodes. The compile unit
that DIGlobalVariable is expected to be configured with will be updated
with the created DIGlobalVariableExpression.
In the early days of MLIR-to-LLVM IR translation, it had to forcefully
inject declarations of `malloc` and `free` functions as then-standard
(now `memref`) dialect ops were unconditionally lowering to libc calls.
This is no longer the case. Even when they do lower to libc calls, the
signatures of those methods are injected at lowering since calls must
target declared functions in valid IR. Don't inject those declarations
anymore.
This extends `LLVM_IntrOpBase` so that it can be passed a list of
`immArgPositions` and a list (of the same length) of `immArgAttrNames`.
`immArgPositions` contains the positions of `immargs` on the LLVM IR
intrinsic, and `immArgAttrNames` maps those to a corresponding MLIR
attribute.
This allows modeling LLVM `immargs` as MLIR attributes, which is the
closest match semantically (and had already been done manually for the
LLVM dialect intrinsics).
This has two upsides:
* It's slightly easier to implement intrinsics with immargs now
(especially if they make use of other features, such as overloads)
* It clearly defines that `immargs` should map to attributes, before
there was no mention of `immargs` in LLVMOpBase.td, so implementing them
was unclear
This works with other features of the `LLVM_IntrOpBase`, so `immargs`
can be marked as overloaded too (which is used in some intrinsics).
As part of this patch (and to test correctness) existing intrinsics have
been updated to use these new parameters.
This also uncovered a few issues with the
`llvm.intr.vector.insert/extract` intrinsics. First, the argument order
for insert did not match the LLVM intrinsic, and secondly, both were
missing a mlirBuilder (so failed to import from LLVM IR). This is
corrected with this patch (and a test case added).
Data layout queries may be issued for types whose size exceeds the range
of 32-bit integer as well as for types that don't have a size known at
compile time, such as scalable vectors. Use best practices from LLVM IR
and adopt `llvm::TypeSize` for size-related queries and `uint64_t` for
alignment-related queries.
See #72678.
Previously, we were inserting za.enable/disable intrinsics for functions
with the "arm_za" attribute (at the MLIR level), rather than using the
backend attributes. This was done to avoid a dependency on the SME ABI
functions from compiler-rt (which have only recently been implemented).
Doing things this way did have correctness issues, for example, calling
a streaming-mode function from another streaming-mode function (both
with ZA enabled) would lead to ZA being disabled after returning to the
caller (where it should still be enabled). Fixing issues like this would
require re-doing the ABI work already done in the backend within MLIR.
Instead, this patch switches to use the "arm_new_za" (backend) attribute
for enabling ZA for an MLIR function. For the integration tests, this
requires some way of linking the SME ABI functions. This is done via the
`%arm_sme_abi_shlib` lit substitution. By default, this expands to a
stub implementation of the SME ABI functions, but this can be overridden
by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable
(pointing it at an alternative implementation). For now, the ArmSME
integration tests pass with just stubs, as we don't make use of nested
ZA-enabled calls.
A future patch may add an option to compiler-rt to build the SME
builtins into a standalone shared library to allow easily
building/testing with the actual implementation.
This commit fixes a bug in the Mem2Reg operation erasure order.
Replacing the use-def based topological order with a dominance-based
weak order ensures that no operation is removed before all its uses have
been replaced. The order relation uses the topological order of blocks
and block internal ordering to determine a deterministic operation
order.
Additionally, the reliance on the `DenseMap` key order was eliminated by
switching to a `MapVector`, that gives a deterministic iteration order.
Example:
```
%ptr = alloca ...
...
%val0 = %load %ptr ... // LOAD0
store %val0 %ptr ...
%val1 = load %ptr ... // LOAD1
````
When promoting the slot backing %ptr, it can happen that the LOAD0 was
cleaned before LOAD1. This results in all uses of LOAD0 being replaced
by its reaching definition, before LOAD1's result is replaced by LOAD0's
result. The subsequent erasure of LOAD0 can thus not succeed, as it has
remaining usages.
The vscale_range is used for scalabale vector functionality in Arm
Scalable Vector Extension to select the size of vector operation (and I
thnk RISCV has something similar).
This patch adds the base support for the vscale_range attribute to the
LLVM::FuncOp, and the marshalling for translation to LLVM-IR and import
from LLVM-IR to LLVM dialect.
This attribute is intended to be used at higher level MLIR, specified
either by command-line options to the compiler or using compiler
directives (e.g. pragmas or function attributes in the source code) to
indicate the desired range.
Use the recently introduced llvm.mlir.zero operation for values with
LLVM target extension type. Replaces the previous workaround that uses a
single zero-valued integer attribute constant operation.
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Default atomic ordering information is processed in the OpenMP dialect
to LLVM IR lowering stage at every spot where an operation can be
affected by it. The rest of clauses are stored globally in the
OpenMPIRBuilderConfig object before starting that lowering stage, so
that the OMPIRBuilder can conditionally modify code generation
depending on these. At the end of the process, the omp.requires
attribute is itself lowered into a global constructor that passes these
clauses as flags to the OpenMP runtime.
Depends on D147217, D147218 and D158278.
Differential Revision: https://reviews.llvm.org/D147219
This patch updates the `OpenMPIRBuilderConfig` structure to hold all
available 'requires' clauses, and it replicates part of the code
generation for the 'requires' registration function from clang in the
`OMPIRBuilder`, to be used with flang.
Porting the rest of features of the clang implementation to the IRBuilder
and sharing it between clang and flang remains for a future patch, due to the
complexity of the logic selecting the attributes of the generated
registration function.
Differential Revision: https://reviews.llvm.org/D147217
This patch moves the call for translating an MLIR module to LLVM IR to the
beginning of the translation process. This enables the use of dialect
attributes attached to `builtin.module` operations and the `amendOperation()`
flow to initialize dialect-specific global configuration before translating
the contents of the module.
Currently, this patch does not impact the generated IR on its own. Testing
infrastructure to allow translating the Test dialect to LLVM IR is added, so
that it can be checked that the current behavior is not broken in the future.
Differential Revision: https://reviews.llvm.org/D158278
Change the LLVM dialect to LLVM IR translation to convert the alias
scope attributes lazily to LLVM IR metadata. Previously, the alias
scopes have been translated upfront walking the alias scopes of
operations that implement the AliasAnalysisOpInterface. As a result,
the translation of a module that contains only a noalias scope
intrinsic failed, since its alias scope attribute has not been
translated due to the intrinsic not implementing
AliasAnalysisOpInterface.
Reviewed By: zero9178
Differential Revision: https://reviews.llvm.org/D159187
Many previous sets of AMDGPU dialect code have been incorrect in the
presence of the bf16 type (when lowered to LLVM's bfloat) as they were
developed in a setting that run a custom bf16-to-i16 pass before LLVM
lowering.
An overall effect of this patch is that you should run
--arith-emulate-unsupported-floats="source-types=bf16 target-type=f32"
on your GPU module before calling --convert-gpu-to-rocdl if your code
performs bf16 arithmetic.
While LLVM now supports software bfloat, initial experiments showed
that using this support on AMDGPU inserted a large number of
conversions around loads and stores which had substantial performance
imparts. Furthermore, all of the native AMDGPU operations on bf16
types (like the WMMA operations) operate on 16-bit integers instead of
the bfloat type.
First, we make the following changes to preserve compatibility once
the LLVM bfloat type is reenabled.
1. The matrix multiplication operations (MFMA and WMMA) will bitcast
bfloat vectors to i16 vectors.
2. Buffer loads and stores will operate on the relevant integer
datatype and then cast to bfloat if needed.
Second, we add type conversions to convert bf16 and vectors of it to
equivalent i16 types.
Third, we add the bfloat <-> f32 expansion patterns to the set of
operations run before the main LLVM conversion so that MLIR's
implementation of these conversion routines is used.
Finally, we extend the "floats treated as integers" support in the
LLVM exporter to handle types other than fp8.
We also fix a bug in the unsupported floats emulation where it tried
to operate on `arith.bitcast` due to an oversight.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D156361
This patch modifies the MLIR-to-LLVMIR translation pass to enable dialect
attributes attached to external functions being processed by the corresponding
dialect's translation interface via `amendOperation()`.
Differential Revision: https://reviews.llvm.org/D156988