Since https://github.com/ARM-software/acle/pull/276 the ACLE
defines attributes to better describe the use of a given SME state.
Previously the attributes merely described the possibility of it being
'shared' or 'preserved', whereas the new attributes have more semantics
and also describe how the data flows through the program.
For ZT0 we already had to add new LLVM IR attributes:
* aarch64_new_zt0
* aarch64_in_zt0
* aarch64_out_zt0
* aarch64_inout_zt0
* aarch64_preserves_zt0
We have now done the same for ZA, such that we add:
* aarch64_new_za (previously `aarch64_pstate_za_new`)
* aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`)
* aarch64_inout_za (more specific variation of
`aarch64_pstate_za_shared`)
* aarch64_preserves_za (previously `aarch64_pstate_za_shared,
aarch64_pstate_za_preserved`)
This explicitly removes 'pstate' from the name, because with SME2 and
the new ACLE attributes there is a difference between "sharing ZA"
(sharing
the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing
either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).
Adds unsafe-fp-math, no-infs-fp-math, no-nans-fp-math,
approx-func-fp-math, and no-signed-zeros-fp-math function attributes.
This allows code generators using the LLVMIR dialect to match the
codegen of Clang.
This patch adds support for translating dense_resource attributes to
LLVMIR Target.
The support added is similar to how DenseElementsAttr is handled, except
we
don't need to handle splats.
Another possible way of doing this is adding iteration on
dense_resource, but that is
non-trivial as DenseResourceAttr is not meant to be something you should
directly
access. It has subclasses which you are supposed to use to iterate on
it.
This patch adds the target_cpu attribute to llvm.func MLIR operations
and updates the translation to/from LLVM IR to match "target-cpu"
function attributes.
This commit changes the MLIR to LLVMIR export to also attach subprogram
debug attachements to function declarations.
This commit additonally fixes the two passes that produce subprograms to
not attach the "Definition" flag to function declarations. This
otherwise results in invalid LLVM IR.
Add a rewriter for DIExpressions & use it to run legalization patterns
before exporting to llvm (because LLVM dialect allows DI Expressions
that may not be valid in LLVM IR).
The rewriter driver works similarly to the existing mlir rewriter
drivers, except it operates on lists of DIExpressionElemAttr (i.e.
DIExpressionAttr). Each rewrite pattern transforms a range of
DIExpressionElemAttr into a new list of DIExpressionElemAttr.
In addition, this PR sets up a place to add legalization patterns that
are broadly applicable internally to the LLVM dialect, and they will
always be applied prior to export. This PR adds one pattern for merging
fragment operators.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
This patch is based on a previous PR https://reviews.llvm.org/D144657
that added alloca address space handling to MLIR's DataLayout and DLTI
interface. This patch aims to add identical features to import and
access the global and program memory space through MLIR's
DataLayout/DLTI system.
Extend the `amendOperation` mechanism for translating dialect attributes
attached to operations from another dialect when translating MLIR to
LLVM IR. Previously, this mechanism would have no knowledge of the LLVM
IR instructions created for the given operation, making it impossible
for it to perform local modifications such as attaching operation-level
metadata. Collect instructions inserted by the LLVM IR builder and pass
them to `amendOperation`.
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This commit ensures that we model DI information for global constants
correctly. These constructs can lack scopes, names, and linkage names,
so these parameters were made optional for the DIGlobalVariable
attribute.
This patch adds a target_features (TargetFeaturesAttr) to the LLVM
dialect to allow setting and querying the features in use on a function.
The motivation for this comes from the Arm SME dialect where we would
like a convenient way to check what variants of an operation are
available based on the CPU features.
Intended usage:
The target_features attribute is populated manually or by a pass:
```mlir
func.func @example() attributes {
target_features = #llvm.target_features<["+sme", "+sve", "+sme-f64f64"]>
} {
// ...
}
```
Then within a later rewrite the attribute can be checked, and used to
make lowering decisions.
```c++
// Finds the "target_features" attribute on the parent
// FunctionOpInterface.
auto targetFeatures = LLVM::TargetFeaturesAttr::featuresAt(op);
// Check a feature.
// Returns false if targetFeatures is null or the feature is not in
// the list.
if (!targetFeatures.contains("+sme-f64f64"))
return failure();
```
For now, this is rather simple just checks if the exact feature is in
the list, though it could be possible to extend with implied features
using information from LLVM.
Add support for frame pointers in MLIR.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
This PR introduces DIGlobalVariableAttr and
DIGlobalVariableExpressionAttr so that ModuleTranslation can emit the
required metadata needed for debug information about global variable.
The translator implementation for debug metadata needed to be refactored
in order to allow translation of nodes based on MDNode
(DIGlobalVariableExpressionAttr and DIExpression) in addition to
DINode-based nodes.
A DIGlobalVariableExpressionAttr can now be passed to the GlobalOp
operation directly and ModuleTranslation will create the respective
DIGlobalVariable and DIGlobalVariableExpression nodes. The compile unit
that DIGlobalVariable is expected to be configured with will be updated
with the created DIGlobalVariableExpression.
In the early days of MLIR-to-LLVM IR translation, it had to forcefully
inject declarations of `malloc` and `free` functions as then-standard
(now `memref`) dialect ops were unconditionally lowering to libc calls.
This is no longer the case. Even when they do lower to libc calls, the
signatures of those methods are injected at lowering since calls must
target declared functions in valid IR. Don't inject those declarations
anymore.
This extends `LLVM_IntrOpBase` so that it can be passed a list of
`immArgPositions` and a list (of the same length) of `immArgAttrNames`.
`immArgPositions` contains the positions of `immargs` on the LLVM IR
intrinsic, and `immArgAttrNames` maps those to a corresponding MLIR
attribute.
This allows modeling LLVM `immargs` as MLIR attributes, which is the
closest match semantically (and had already been done manually for the
LLVM dialect intrinsics).
This has two upsides:
* It's slightly easier to implement intrinsics with immargs now
(especially if they make use of other features, such as overloads)
* It clearly defines that `immargs` should map to attributes, before
there was no mention of `immargs` in LLVMOpBase.td, so implementing them
was unclear
This works with other features of the `LLVM_IntrOpBase`, so `immargs`
can be marked as overloaded too (which is used in some intrinsics).
As part of this patch (and to test correctness) existing intrinsics have
been updated to use these new parameters.
This also uncovered a few issues with the
`llvm.intr.vector.insert/extract` intrinsics. First, the argument order
for insert did not match the LLVM intrinsic, and secondly, both were
missing a mlirBuilder (so failed to import from LLVM IR). This is
corrected with this patch (and a test case added).
Data layout queries may be issued for types whose size exceeds the range
of 32-bit integer as well as for types that don't have a size known at
compile time, such as scalable vectors. Use best practices from LLVM IR
and adopt `llvm::TypeSize` for size-related queries and `uint64_t` for
alignment-related queries.
See #72678.
Previously, we were inserting za.enable/disable intrinsics for functions
with the "arm_za" attribute (at the MLIR level), rather than using the
backend attributes. This was done to avoid a dependency on the SME ABI
functions from compiler-rt (which have only recently been implemented).
Doing things this way did have correctness issues, for example, calling
a streaming-mode function from another streaming-mode function (both
with ZA enabled) would lead to ZA being disabled after returning to the
caller (where it should still be enabled). Fixing issues like this would
require re-doing the ABI work already done in the backend within MLIR.
Instead, this patch switches to use the "arm_new_za" (backend) attribute
for enabling ZA for an MLIR function. For the integration tests, this
requires some way of linking the SME ABI functions. This is done via the
`%arm_sme_abi_shlib` lit substitution. By default, this expands to a
stub implementation of the SME ABI functions, but this can be overridden
by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable
(pointing it at an alternative implementation). For now, the ArmSME
integration tests pass with just stubs, as we don't make use of nested
ZA-enabled calls.
A future patch may add an option to compiler-rt to build the SME
builtins into a standalone shared library to allow easily
building/testing with the actual implementation.
This commit fixes a bug in the Mem2Reg operation erasure order.
Replacing the use-def based topological order with a dominance-based
weak order ensures that no operation is removed before all its uses have
been replaced. The order relation uses the topological order of blocks
and block internal ordering to determine a deterministic operation
order.
Additionally, the reliance on the `DenseMap` key order was eliminated by
switching to a `MapVector`, that gives a deterministic iteration order.
Example:
```
%ptr = alloca ...
...
%val0 = %load %ptr ... // LOAD0
store %val0 %ptr ...
%val1 = load %ptr ... // LOAD1
````
When promoting the slot backing %ptr, it can happen that the LOAD0 was
cleaned before LOAD1. This results in all uses of LOAD0 being replaced
by its reaching definition, before LOAD1's result is replaced by LOAD0's
result. The subsequent erasure of LOAD0 can thus not succeed, as it has
remaining usages.
The vscale_range is used for scalabale vector functionality in Arm
Scalable Vector Extension to select the size of vector operation (and I
thnk RISCV has something similar).
This patch adds the base support for the vscale_range attribute to the
LLVM::FuncOp, and the marshalling for translation to LLVM-IR and import
from LLVM-IR to LLVM dialect.
This attribute is intended to be used at higher level MLIR, specified
either by command-line options to the compiler or using compiler
directives (e.g. pragmas or function attributes in the source code) to
indicate the desired range.
Use the recently introduced llvm.mlir.zero operation for values with
LLVM target extension type. Replaces the previous workaround that uses a
single zero-valued integer attribute constant operation.
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Default atomic ordering information is processed in the OpenMP dialect
to LLVM IR lowering stage at every spot where an operation can be
affected by it. The rest of clauses are stored globally in the
OpenMPIRBuilderConfig object before starting that lowering stage, so
that the OMPIRBuilder can conditionally modify code generation
depending on these. At the end of the process, the omp.requires
attribute is itself lowered into a global constructor that passes these
clauses as flags to the OpenMP runtime.
Depends on D147217, D147218 and D158278.
Differential Revision: https://reviews.llvm.org/D147219
This patch updates the `OpenMPIRBuilderConfig` structure to hold all
available 'requires' clauses, and it replicates part of the code
generation for the 'requires' registration function from clang in the
`OMPIRBuilder`, to be used with flang.
Porting the rest of features of the clang implementation to the IRBuilder
and sharing it between clang and flang remains for a future patch, due to the
complexity of the logic selecting the attributes of the generated
registration function.
Differential Revision: https://reviews.llvm.org/D147217
This patch moves the call for translating an MLIR module to LLVM IR to the
beginning of the translation process. This enables the use of dialect
attributes attached to `builtin.module` operations and the `amendOperation()`
flow to initialize dialect-specific global configuration before translating
the contents of the module.
Currently, this patch does not impact the generated IR on its own. Testing
infrastructure to allow translating the Test dialect to LLVM IR is added, so
that it can be checked that the current behavior is not broken in the future.
Differential Revision: https://reviews.llvm.org/D158278
Change the LLVM dialect to LLVM IR translation to convert the alias
scope attributes lazily to LLVM IR metadata. Previously, the alias
scopes have been translated upfront walking the alias scopes of
operations that implement the AliasAnalysisOpInterface. As a result,
the translation of a module that contains only a noalias scope
intrinsic failed, since its alias scope attribute has not been
translated due to the intrinsic not implementing
AliasAnalysisOpInterface.
Reviewed By: zero9178
Differential Revision: https://reviews.llvm.org/D159187
Many previous sets of AMDGPU dialect code have been incorrect in the
presence of the bf16 type (when lowered to LLVM's bfloat) as they were
developed in a setting that run a custom bf16-to-i16 pass before LLVM
lowering.
An overall effect of this patch is that you should run
--arith-emulate-unsupported-floats="source-types=bf16 target-type=f32"
on your GPU module before calling --convert-gpu-to-rocdl if your code
performs bf16 arithmetic.
While LLVM now supports software bfloat, initial experiments showed
that using this support on AMDGPU inserted a large number of
conversions around loads and stores which had substantial performance
imparts. Furthermore, all of the native AMDGPU operations on bf16
types (like the WMMA operations) operate on 16-bit integers instead of
the bfloat type.
First, we make the following changes to preserve compatibility once
the LLVM bfloat type is reenabled.
1. The matrix multiplication operations (MFMA and WMMA) will bitcast
bfloat vectors to i16 vectors.
2. Buffer loads and stores will operate on the relevant integer
datatype and then cast to bfloat if needed.
Second, we add type conversions to convert bf16 and vectors of it to
equivalent i16 types.
Third, we add the bfloat <-> f32 expansion patterns to the set of
operations run before the main LLVM conversion so that MLIR's
implementation of these conversion routines is used.
Finally, we extend the "floats treated as integers" support in the
LLVM exporter to handle types other than fp8.
We also fix a bug in the unsupported floats emulation where it tried
to operate on `arith.bitcast` due to an oversight.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D156361
This patch modifies the MLIR-to-LLVMIR translation pass to enable dialect
attributes attached to external functions being processed by the corresponding
dialect's translation interface via `amendOperation()`.
Differential Revision: https://reviews.llvm.org/D156988
The `allocsize` attribute is weird because it packs two 32-bit values
into a 64-bit value. It also turns out that the passthrough attribute
exporter was using `int`, which is incorrectly handling 64-bit integers.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D156574
Convert function bodies after all other operations, breaking possible declaration-reference issues between top non-LLVM Ops and non-LLVM ops inside function bodies.
Example:
```
mydialect.global @myglobal : i32
llvm.func @bar(...) {
...
%address = mydialect.global_address @myglobal : llvm.ptr
...
}
```
With the previous scheme `mydialect.global_address` always got translated before `mydialect.global`, this change ensures `mydialect.global` gets translated first.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D156284
This revision removes the metadata op, that to the best of our
knowledge, has no more uses after switching to a purely attribute based
metadata representation:
https://reviews.llvm.org/D155444https://reviews.llvm.org/D155285https://reviews.llvm.org/D155159
These changes got unlocked after landing distinct attribute support:
https://reviews.llvm.org/D153360,
which enables modeling distinct metadata using attributes. As a result,
all metadata kinds are now represented using attributes. Previously,
there has been a mix of attribute and op based representations.
Having attribute only metadata makes it possible to update the metadata
in-parallel, while updating the global metadata operation has been
a sequential process. The LLVM Dialect inliner already benefits from
this change and now creates new alias scopes and domains during
inlining rather than dropping the no alias information:
https://reviews.llvm.org/D155712
Reviewed By: Dinistro
Differential Revision: https://reviews.llvm.org/D156217
This patch modifies the construction of the `OpenMPIRBuilder` in MLIR to
initialize the `IsGPU` flag using target triple information passed down from
the Flang frontend. If not present, it will default to `false`.
This replicates the behavior currently implemented in Clang, where the
`CodeGenModule::createOpenMPRuntime()` method creates a different
`CGOpenMPRuntime` instance depending on the target triple, which in turn has an
effect on the `IsGPU` flag of the `OpenMPIRBuilderConfig` object.
Differential Revision: https://reviews.llvm.org/D151903
This revision adds a branch weight op interface for the call / branch
operations that support branch weights. It can be used in the LLVM IR
import and export to simplify the branch weight conversion. An
additional mapping between call operations and instructions ensures
the actual conversion can be done in the module translation itself,
rather than in the dialect translation interface. It also has the
benefit that downstream users can amend custom metadata to the call
operation during the export to LLVM IR.
Reviewed By: zero9178, definelicht
Differential Revision: https://reviews.llvm.org/D155702
The current representation of TBAA is the very last in-tree user of the `llvm.metadata` operation.
Using ops to model metadata has a few disadvantages:
* Building a graph has to be done through some weakly typed indirection mechanism such as `SymbolRefAttr`
* Creating the metadata has to be done through a builder within a metadata op.
* It is not multithreading safe as operation insertion into the same block is not thread-safe
This patch therefore converts TBAA metadata into an attribute representation, in a similar manner as it has been done for alias groups and access groups in previous patches.
This additionally has the large benefit of giving us more "correctness by construction" as it makes things like cycles in a TBAA graph, or references to an incorrectly typed metadata node impossible.
Differential Revision: https://reviews.llvm.org/D155444
Using MLIR attributes instead of metadata has many advantages:
* No indirection: Attributes can simply refer to each other seemlessly without having to use the indirection of `SymbolRefAttr`. This also gives us correctness by construction in a lot of places as well
* Multithreading safe: The Attribute infrastructure gives us thread-safety for free. Creating operations and inserting them into a block is not thread-safe. This is a major use case for e.g. the inliner in MLIR which runs in parallel
* Easier to create: There is no need for a builder or a metadata region
This patch therefore does exactly that. It leverages the new distinct attributes to create distinct access groups in a deterministic and threadsafe manner.
Differential Revision: https://reviews.llvm.org/D155285
Using MLIR attributes instead of metadata has many advantages:
* No indirection: Attributes can simply refer to each other seemlessly without having to use the indirection of `SymbolRefAttr`. This also gives us correctness by construction in a lot of places as well
* Multithreading save: The Attribute infrastructure gives us thread-safety for free. Creating operations and inserting them into a block is not thread-safe. This is a major use case for e.g. the inliner in MLIR which runs in parallel
* Easier to create: There is no need for a builder or a metadata region
This patch therefore does exactly that. It leverages the new distinct attributes to create distinct alias domains and scopes in a deterministic and threadsafe manner.
Differential Revision: https://reviews.llvm.org/D155159
This patch renames the `OpenMPIRBuilderConfig` flags to reduce confusion over
their meaning. `IsTargetCodegen` becomes `IsGPU`, whereas `IsEmbedded` becomes
`IsTargetDevice`. The `-fopenmp-is-device` compiler option is also renamed to
`-fopenmp-is-target-device` and the `omp.is_device` MLIR attribute is renamed
to `omp.is_target_device`. Getters and setters of all these renamed properties
are also updated accordingly. Many unit tests have been updated to use the new
names, but an alias for the `-fopenmp-is-device` option is created so that
external programs do not stop working after the name change.
`IsGPU` is set when the target triple is AMDGCN or NVIDIA PTX, and it is only
valid if `IsTargetDevice` is specified as well. `IsTargetDevice` is set by the
`-fopenmp-is-target-device` compiler frontend option, which is only added to
the OpenMP device invocation for offloading-enabled programs.
Differential Revision: https://reviews.llvm.org/D154591
This commit adds an optional section attribute to the `LLVMFuncOp` and
adds import and export functionality for it.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D154219
This revision adds comdat support to functions. Additionally,
it ensures only comdats that have uses are imported/exported and
only non-empty global comdat operations are created.
Reviewed By: Dinistro
Differential Revision: https://reviews.llvm.org/D153739
Whereas LLVM currently doesn't have any types for 8-bit floats, and
whereas existing 8-bit float APIs (for instance, the AMDGCN
intrinsics) take such floats as (packed) bytes, translate the MLIR
8-bit float types to i8 during LLVM lowering.
In order to not special-case arith.constant for bitcasting constants
to their integer form, amend the MLIR to LLVM translator to turn 8-bit
float constants into i8 constants with the same value (by use of
APFloat's bitcast method).
This change can be reverted once LLVM has 8-bit float types.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D153160
The LLVM comdat operation specifies how to deduplicate globals with the
same key in two different object files. This is necessary on Windows
where e.g. two object files with linkonce globals will not link unless
a comdat for those globals is specified. It is also supported in the ELF
format.
Differential Revision: https://reviews.llvm.org/D150796
This commit changes intrinsics that have immarg parameter attributes to
model these parameters as attributes, instead of operands. Using
operands only works if the operation is an `llvm.mlir.constant`,
otherwise the exported LLVMIR is invalid.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D151692