There are some spurious libraries which can be removed.
I'm trying to bundle MLIR/LLVM library dependencies for our own
libraries. We're utilizing cmake function to recursively collect
MLIR/LLVM related dependencies. However, we identified certain library
dependencies as redundant and safe for removal.
This PR adds ability to pass non-default value to
`.amdhsa_code_object_version` metadata when serializing ROCDL GPU
modules.
It also fixes typos in two places.
---------
Co-authored-by: Fabian Mora <fmora.dev@gmail.com>
* Strip calls to raw_string_ostream::flush(), which is essentially a no-op
* Strip unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
This patch adds the `#gpu.kernel_metadata` and `#gpu.kernel_table`
attributes. The `#gpu.kernel_metadata` attribute allows storing metadata
related to a compiled kernel, for example, the number of scalar
registers used by the kernel. The attribute only has 2 required
parameters, the name and function type. It also has 2 optional
parameters, the arguments attributes and generic dictionary for storing
all other metadata.
The `#gpu.kernel_table` stores a table of `#gpu.kernel_metadata`,
mapping the name of the kernel to the metadata.
Finally, the function `ROCDL::getAMDHSAKernelsELFMetadata` was added to
collect ELF metadata from a binary, and to test the class methods in
both attributes.
Example:
```mlir
gpu.binary @binary [#gpu.object<#rocdl.target<chip = "gfx900">, kernels = #gpu.kernel_table<[
#gpu.kernel_metadata<"kernel0", (i32) -> (), metadata = {sgpr_count = 255}>,
#gpu.kernel_metadata<"kernel1", (i32, f32) -> (), arg_attrs = [{llvm.read_only}, {}]>
]> , bin = "BLOB">]
```
The motivation behind these attributes is to provide useful information
for things like tunning.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
This patch adds an argument to `gpu::TargetAttrInterface::createObject`
to pass the GPU module. This is useful as `gpu::ObjectAttr` contains a
property dict for metadata, hence the module can be used for extracting
things like the symbol table and adding it to the property dict.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
This patch constructs the AMDGCN ISA control variable explicitly instead
of linking against the library shipped with ROCm. This change prevents
issues arising from the order in which the AMDGCN libraries are linked.
Most of the code here does not depend on the NVPTX target. In particular
the simple offload can just emit LLVM IR and we can use this without the
NVVM backend being built, which can be useful for a frontend that just
need to serialize the IR and leave it up to the runtime to JIT further.
Reland: https://github.com/llvm/llvm-project/pull/95456
This patch improves the ROCDL gpu serialization API by:
- Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN
device code libraries to use during linking.
- Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`.
Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
- Exposing the `compileToBinary` method and making it virtual, allowing
downstream users to re-use this method.
- Exposing `moduleToObjectImpl`, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
- It also avoids constructing the control variables if no device
libraries are being used.
- Changes the style of the error messages to be composable, ie no full
stops.
- Adds an error message for when the ROCm toolkit can't be found but it
was required.
As discussed in #89743, when using the Visual Studio solution
generators, object library projects are displayed as a collection of
non-editable *.obj files. To look for the corresponding source files,
one has to browse (or search) to the library's obj.libname project. This
patch tries to avoid this as much as possible.
For Clang, there is already an exception for XCode. We handle MSVC_IDE
the same way.
For MLIR, this is more complicated. There are explicit references to the
obj.libname target that only work when there is an object library. This
patch cleans up the reasons for why an object library is needed:
1. The obj.libname is modified in the calling CMakeLists.txt. Note that
with use-only references, `add_library(<name> ALIAS <target>)` could
have been used.
2. An `libMLIR.so` (mlir-shlib) is also created. This works by adding
linking the object libraries' object files into the libMLIR.so (in
addition to the library's own .so/.a). XCode is handled using the
`-force_load` linker option instead. Windows is not supported. This
mechanism is different from LLVM's llvm-shlib that is created by linking
static libraries with `-Wl,--whole-archive` (and `-Wl,-all_load` on
MacOS).
3. The library might be added to an aggregate library. In-tree, the
seems to be only `libMLIR-C.so` and the standalone example. In XCode, it
uses the object library and `-force_load` mechanism as above. Again,
this is different from `libLLVM-C.so`.
4. Build an object library whenever it was before this patch, except
when generating a Visual Studio solution. This condition could be
removed, but I am trying to avoid build breakages of whatever
configurations others use.
This seems to never have worked with XCode because of the explicit
references to obj.libname (reason 1.). I don't have access to XCode, but
I tried to preserve the current working. IMHO there should be a common
mechanism to build aggregate libraries for all LLVM projects instead of
the 4 that we have now.
As far as I can see, this means for LLVM there are the following changes
on whether object libraries are created:
1. An object library is created even in XCode if FORCE_OBJECT_LIBRARY is
set. I do not know how XCode handles it, but I also know CMake will
abort otherwise.
2. An object library is created even for explicitly SHARED libraries for
building `libMLIR.so`. Again, mlir-shlib does not work otherwise.
`libMLIR.so` itself is created using SHARED so this patch is marking it
as EXCLUDE_FROM_LIBMLIR.
3. For the second condition, it is now sensitive to whether the
mlir-shlib is built at all (LLVM_BUILD_LLVM_DYLIB). However, an object
library is still built using the fourth condition unless using the MSVC
solution generator. That is, except with MSVC_IDE, when an object
library was built before, it will also be an object library now.
Reland: https://github.com/llvm/llvm-project/pull/95456
This patch improves the ROCDL gpu serialization API by:
- Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN
device code libraries to use during linking.
- Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`.
Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
- Exposing the `compileToBinary` method and making it virtual, allowing
downstream users to re-use this method.
- Exposing `moduleToObjectImpl`, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
- It also avoids constructing the control variables if no device
libraries are being used.
- Changes the style of the error messages to be composable, ie no full
stops.
- Adds an error message for when the ROCm toolkit can't be found but it
was required.
This patch improves the ROCDL gpu serialization API by:
- Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN
device code libraries to use during linking.
- Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`.
Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
- Exposing the `compileToBinary` method and making it virtual, allowing
downstream users to re-use this method.
- Exposing `moduleToObjectImpl`, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
- It also avoids constructing the control variables if no device
libraries are being used.
This patch also changes the behavior of the CMake flag
`DEFAULT_ROCM_PATH`. Before it would fall back to a default value of
`/opt/rocm` if not specified. However, that default value causes fragile
builds in environments with ROCm. Now, the flag falls back to the empty
string, making it clear that **the user must provide a value at LLVM
build time**.
LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX
was enabled when building LLVM. Use this instead of manually defining
MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).
This is another follow-up of #83004, which made the same change for
`MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit
exposes mentioned CMake variable through `mlir-config.h` and uses the
macro that is introduced with the same name. This replaces the macro
`MLIR_NVPTXCOMPILER_ENABLED`, which the CMake files previously defined
manually.
This is another follow-up of #83004. `NVVM/Target.cpp` uses the macro
`MLIR_NVPTXCOMPILER_ENABLED`, which is defined in `llvm-config.h` but
did not include that file, yielding a warning when compiled with
`-Wundef`. This PR adds the include.
~~This is another follow-up of #83004, which made the same change for
`MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit
exposes mentioned CMake variable through `mlir-config.h` and uses the
macro that is introduced with the same name. This replaces the macro
`MLIR_NVPTXCOMPILER_ENABLED`, which the CMake files previously defined
manually.~~
This is a follow up of #83004, which made the same change for
`MLIR_CUDA_CONVERSIONS_ENABLED`. As the previous PR, this PR commit
exposes mentioned CMake variable through `mlir-config.h` and uses the
macro that is introduced with the same name. This replaces the macro
`MLIR_ROCM_CONVERSIONS_ENABLED`, which the CMake files previously
defined manually.
That macro was not defined in some cases and thus yielded warnings if
compiled with `-Wundef`. In particular, they were not defined in the
BUILD files, so the GPU targets were broken when built with Bazel. This
commit exposes mentioned CMake variable through mlir-config.h and uses
the macro that is introduced with the same name. This replaces the macro
MLIR_CUDA_CONVERSIONS_ENABLED, which the CMake files previously defined
manually.
Some specific implementation of the offload may want more customization, and
even avoid using LLVM in-tree to dispatch the ISA translation to a custom
solution. This refactoring makes it possible for such implementation to work
without even configuring the target backend in LLVM.
Reviewers: fabianmcg
Reviewed By: fabianmcg
Pull Request: https://github.com/llvm/llvm-project/pull/71165
SerializetToHsaco, as currently implemented, leaks the file descriptor
of the .hsaco temporary file, which causes issues in long-running
parallel compilation setups.
See also https://github.com/ROCmSoftwarePlatform/rocMLIR/pull/1257
This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.
2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.
3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.
4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
Currently, the NVPTX tool compilation path only calls `ptxas`; thus, the
GPU running the binary must be an exact match of the arch of the target,
or else the runtime throws an error due to the arch mismatch.
This patch adds a call to `fatbinary`, creating a fat binary with the
cubin object and the PTX code, allowing the driver to JIT the PTX at
runtime if there's an arch mismatch.
This revision avoids the registration of dialect extensions in Pass::getDependentDialects.
Such registration of extensions can be dangerous because `DialectRegistry::isSubsetOf` is
always guaranteed to return false for extensions (i.e. there is no mechanism to track
whether a lambda is already in the list of already registered extensions).
When the context is already in a multi-threaded mode, this is guaranteed to assert.
Arguably a more structured registration mechanism for extensions with a unique ExtensionID
could be envisioned in the future.
In the process of cleaning this up, multiple usage inconsistencies surfaced around the
registration of translation extensions that this revision also cleans up.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157703
Fix a build failure in GCC 7 caused by the function:
`std::optional<SmallVector<std::unique_ptr<llvm::Module>>> loadBitcodeFiles` ,
in `NVVMTarget` & `ROCDLTarget`.
The failure is caused because GCC fails to use the move constructor in
`std::optional` for constructing the return value, which prompts a call to the
deleted copy constructor in `std::unique_ptr`, resulting in a failure.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D157804
**For an explanation of these patches see D154153.**
Commit message:
This patch adds the ROCDL target attribute for serializing GPU modules into
strings containing HSAco.
Depends on D154117
Differential Revision: https://reviews.llvm.org/D154129
**For an explanation of these patches see D154153.**
Commit message:
This patch adds the ROCDL target attribute for serializing GPU modules into
strings containing HSAco.
Depends on D154117
Reviewed By: mehdi_amini, krzysz00
Differential Revision: https://reviews.llvm.org/D154129
**For an explanation of these patches see D154153.**
Commit message:
This patch adds the NVVM target attribute for serializing GPU modules into
strings containing cubin.
Depends on D154113 and D154100 and D154097
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D154117
**For an explanation of these patches see D154153.**
Commit message:
This patch adds the utility base class `ModuleToObject`. This class provides an
interface for compiling module operations into binary strings, by default this
class serialize modules to LLVM bitcode.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D154100
**For an explanation of these patches see D154153.**
Commit message:
This patch adds the utility base class `ModuleToObject`. This class provides an
interface for compiling module operations into binary strings, by default this
class serialize modules to LLVM bitcode.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D154100