Adding device information print for AMD devices on the
`llvm-omp-device-info` command line tool. The output is inspired by
the rocminfo command line tool.
This commit adds missing HSA functions, enums and structs
needed to query additional information from the HSA agents.
A generic message for the `generic-elf-64bit` plugin is also added
Example of an output:
```
llvm-omp-device-info
Device (0):
This is a generic-elf-64bit device
Device (1):
This is a generic-elf-64bit device
Device (2):
This is a generic-elf-64bit device
Device (3):
This is a generic-elf-64bit device
Device (4):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 0
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE
Device (5):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 1
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE
Device (6):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 2
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE
Device (7):
HSA Runtime Version: 1.1
HSA OpenMP Device Number: 3
Device Name: gfx906
Vendor Name: AMD
Device Type: GPU
Max Queues: 128
Queue Min Size: 64
Queue Max Size: 131072
Cache:
L0: 16384 bytes
L1: 8388608 bytes
Cacheline Size: 64
Max Clock Freq(MHz): 1725
Compute Units: 60
SIMD per CU: 4
Fast F16 Operation: TRUE
Wavefront Size: 64
Workgroup Max Size: 1024
Workgroup Max Size per Dimension:
x: 1024
y: 1024
z: 1024
Max Waves Per CU: 40
Max Work-item Per CU: 2560
Grid Max Size: 4294967295
Grid Max Size per Dimension:
x: 4294967295
y: 4294967295
z: 4294967295
Max fbarriers/Workgrp: 32
Memory Pools:
Pool GLOBAL; FLAGS: COARSE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GLOBAL; FLAGS: FINE GRAINED, :
Size: 34342961152 bytes
Allocatable: TRUE
Runtime Alloc Granule: 4096 bytes
Runtime Alloc alignment: 4096 bytes
Accessable by all: FALSE
Pool GROUP:
Size: 65536 bytes
Allocatable: FALSE
Runtime Alloc Granule: 0 bytes
Runtime Alloc alignment: 0 bytes
Accessable by all: FALSE
```
Differential Revision: https://reviews.llvm.org/D126836
README for the LLVM* OpenMP* Offloading Runtime Library (libomptarget)
======================================================================
How to Build the LLVM* OpenMP* Offloading Runtime Library (libomptarget)
========================================================================
In-tree build:
$ cd where-you-want-to-live
Check out openmp (libomptarget lives under ./libomptarget) into llvm/projects
$ cd where-you-want-to-build
$ mkdir build && cd build
$ cmake path/to/llvm -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
$ make omptarget
Out-of-tree build:
$ cd where-you-want-to-live
Check out openmp (libomptarget lives under ./libomptarget)
$ cd where-you-want-to-live/openmp/libomptarget
$ mkdir build && cd build
$ cmake path/to/openmp -DCMAKE_C_COMPILER=<C compiler> -DCMAKE_CXX_COMPILER=<C++ compiler>
$ make
For details about building, please look at README.rst in the parent directory.
Architectures Supported
=======================
The current library has been only tested in Linux operating system and the
following host architectures:
* Intel(R) 64 architecture
* IBM(R) Power architecture (big endian)
* IBM(R) Power architecture (little endian)
* ARM(R) AArch64 architecture (little endian)
The currently supported offloading device architectures are:
* Intel(R) 64 architecture (generic 64-bit plugin - mostly for testing purposes)
* IBM(R) Power architecture (big endian) (generic 64-bit plugin - mostly for testing purposes)
* IBM(R) Power architecture (little endian) (generic 64-bit plugin - mostly for testing purposes)
* ARM(R) AArch64 architecture (little endian) (generic 64-bit plugin - mostly for testing purposes)
* CUDA(R) enabled 64-bit NVIDIA(R) GPU architectures
Supported RTL Build Configurations
==================================
Supported Architectures: Intel(R) 64, IBM(R) Power 7 and Power 8
---------------------------
| gcc | clang |
--------------|------------|------------|
| Linux* OS | Yes(1) | Yes(2) |
-----------------------------------------
(1) gcc version 4.8.2 or later is supported.
(2) clang version 3.7 or later is supported.
Front-end Compilers that work with this RTL
===========================================
The following compilers are known to do compatible code generation for
this RTL:
- clang (from https://github.com/clang-ykt )
- clang (development branch at http://clang.llvm.org - several features still
under development)
-----------------------------------------------------------------------
Notices
=======
This library and related compiler support is still under development, so the
employed interface is likely to change in the future.
*Other names and brands may be claimed as the property of others.