Jonathan Peyton 3fdf3294ab Fix OMPT support for task frames, parallel regions, and parallel regions + loops
This patch makes it possible for a performance tool that uses call stack
unwinding to map implementation-level call stacks from master and worker
threads into a unified global view. There are several components to this patch.

include/*/ompt.h.var
  Add a new enumeration type that indicates whether the code for a master task
    for a parallel region is invoked by the user program or the runtime system
  Change the signature for OMPT parallel begin/end callbacks to indicate whether
    the master task will be invoked by the program or the runtime system. This
    enables a performance tool using call stack unwinding to handle these two
    cases differently. For this case, a profiler that uses call stack unwinding
    needs to know that the call path prefix for the master task may differ from
    those available within the begin/end callbacks if the program invokes the
    master.

kmp.h
  Change the signature for __kmp_join_call to take an additional parameter
  indicating the fork_context type. This is needed to supply the OMPT parallel
  end callback with information about whether the compiler or the runtime
  invoked the master task for a parallel region.

kmp_csupport.c
  Ensure that the OMPT task frame field reenter_runtime_frame is properly set
    and cleared before and after calls to fork and join threads for a parallel
    region.
  Adjust the code for the new signature for __kmp_join_call.
  Adjust the OMPT parallel begin callback invocations to carry the extra
    parameter indicating whether the program or the runtime invokes the master
    task for a parallel region.

kmp_gsupport.c
  Apply all of the analogous changes described for kmp_csupport.c for the GOMP
    interface
  Add OMPT support for the GOMP combined parallel region + loop API to
    maintain the OMPT task frame field reenter_runtime_frame.

kmp_runtime.c:
  Use the new information passed by __kmp_join_call to adjust the OMPT
    parallel end callback invocations to carry the extra parameter indicating
    whether the program or the runtime invokes the master task for a parallel
    region.

ompt_internal.h:
  Use the flavor of the parallel region API (GNU or Intel) to determine who
    invokes the master task.

Differential Revision: http://reviews.llvm.org/D11259

llvm-svn: 242817
2015-07-21 18:03:30 +00:00
..
2015-06-01 02:32:03 +00:00
2015-07-09 18:16:58 +00:00

               README for the LLVM* OpenMP* Runtime Library
               ============================================

How to Build Documentation
==========================

The main documentation is in Doxygen* format, and this distribution
should come with pre-built PDF documentation in doc/Reference.pdf.  
However, an HTML version can be built by executing:

% doxygen doc/doxygen/config 

in the runtime directory.

That will produce HTML documentation in the doc/doxygen/generated
directory, which can be accessed by pointing a web browser at the
index.html file there.

If you don't have Doxygen installed, you can download it from
www.doxygen.org.


How to Build the LLVM* OpenMP* Runtime Library
==============================================

The library can be built either using Cmake, or using a makefile that
in turn invokes various Perl scripts. For porting, non X86
architectures, and for those already familiar with Cmake that may be
an easier route to take than the one described here.

Building with CMake
===================
The runtime/Build_With_CMake.txt file has a description of how to
build with Cmake.

Building with the Makefile
==========================
The Makefile at the top-level will attempt to detect what it needs to
build the LLVM* OpenMP* Runtime Library.  To see the default settings, 
type:

make info

You can change the Makefile's behavior with the following options:

omp_root:    The path to the top-level directory containing the top-level
	     Makefile.  By default, this will take on the value of the 
	     current working directory.

omp_os:      Operating system.  By default, the build will attempt to 
	     detect this. Currently supports "linux", "freebsd", "macos", and
	     "windows".

arch:        Architecture. By default, the build will attempt to 
	     detect this if not specified by the user. Currently 
	     supported values are
                 "32" for IA-32 architecture 
                 "32e" for Intel(R) 64 architecture
                 "mic" for Intel(R) Many Integrated Core Architecture
                 "arm" for ARM* architecture
                 "aarch64" for Aarch64 (64-bit ARM) architecture
                 "ppc64" for IBM(R) Power architecture (big endian)
                 "ppc64le" for IBM(R) Power architecture (little endian)

             If "mic" is specified then "icc" will be used as the
	     compiler, and appropriate k1om binutils will be used. The
	     necessary packages must be installed on the build machine
	     for this to be possible (but an Intel(R) Xeon Phi(TM)
	     coprocessor card is not required to build the library).

compiler:    Which compiler to use for the build.  Defaults to "icc" 
	     or "icl" depending on the value of omp_os. Also supports 
	     some versions of "gcc"* when omp_os is "linux". The selected 
	     compiler should be installed and in the user's path. The 
	     corresponding Fortran compiler should also be in the path. 
	     See "Supported RTL Build Configurations" below for more 
	     information on compiler versions.

mode:        Library mode: default is "release".  Also supports "debug".

jobs:        The number of parallel jobs for the underlying call to make.
         This value is sent as the parameter to the -j flag for make.
         This value defaults to "1", but can be set to any positive integer.

To use any of the options above, simple add <option_name>=<value>.  For 
example, if you want to build with gcc instead of icc, type:

make compiler=gcc

On OS X* machines, it is possible to build universal (or fat) libraries which
include both IA-32 architecture and Intel(R) 64 architecture objects in a
single archive; just build the 32 and 32e libraries separately, then invoke 
make again with a special argument as follows:

make compiler=clang build_args=fat

Supported RTL Build Configurations
==================================

Supported Architectures: IA-32 architecture, Intel(R) 64, and 
Intel(R) Many Integrated Core Architecture

              ----------------------------------------------
              |   icc/icl     |    gcc      |   clang      |
--------------|---------------|----------------------------|
| Linux* OS   |   Yes(1,5)    |  Yes(2,4)   | Yes(4,6,7)   |
| FreeBSD*    |   No          |  No         | Yes(4,6,7,8) |
| OS X*       |   Yes(1,3,4)  |  No         | Yes(4,6,7)   |
| Windows* OS |   Yes(1,4)    |  No         | No           |
------------------------------------------------------------

(1) On IA-32 architecture and Intel(R) 64, icc/icl versions 12.x are 
    supported (12.1 is recommended).
(2) GCC* version 4.6.2 is supported.
(3) For icc on OS X*, OS X* version 10.5.8 is supported.
(4) Intel(R) Many Integrated Core Architecture not supported.
(5) On Intel(R) Many Integrated Core Architecture, icc/icl versions 13.0 
    or later are required.
(6) Clang* version 3.3 is supported.
(7) Clang* currently does not offer a software-implemented 128 bit extended 
    precision type.  Thus, all entry points reliant on this type are removed
    from the library and cannot be called in the user program.  The following
    functions are not available:
    __kmpc_atomic_cmplx16_*
    __kmpc_atomic_float16_*
    __kmpc_atomic_*_fp
(8) Community contribution provided AS IS, not tested by Intel.

Front-end Compilers that work with this RTL
===========================================

The following compilers are known to do compatible code generation for
this RTL: clang (from the OpenMP development branch at
http://clang-omp.github.io/ ), Intel compilers, GCC.  See the documentation
for more details.

-----------------------------------------------------------------------

Notices
=======

*Other names and brands may be claimed as the property of others.