[Flang] Reword the overview document

I brought the overview document up to date and added information for
most compilation phases to dump out the reeults of the phase.

Differential Revision: https://reviews.llvm.org/D140241
This commit is contained in:
Peter Steinfeld 2022-12-16 12:05:16 -08:00
parent 6c3a2902f1
commit 62a410f651

View File

@ -6,23 +6,46 @@
--> -->
# Intro
This document goes briefly over compiler phases in Flang. It focuses on the
internal implementation and as such, it is intended for Flang developers rather
than end-users.
# Overview of Compiler Phases # Overview of Compiler Phases
```eval_rst ```eval_rst
.. contents:: .. contents::
:local: :local:
``` ```
The Flang compiler transforms Fortran source code into an executable file.
This transformation proceeds in three high level phases -- analysis, lowering,
and code generation/linking.
Each phase produces either correct output or fatal errors. The first high level phase (analysis) transforms Fortran source code into a
decorated parse tree and a symbol table. During this phase, all user
related errors are detected and reported.
The second high level phase (lowering), changes the decorated parse tree and
symbol table into the Fortran Intermediate Representation (FIR), which is a
dialect of LLVM's Multi-Level Intermediate Representation or MLIR. It then
runs a series of passes on the FIR code which verify its validity, perform a
series of optimizations, and finally transform it into LLVM's Intermediate
Representation, or LLVM IR
The third high level phase generates machine code and invokes a linker to
produce an executable file.
This document describes the first two high level phases. Each of these is
described in more detailed phases.
Each detailed phase is described -- its inputs and outputs along with how to
produce a readable version of the outputs.
Each detailed phase produces either correct output or fatal errors.
# Analysis
This high level phase validates that the program is correct and creates all of
the information needed for lowering.
## Prescan and Preprocess ## Prescan and Preprocess
See: [Preprocessing.md](Preprocessing.md). See [Preprocessing.md](Preprocessing.md).
**Input:** Fortran source and header files, command line macro definitions, **Input:** Fortran source and header files, command line macro definitions,
set of enabled compiler directives (to be treated as directives rather than set of enabled compiler directives (to be treated as directives rather than
@ -32,82 +55,135 @@ See: [Preprocessing.md](Preprocessing.md).
- A "cooked" character stream: the entire program as a contiguous stream of - A "cooked" character stream: the entire program as a contiguous stream of
normalized Fortran source. normalized Fortran source.
Extraneous whitespace and comments are removed (except comments that are Extraneous whitespace and comments are removed (except comments that are
compiler directives that are not disabled) and case is normalized. compiler directives that are not disabled) and case is normalized. Also,
directives are processed and macros expanded.
- Provenance information mapping each character back to the source it came from. - Provenance information mapping each character back to the source it came from.
This is used in subsequent phases to issue errors messages that refer to source locations. This is used in subsequent phases that need source locations. This includes
error messages, optimization reports, and debugging information.
**Entry point:** `parser::Parsing::Prescan` **Entry point:** `parser::Parsing::Prescan`
**Command:** `flang-new -fc1 -E src.f90` dumps the cooked character stream **Commands:**
- `flang-new -fc1 -E src.f90` dumps the cooked character stream
- `flang-new -fc1 -fdebug-dump-provenance src.f90` dumps provenance
information
## Parse ## Parsing
**Input:** Cooked character stream. **Input:** Cooked character stream
**Output:** A parse tree representing a syntactically correct program, **Output:** A parse tree for each Fortran program unit in the source code
rooted at a `parser::Program`. representing a syntactically correct program, rooted at the program unit. See:
See: [Parsing.md](Parsing.md) and [ParserCombinators.md](ParserCombinators.md). [Parsing.md](Parsing.md) and [ParserCombinators.md](ParserCombinators.md).
**Entry point:** `parser::Parsing::Parse` **Entry point:** `parser::Parsing::Parse`
**Command:** **Commands:**
- `flang-new -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree - `flang-new -fc1 -fdebug-dump-parse-tree-no-sema src.f90` dumps the parse tree
- `flang-new -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran - `flang-new -fc1 -fdebug-unparse src.f90` converts the parse tree to normalized Fortran
- `flang-new -fc1 -fdebug-dump-parsing-log src.f90` runs an instrumented parse and dumps the log
- `flang-new -fc1 -fdebug-measure-parse-tree src.f90` measures the parse tree
## Validate Labels and Canonicalize Do Statements ## Semantic processing
**Input:** Parse tree. **Input:** the parse tree, the cooked character stream, and provenance
information
**Output:** The parse tree with label constraints and construct names checked,
and each `LabelDoStmt` converted to a `NonLabelDoStmt`.
See: [LabelResolution.md](LabelResolution.md).
**Entry points:** `semantics::ValidateLabels`, `parser::CanonicalizeDo`
## Resolve Names
**Input:** Parse tree (without `LabelDoStmt`) and `.mod` files from compilation
of USEd modules.
**Output:** **Output:**
- Tree of scopes populated with symbols and types * a symbol table
- Parse tree with some refinements: * modified parse tree
- each `parser::Name::symbol` field points to one of the symbols * module files, (see: [ModFiles.md](ModFiles.md))
- each `parser::TypeSpec::declTypeSpec` field points to one of the types * the intrinsic procedure table
- array element references that were parsed as function references or * the target characteristics
statement functions are corrected * the runtime derived type derived type tables (see: [RuntimeTypeInfo.md](RuntimeTypeInfo.md))
**Entry points:** `semantics::ResolveNames`, `semantics::RewriteParseTree` **Entry point:** `semantics::Semantics::Perform`
**Command:** `flang-new -fc1 -fdebug-dump-symbols src.f90` dumps the For more detail on semantic analysis, see: [Semantics.md](Semantics.md).
tree of scopes and symbols in each scope Semantic processing performs several tasks:
* validates labels, see: [LabelResolution.md](LabelResolution.md).
* canonicalizes DO statements,
* canonicalizes OpenACC and OpenMP code
* resolves names, building a tree of scopes and symbols
* rewrites the parse tree to correct parsing mistakes (when needed) once semantic information is available to clarify the program's meaning
* checks the validity of declarations
* analyzes expressions and statements, emitting error messages where appropriate
* creates module files if the source code contains modules,
see [ModFiles.md](ModFiles.md).
## Check DO CONCURRENT Constraints In the course of semantic analysis, the compiler:
* creates the symbol table
* decorates the parse tree with semantic information (such as pointers into the symbol table)
* creates the intrinsic procedure table
* folds constant expressions
**Input:** Parse tree with names resolved. At the end of semantic processing, all validation of the user's program is complete. This is the last detailed phase of analysis processing.
**Output:** Parse tree with semantically correct DO CONCURRENT loops. **Commands:**
- `flang-new -fc1 -fdebug-dump-parse-tree src.f90` dumps the parse tree after semantic analysis
- `flang-new -fc1 -fdebug-dump-symbols src.f90` dumps the symbol table
- `flang-new -fc1 -fdebug-dump-all src.f90` dumps both the parse tree and the symbol table
## Write Module Files # Lowering
**Input:** Parse tree with names resolved. Lowering takes the parse tree and symbol table produced by analysis and
produces LLVM IR.
**Output:** For each module and submodule, a `.mod` file containing a minimal ## Create the lowering bridge
Fortran representation suitable for compiling program units that depend on it.
See [ModFiles.md](ModFiles.md).
## Analyze Expressions and Assignments **Inputs:**
- the parse tree
- the symbol table
- The default KINDs for intrinsic types (specified by default or command line option)
- The intrinsic procedure table (created in semantics processing)
- The target characteristics (created during semantics processing)
- The cooked character stream
- The target triple -- CPU type, vendor, operating system
- The mapping between Fortran KIND values to FIR KIND values
**Input:** Parse tree with names resolved. The lowering bridge is a container that holds all of the information needed for lowering.
**Output:** Parse tree with `parser::Expr::typedExpr` filled in and semantic **Output:** A container with all of the information needed for lowering
checks performed on all expressions and assignment statements.
**Entry points**: `semantics::AnalyzeExpressions`, `semantics::AnalyzeAssignments` **Entry point:** lower::LoweringBridge::create
## Produce the Intermediate Representation ## Initial lowering
**Input:** Parse tree with names and labels resolved. **Input:** the lowering bridge
**Output:** An intermediate representation of the executable program. **Output:** A Fortran IR (FIR) representation of the program.
See [FortranIR.md](FortranIR.md).
**Entry point:** `lower::LoweringBridge::lower`
The compiler then takes the information in the lowering bridge and creates a
pre-FIR tree or PFT. The PFT is a list of programs and modules. The programs
and modules contain lists of function-like units. The function-like units
contain a list of evaluations. All of these contain pointers back into the
parse tree. The compiler walks the PFT generating FIR.
**Commands:**
- `flang-new -fc1 -fdebug-dump-pft src.f90` dumps the pre-FIR tree
- `flang-new -fc1 -emit-mlir src.f90` dumps the FIR to the files src.mlir
## Transformation passes
**Input:** initial version of the FIR code
**Output:** An LLVM IR representation of the program
**Entry point:** `mlir::PassManager::run`
The compiler then runs a series of passes over the FIR code. The first is a
verification pass. It's followed by a series of transformation passes that
perform various optimizations and transformations. The final pass creates an
LLVM IR representation of the program.
**Commands:**
- `flang-new -mmlir --mlir-print-ir-after-all -S src.f90` dumps the FIR code after each pass to standard error
- `flang-new -fc1 -emit-llvm src.f90` dumps the LLVM IR to src.ll
# Object code generation and linking
After the LLVM IR is created, the flang driver invokes LLVM's existing
infrastructure to generate object code and invoke a linker to create the
executable file.