Both clang and gfortran support the -fopenmp-simd flag, which enables
OpenMP support only for simd constructs, while disabling the rest of
OpenMP.
Implement the appropriate parse tree rewriting to remove non-SIMD OpenMP
constructs at the parsing stage.
Add a new SimdOnly flang OpenMP IR pass which rewrites generated OpenMP
FIR to handle untangling composite simd constructs, and clean up OpenMP
operations leftover after the parse tree rewriting stage.
With this approach, the two parts of the logic required to make the flag
work can be self-contained within the parse tree rewriter and the MLIR
pass, respectively. It does not need to be implemented within the core
lowering logic itself.
The flag is expected to have no effect if -fopenmp is passed explicitly,
and is only expected to remove OpenMP constructs, not things like OpenMP
library functions calls. This matches the behaviour of other compilers.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
This patch adds an option to select the method for computing complex
number division. It uses `LoweringOptions` to determine whether to lower
complex division to a runtime function call or to MLIR's `complex.div`,
and `CodeGenOptions` to select the computation algorithm for
`complex.div`. The available option values and their corresponding
algorithms are as follows:
- `full`: Lower to a runtime function call. (Default behavior)
- `improved`: Lower to `complex.div` and expand to Smith's algorithm.
- `basic`: Lower to `complex.div` and expand to the algebraic algorithm.
See also the discussion in the following discourse post:
https://discourse.llvm.org/t/optimization-of-complex-number-division/83468
---------
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch ensures a few `cl::opt` declarations
are properly annotated with `LLVM_ABI`. The annotations currently have
no meaningful impact on the LLVM build; however, they are a prerequisite
to support an LLVM Windows DLL (shared library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
## Overview
- Remove local `extern` declarations of `llvm::PrintPipelinePasses`
because it is already correctly declared with an `LLVM_ABI` annotation
in `llvm\Passes\PassBuilder.h`. Leaving these declarations results in a
gcc compile warning unless they are also annotated with `LLVM_ABI`.
- Similarly, remove local `extern` declarations of
`ProfileSummaryCutoffHot` and `UseContextLessSummary` from
`llvm/tools/llvm-profgen/ProfileGenerator.cpp` since they are declared
with `LLVM_ABI` in `llvm\ProfileData\ProfileCommon.h`.
- Explicitly annotate the extern declaration of `ProfileCorrelate` in
`clang/lib/CodeGen/BackendUtil.cpp` since it is not declared in a
header. The definition of `ProfileCorrelate` in
`llvm\lib\Transforms\Instrumentation\InstrProfiling.cpp` is already
annotated with `LLVM_ABI`.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
This PR resubmits the changes from #136098, which was previously
reverted due to a build failure during the linking stage:
```
undefined reference to `llvm::DebugInfoCorrelate'
undefined reference to `llvm::ProfileCorrelate'
```
The root cause was that `llvm/lib/Frontend/Driver/CodeGenOptions.cpp`
references symbols from the `Instrumentation` component, but the
`LINK_COMPONENTS` in the `llvm/lib/Frontend/CMakeLists.txt` for
`LLVMFrontendDriver` did not include it. As a result, linking failed in
configurations where these components were not transitively linked.
### Fix:
This updated patch explicitly adds `Instrumentation` to
`LINK_COMPONENTS` in the relevant `llvm/lib/Frontend/CMakeLists.txt`
file to ensure the required symbols are properly resolved.
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Chyaka <52224511+liliumshade@users.noreply.github.com>
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
This patch adds support for the -mrecip command line option. The parsing
of this options is equivalent to Clang's and it is implemented by
setting the "reciprocal-estimates" function attribute.
Also move the ParseMRecip(...) function to CommonArgs, so that Flang is
able to make use of it as well.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
This patch implements IR-based Profile-Guided Optimization support in
Flang through the following flags:
- `-fprofile-generate` for instrumentation-based profile generation
- `-fprofile-use=<dir>/file` for profile-guided optimization
Resolves#74216 (implements IR PGO support phase)
**Key changes:**
- Frontend flag handling aligned with Clang/GCC semantics
- Instrumentation hooks into LLVM PGO infrastructure
- LIT tests verifying:
- Instrumentation metadata generation
- Profile loading from specified path
- Branch weight attribution (IR checks)
**Tests:**
- Added gcc-flag-compatibility.f90 test module verifying:
- Flag parsing boundary conditions
- IR-level profile annotation consistency
- Profile input path normalization rules
- SPEC2006 benchmark results will be shared in comments
For details on LLVM's PGO framework, refer to [Clang PGO
Documentation](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization).
This implementation was developed by [XSCC Compiler
Team](https://github.com/orgs/OpenXiangShan/teams/xscc).
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Tom Eccles <t@freedommail.info>
This patch adds support for the -mprefer-vector-width= command line
option. The parsing of this options is equivalent to Clang's and it is
implemented by setting the "prefer-vector-width" function attribute.
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
This is a fix for the issue https://github.com/llvm/llvm-project/issues/137126
that turned out to be a driver issue.
FrontendActions has a loop to process multiple input files and `flang -fc1`
accept multiple files, but the semantic, lowering, and llvm codegen
actions were not re-entrant, and crash or weird behaviors occurred
when processing multiple files with `-fc1`.
This patch makes the actions reentrant by cleaning-up the contexts/modules
if needed on entry.
This patch defines `fir::SafeTempArrayCopyAttrInterface` and the
corresponding
OpenACC/OpenMP related attributes in FIR dialect. The actual
implementations
are just placeholders right now, and array repacking becomes a no-op
if `-fopenacc/-fopenmp` is used for the compilation.
This patch updates flang to follow clang's behavior when processing the
`-mcode-object-version` option.
It is now used to populate an LLVM module flag called
`amdhsa_code_object_version` expected by the backend and also updates
the driver to add the `--amdhsa-code-object-version` option to the
frontend invocation for device compilation of AMDGPU targets.
This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.
An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.
In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
This looks like a huge PR but a lot of the added stuff is documentation.
It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.
PR stack:
- https://github.com/llvm/llvm-project/pull/126026 (this PR)
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
Summary:
When we were first porting to COV5, this lead to some ABI issues due to
a change in how we looked up the work group size. Bitcode libraries
relied on the builtins to emit code, but this was changed between
versions. This prevented the bitcode libraries, like OpenMP or libc,
from being used for both COV4 and COV5. The solution was to have this
'none' functionality which effectively emitted code that branched off of
a global to resolve to either version.
This isn't a great solution because it forced every TU to have this
variable in it. The patch in
https://github.com/llvm/llvm-project/pull/131033 removed support for
COV4 from OpenMP, which was the only consumer of this functionality.
Other users like HIP and OpenCL did not use this because they linked the
ROCm Device Library directly which has its own handling (The name was
borrowed from it after all).
So, now that we don't need to worry about backward compatibility with
COV4, we can remove this special handling. Users can still emit COV4
code, this simply removes the special handling used to make the OpenMP
device runtime bitcode version agnostic.
Add -f[no-]slp-vectorize to the flang driver.
Add corresponding -fvectorize-slp to the flang frontend.
Enable -fslp-vectorize at -O2 and higher in flang to match the current
behaviour in clang.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
FrontendActions.cpp is currently one of the biggest compilation units in
all of flang. Measuring its compilation gives the following metrics:
User time (seconds): 139.21
System time (seconds): 4.65
Maximum resident set size (kbytes): 5891440 (5.61 GB)
This commit separates out explicit invocations of the parser into a
separate compilation unit - ParserActions.cpp - through helper functions
in order to decrease the maximum compilation time and memory usage of a
single unit.
After the split, the measurements of FrontendActions.cpp are as follows:
User time (seconds): 70.08
System time (seconds): 3.16
Maximum resident set size (kbytes): 3961492 (3.7 GB)
While the ones for the newly created ParserActions.cpp as follows:
User time (seconds): 104.33
System time (seconds): 3.37
Maximum resident set size (kbytes): 4185600 (3.99 GB)
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
This patch adds the -fvectorize and -fno-vectorize flags to flang.
Note that this also changes the behaviour of `flang -fc1` to match that
of `clang -cc1`, which is that vectorization is only enabled in the
presence of the `-vectorize-loops` flag.
Additionally, this patch changes the behaviour of the default
optimisation levels to match clang, such that vectorization only happens
at the same levels as it does there.
This patch is in draft while I write an RFC to discuss the above two
changes.
This is needed to generate proper ABI flags in the ELF header for LTO
builds. If these flags aren't set correctly, we can't link with objects
that were built with the correct flags.
For non-LTO builds the mcpu/mattr in the TargetMachine will cause the
backend to infer an ABI. For LTO builds the mcpu/mattr aren't set.
I've only added lp64, lp64f, and lp64d ABIs. ilp32* requires riscv32
which is not yet supported in flang. lp64e requires a different
DataLayout string and would need additional plumbing.
Fixes#115679
The OpenACC type interfaces have been updated to require that a type
self-identify which type category it belongs to. Ensure that FIR types
are able to provide this self identification.
In addition to implementing the new API, the PointerLikeType interface
attachment was moved to FIROpenACCSupport library like MappableType to
ensure all type interfaces and their implementation are now in the same
spot.
Move non-common files from FortranCommon to FortranSupport (analogous to
LLVMSupport) such that
* declarations and definitions that are only used by the Flang compiler,
but not by the runtime, are moved to FortranSupport
* declarations and definitions that are used by both ("common"), the
compiler and the runtime, remain in FortranCommon
* generic STL-like/ADT/utility classes and algorithms remain in
FortranCommon
This allows a for cleaner separation between compiler and runtime
components, which are compiled differently. For instance, runtime
sources must not use STL's `<optional>` which causes problems with CUDA
support. Instead, the surrogate header `flang/Common/optional.h` must be
used. This PR fixes this for `fast-int-sel.h`.
Declarations in include/Runtime are also used by both, but are
header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler
and runtime, is also moved into FortranCommon.
The behavior is not entirely consistent with that of clang for the
moment since detailed timing information on the LLVM IR optimization and
code generation passes is not provided. The -ftime-report= option is
also not enabled since that is only relevant for information about the
LLVM IR passes. However, some code to handle that option has been
included, to make it easier to support the option when the issues
blocking it are resolved. A FortranSupport library has been created that
is intended to mirror the LLVM and MLIR support libraries.
Based on @tarunprabhu's PR
https://github.com/llvm/llvm-project/pull/107270 with minor changes
addressing latest review feedback. He's busy and we'd like to get this
support in ASAP.
Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
The flang-new driver doesn't check for the case of the parser failing to
consume the entire input file. This is of course never an ideal outcome,
and usually signals a need to improve error recovery, but it is better
for the compiler to admit failure rather than to silently proceed with
compilation of what may well be an incomplete parse tree.
This commit fixes some but not all memory leaks in Flang. There are
still 91 tests that fail with ASAN.
- Use `mlir::OwningOpRef` instead of `std::unique_ptr`. The latter does
not free allocations of nested blocks.
- Pass `ModuleOp` as value instead of reference.
- Add few missing deallocations in test cases and other places.
Add a new pass that lowers an `omp.workshare` with its binding `omp.workshare.loop_wrapper` loop nests into other OpenMP constructs that can be lowered to LLVM.
More specifically, in order to preserve the sequential execution semantics of the code contained, it wraps portions that needs to be executed on a single thread in `omp.single` blocks, converts code that must be parallelized into `omp.wsloop` nests and inserts the appropriate synchronization.
nsw is now added to do-variable increment when -fno-wrapv is enabled as
GFortran seems to do.
That means the option introduced by #91579 isn't necessary any more.
Note that the feature of -flang-experimental-integer-overflow is enabled
by default.
The underlying issue was caused by a file included in two different
places which resulted in duplicate definition errors when linking
individual shared libraries. This was fixed in c3201ddaeac02a2c86a38b
[#109874].
This does a global rename from `flang-new` to `flang`. I also
removed/changed any TODOs that I found related to making this change.
---------
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
Co-authored-by: Andrzej Warzynski <andrzej.warzynski@arm.com>
Remove flang/include/flang/Tools/CLOptions.inc - which was included as
is in - several places. Move the code in it to header and source files
which are used used in the "standard" way. Some minor cleanup such as
removing trailing whitespace and excessive newlines and reordering
entries alphabetically for files that were modified along the way.
Update the documentation that referenced CLOptions.inc.
Add support for the -frecord-command-line option that will produce the
llvm.commandline metadata which will eventually be saved in the object
file. This behavior is also supported in clang. Some refactoring of the
code in flang to handle these command line options was carried out. The
corresponding -grecord-command-line option which saves the command line
in the debug information has not yet been enabled for flang.
As specified in the docs,
1) raw_string_ostream is always unbuffered and
2) the underlying buffer may be used directly
( 65b13610a5226b84889b923bae884ba395ad084d for further reference )
Avoid unneeded calls to raw_string_ostream::str(), to avoid excess indirection.
The behavior deliberately mimics that of clang. Ideally, -print-pipeline-passes
should be a first-class driver option. Notes to this effect have been added in
the appropriate places in both flang and clang.
---------
Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
The verification pass is run right after lowering with its own pass
manager by flang driver, but the mlir command line options were not
applied to this pass manager.
This prevented options like `-mmlir
--mlir-pass-pipeline-crash-reproducer="crash.fir"` or `-mmlir
--mlir-print-ir-after-failure` to work when a verifier error was hit
right after lowering, while these options are useful to
investigate/reproduce internal errors.
Note that the change in the pipeline tests is not showing a new pass
being run: the pass was already run, but `-mmlir --mlir-pass-statistics`
was not applied when the initial verification pass was run.
Note that when we deal with compiler performance, we will probably want
to run the verification pass only once after the initial lowering (this
patch shows that it is called twice in a raw: once after the initial
lowering, once at the beginning of FIR to LLVM IR lowering).
This patch implements the -mcmodel flag from clang, allowing the Code
Model to be changed for the LLVM module. The same set of mcmodel
flags are accepted as in clang and the same Code Model attributes are
added to the LLVM module for those flags.
Also add `-mlarge-data-threshold` for x86-64, which is automatically set
by the shared command-line code (see below). This is also added as an
attribute into the LLVM module and on the target machine.
A function is created for `addMCModel` that is copied out of clang's
argument handling so that it can be shared with flang.
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
This PR adds -mtune as a valid flang flag and passes the information
through to LLVM IR as an attribute on all functions. No specific
architecture optimizations are added at this time.
Fixed the link error that previously occurred on buildbots by adding
IRPrinter to the linked components of the Flang frontend.
This reverts commit 1d4523505e54415a270d7b13b6e203fc25585c5b.
The Flang frontend currently prints LLVM IR modules using
llvm::Module::print(); this works for default cases, but skips some of
the logic that IR printer passes use, specifically the use of the
--write-experimental-debuginfo flag to control debug info format. This
patch replaces the use of print() with the PrintModulePass, bringing the
printing behaviour to parity with clang's frontend.
Also reverts "[MLIR][Flang][DebugInfo] Convert debug format in MLIR translators"
The patch above introduces behaviour controlled by an LLVM flag into the
Flang driver, which is incorrect behaviour.
This reverts commits:
3cc2710e0dd53bb82742904fa13014018a1137ed.
460408f78b30720950040e336f7b566aa7203269.
Reapplies the original patch with some additional conversion layers added
to the MLIR translator, to ensure that we don't write the new debug info
format unless WriteNewDbgInfoFormat is set.
This reverts commit 8c5d9c79b96ed8297b381e00d3a706a432cd6c9d.