Flang currently lowers internal procedures passed as actual arguments
using LLVM's `llvm.init.trampoline` / `llvm.adjust.trampoline`
intrinsics, which require an executable stack. On modern Linux
toolchains and security-hardened kernels that enforce W^X (Write XOR
Execute), this causes link-time failures (`ld.lld: error: ... requires
an executable stack`) or runtime `SEGV` from NX violations.
This patch introduces a runtime trampoline pool that allocates
trampolines from a dedicated `mmap`'d region instead of the stack. The
pool toggles page permissions between writable (for patching) and
executable (for dispatch), so the stack stays non-executable throughout.
On macOS, MAP_JIT and `pthread_jit_write_protect_np` are used for the
same effect. An i-cache flush (`__builtin___clear_cache` on Linux,
`sys_icache_invalidate` on macOS) is performed after each write→exec
transition.
The feature is gated behind a new driver flag, `-fsafe-trampoline` (off
by default), which threads through the frontend into the
`BoxedProcedurePass`. When enabled, the pass emits calls to
`_FortranATrampolineInit`, `_FortranATrampolineAdjust`, and
`_FortranATrampolineFree` instead of the legacy intrinsics. The legacy
path is completely untouched when the flag is off.
The pool is a singleton with a fixed capacity (default 1024 slots,
overridable via `FLANG_TRAMPOLINE_POOL_SIZE`). Slot size varies by
target (32 bytes on x86-64/AArch64, 48 on PPC64, 64 fallback). Each slot
holds a small architecture-specific stub, currently x86-64 (17 bytes,
using `r10` as the nest/static-chain register) and AArch64 (24 bytes,
using `x15`). The implementation compiles on all architectures but will
crash at runtime with a clear diagnostic if trampoline emission is
actually attempted on an unsupported target. This avoids breaking the
flang-rt build on e.g. RISC-V or PPC64.
Freed slots are poisoned (the callee pointer is overwritten with a
sentinel) and recycled into a freelist, so the pool can sustain
long-running programs that repeatedly create and destroy closures.
A few design choices worth calling out:
The runtime avoids all C++ runtime dependencies, no `std::mutex`, no
`operator new`, no function-local statics with hidden guard variables.
Locking is via flang-rt's own `Lock` / `CriticalSection`, memory is via
`AllocateMemoryOrCrash` / `FreeMemory`, and the singleton uses explicit
double-checked locking with a raw pointer. This was done so the
trampoline pool links cleanly in minimal / freestanding flang-rt
configurations.
`_FortranATrampolineFree` calls are inserted immediately before every
`func.return` in the enclosing host function. This is a conservative but
correct strategy. The trampoline handle cannot outlive the host's stack
frame since the closure captures the host's local variables by
reference.
The GNU_STACK note is verified via a dedicated integration test
(`safe-trampoline-gnustack.f90`) that compiles and links a Fortran
program using the runtime path, then inspects the ELF with
`llvm-readelf` to confirm the stack segment is `RW` (not `RWE`).
**Test coverage:**
- `flang/test/Driver/fsafe-trampoline.f90` — flag forwarding (on, off,
default)
- `flang/test/Fir/boxproc-safe-trampoline.fir` — FIR-level FileCheck for
emitted runtime calls
- `flang/test/Lower/safe-trampoline.f90` — end-to-end lowering
- `flang-rt/test/Driver/safe-trampoline-gnustack.f90` — GNU_STACK ELF
verification
Closes#182813
Co-authored-by: Sairudra More <moresair@pe31.hpc.amslabs.hpecorp.net>
This patch adds `flang -fc1` option `-ffp-maxmin-behavior` and
propagates it throughout Flang, so that semantics context,
lowering and the pass pipeline builder can use it.
MAX/MIN intrinsic and OpenACC max/min reduction lowering
are now controlled by the option.
I kept the `Legacy` mode, which is the default and matches the current
behavior. I am going to test and merge a follow-up patch that
replaces `Legacy` with `Portable`.
RFC:
https://discourse.llvm.org/t/flang-canonical-and-optimizable-representation-for-min-max/90037
Enable Flang to match Clang behavior for command-line recording in DWARF
producer strings when using -grecord-command-line.
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Driver/compiler option plumbing to get -f(no-)protect-parens supported
on flang. (This option was already supported in clang, so extended the
option config to enable it in flang.)
In the compiler, support it in code gen options and in lowering options.
Hooked up lowering options with the code by @alexey-bataev that turns
off reassociation transformations.
Co-authored-by: Alexey Bataev <a.bataev@outlook.com>
We have a longstanding issue in debug info that use statement is not
fully respected. The problem has been described in
https://github.com/llvm/llvm-project/issues/160923. This is first part
of the effort to address this issue. This PR adds infrastructure to emit
`use` statement information in FIR, which will be used by subsequent
patches to generate DWARF debug information.
The information about use statement is collected during semantic
analysis and stored in `PreservedUseStmt` objects. During lowering,
`fir.use_stmt` operations are emitted for each `PreservedUseStmt`
object. The `fir.use_stmt` operation captures the module name, `only`
list symbols, and any renames specified in the use statement. The
`fir.use_stmt` is removed during `CodeGen`.
These options are supported by the Intel compiler. There are similar
options in the IBM compiler as well. Each of the options can be provided
more than once. In that case, the last occurence of -fdefault-real-*
(respectively -fdefault-integer-*) is used if both -fdefault-real-4 and
-fdefault-real-8 are specified. The fdefault.f90 test file was
extensively modified. Testing handling of the options by the driver was
removed since testing handling of the options by -fc1 is sufficient.
Fixes#160252
Reapplication of #137828, changes:
* Workaround CMAKE_Fortran_PREPROCESS_SOURCE issue for CMake < 2.24: The
issue is that `try_compile` does not forward manually-defined compiler
flang variables to the test build environment; instead of just a
negative test result, it aborts the configuration step itself. To be
fair, manually defining these variables is deprecated since at least
CMake 3.6.
* Missing flang cmd line flags for CMake < 3.28 `-target=`, `-O2`, `-O3`
* It is now possible to set FLANG_RT_ENABLED_STATIC=OFF and
FLANG_RT_ENABLE_SHARED=OFF at the same and is the default for amdgpu and
nvptx targets. In this mode, only the .mod files are compiled --
necessary for module files in
lib/clang/22/finclude/flang/(nvptx64-nvidia-cuda|amdgpu-amd-amdhsa)/*.mod
to be available.
* For compiling omp_lib.mod for nvptx and amdgpu, the module build
functionality must be hoisted out if openmp's runtime/ directory which
is only included for host targets. This PR now requires #169909.
Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:
1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Constants such as [`c_long_double` also have different
values](d748c81218/flang-rt/lib/runtime/iso_c_binding.f90 (L77-L81)).
Most other modules have `#ifdef`-enclosed code as well. For instance
this caused offload targets nvptx64-nvidia-cuda/amdgpu-amd-amdhsa to use
the modules files compiled for the host which may contrain uses of the
types REAL(10) or REAL(16) not available for nvptx/amdgpu.
#146876#128015#129742#158790
3. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.
4. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (#89403). The workaround of #95388 could be reverted (PR
#169525).
If using Flang for cross-compilation or target-offloading, flang-rt must
now be compiled for each target not only for the library, but also to
get the target-specific module files. For instance in a bootstrapping
runtime build, this can be done by adding:
`-DLLVM_RUNTIME_TARGETS=default;nvptx64-nvidia-cuda;amdgpu-amd-amdhsa`.
Some new dependencies come into play:
* openmp depends on flang-rt for building `lib_omp.mod` and
`lib_omp_kinds.mod`. Currently, if flang-rt is not found then the
modules are not built.
* check-flang depends on flang-rt: If not found, the majority of tests
are disabled. If not building in a bootstrpping build, the location of
the module files can be pointed to using
`-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone
build. Alternatively, the test needing any of the intrinsic modules
could be marked with `REQUIRES: flangrt-modules`.
* check-flang depends on openmp: Not a change; tests requiring
`lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with
`openmp_runtime`.
As intrinsic are now specific to the target, their location is moved
from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The
mechnism to compute the location have been moved from flang-rt
(previously used to compute the location of `libflang_rt.*.a`) to common
locations in `cmake/GetToolchainDirs.cmake` and
`runtimes/CMakeLists.txt` so they can be used by both, openmp and
flang-rt. Potentially the mechnism could also be shared by other
libraries such as compiler-rt.
`finclude` was chosen because `gfortran` uses it as well and avoids
misuse such as `#include <flang/iso_c_binding.mod>`. The search location
is now determined by `ToolChain` in the driver, instead of by the
frontend. Another subdirectory `flang` avoids accidental inclusion of
gfortran-modules which due to compression would result in
user-unfriendly errors. Now the driver adds `-fintrinsic-module-path`
for that location to the frontend call (Just like gfortran does).
`-fintrinsic-module-path` had to be fixed for this because ironically it
was only added to `searchDirectories`, but not
`intrinsicModuleDirectories_`. Since the driver determines the location,
tests invoking `flang -fc1` and `bbc` must also be passed the location
by llvm-lit. This works like llvm-lit does for finding the include dirs
for Clang using `-print-file-name=...`.
This relands #165277 by reverting #169397.
This also relands the corresponding Bazel port by reverting #169410.
The original revert was due to a report of a broken build, which was
later resolved by fully clearing the build directory.
Move building the .mod files from openmp/flang to openmp/flang-rt using
a shared mechanism. Motivations to do so are:
1. Most modules are target-dependent and need to be re-compiled for each
target separately, which is something the LLVM_ENABLE_RUNTIMES system
already does. Prime example is `iso_c_binding.mod` which encodes the
target's ABI. Most other modules have `#ifdef`-enclosed code as well.
2. CMake has support for Fortran that we should use. Among other things,
it automatically determines module dependencies so there is no need to
hardcode them in the CMakeLists.txt.
3. It allows using Fortran itself to implement Flang-RT. Currently, only
`iso_fortran_env_impl.f90` emits object files that are needed by Fortran
applications (#89403). The workaround of #95388 could be reverted.
Some new dependencies come into play:
* openmp depends on flang-rt for building `lib_omp.mod` and
`lib_omp_kinds.mod`. Currently, if flang-rt is not found then the
modules are not built.
* check-flang depends on flang-rt: If not found, the majority of tests
are disabled. If not building in a bootstrpping build, the location of
the module files can be pointed to using
`-DFLANG_INTRINSIC_MODULES_DIR=<path>`, e.g. in a flang-standalone
build. Alternatively, the test needing any of the intrinsic modules
could be marked with `REQUIRES: flangrt-modules`.
* check-flang depends on openmp: Not a change; tests requiring
`lib_omp.mod` and `lib_omp_kinds.mod` those are already marked with
`openmp_runtime`.
As intrinsic are now specific to the target, their location is moved
from `include/flang` to `<resource-dir>/finclude/flang/<triple>`. The
mechnism to compute the location have been moved from flang-rt
(previously used to compute the location of `libflang_rt.*.a`) to common
locations in `cmake/GetToolchainDirs.cmake` and
`runtimes/CMakeLists.txt` so they can be used by both, openmp and
flang-rt. Potentially the mechnism could also be shared by other
libraries such as compiler-rt.
`finclude` was chosen because `gfortran` uses it as well and avoids
misuse such as `#include <flang/iso_c_binding.mod>`. The search location
is now determined by `ToolChain` in the driver, instead of by the
frontend. Now the driver adds `-fintrinsic-module-path` for that
location to the frontend call (Just like gfortran does).
`-fintrinsic-module-path` had to be fixed for this because ironically it
was only added to `searchDirectories`, but not
`intrinsicModuleDirectories_`. Since the driver determines the location,
tests invoking `flang -fc1` and `bbc` must also be passed the location
by llvm-lit. This works like llvm-lit does for finding the include dirs
for Clang using `-print-file-name=...`.
This removes the dependency on clangDriver from clangFrontend and
flangFrontend.
This refactoring is part of a broader effort to support driver-managed
builds for compilations using C++ named modules and/or Clang modules.
It is required for linking the dependency scanning tooling against the
driver without introducing cyclic dependencies, which would otherwise
cause build failures when dynamic linking is enabled.
In particular, clangFrontend must no longer depend on clangDriver
for this to be possible.
This change was discussed in the following RFC:
https://discourse.llvm.org/t/rfc-new-clangoptions-library-remove-dependency-on-clangdriver-from-clangfrontend-and-flangfrontend/88773
It turns out that having `-ffast-math` as the only option to control
optimizations for MOD for REAL kinds (PR #160660) is too coarse-grained
for some applications. Thus, this PR adds back `-ffast-real-mod` to have
more control over the optimization. The `-ffast-math` flag will still
enable the optimization, and `-fno-fast-real-mod` allows one to disable
it.
This patch adds a new `FramePointerKind::NonLeafNoReserve` and makes it
the default for `-momit-leaf-frame-pointer`.
It also adds a new commandline option `-m[no-]reserve-frame-pointer-reg`.
This should fix#154379, the main impact of this patch can be found in
`clang/lib/Driver/ToolChains/CommonArgs.cpp`.
This relands #167348.
The original PR was reverted due to a reported build failure, which was
later diagnosed as a local issue in the developer’s checkout or build
state. See discussion here:
https://github.com/llvm/llvm-project/pull/163659#discussion_r2511546964
No additional changes have been made in this reland.
This change moves option-related code from clangDriver into a new
clangOptions library.
This refactoring is part of a broader effort to support driver-managed
builds for compilations using C++ named modules and/or Clang modules.
It is required for linking the dependency scanning tooling against the
driver without introducing cyclic dependencies, which would otherwise
cause build failures when dynamic linking is enabled.
In particular, clangFrontend must no longer depend on clangDriver
for this to be possible.
This PR is motivated by the following review comment:
https://github.com/llvm/llvm-project/pull/152770#discussion_r2430756918
This patch adds direct code-gen support for a faster MOD intrinsic for
REAL types. Flang has maintained and keeps maintaining a high-precision
implementation of the MOD intrinsic as part of the Fortran runtime. With
the -ffast-real-mod flag, users can opt to avoid calling into the
Fortran runtime, but instead trigger code-gen that produces faster code
by avoiding the runtime call, at the expense of potentially risking bit
cancelation by having the compiler use the MOD formula a specified by
ISO Fortran.
This flags enables the compiler to generate most of the debug
information in a separate file which can be useful for executable size
and link times. Clang already supports this flag.
I have tried to follow the logic of the clang implementation where
possible. Some functions were moved where they could be used by both
clang and flang. The `addOtherOptions` was renamed to `addDebugOptions`
to better reflect its purpose.
Clang also set the `splitDebugFilename` field of the `DICompileUnit` in
the IR when this option is present. That part is currently missing from
this patch and will come in a follow-up PR.
This PR builds on the https://github.com/llvm/llvm-project/pull/158314
and adds the lowering support for `-gdwarf-N` flag. The changes to pass
the information to `AddDebugInfo` pass are mostly mechanical. The
`AddDebugInfo` pass adds `ModuleFlagsOp` in the module which gets
translated to correct llvm metadata during mlir->llvmir translation.
There is minor correction where the version is set to 0 in case no
-debug-version flag is provided. Previously it was set to 2 in this case
due to misreading of clang code.
This reverts commit 895cda70a95529fd22aac05eee7c34f7624996af.
And fix attempt: 06f671e57a574ba1c5127038eff8e8773273790e.
Performance regressions and broken sanitizers, see #142686.
This PR adds the support for -gdwarf-N option where allows user to
choose the version of the dwarf. Currently N can be 2, 3, 4, or 5. This
is only the driver part of the change. Later PRs will propogate it to
the IR.
Fixes https://github.com/llvm/llvm-project/issues/112910.
This patch adds the flag -fexperimental-loop-fuse to the clang and flang
drivers. This is primarily useful for experiments as we envision to
enable the pass one day.
The options are based on the same principles and reason on which we have
`floop-interchange`.
---------
Co-authored-by: Madhur Amilkanthwar <madhura@nvidia.com>
From the CUDA Fortran programming guide:
> If CUDA Fortran is enabled in compilation, either by specifying -cuda
on the command line, and pre-processing is enabled by either the
-Mpreprocess compiler option or by using capital letters in the
filename extension (.CUF, .F90, etc.) then the _CUDA macro is defined.
Move the definition of `_CUDA` to the compiler invocation.
In relation to the approval and merge of the
[PRIF](https://github.com/llvm/llvm-project/pull/76088) specification
about multi-image features in Flang, here is a first PR to add support
for the `-fcoarray` compilation flag and the initialization of the PRIF
environment.
Other PRs will follow for adding support of lowering to PRIF.
Predefine the `__pic__/__pie__/__PIC__/__PIE__` macros based on the
configured relocation level. This logic mirrors that of the clang
driver, where `__pic__/__PIC__` are defined for both PIC and PIE modes,
but `__pie__/__PIE__` are only defined for PIE mode.
Fixes https://github.com/llvm/llvm-project/issues/135275
Both clang and gfortran support the -fopenmp-simd flag, which enables
OpenMP support only for simd constructs, while disabling the rest of
OpenMP.
Implement the appropriate parse tree rewriting to remove non-SIMD OpenMP
constructs at the parsing stage.
Add a new SimdOnly flang OpenMP IR pass which rewrites generated OpenMP
FIR to handle untangling composite simd constructs, and clean up OpenMP
operations leftover after the parse tree rewriting stage.
With this approach, the two parts of the logic required to make the flag
work can be self-contained within the parse tree rewriter and the MLIR
pass, respectively. It does not need to be implemented within the core
lowering logic itself.
The flag is expected to have no effect if -fopenmp is passed explicitly,
and is only expected to remove OpenMP constructs, not things like OpenMP
library functions calls. This matches the behaviour of other compilers.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Atomic Control Options are used to specify architectural characteristics
to help lowering of atomic operations. The options used are:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
Legacy option `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
More details can be found in
https://github.com/llvm/llvm-project/pull/102569. This PR implements the
frontend support for these options with OpenMP atomic in flang.
Backend changes are available in the draft PR:
https://github.com/llvm/llvm-project/pull/143769 which will be raised
after this merged.
This patch adds an option to select the method for computing complex
number division. It uses `LoweringOptions` to determine whether to lower
complex division to a runtime function call or to MLIR's `complex.div`,
and `CodeGenOptions` to select the computation algorithm for
`complex.div`. The available option values and their corresponding
algorithms are as follows:
- `full`: Lower to a runtime function call. (Default behavior)
- `improved`: Lower to `complex.div` and expand to Smith's algorithm.
- `basic`: Lower to `complex.div` and expand to the algebraic algorithm.
See also the discussion in the following discourse post:
https://discourse.llvm.org/t/optimization-of-complex-number-division/83468
---------
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
Fixes `#146802`
#146582 started using the `Reserved` Frame Pointer kind for Arm64
Windows, but this revealed a bug in Flang where it copied the
`-mframe-pointer=reserved` flag from Clang, but didn't correctly handle
it in its own command line parser and subsequent compilation pipeline.
This change adds support for `-mframe-pointer=reserved` and adds a test
to make sure that functions are correctly marked when the flag is set.
Summary:
This patch is mostly an NFC that renames the existing `-fopenmp-targets`
into `--offload-targets`. Doing this early to simplify a follow-up patch
that will hopefully allow this syntax to be used more generically over
the existing `--offload` syntax (which I think is mostly unmaintained
now.). Following in the well-trodden path of trying to pull language
specific offload options into generic ones, but right now this is still
just OpenMP specific.
Reland #145901 with a fix for shared library builds.
So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.
This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.
This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.
Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.
I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
For historical versions that are unsupported, emit a warning and assume
the currently default version.
For values of N that are not integers or that don't correspond to any
OpenMP version (old or newer), emit an error.
This just adds some convenience methods to feature control and rewrites
old code in terms of those methods. Also cleans up some names that I
just realize were overloads of another method.
This PR resubmits the changes from #136098, which was previously
reverted due to a build failure during the linking stage:
```
undefined reference to `llvm::DebugInfoCorrelate'
undefined reference to `llvm::ProfileCorrelate'
```
The root cause was that `llvm/lib/Frontend/Driver/CodeGenOptions.cpp`
references symbols from the `Instrumentation` component, but the
`LINK_COMPONENTS` in the `llvm/lib/Frontend/CMakeLists.txt` for
`LLVMFrontendDriver` did not include it. As a result, linking failed in
configurations where these components were not transitively linked.
### Fix:
This updated patch explicitly adds `Instrumentation` to
`LINK_COMPONENTS` in the relevant `llvm/lib/Frontend/CMakeLists.txt`
file to ensure the required symbols are properly resolved.
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Chyaka <52224511+liliumshade@users.noreply.github.com>
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
This patch adds support for the -mrecip command line option. The parsing
of this options is equivalent to Clang's and it is implemented by
setting the "reciprocal-estimates" function attribute.
Also move the ParseMRecip(...) function to CommonArgs, so that Flang is
able to make use of it as well.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
This change allows the flang CLI to accept `-W[no-]<feature>` flags matching the clang syntax and enable and disable usage and language feature warnings.
This patch moves the CommonArgs utilities into a location visible by the
Frontend Drivers, so that the Frontend Drivers may share option parsing
code with the Compiler Driver. This is useful when the Frontend Drivers
would like to verify that their incoming options are well-formed and
also not reinvent the option parsing wheel.
We already see code in the Clang/Flang Drivers that is parsing and
verifying its incoming options. E.g. OPT_ffp_contract. This option is
parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with
slightly different parsing code. It would be nice if the Frontend
Drivers were not required to duplicate this Compiler Driver code. That
way there is no/low maintenance burden on keeping all these parsing
functions in sync.
Along those lines, the Frontend Drivers will now have a useful mechanism
to verify their incoming options are well-formed. Currently, the
Frontend Drivers trust that the Compiler Driver is not passing back junk
in some cases. The Language Drivers may even accept junk with no error
at all. E.g.:
`clang -cc1 -mprefer-vector-width=junk test.c'
With this patch, we'll now be able to tighten up incomming options to
the Frontend drivers in a lightweight way.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>