Predefine the `__pic__/__pie__/__PIC__/__PIE__` macros based on the
configured relocation level. This logic mirrors that of the clang
driver, where `__pic__/__PIC__` are defined for both PIC and PIE modes,
but `__pie__/__PIE__` are only defined for PIE mode.
Fixes https://github.com/llvm/llvm-project/issues/135275
Both clang and gfortran support the -fopenmp-simd flag, which enables
OpenMP support only for simd constructs, while disabling the rest of
OpenMP.
Implement the appropriate parse tree rewriting to remove non-SIMD OpenMP
constructs at the parsing stage.
Add a new SimdOnly flang OpenMP IR pass which rewrites generated OpenMP
FIR to handle untangling composite simd constructs, and clean up OpenMP
operations leftover after the parse tree rewriting stage.
With this approach, the two parts of the logic required to make the flag
work can be self-contained within the parse tree rewriter and the MLIR
pass, respectively. It does not need to be implemented within the core
lowering logic itself.
The flag is expected to have no effect if -fopenmp is passed explicitly,
and is only expected to remove OpenMP constructs, not things like OpenMP
library functions calls. This matches the behaviour of other compilers.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Atomic Control Options are used to specify architectural characteristics
to help lowering of atomic operations. The options used are:
`-f[no-]atomic-remote-memory`, `-f[no-]atomic-fine-grained-memory`,
`-f[no-]atomic-ignore-denormal-mode`.
Legacy option `-m[no-]unsafe-fp-atomics` is aliased to
`-f[no-]ignore-denormal-mode`.
More details can be found in
https://github.com/llvm/llvm-project/pull/102569. This PR implements the
frontend support for these options with OpenMP atomic in flang.
Backend changes are available in the draft PR:
https://github.com/llvm/llvm-project/pull/143769 which will be raised
after this merged.
This patch adds an option to select the method for computing complex
number division. It uses `LoweringOptions` to determine whether to lower
complex division to a runtime function call or to MLIR's `complex.div`,
and `CodeGenOptions` to select the computation algorithm for
`complex.div`. The available option values and their corresponding
algorithms are as follows:
- `full`: Lower to a runtime function call. (Default behavior)
- `improved`: Lower to `complex.div` and expand to Smith's algorithm.
- `basic`: Lower to `complex.div` and expand to the algebraic algorithm.
See also the discussion in the following discourse post:
https://discourse.llvm.org/t/optimization-of-complex-number-division/83468
---------
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
Fixes `#146802`
#146582 started using the `Reserved` Frame Pointer kind for Arm64
Windows, but this revealed a bug in Flang where it copied the
`-mframe-pointer=reserved` flag from Clang, but didn't correctly handle
it in its own command line parser and subsequent compilation pipeline.
This change adds support for `-mframe-pointer=reserved` and adds a test
to make sure that functions are correctly marked when the flag is set.
Summary:
This patch is mostly an NFC that renames the existing `-fopenmp-targets`
into `--offload-targets`. Doing this early to simplify a follow-up patch
that will hopefully allow this syntax to be used more generically over
the existing `--offload` syntax (which I think is mostly unmaintained
now.). Following in the well-trodden path of trying to pull language
specific offload options into generic ones, but right now this is still
just OpenMP specific.
Reland #145901 with a fix for shared library builds.
So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.
This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.
This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.
Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.
I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
For historical versions that are unsupported, emit a warning and assume
the currently default version.
For values of N that are not integers or that don't correspond to any
OpenMP version (old or newer), emit an error.
This just adds some convenience methods to feature control and rewrites
old code in terms of those methods. Also cleans up some names that I
just realize were overloads of another method.
This PR resubmits the changes from #136098, which was previously
reverted due to a build failure during the linking stage:
```
undefined reference to `llvm::DebugInfoCorrelate'
undefined reference to `llvm::ProfileCorrelate'
```
The root cause was that `llvm/lib/Frontend/Driver/CodeGenOptions.cpp`
references symbols from the `Instrumentation` component, but the
`LINK_COMPONENTS` in the `llvm/lib/Frontend/CMakeLists.txt` for
`LLVMFrontendDriver` did not include it. As a result, linking failed in
configurations where these components were not transitively linked.
### Fix:
This updated patch explicitly adds `Instrumentation` to
`LINK_COMPONENTS` in the relevant `llvm/lib/Frontend/CMakeLists.txt`
file to ensure the required symbols are properly resolved.
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Chyaka <52224511+liliumshade@users.noreply.github.com>
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
This patch adds support for the -mrecip command line option. The parsing
of this options is equivalent to Clang's and it is implemented by
setting the "reciprocal-estimates" function attribute.
Also move the ParseMRecip(...) function to CommonArgs, so that Flang is
able to make use of it as well.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
This change allows the flang CLI to accept `-W[no-]<feature>` flags matching the clang syntax and enable and disable usage and language feature warnings.
This patch moves the CommonArgs utilities into a location visible by the
Frontend Drivers, so that the Frontend Drivers may share option parsing
code with the Compiler Driver. This is useful when the Frontend Drivers
would like to verify that their incoming options are well-formed and
also not reinvent the option parsing wheel.
We already see code in the Clang/Flang Drivers that is parsing and
verifying its incoming options. E.g. OPT_ffp_contract. This option is
parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with
slightly different parsing code. It would be nice if the Frontend
Drivers were not required to duplicate this Compiler Driver code. That
way there is no/low maintenance burden on keeping all these parsing
functions in sync.
Along those lines, the Frontend Drivers will now have a useful mechanism
to verify their incoming options are well-formed. Currently, the
Frontend Drivers trust that the Compiler Driver is not passing back junk
in some cases. The Language Drivers may even accept junk with no error
at all. E.g.:
`clang -cc1 -mprefer-vector-width=junk test.c'
With this patch, we'll now be able to tighten up incomming options to
the Frontend drivers in a lightweight way.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>
This patch implements IR-based Profile-Guided Optimization support in
Flang through the following flags:
- `-fprofile-generate` for instrumentation-based profile generation
- `-fprofile-use=<dir>/file` for profile-guided optimization
Resolves#74216 (implements IR PGO support phase)
**Key changes:**
- Frontend flag handling aligned with Clang/GCC semantics
- Instrumentation hooks into LLVM PGO infrastructure
- LIT tests verifying:
- Instrumentation metadata generation
- Profile loading from specified path
- Branch weight attribution (IR checks)
**Tests:**
- Added gcc-flag-compatibility.f90 test module verifying:
- Flag parsing boundary conditions
- IR-level profile annotation consistency
- Profile input path normalization rules
- SPEC2006 benchmark results will be shared in comments
For details on LLVM's PGO framework, refer to [Clang PGO
Documentation](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization).
This implementation was developed by [XSCC Compiler
Team](https://github.com/orgs/OpenXiangShan/teams/xscc).
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Tom Eccles <t@freedommail.info>
This patch adds support for the -mprefer-vector-width= command line
option. The parsing of this options is equivalent to Clang's and it is
implemented by setting the "prefer-vector-width" function attribute.
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
There are various places where the -fveclib option is parsed to
determine whether its value is correct for the target. Unfortunately
these places assume case-insensitivity and subsequently use "LIBMVEC"
where the driver mandates "libmvec", thus rendering the diagnosistic
useless.
This PR corrects the naming along with similar incorrect uses within the
test files.
This PR starts the effort to upstream AMD's internal implementation of
`do concurrent` to OpenMP mapping. This replaces #77285 since we
extended this WIP quite a bit on our fork over the past year.
An important part of this PR is a document that describes the current
status downstream, the upstreaming status, and next steps to make this
pass much more useful.
In addition to this document, this PR also contains the skeleton of the
pass (no useful transformations are done yet) and some testing for the
added command line options.
This looks like a huge PR but a lot of the added stuff is documentation.
It is also worth noting that the downstream pass has been validated on
https://github.com/BerkeleyLab/fiats. For the CPU mapping, this achived
performance speed-ups that match pure OpenMP, for GPU mapping we are
still working on extending our support for implicit memory mapping and
locality specifiers.
PR stack:
- https://github.com/llvm/llvm-project/pull/126026 (this PR)
- https://github.com/llvm/llvm-project/pull/127595
- https://github.com/llvm/llvm-project/pull/127633
- https://github.com/llvm/llvm-project/pull/127634
- https://github.com/llvm/llvm-project/pull/127635
Add -f[no-]slp-vectorize to the flang driver.
Add corresponding -fvectorize-slp to the flang frontend.
Enable -fslp-vectorize at -O2 and higher in flang to match the current
behaviour in clang.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
This PR addresses some of the issues described in
https://github.com/llvm/llvm-project/issues/127617. Key changes:
- Stop assuming fixed-form for `-x f95` unless the input is a `.i` file.
This change ensures compatibility with `-save-temps` workflows while
preventing unintended fixed-form assumptions.
- Ensure `-x f95-cpp-input` enables `-cpp` by default, aligning Flang's
behavior with `gfortran`.
`-fd-lines-as-code` and `-fd-lines-as-comments` enables treatment for
lines beginning with `d` or `D` in fixed form sources.
Using these options in free form has no effect.
If the `-fd-lines-as-code` option is given they are treated as if the
first column contained a blank.
If the `-fd-lines-as-comments` option is given, they are treated as
comment lines.
This patch adds the -fvectorize and -fno-vectorize flags to flang.
Note that this also changes the behaviour of `flang -fc1` to match that
of `clang -cc1`, which is that vectorization is only enabled in the
presence of the `-vectorize-loops` flag.
Additionally, this patch changes the behaviour of the default
optimisation levels to match clang, such that vectorization only happens
at the same levels as it does there.
This patch is in draft while I write an RFC to discuss the above two
changes.
This is needed to generate proper ABI flags in the ELF header for LTO
builds. If these flags aren't set correctly, we can't link with objects
that were built with the correct flags.
For non-LTO builds the mcpu/mattr in the TargetMachine will cause the
backend to infer an ABI. For LTO builds the mcpu/mattr aren't set.
I've only added lp64, lp64f, and lp64d ABIs. ilp32* requires riscv32
which is not yet supported in flang. lp64e requires a different
DataLayout string and would need additional plumbing.
Fixes#115679
Move non-common files from FortranCommon to FortranSupport (analogous to
LLVMSupport) such that
* declarations and definitions that are only used by the Flang compiler,
but not by the runtime, are moved to FortranSupport
* declarations and definitions that are used by both ("common"), the
compiler and the runtime, remain in FortranCommon
* generic STL-like/ADT/utility classes and algorithms remain in
FortranCommon
This allows a for cleaner separation between compiler and runtime
components, which are compiled differently. For instance, runtime
sources must not use STL's `<optional>` which causes problems with CUDA
support. Instead, the surrogate header `flang/Common/optional.h` must be
used. This PR fixes this for `fast-int-sel.h`.
Declarations in include/Runtime are also used by both, but are
header-only. `ISO_Fortran_binding_wrapper.h`, a header used by compiler
and runtime, is also moved into FortranCommon.
When -fimplicit-none-ext is passed, flang behaves as if "implicit
none(external)" was specified for all relevant constructs in Fortran
source file.
Note: implicit17.f90 was based on implicit07.f90 with `implicit
none(external)` removed and `-fimplicit-none-ext` added.
The behavior is not entirely consistent with that of clang for the
moment since detailed timing information on the LLVM IR optimization and
code generation passes is not provided. The -ftime-report= option is
also not enabled since that is only relevant for information about the
LLVM IR passes. However, some code to handle that option has been
included, to make it easier to support the option when the issues
blocking it are resolved. A FortranSupport library has been created that
is intended to mirror the LLVM and MLIR support libraries.
Based on @tarunprabhu's PR
https://github.com/llvm/llvm-project/pull/107270 with minor changes
addressing latest review feedback. He's busy and we'd like to get this
support in ASAP.
Co-authored-by: Tarun Prabhu <tarun.prabhu@gmail.com>
Implement the UNSIGNED extension type and operations under control of a
language feature flag (-funsigned).
This is nearly identical to the UNSIGNED feature that has been available
in Sun Fortran for years, and now implemented in GNU Fortran for
gfortran 15, and proposed for ISO standardization in J3/24-116.txt.
See the new documentation for details; but in short, this is C's
unsigned type, with guaranteed modular arithmetic for +, -, and *, and
the related transformational intrinsic functions SUM & al.