This commit adds driver support for linking libclc OpenCL libraries. It
takes the form of a new optional flag: --libclc-lib=namespec. Nothing is
linked unless this flag is specified.
Not all libclc targets have corresponding clang targets. For this reason
it is desirable for users to be able to specify a libclc library name.
We support this by taking both a library name (without the .bc suffix)
or a filename. Both of these are searched for in the clang resource
directory. Filenames are
also checked themselves so that absolute paths can be provided. The
syntax for specifying filenames (as opposed to library names) uses a
leading colon (:), inspired by the -l option.
To accommodate this option, libclc libraries are now placed into clang's
resource directory in an in-tree configuration. The libraries are all
placed in <resource-dir>/lib/libclc and
are not grouped under host-specific directories as some other runtime
libraries are; it is not expected that OpenCL libraries will differ
depending on the host toolchain.
Currently only the AMDGPU toolchain supports this option as a proof of
concept. Other targets such as NVPTX or SPIR/SPIR-V could support it
too. We could optionally let target toolchains search for libclc
libraries themselves, possibly when passed an empty --libclc-lib.
Summary:
The new driver's behavior forwards all unrecognized command line
arguments to the host linker. It only knew `--compress` so when
`-compress` was passed it didn't forward it correctly. This patch
changes the spelling because multi word arguments should have two
dashes.
As discussed in PR #145603, the following command seems to fail to
produce a YAML remarks file for offload LTO passes and thus for
kernel-info:
```
clang -O2 -g -fopenmp --offload-arch=native test.c -foffload-lto \
-Rpass=kernel-info -fsave-optimization-record
```
The problem is that, in clang-linker-wrapper's clang call, clang names
the file based on clang's main output file (from `-o`). That is a
temporary file, so the YAML file becomes a temporary file, which the
user never sees.
This patch:
- Makes clang honor `-dumpdir` for the default YAML remarks file in the
case of LTO.
- Extends clang-linker-wrapper to specify that option to clang.
To demonstrate the appeal of the generality of `-dumpdir` (as opposed to
a one-off `-fsave-optimization-record` solution in
clang-linker-wrapper), this patch also fixes `-gsplit-dwarf`. Without
this patch, when using `-gsplit-dwarf` and later debugging using rocgdb,
the dwo directory for offload is a temporary file, so temporary file
cleanup causes rocgdb to lose debug symbols for offload code.
WARNING: The clang driver passes `-dumpdir` to various clang frontend
calls. For LTO, that was previously being ignored, and now it's not.
That changes some auxiliary file names, as revealed by changes in some
existing tests' expected output: `clang/test/Driver/opt-record.c` and
`clang/test/Driver/lto-dwo.c`. Hopefully this change does not introduce
a backward compatibility issue for users.
This patch moves `LazyDetector` and target specific (Cuda, Hip, SYCL)
installation detectors to clang's include directory. It was problematic
for downstream to use headers from clang's lib dir. The use of lib
headers could lead to subtle errors, as some of the symbols there are
annotated with `LLVM_LIBRARY_VISIBILITY`. For instance
[`ROCMToolChain::getCommonDeviceLibNames`](https://github.com/jchlanda/llvm-project/blob/jakub/installation_detectors/clang/lib/Driver/ToolChains/AMDGPU.h#L147)
is c++ public, but because of the annotation it ends up as ELF hidden
symbol, which causes errors when accessed from another shared library.
I'm unsure if there is an official source for which targets use/support
which emulations, but for the baremetal GNU Arm/AArch64 toolchains or
binutils builds I've tried to use, GNU ld either did not support the
Linux emulations (resulting in errors unless overriding the emulation)
or the Linux emulations were supported but GCC passed the non-Linux
emulations by default.
These emulations all seem to be accepted by lld as well, so try to align
with what it seems GCC is doing and prefer the non-Linux emulations for
baremetal Arm/AArch64 targets.
This patch introduces support for Integrated Distributed ThinLTO (DTLTO)
in Clang.
DTLTO enables the distribution of ThinLTO backend compilations via
external distribution systems, such as Incredibuild, during the
traditional link step: https://llvm.org/docs/DTLTO.html.
Testing:
- `lit` test coverage has been added to Clang's Driver tests.
- The DTLTO cross-project tests will use this Clang support.
For the design discussion of the DTLTO feature, see:
https://github.com/llvm/llvm-project/pull/126654
This patch adds an option to select the method for computing complex
number division. It uses `LoweringOptions` to determine whether to lower
complex division to a runtime function call or to MLIR's `complex.div`,
and `CodeGenOptions` to select the computation algorithm for
`complex.div`. The available option values and their corresponding
algorithms are as follows:
- `full`: Lower to a runtime function call. (Default behavior)
- `improved`: Lower to `complex.div` and expand to Smith's algorithm.
- `basic`: Lower to `complex.div` and expand to the algebraic algorithm.
See also the discussion in the following discourse post:
https://discourse.llvm.org/t/optimization-of-complex-number-division/83468
---------
Co-authored-by: Tarun Prabhu <tarunprabhu@gmail.com>
Re-land #146582 now that the Flang bugs have been fixed.
There is no way in Arm64 Windows to indicate that a given function has
used the Frame Pointer as a General Purpose Register, as such stack
walks will always assume that the frame chain is valid and will follow
whatever value has been saved for the Frame Pointer (even if it is
pointing to data, etc.).
This change makes the Frame Pointer always reserved when building for
Arm64 Windows to avoid this issue.
We will be updating the official Windows ABI documentation to reflect
this requirement, and I will provide a link once it's available.
Summary:
This patch is mostly an NFC that renames the existing `-fopenmp-targets`
into `--offload-targets`. Doing this early to simplify a follow-up patch
that will hopefully allow this syntax to be used more generically over
the existing `--offload` syntax (which I think is mostly unmaintained
now.). Following in the well-trodden path of trying to pull language
specific offload options into generic ones, but right now this is still
just OpenMP specific.
Reverts llvm/llvm-project#146582
Due to failures on many of Linaro's Linux flang bots:
https://lab.llvm.org/buildbot/#/builders/17/builds/9292
```
******************** TEST 'Flang :: Semantics/windows.f90' FAILED ********************
Exit Code: 1
Command Output (stdout):
--
---
+++
@@ -0,0 +1,2 @@
expect at 6: User IDs do not exist on Windows. This function will always return 1
expect at 11: Group IDs do not exist on Windows. This function will always return 1
FAIL
--
Command Output (stderr):
--
RUN: at line 1 has no command after substitutions
"/usr/bin/python3.10" /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/flang/test/Semantics/test_errors.py /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/flang/test/Semantics/windows.f90 /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/bin/flang --target=aarch64-pc-windows-msvc -Werror # RUN: at line 2
+ /usr/bin/python3.10 /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/flang/test/Semantics/test_errors.py /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/llvm/flang/test/Semantics/windows.f90 /home/tcwg-buildbot/worker/clang-aarch64-sve-vla/stage1/bin/flang --target=aarch64-pc-windows-msvc -Werror
--
```
There is no way in Arm64 Windows to indicate that a given function has
used the Frame Pointer as a General Purpose Register, as such stack
walks will always assume that the frame chain is valid and will follow
whatever value has been saved for the Frame Pointer (even if it is
pointing to data, etc.).
This change makes the Frame Pointer always reserved when building for
Arm64 Windows to avoid this issue.
We will be updating the official Windows ABI documentation to reflect
this requirement, and I will provide a link once it's available.
This patch fixes an issue where the __trampoline_setup symbol is missing
with some programs compiled with flang. This symbol is present only in
compiler-rt and not in libgcc. This patch adds compiler-rt to the link
line after libgcc if libgcc is being used, so that only this symbol will
be picked from compiler-rt.
Fixes#141147
This patch adds support for the -mrecip command line option. The parsing
of this options is equivalent to Clang's and it is implemented by
setting the "reciprocal-estimates" function attribute.
Also move the ParseMRecip(...) function to CommonArgs, so that Flang is
able to make use of it as well.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
This patch moves the CommonArgs utilities into a location visible by the
Frontend Drivers, so that the Frontend Drivers may share option parsing
code with the Compiler Driver. This is useful when the Frontend Drivers
would like to verify that their incoming options are well-formed and
also not reinvent the option parsing wheel.
We already see code in the Clang/Flang Drivers that is parsing and
verifying its incoming options. E.g. OPT_ffp_contract. This option is
parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with
slightly different parsing code. It would be nice if the Frontend
Drivers were not required to duplicate this Compiler Driver code. That
way there is no/low maintenance burden on keeping all these parsing
functions in sync.
Along those lines, the Frontend Drivers will now have a useful mechanism
to verify their incoming options are well-formed. Currently, the
Frontend Drivers trust that the Compiler Driver is not passing back junk
in some cases. The Language Drivers may even accept junk with no error
at all. E.g.:
`clang -cc1 -mprefer-vector-width=junk test.c'
With this patch, we'll now be able to tighten up incomming options to
the Frontend drivers in a lightweight way.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>
The option name to specify the path is `--libomptarget-spirv-bc-path` so
the existing error gives an invalid option name
(`--libomptarget-spirv64-bc-path`) when it can't find the file. Also the
expected file name is weird, we expect the file name to be
`libomptarget-spirv64.bc`. and use the same prefix `spirv64` to suggest
the option to the user.
Also the `nvptx` triple is `nvptx64` and the option/filename there is
just `nvptx`, so we should be consistent.
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
This PR addressed issue #140748 to support XRay instrumentation on the
host side when using offloading.
It makes the following changes:
- Initializes `XRayArgs` using the processed toolchain arguments instead
of the raw input.
- Removes the current caching mechanism of `XRayArgs` in the `ToolChain`
class, as this is error-prone and potential benefits are questionable.
For reference, `SanitizierArgs`, which is constructed in a similar
manner but is much more complex, does not use any caching.
- Adds driver tests to verify that XRay flags are set correctly with
offloading and `-Xarch_host`.
The behavior of -fveclib=AMDLIBM should be similar to -fveclib=libmvec.
Example - Error message for unsupported target usage should be same.
We are handling the missed cases for -fveclib=AMDLIBM and aligning it to
-fveclib=libmvec usage.
---------
Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
There are various places where the -fveclib option is parsed to
determine whether its value is correct for the target. Unfortunately
these places assume case-insensitivity and subsequently use "LIBMVEC"
where the driver mandates "libmvec", thus rendering the diagnosistic
useless.
This PR corrects the naming along with similar incorrect uses within the
test files.
`addArchSpecificRPath` should immediately return for AIX as AIX doesn't
support `rpath` option.
`getArchSpecificLibPaths` should return as well as we don't want
`-L/ArchSepcificLibPaths` sent to the linker on AIX.
Summary:
This was accidentally kept in the old location when we moved to the
new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the
delta with https://github.com/llvm/llvm-project/pull/136729.
Summary:
Currently, we build a single `libomptarget.devicertl.a` which is a
fatbinary. It is a host object file that contains the embedded archive
files for both the NVIDIA and AMDGPU targets. This was done primarily as
a convenience due to naming conflicts. Now that the clang driver for the
GPU targets can appropriate link via the per-target runtime-dir, we can
just make two separate static libraries and remove the indirection.
This patch creates two new static libraries that get installed into
```
/lib/amdgcn-amd-amdhsa/libomp.a
/lib/nvptx64-nvidia-cuda/libomp.a
```
for AMDGPU and NVPTX respectively. The link job created by the linker
wrapper now simply needs to do `-lomp` and it will search those
directories and link those static libraries. This requires far less
special handling.
This patch is a precursor to changing the build system entirely to be a
runtimes based one. Soon this target will be a standard `add_library`
and done through the GPU runtime targets.
NOTE that this actually does remove an additional optimization step.
Previously we merged all of the files into a single bitcode object and
forcibly internalized some definitions. This, instead, just treats them
like a normal static library. This may possibly affect performance for
some files, but I think it's better overall to use static library
semantics because it allows us to have an 'include-what-you-use'
relationship with the library.
Performance testing will be required. If we really need the merged blob
then we can simply pack that into a new static library.
This PR is to improve the driver code to build `flang-rt` path by
re-using the logic and code of `compiler-rt`.
1. Moved `addFortranRuntimeLibraryPath` and `addFortranRuntimeLibs` to
`ToolChain.h` and made them virtual so that they can be overridden if
customization is needed. The current implementation of those two
procedures is moved to `ToolChain.cpp` as the base implementation to
default to.
2. Both AIX and PPCLinux now override `addFortranRuntimeLibs`.
The overriding function of `addFortranRuntimeLibs` for both AIX and
PPCLinux calls `getCompilerRTArgString` => `getCompilerRT` =>
`buildCompilerRTBasename` to get the path to `flang-rt`. This code
handles `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR` setting. As shown in
`PPCLinux.cpp`, `FT_static` is the default. If not found, it will search
and build for `FT_shared`. To differentiate `flang-rt` from `clang-rt`,
a boolean flag `IsFortran` is passed to the chain of functions in order
to reach `buildCompilerRTBasename`.
Although combining -fveclib=ArmPL with -nostdlib is a rare situation, it
should still be supported correctly and should effect in avoidance of
linking against libm.
Summary:
The https://github.com/llvm/llvm-project/pull/128509 patch introduced
`--flto-partitions`. This was marked as a HIP only argument, and was
also spelled and handled incorrectly for an `-f` option. This patch
makes the handling generic for `ld.lld` consumers.
This also fixes some issues with emitting the flags being put after the
default arguments, preventing users from overriding them. Also, forwards
things properly for the new driver so we can test this.
Add -f[no-]slp-vectorize to the flang driver.
Add corresponding -fvectorize-slp to the flang frontend.
Enable -fslp-vectorize at -O2 and higher in flang to match the current
behaviour in clang.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
The current parsing logic for the target string assumes it follows the
format `<kind>-<triple>-<target id>:<feature>`, such as
`hipv4-amdgcn-amd-amdhsa-gfx1030:+xnack`.
Specifically, it assumes that `<target id>` does not contain any `-`,
relying on `rsplit` for parsing.
However, this assumption breaks for AMDGPU's generic targets, which may
contain one or more `-`, such as `gfx10-3-generic` or `gfx12-generic`.
As a result, the existing approach using `rstrip` is no longer reliable.
This patch reworks the parsing logic to handle target strings more
robustly, including support for generic targets.
The bundler now strictly requires a 4-field target triple.
Additionally, a new Python helper function has been added to `config.py`
to normalize the target triple into the 4-field format when it is not,
ensuring tests pass reliably.
It's possible to have an `ld-path` point to a linker that doesn't have
the `ld.lld` filename (e.g. linker wrapper that may emit telemetry
before invoking the linker). This was causing mis-compilations with
fatLTO since the check couldn't reliably detect that it was using lld.
Instead, rely on the value from `-fuse-ld` to determine whether lld is
enabled.
For builds that cannot be easily modified and enabled with
`-ffat-lto-objects`, `-fno-fat-lto-objects` acts as an escape hatch to
disable this option (which is standard to how clang and lld flags are
used).
Add support for existing -rpath option to AIX. Prior to this PR,
if -rpath is passed on AIX it gets passed to the linker and crashes as
the linker on AIX cannot process it.
This patch adds the -fvectorize and -fno-vectorize flags to flang.
Note that this also changes the behaviour of `flang -fc1` to match that
of `clang -cc1`, which is that vectorization is only enabled in the
presence of the `-vectorize-loops` flag.
Additionally, this patch changes the behaviour of the default
optimisation levels to match clang, such that vectorization only happens
at the same levels as it does there.
This patch is in draft while I write an RFC to discuss the above two
changes.
Summary:
We support `nogpulib` to disable implicit libraries. In the future we
will want to change the default linking of these libraries based on the
user language. This patch just introduces a positive variant so now we
can do `-nogpulib -gpulib` to disable it.
Later patch will make the default a variable in the ROCmToolChain
depending on the target languages.
Following the conclusion of the
[RFC](https://discourse.llvm.org/t/rfc-names-for-flang-rt-libraries/84321),
rename Flang's runtime libraries as follows:
* libFortranRuntime.(a|so) to libflang_rt.runtime.(a|so)
* libFortranFloat128Math.a to libflang_rt.quadmath.a
* libCufRuntime_cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so) to
libflang_rt.cuda_${CUDAToolkit_VERSION_MAJOR}.(a|so)
This follows the same naming scheme as Compiler-RT libraries
(`libclang_rt.${component}.(a|so)`). It provides some consistency
between Flang's runtime libraries for current and potential future
library components.
Avoid using the same library for runtime and compiler. `FortranDecimal`
was used in two ways:
1. As an auxiliary library needed for `libFortranRuntime.a`. This patch
adds the two source files of FortranDecimal directly into
FortranRuntime, so `FortranRuntime` is not used anymore.
2. As a library used by the Flang compiler. As the only remaining use of
the library, extra CMake code to make it compatible with the runtime can
be removed.
Before this PR, `enable_cuda_compilation` is applied to `FortranDecimal`
which causes everything that links to it, including flang (the
compiler), to depend on libcudart when CUDA support is enabled.
Having two runtime library just makes everything more complicated while
the user ideally should not be concerned with how the runtime is
structured internally. Some logic was copied for FortranDecimal because
of this, such as the ability to be compiled out-of tree
(b75a3c9f31c1ffdc9856aee32991d8129b372ee7) which is undocumented, the
logic to link against the various versions of Microsofts runtime library
(#70833), and avoiding dependency on the C++ runtime
(7783bba22c7add678d796741d30669c73159b3d8).