The clang/flang driver has two separate systems for find the location of
clang_rt (simplified):
* `getCompilerRTPath()`, e.g. `../lib/clang/22/lib/windows`,
used when `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0`
* `getRuntimePath()`, e.g. `../lib/clang/22/lib/x86_64-pc-windows-msvc`,
used when `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=1`
To simplify the search path, Flang-RT normally assumes only
`getRuntimePath()`, i.e. ignoring `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR`
and always using the `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=1` mechanism.
There is an exception for Apple Darwin triples where `getRuntimePath()`
returns nothing. The flang-rt/compiler-rt CMake code for library
location also ignores `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR` but uses the
`LLVM_ENABLE_PER_TARGET_RUNTIME_DIR=0` path instead. Since only
`getRuntimePath()` is automatically added to the linker command line,
this patch explicitly adds `getCompilerRTPath()` to the path when
linking flang_rt.
Fixes#151031
(cherry picked from commit 8de481913353a1e37264687d5cc73db0de19e6cc)
Summary:
This patch is mostly an NFC that renames the existing `-fopenmp-targets`
into `--offload-targets`. Doing this early to simplify a follow-up patch
that will hopefully allow this syntax to be used more generically over
the existing `--offload` syntax (which I think is mostly unmaintained
now.). Following in the well-trodden path of trying to pull language
specific offload options into generic ones, but right now this is still
just OpenMP specific.
This patch moves the CommonArgs utilities into a location visible by the
Frontend Drivers, so that the Frontend Drivers may share option parsing
code with the Compiler Driver. This is useful when the Frontend Drivers
would like to verify that their incoming options are well-formed and
also not reinvent the option parsing wheel.
We already see code in the Clang/Flang Drivers that is parsing and
verifying its incoming options. E.g. OPT_ffp_contract. This option is
parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with
slightly different parsing code. It would be nice if the Frontend
Drivers were not required to duplicate this Compiler Driver code. That
way there is no/low maintenance burden on keeping all these parsing
functions in sync.
Along those lines, the Frontend Drivers will now have a useful mechanism
to verify their incoming options are well-formed. Currently, the
Frontend Drivers trust that the Compiler Driver is not passing back junk
in some cases. The Language Drivers may even accept junk with no error
at all. E.g.:
`clang -cc1 -mprefer-vector-width=junk test.c'
With this patch, we'll now be able to tighten up incomming options to
the Frontend drivers in a lightweight way.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>
If the runtime path is not found (by getTargetSubDirPath()), since per
target runtime directory is enabled on AIX, we should fall back to the
target subdirectory rather than the OS subdirectory.
Previously, when the triple is `powerpc-ibm-aix-unknown`, the driver
fails to find subdirectory `lib/powerpc-ibm-aix`.
This ensures the correct runtime path is found if the triple has the
-unknown environment component attached.
This PR addressed issue #140748 to support XRay instrumentation on the
host side when using offloading.
It makes the following changes:
- Initializes `XRayArgs` using the processed toolchain arguments instead
of the raw input.
- Removes the current caching mechanism of `XRayArgs` in the `ToolChain`
class, as this is error-prone and potential benefits are questionable.
For reference, `SanitizierArgs`, which is constructed in a similar
manner but is much more complex, does not use any caching.
- Adds driver tests to verify that XRay flags are set correctly with
offloading and `-Xarch_host`.
lookupTarget takes StringRef and internally creates an instance of
std::string with the StringRef as part of constructing Triple, so we
don't need to create temporary instances of std::string on our own.
Move the Darwin framework search path logic from
InitHeaderSearch::AddDefaultIncludePaths to
DarwinClang::AddClangSystemIncludeArgs. Add a new -internal-iframework
cc1 argument to support the tool chain adding these paths.
Now that the tool chain is adding search paths via cc1 flag, they're
only added if they exist, so the Preprocessor/cuda-macos-includes.cu
test is no longer relevant.
Change Driver/driverkit-path.c and Driver/darwin-subframeworks.c to do
-### style testing similar to the darwin-header-search and
darwin-embedded-search-paths tests. Rename darwin-subframeworks.c to
darwin-framework-search-paths.c and have it test all framework search
paths, not just SubFrameworks.
Add a unit test to validate that the myriad of search path flags result
in the expected search path list.
Fixes https://github.com/llvm/llvm-project/issues/75638
The PR is to generalize the re-use of the `compilerRT` code of adding
the path of `libflang_rt.runtime.a (so)` from AIX and LoP only to all
platforms via a new function `addFlangRTLibPath`.
It also added `-static-libflangrt` and `-shared-libflangrt` compiler
options to allow users choosing which `flang-rt` to link to. It defaults
to shared `flang-rt`, which is consistent with the linker behavior,
except on AIX, it defaults to static.
Also, PR #134320 exposed an issue in PR #131041 that the the overriding
`addFortranRuntimeLibs` is missing the link to `libquadmath`. This PR
also fixed that and restored the test case that PR #131041 broke.
This PR is to improve the driver code to build `flang-rt` path by
re-using the logic and code of `compiler-rt`.
1. Moved `addFortranRuntimeLibraryPath` and `addFortranRuntimeLibs` to
`ToolChain.h` and made them virtual so that they can be overridden if
customization is needed. The current implementation of those two
procedures is moved to `ToolChain.cpp` as the base implementation to
default to.
2. Both AIX and PPCLinux now override `addFortranRuntimeLibs`.
The overriding function of `addFortranRuntimeLibs` for both AIX and
PPCLinux calls `getCompilerRTArgString` => `getCompilerRT` =>
`buildCompilerRTBasename` to get the path to `flang-rt`. This code
handles `LLVM_ENABLE_PER_TARGET_RUNTIME_DIR` setting. As shown in
`PPCLinux.cpp`, `FT_static` is the default. If not found, it will search
and build for `FT_shared`. To differentiate `flang-rt` from `clang-rt`,
a boolean flag `IsFortran` is passed to the chain of functions in order
to reach `buildCompilerRTBasename`.
Previously, alignment option was passed to multilib selection logic only
when -mno-unaligned-access was explicitly specified on the command line.
Now this change ensure both -mno-unaligned-access and -munaligned-access
are passed to the multilib selection logic, which now also considers the
target architecture when determining alignment access policy.
In the wake of discussion in PR #131200 and internal discussion after,
we will add support for `LLVM_ENABLE_PER_TARGET_RUNTIME=ON` for AIX
instead of disable it. I already reverted the change in PR #131200.
The default value of the option is still OFF on AIX.
Add an option similar to the -qtarget option in XL to allow the user to
say they want to be able to run the generated program on an older
version of the LE environment. This option will do two things:
- set the `__TARGET_LIBS` macro so the system headers exclude newer
interfaces when targeting older environments
- set the arch level to match the minimum arch level for that older
version of LE. It doesn't happen right now since all of the supported LE
versions have a the same minimum ach level. So the option doesn't change
this yet.
The user can specify three different kinds of arguments:
1. -mzos-target=zosv*V*r*R* - where V & R are the version and release
2. -mzos-target=0x4vrrmmmm - v, r, m, p are the hex values for the
version, release, and modlevel
3. -mzos-target=current - uses the latest version of LE the system
headers have support for
Summary:
Currently the `-Xarch` argument needs to re-parse the option, which goes
through every single registered argument. This causes errors when trying
to pass `-O1` through it because it thinks it's a DXC option. This patch
changes the behavior to only allow `clang` options. Concievably we could
detect the driver mode to make this more robust, but I don't know if
there are other users for this.
Fixes: https://github.com/llvm/llvm-project/issues/110325
Summary:
Currently, `-Xarch_` is handled specially between different toolchains,
(i.e. Mach-O).
This patch unifies the handling so that it can be used generically.
The main benefit here is that we now have a more generic version of
`-Xopenmp-target=`, which should probably just be deprecated.
Additionally, it allows us to specially pass arguments to different
architectures for offloading.
This patch is done in preparation for making selecting offloading
toolchains more generic, this will be helpful while people are moving
toward compile jobs that include multiple toolchains (SPIR-V, AMDGCN,
NVPTX).
In the discussion around #116792, @rjmccall mentioned that ARCMigrate
has been obsoleted and that we could go ahead and remove it from Clang,
so this patch does just that.
This patch is the second step to extend the current multilib system to
support the selection of library variants which do not correspond to
existing command-line options.
Proposal can be found in
https://discourse.llvm.org/t/rfc-multilib-custom-flags/81058
The multilib mechanism supports libraries that target code generation or
language options such as --target, -mcpu, -mfpu, -mbranch-protection.
However, some library variants are particular to features that do not
correspond to any command-line options. Examples include variants for
multithreading and semihosting.
This work introduces a way to instruct the multilib system to consider
these features in library selection.
The driver must be informed about the multilib custom flags with a new
command-line option.
```
-fmultilib-flag=C
```
Where the grammar for C is:
```
C -> option
option -> multithreaded | no-multithreaded | io-none | io-semihosting | io-linux-syscalls | ...
```
There must be one option instance for each flag specified:
```
-fmultilib-flag=multithreaded -fmultilib-flag=io-semihosting
```
Contradictory options are untied by *last one wins*.
These options are to be used exclusively by the multilib mechanism in
the Clang driver. Hence they are not forwarded to the compiler frontend.
Introduces the SYCL based toolchain and initial toolchain construction
when using the '-fsycl' option. This option will enable SYCL based
offloading, creating a SPIR-V based IR file packaged into the compiled
host object.
This includes early support for creating the host/device object using
the new offloading model. The device object is created using the
spir64-unknown-unknown target triple.
New/Updated Options:
-fsycl Enables SYCL offloading for host and device
-fsycl-device-only
Enables device only compilation for SYCL
-fsycl-host-only
Enables host only compilation for SYCL
RFC Reference:
https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092
This is a reland of: https://github.com/llvm/llvm-project/pull/107493
Introduces the SYCL based toolchain and initial toolchain construction
when using the '-fsycl' option. This option will enable SYCL based
offloading, creating a SPIR-V based IR file packaged into the compiled
host object.
This includes early support for creating the host/device object using
the new offloading model. The device object is created using the
spir64-unknown-unknown target triple.
New/Updated Options:
-fsycl Enables SYCL offloading for host and device
-fsycl-device-only
Enables device only compilation for SYCL
-fsycl-host-only
Enables host only compilation for SYCL
RFC Reference:
https://discourse.llvm.org/t/rfc-sycl-driver-enhancements/74092
This removes the temporary ban on mixing AMDGCN flavoured SPIR-V and
concrete targets (e.g. `gfx900`) in the same HIPAMD compilation. This is
done primarily by tweaking the effective / observable triple when the
target is `amdgcnspirv`, which seamlessly composes with the existing
infra. The test is stolen from #75357.
In https://github.com/llvm/llvm-project/pull/109837, it sets a global
variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp
which introduced a data race detected by TSan. To fix this, I decouple
the flag setting, the flags are now set
separately(`instrument-cold-function-only-path` is required to be used
with `--pgo-instrument-cold-function-only`).
PR https://github.com/llvm/llvm-project/pull/111976 was enabling the
tests updated in the PR to run on all systems. We found a few didn't run
on z/OS. I tracked the problem down to:
1. the ExecuteToolChainProgram() function wasn't passing the executable
name as the first arg. That was causing exec on z/OS to fail.
2. the temp file needs to be a text file so codepage conversion happens.
This patch adds support for cold function coverage instrumentation based
on sampling PGO counts. The major motivation is to detect dead functions
for the services that are optimized with sampling PGO. If a function is
covered by sampling profile count (e.g., those with an entry count > 0),
we choose to skip instrumenting those functions, which significantly
reduces the instrumentation overhead.
More details about the implementation and flags:
- Added a flag `--pgo-instrument-cold-function-only` in
`PGOInstrumentation.cpp` as the main switch to control skipping the
instrumentation.
- Built the extra instrumentation passes(a bundle of passes in
`addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by
`--instrument-cold-function-only-path` flag.
- Added a driver flag `-fprofile-generate-cold-function-coverage`:
- 1) Config the flags in one place, i,e. adding
`--instrument-cold-function-only-path=<...>` and
`--pgo-function-entry-coverage`. Note that the instrumentation file path
is passed through `--instrument-sample-cold-function-path`, because we
cannot use the `PGOOptions.ProfileFile` as it's already used by
`-fprofile-sample-use=<...>`.
- 2) makes linker to link `compiler_rt.profile` lib(see
[ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131)
).
- Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry
count to determine cold function.
Overall, the full command is like:
```
clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code
```
This does a global rename from `flang-new` to `flang`. I also
removed/changed any TODOs that I found related to making this change.
---------
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
Co-authored-by: Andrzej Warzynski <andrzej.warzynski@arm.com>
PR #102521, which landed as 1ea0865dd6fa, implemented
`CLANG_TOOLCHAIN_PROGRAM_TIMEOUT`, but the logic is obviously wrong.
If the user-specified value is negative, it should become zero to mean
infinite. Otherwise, it should be left as is. Thus, use `std::max`
not `std::min`. This obvious fixup doesn't seem worth another pull
request.
This change modifies -ffp-model=fast to select options that more closely
match -funsafe-math-optimizations, and introduces a new model,
-ffp-model=aggressive which matches the existing behavior (except for a
minor change in the fp-contract behavior).
The primary motivation for this change is to make -ffp-model=fast more
user friendly, particularly in light of LLVM's aggressive optimizations
when -fno-honor-nans and -fno-honor-infinites are used.
This was previously proposed here:
https://discourse.llvm.org/t/making-ffp-model-fast-more-user-friendly/78402
This changes the bare-metal driver logic such that it _always_ tries
multilib.yaml if it exists, and it falls back to the hardwired/default
RISC-V multilib selection only if a multilib.yaml doesn't exist. In
contrast, the current behavior is that RISC-V can never use
multilib.yaml, but other targets will try it if it exists.
The flags `-march=` and `-mabi=` are exposed for multilib.yaml to match
on. There is no attempt to help YAML file creators to duplicate the
existing hard-wired multilib reuse logic -- they will have to implement
it using `Mappings`.
This should be backwards-compatible with existing sysroots, as
multilib.yaml was previously never used for RISC-V, and the behavior
doesn't change after this PR if the file doesn't exist.
When working on very busy systems, check-offload frequently fails many
tests with this diagnostic:
```
clang: error: cannot determine amdgcn architecture: /tmp/llvm/build/bin/amdgpu-arch: Child timed out: ; consider passing it via '-march'
```
This patch accepts the environment variable
`CLANG_TOOLCHAIN_PROGRAM_TIMEOUT` to set the timeout. It also increases
the timeout from 10 to 60 seconds.
Support for this was added back in 2016
(https://reviews.llvm.org/D27499), but never enabled in the driver.
Since then, it's been possible to enable this with an arm triple and the
-mthumb option, but not with a thumb triple.
This also caused -fsanitise=cfi to enable cfi-icall for arm triple but
not thumb triples, which caused spurious sanitiser failures if mixing
the two ISAs in one program.
When `pauthtest` is either passed as environment part of AArch64 Linux
triple
or passed via `-mabi=`, enable the following ptrauth flags:
- `intrinsics`;
- `calls`;
- `returns`;
- `auth-traps`;
- `vtable-pointer-address-discrimination`;
- `vtable-pointer-type-discrimination`;
- `init-fini`.
Some related stuff is still subject to change, and the ABI itself might
be changed, so end users are not expected to use this and the ABI name
has 'test' suffix.
If `-mabi=pauthtest` option is used, it's normalized to effective
triple.
When the environment part of the effective triple is `pauthtest`, try
to use `aarch64-linux-pauthtest` as multilib directory.
The following is not supported:
- combination of `pauthtest` ABI with any branch protection scheme
except BTI;
- explicit set of environment part of the triple to a value different
from `pauthtest` in combination with `-mabi=pauthtest`;
- usage on non-Linux OS.
---------
Co-authored-by: Anatoly Trosinenko <atrosinenko@accesssoftek.com>
In a multilib setting, if you compile with a command line such as `clang
--target=aarch64-none-elf -march=armv8.9-a+rcpc3`,
`getAArch64MultilibFlags` returns an ill-formed string containing two
consecutive `+` signs, of the form `...+rcpc++rcpc3+...`, causing later
stages of multilib selection to get confused.
The `++` arises from the entry in `AArch64::Extensions` for the
SubtargetFeature `rcpc-immo`, which is a dependency of the `rcpc3`
SubtargetFeature, but doesn't have an _extension_ name for the purposes
of the `-march=foo+bar` option. So its `UserVisibleName` field is the
empty string.
To fix this, I've excluded extensions from consideration in
`getAArch64MultilibFlags` if they have an empty `UserVisibleName`. Since
the input to this function is not derived from a completely general set
of SubtargetFeatures, but from a set that has only just been converted
_from_ a clang driver command line, the only extensions skipped by this
check should be cases like this one, where the anonymous extension was
only included because it was a dependency of one mentioned explicitly.
I've also made the analogous change in `getARMMultilibFlags`. I don't
think it's necessary right now, because the architecture extensions for
ARM (defined in `ARMTargetParser.def` rather than Tablegen) don't
include any anonymous ones. But it seems sensible to add the check
anyway, in case future refactoring introduces anonymous array elements
in the same way that AArch64 did, and also in case someone writes a
function for another platform by using either of these as example code.
We want to support using a complete Clang/LLVM toolchain that includes
LLVM libc and libc++ for baremetal targets. To do so, we need the driver
to add the necessary include paths.
This introduces the new `--print-enabled-extensions` command line option
to AArch64, which prints the list of extensions that are enabled for the
target specified by the combination of `--target`/`-march`/`-mcpu`
values.
The goal of the this option is both to enable the manual inspection of
the enabled extensions by users and to enhance the testability of
architecture versions and CPU targets implemented in the compiler.
As part of this change, a new field for `FEAT_*` architecture feature
names was added to the TableGen entries. The output of the existing
`--print-supported-extensions` option was updated accordingly to show
these in a separate column.
This introduces the new `--print-enabled-extensions` command line option
to AArch64, which prints the list of extensions that are enabled for the
target specified by the combination of `--target`/`-march`/`-mcpu`
values.
The goal of the this option is both to enable the manual inspection of
the enabled extensions by users and to enhance the testability of
architecture versions and CPU targets implemented in the compiler.
As part of this change, a new field for `FEAT_*` architecture feature
names was added to the TableGen entries. The output of the existing
`--print-supported-extensions` option was updated accordingly to show
these in a separate column.
Summary:
The utilities `nvptx-arch` and `amdgpu-arch` are used to support
`--offload-arch=native` among other utilities in clang. However, these
rely on the GPU drivers to query the features. In certain cases these
drivers can become locked up, which will lead to indefinate hangs on any
compiler jobs running in the meantime.
This patch adds a ten second timeout period for these utilities before
it kills the job and errors out.
This change cleans up the clang driver handling of umbrella options like
-ffast-math, -funsafe-math-optimizations, and -ffp-model, and aligns the
behavior of -ffp-model=fast with -ffast-math with regard to the linking
of crtfastmath.o.
We agreed in a previous review that the fast-math options should not
attempt to change the -fdenormal-fp-math option, which is inherently
target-specific.
The clang user's manual claims that -ffp-model=fast "behaves identically
to specifying both -ffast-math and -ffp-contract=fast." Since
-ffast-math causes crtfastmath.o to be linked if it is available, that should
also happen with -ffp-model=fast.
I am going to be proposing further changes to -ffp-model=fast,
decoupling it from -ffast-math and introducing a new
-ffp-model=aggressive that matches the current behavior, but I wanted
to solidfy the current behavior before I do that.