Close https://github.com/llvm/llvm-project/issues/185394
This is only for P1689 format as ClangScanDeps/optimize-vfs-pch.m will
check for warning message. I'll leave this to people who want to change
that.
This patch introduces a dependency from driver to dependency_scanning.
We need to tease out dependency_scanning's current dependency on driver
(just some headers that can be removed) and then add a depenency in
driver on dependency_scanning to make the patch work.
This PR changes how `ModuleManager` deduplicates module files.
Previously, `ModuleManager` used `FileEntry` for assigning unique
identity to module files. This works fine for explicitly-built modules
because they don't change during the lifetime of a single Clang
instance. For implicitly-built modules however, there are two issues:
1. The `FileEntry` objects are deduplicated by `FileManager` based on
the inode number. Some file systems reuse inode numbers of previously
removed files. Because implicitly-built module files are rapidly removed
and created, this deduplication breaks and compilations may fail
spuriously when inode numbers are recycled during the lifetime of a
single Clang instance.
2. The first thing `ModuleManager` does when loading a module file is
consulting the `FileManager` and checking the file size and modification
time match the expectation of the importer. This is done even when such
module file already lives in the `InMemoryModuleCache`. This introduces
racy behavior into the mechanism that explicitly tries to solve race
conditions, and may lead into spurious compilation failures.
This PR identifies implicitly-built module files by a pair of
`DirectoryEntry` of the module cache path and the path suffix
`<context-hash>/<module-name>-<module-map-path-hash>.pcm`. This gives us
canonicalization of the user-provided module cache path without turning
to `FileEntry` for the PCM file. The path suffix is Clang-generated and
is already canonical.
Some tests needed to be updated because the module cache path directory
was also used as an include directory. This PR relies on not caching the
non-existence of the module cache directory in the `FileManager`. When
other parts of Clang are trying to look up the same path and cache its
non-existence, things break. This is probably very specific to some of
our tests and not how users are setting up their compilations.
Previously, the normalized module cache path was only accessible via
`HeaderSearch::getSpecificModuleCachePath()` which may or may not also
contain the context hash. Clients would need to parse the result to
learn the normalized module cache path. What `ASTWriter` does instead is
normalize the as-written module cache path redundantly.
Instead, this PR exposes the normalized module cache path in the
`HeaderSearch` interface and moves the computation of specific module
cache path into the clangLex library.
This is motivated by another patch that would've needed to redundantly
perform the module cache path canonicalization or parse the specific
module cache path.
This PR upstreams a piece of infrastructure from Swift's LLVM fork. This
allows controlling behavior of the dependency scanner, e.g. modifying
the primary compiler invocation and modifying compiler instances (the
primary one and for imported modules). This will be used to implement
modules support in the CAS-based compilation caching mode.
The by-module-name scanning APIs are fairly spread out. There's the main
`CompilerInstanceWithContext` class that provides a constructor,
`initialize()` and `computeDependencies()`. Then there's the
`DependencyScanningWorker` that optionally owns
`CompilerInstanceWithContext` and re-exposes two `initialize()` and one
`computeDependencies()` functions. Lastly, there's
`DependencyScanningTool` that again re-exposes two variants of
`initialize()` and one `computeDependencies()`.
The current setup makes it unnecessarily difficult to make changes to
these APIs (as observed in
https://github.com/swiftlang/llvm-project/pull/12453).
This PR makes `CompilerInstanceWithContext` standalone, and hides the
construct + initialize pattern behind a static factory function. This
makes it harder to misuse the API (forgetting to call `initialize()`,
calling it twice, etc.) and means changes now need to only touch single
class instead of three classes spread over multiple files.
The `DiagnosticConsumer::finish()` API was removed in #183831. Since
that was the only thing the by-module-name `finalize()` API called, we
can safely remove that and simplify the scanner.
The `DiagnosticConsumer::finish()` API has historically been a source of
friction. Lots of different clients must manually ensure it gets called
for all consumers to work correctly. Idiomatic C++ uses destructors for
this. In Clang, there are some cases where destructors don't run
automatically, such as under `-disable-free` or some signal handling
code in `clang_main()`. This PR squeezes the complexity of ensuring
those destructors do run out of library code and into the tools that
already deal with the complexities of `-disable-free` and signal
handling.
* To avoid the build time overhead of checking for relocated modules,
only check it once per build session.
* Enable relocated module checks in the dependency scanner.
* Add remarks to know when this is happening with `-Rmodule-validation`
This check is necessary to be able to handle new libraries appearing in
earlier search paths. This is a valid scenario when dependency info
changes between incremental builds of the same scheme, thus new build
sessions.
It is still malformed to expect new versions of libraries to be added
within the same build session.
resolves: rdar://169174750
This patch essentially reverts 62fec3d2 which was an NFC, and replaces
the remaining check for the output format with more generic scanning
service option that controls whether we report absolute or relative file
paths. Along with #182063 this makes the scanner implementation entirely
independent of the desired output format.
This PR fixes a race condition discovered by thread sanitizer in the
asynchronous dependency scanner implementaion.
The implementation assumed that whenever a new thread is spawned to
compile a module, the primary scanning thread must wait for it to finish
to read the PCM it produces. This is not true - it's possible for the
implicit build on the primary thread to decide to compile the same
module too, leaving the asynchronous thread running without any kind of
synchronization. This means the TU scan may return, the service may get
destroyed, but the asynchronous thread continues running with the VFS
caches and module cache implementation destroyed, leading to crashes.
This PR fixes this by awaiting all asynchronous threads at the end of a
TU scan.
This PR makes it so that the base VFS used during a scan is provided to
the top-level service, not to each worker/tool individually. This
enables resolving a FIXME in the async-scan-modules mode of the scanner,
will clean up [downstream
code](0eb56baa1d/clang/tools/libclang/CDependencies.cpp (L619-L622)),
and will make it more difficult to have mismatching VFSs across workers,
which may cause non-deterministic poisoning of
`DependencyScanningFilesystemSharedCache`.
Adding new configuration knobs in the scanner is fairly painful now,
especially with a diverging downstream. This patch extracts what was
previously passed into the service constructor into a struct. This
encourages one knob customization per line, reduces difficult merge
conflicts, `/*ArgName=*/`-style comments with copy-pasted defaults, etc.
In a typical build, the build system schedules many TUs from the same
target to be scanned/compiled at the same time. These TUs tend to depend
on a similar set of modules, and they usually keep their imports
alphabetically sorted. The nature of implicit modules then means that
scanning these TUs reduces into a single-threaded computation, since
only one TU wins the race to compile the common dependency module, and
the same thread/process keeps being responsible for compiling all
transitive dependencies of such module.
This PR makes use of the single-module-parse-mode in a new scanning step
that runs at the start of each TU scan. In this step, the scanner
quickly discovers unconditional module dependencies of the TU without
blocking on its compile. This typically discovers plenty of work to keep
the available threads busy and compile modules in more parallel fashion.
Modules discovered here are compiled on separate threads right away in
the same two-step fashion.
The second step then performs the regular dependency scan of the TU
where each module import is a blocking operation. However, by this time,
the first scanning step most likely already compiled the majority of
modules, so there's actually little to no waiting happening here. The
compiler only deserializes previously-built modules.
This is a barebones implementation with some known quirks marked in
FIXME comments. I will continue working on performance and polish once
this lands.
The by-name lookup API uses the same diagnostics engine and consumer for
multiple lookups. When multiple lookups fail, the diagnostics could be
incorrect for all but the first failing lookup. All the subsequent
failing lookups inherit the diagnostics from the first failing lookup.
This PR resets the diagnostics consumer's buffer and the
CompilerInstance's diagnostics engine for each by-name lookup, so each
lookup can produce the correct diagnostics.
Part of work for rdar://136303612.
The `ModuleCache` class is currently reference-counted intrusively. As
explained in https://github.com/llvm/llvm-project/pull/139584, this is
problematic. This PR uses `std::shared_ptr` to reference-count
`ModuleCache` instead, which clarifies what happens to its lifetime when
constructing `CompilerInstance`, for example. This also makes the
reference in `ModuleManager` non-owning, simplifying the ownership
relationship further. The
`ASTUnit::transferASTDataFromCompilerInstance()` function now accounts
for that by taking care to keep it alive.
This is the final patch in a series that removes the dependency of
`clangDependencyScanning` on `clangDriver`, splitting the work from
#169964 into smaller changes (see comment linked below).
This patch updates the remaining parts of the scanning interface in
`DependencyScanningWorker` to only operate on `-cc1` / driver job
command lines.
This follows #171238, which applied this change to the by-name scanning
API.
This is part of a broader effort to support driver-managed builds for
compilations using C++ named modules and/or Clang modules. It is
required for linking the dependency scanning tooling against the driver
without introducing cyclic dependencies, which would otherwise cause
build failures when dynamic linking is enabled.
---------
Co-authored-by: Jan Svoboda <jan@svoboda.ai>
Calling `CompilerInstance::hasDiagnostics()` after
`CompilerInstance::createDiagnostics()` is pointless, since the creation
step cannot fail. This removes such calls.
This patch is part of a series of splitting the work from #169964 into
smaller changes (see review linked below).
As part of this work, the regular and by-name dependency scanning APIs
need to be extended to allow the VFS to be initialized via the
initVFS helpers before invoking the scanning API, with the resulting
overlay filesystem passed into the worker instead.
To make the newly added API changes consistent, this patch updates both
helpers to return an OverlayFileSystem.
https://github.com/llvm/llvm-project/pull/169964#pullrequestreview-3545879529
This is the second patch in a series that removes the dependency of
clangDependencyScanning on clangDriver, splitting the work from
#169964 into smaller changes (see comment linked below).
This patch updates the by-name scanning interface in
DependencyScanningWorker to accept only -cc1 command lines directly
and moves the logic for handling driver-style command lines into
DependencyScanningTool in clangTooling.
Support for -cc1 command lines in by-name scanning is introduced in
this patch.
The next patch will update the remaining parts of
DependencyScanningWorker to operate only on -cc1 command lines,
allowing its dependency on clangDriver to be removed.
https://github.com/llvm/llvm-project/pull/169964#pullrequestreview-3545879529
This updates the dependency-scanning tooling to consistently use
`ArrayRef<std::string>` rather than `const std::vector<std::string>&` in
function signatures.
This is done to help break PR #169964 into smaller, more manageable
pieces.
This patch is the first of two in refactoring Clang's dependency
scanning tooling to remove its dependency on clangDriver.
It separates Tooling/DependencyScanningTool.cpp from the rest of
clangDependencyScanning and moves clangDependencyScanning out of
clangTooling into its own library. No functional changes are
introduced.
The follow-up patch (#169964) will restrict clangDependencyScanning to
handling only -cc1 command line inputs and will move all functionality
related to handling driver commands into clangTooling.
(Tooling/DependencyScanningTool.cpp).
This is part of a broader effort to support driver-managed builds for
compilations using C++ named modules and/or Clang modules. It is
required for linking the dependency scanning tooling against the driver
without introducing cyclic dependencies, which would otherwise cause
build failures when dynamic linking is enabled.
The RFC for this change can be found here:
https://discourse.llvm.org/t/rfc-new-clangoptions-library-remove-dependency-on-clangdriver-from-clangfrontend-and-flangfrontend/88773?u=naveen-seth