This reverts commit 407a2f23 which stopped propagating the callback to module compiles, effectively disabling dependency directive scanning for all modular dependencies. Also added a regression test.
An instance of `PreprocessorOptions` is part of `CompilerInvocation`
which is supposed to be a value type. The `DependencyDirectivesForFile`
member is problematic, since it holds an owning reference of the
scanning VFS. This makes it not a true value type, and it can keep
potentially large chunk of memory (the local cache in the scanning VFS)
alive for longer than clients might expect. Let's move it into the
`Preprocessor` instead.
Since b4c83a13f664582015ea22924b9a0c6290d41f5b, `Preprocessor` and
`Lexer` are aware of the concept of scanning dependency directives. This
makes it possible to scan for them on-demand rather than eagerly on the
first filesystem operation (open, or even just stat).
This might improve performance, but is also necessary for the "PCH as
module" mode. Some precompiled header sources use the ".pch" file
extension, which means they were not getting scanned for dependency
directives. This was okay when the PCH was the main input file in a
separate scan step, because there we just lex the file in a
scanning-specific frontend action. But when such source gets treated as
a module implicitly loaded from a TU, it will get compiled as any other
module - with Sema - which will result in compilation errors. (See
attached test case.)
rdar://107663951
Stop overriding -working-directory to CWD during argument parsing, which
should no longer necessary after we set the VFS working directory, and
set FSOpts correctly after parsing arguments so that working-directory
behaves correctly.
It turns out it's not that uncommon for real code to pass a different
set of VFSs while building a PCH than while using the PCH. This can
cause problems as seen in `test/ClangScanDeps/optimize-vfs-pch.m`. If
you scan `compile-commands-tu-no-vfs-error.json` without -Werror and run
the resulting commands, Clang will emit a fatal error while trying to
emit a note saying that it can't find a remapped header.
This also adds textual tracking of VFSs for prebuilt modules that are
part of an included PCH, as the same issue can occur in a module we are
building if we drop VFSs. This has to be textual because we have no
guarantee the PCH had the same list of VFSs as the current TU.
This uses the `PrebuiltModuleListener` to collect `VFSOverlayFiles`
instead of trying to extract it out of a `serialization::ModuleFile`
each time it's needed. There's not a great way to just store a pointer
to the list of strings in the serialized AST.
Canonicalize `-D` and `-U` flags by sorting them and only keeping the
last instance of a given name.
This optimization will only fire if all `-D` and `-U` flags start with a
simple identifier that we can guarantee a simple analysis of can
determine if two flags refer to the same identifier or not. See the
comment on `getSimpleMacroName()` for details of what the issues are.
Previous version of this had issues with sed differences between macOS,
Linux, and Windows. This test doesn't check paths, so just don't run
sed.
Other tests should use `sed -E 's:\\\\?:/:g'` to get portable behavior.
Windows has different command line parsing behavior than Linux for
compilation databases, so the test has been adjusted to ignore that
difference.
Canonicalize `-D` and `-U` flags by sorting them and only keeping the
last instance of a given name.
This optimization will only fire if all `-D` and `-U` flags start with a
simple identifier that we can guarantee a simple analysis of can
determine if two flags refer to the same identifier or not. See the
comment on `getSimpleMacroName()` for details of what the issues are.
`-ivfsoverlay` files are unused when building most modules. Enable
removing them by,
* adding a way to visit the filesystem tree with extensible RTTI to
access each `RedirectingFileSystem`.
* Adding tracking to `RedirectingFileSystem` to record when it
actually redirects a file access.
* Storing this information in each PCM.
Usage tracking is only enabled when iterating over the source manager
and affecting modulemaps. Here each path is stated to cause an access.
During scanning these stats all hit the cache.
Following up on #69975, this patch skips writing `DIAG_PRAGMA_MAPPINGS`
as well. Deserialization of this PCM record is still showing up in
profiles, since it needs to be VBR-decoded for every transitively loaded
PCM file.
The scanner doesn't make any guarantees about diagnostic accuracy (and
it even disables all warnings), so skipping this record should be safe.
Make it easier to control which optimizations are enabled by making
OptimizeArgs a bit masked enum. There's currently only one such
optimization, but more will be added in followup commits.
Deserialization of the `DIAGNOSTIC_OPTIONS` and `HEADER_SEARCH_PATHS`
records is slow and done for every transitively loaded PCM.
Deserialization of these records cannot be skipped, because the words
are VBR6-encoded and we don't store the length of the entire record. We
could either turn them into binary blobs that can be skipped during
deserialization, or skip writing them altogether. This patch takes the
latter approach, since these records are not necessary in scanning PCMs.
The scanner doesn't make any guarantees about the accuracy of
diagnostics, and we always have the same header search paths due to
strict context hashing.
The commit that makes the `DIAGNOSTIC_OPTIONS` record skippable was
originally implemented by @benlangmuir in a downstream repo.
Allow users to run a dependency scan with a cc1 command line in addition to a driver command line. DependencyScanningAction was already being run with a cc1 command line, but DependencyScanningWorker::computeDependencies assumed that it was always provided a driver command line. Now DependencyScanningWorker::computeDependencies can handle cc1 command lines too.
Reviewed By: jansvoboda11
Differential Revision: https://reviews.llvm.org/D156234
Currently, `ASTReader` performs some checks to diagnose relocated modules. This can add quite a bit of overhead to the scanner: it requires looking up, parsing and resolving module maps for all transitively loaded module files (and all the module maps encountered in the search paths on the way). Most of those checks are not really useful in the scanner anyway, since it uses strict context hash and immutable filesystem, which prevent those scenarios in the first place.
This can speed up scanning by up to 30%.
Depends on D150292.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D150320
We have no use for debug info for the scanner modules, and writing raw
ast files speeds up scanning ~15% in some cases. Note that the compile
commands produced by the scanner will still build the obj format (if
requested), and the scanner can *read* obj format pcms, e.g. from a PCH.
rdar://108807592
Differential Revision: https://reviews.llvm.org/D149693
Extract the code the driver uses to expand response files and reuse it
in the dependency scanner.
rdar://106155880
Differential Revision: https://reviews.llvm.org/D145838
The idea is to split the callbacks that are used to consume dependency
information (DependencyConsumer) from callbacks that modify the scan
behaviour itself in any way (DependencyActionController). Currently this
is just lookupModuleOutput, but we have additional callbacks related to
CAS support that we intend to upstream in the future.
Differential Revision: https://reviews.llvm.org/D144058
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
When scanning dependencies of a module, the command line we're given doesn't have an input file, which the driver needs to be happy. We've been creating a fake in-memory input file named after the module. However, this can hide files/directories on the actual filesystem, leading to errors.
This patch works around that issue by generating a unique file name, which won't collide with the actual file system.
We could also change the driver APIs so that we're able to specify an "assumed" input file. This would be more work, though, since the driver assumes the input name comes from the actual command-line.
Depends on D140176.
Reviewed By: artemcm
Differential Revision: https://reviews.llvm.org/D140177
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
This reverts commit 2f8ac1804827026b44f429dce02730da18a73c50.
After committing a fix for previous buildbot failures in D138970,
re-landing the original change.
Before the fix the scanning would fail with
`-Werror,-Wnon-modular-include-in-module` despite the warning being
suppressed in the source code.
Existing approach with `-Wno-error` is not sufficient because it negates
only general `-Werror` but not specific `-Werror=...` and some warnings
can still emitted as errors. Make the approach stricter by using `-w`
flag and ignore all warnings, including those upgraded to errors. This
approach is still valid as it doesn't affect the dependencies.
rdar://101588531
Differential Revision: https://reviews.llvm.org/D138252
When a pcm has already been loaded from disk, reuse it from the
InMemoryModuleCache in readASTFileControlBlock. This avoids potentially
reading it again.
As noted in the FIXME, ideally we would also add the module to the cache
if it will be used again later, but that could modify its build state
and we do not have enough context currenlty to know if it's correct.
Differential Revision: https://reviews.llvm.org/D138160
This is a fix related to D135414. The original intention was to keep `BaseFS` as a member of the worker and conditionally overlay it with local in-memory FS. The `move` of ref-counted `BaseFS` was not intended, and it's a bug.
Disabling parallelism in the "by-module-name" test reliably reproduces this, and the test itself doesn't *need* parallelism. (I think `-j 4` was cargo culted from another test.) Reusing that test to check for correct behavior...
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D136124
The `-ivfsoverlay` flag was only being respected when the scanner was instructed to minimize the inputs. This patch respects that flag even in canonical preprocessing mode.
Depends on D135414.
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D135416
The dependency scanning worker may create an in-memory VFS and inject that into its `FileManager`. (Implemented in D109485.) This VFS currently persists between scans, which is not correct: in-memory files created in one scan could affect subsequent scans. This patch changes things so that the in-memory VFS is local to `DependencyScanningWorker::computeDependencies()`.
To test this, one would first scan for dependencies of a named module and then run second scan (on the same worker) for something unrelated, which would pick up the in-memory file created in the first scan. This set up is impossible to achieve with `clang-scan-deps`. We could set this up in `ToolingTests`, but the scanner needs to access the physical filesystem to store `.pcm` files. AFAIK we don't have a good way to propagate something like `%t` into unit tests to make that feasible. (This could be achieved with something like the virtualized `OutputManager`.)
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D135414
This patch provides `FileManager` with the CWD on construction in the worker, rather than later in the action.
Depends on D134976.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D134977
This patch removes the ability of a dependency scanning worker to share a `FileManager` instance between individual scans. It's not sound and doesn't provide performance benefits (due to the underlying caching VFS).
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D134976
This patch adds new member function to `DependencyScanningWorker` that allows clients to pass custom `DiagnosticConsumer`, and returns `bool`.
This provides more flexibility compared to the existing version that automatically stringifies diagnostics and returns them in `llvm::Error`.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D134838
The `ScanInstance` is a local variable in `DependencyScanningAction::runInvocation()` that is referenced by `ModuleDepCollector`. Since D132405, `ModuleDepCollector` can escape the function and can outlive its `ScanInstance`. This patch fixes that.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D133988
Attempt to fix the test failures observed in CI:
* Add Option dependency, which caused BUILD_SHARED_LIBS builds to fail
* Adapt tests that accidentally depended on the host platform: platforms
that don't use an integrated assembler (e.g. AIX) get a different set
of commands from the driver. Most dependency scanner tests can use
-fsyntax-only or -E instead of -c to avoid this, and in the rare case
we want to check -c specifically, set an explicit target so the
behaviour is independent of the host.
Original commit message follows.
---
Instead of trying to "fix" the original driver invocation by appending
arguments to it, split it into multiple commands, and for each -cc1
command use a CompilerInvocation to give precise control over the
invocation.
This change should make it easier to (in the future) canonicalize the
command-line (e.g. to improve hits in something like ccache), apply
optimizations, or start supporting multi-arch builds, which would
require different modules for each arch.
In the long run it may make sense to treat the TU commands as a
dependency graph, each with their own dependencies on modules or earlier
TU commands, but for now they are simply a list that is executed in
order, and the dependencies are simply duplicated. Since we currently
only support single-arch builds, there is no parallelism available in
the execution.
Differential Revision: https://reviews.llvm.org/D132405
Instead of trying to "fix" the original driver invocation by appending
arguments to it, split it into multiple commands, and for each -cc1
command use a CompilerInvocation to give precise control over the
invocation.
This change should make it easier to (in the future) canonicalize the
command-line (e.g. to improve hits in something like ccache), apply
optimizations, or start supporting multi-arch builds, which would
require different modules for each arch.
In the long run it may make sense to treat the TU commands as a
dependency graph, each with their own dependencies on modules or earlier
TU commands, but for now they are simply a list that is executed in
order, and the dependencies are simply duplicated. Since we currently
only support single-arch builds, there is no parallelism available in
the execution.
Differential Revision: https://reviews.llvm.org/D132405
This patch introduces new option `-eager-load-pcm` to `clang-scan-deps`, which controls whether the resulting command-lines will load PCM files eagerly (at the start of compilation) or lazily (when handling import directive). This patch also switches the default from eager to lazy.
To reduce the potential for churn in LIT tests in the future, this patch also removes redundant checks of command-line arguments and introduces new test `modules-dep-args.c` as a substitute.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D132066
Sharing the FileManager across implicit module builds currently leaks
paths looked up in an importer into the built module itself. This can
cause non-deterministic results across scans. It is especially bad for
modules since the path can be saved into the pcm file itself, leading to
stateful behaviour if the cache is shared.
This should not impact the number of real filesystem accesses in the
scanner, since it is already caching in the
DependencyScanningWorkerFilesystem.
Note: this change does not affect whether or not the FileManager is
shared across TUs in the scanner, which is a separate issue.
Differential Revision: https://reviews.llvm.org/D131412
It's an accident that we started return asbolute paths from
FileEntry::getName for all relative paths. Prepare for getName to get
(closer to) return the requested path. Note: conceptually it might make
sense for the dependency scanner to allow relative paths and have the
DependencyConsumer decide if it wants to make them absolute, but we
currently document that it's absolute and I didn't want to change
behaviour here.
Differential Revision: https://reviews.llvm.org/D130934
The command-line arguments for module builds are cc1 commands, so they
do not implicitly set -disable-free like a driver invocation, and
Tooling will disable it for the scanning instance itself. Set
-disable-free explicitly so that separate invocations for building
modules will not pay for freeing memory unnecessarily.
Differential Revision: https://reviews.llvm.org/D127229
This is a commit with the following changes:
* Remove `ExcludedPreprocessorDirectiveSkipMapping` and related functionality
Removes `ExcludedPreprocessorDirectiveSkipMapping`; its intended benefit for fast skipping of excluded directived blocks
will be superseded by a follow-up patch in the series that will use dependency scanning lexing for the same purpose.
* Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources
Replaces the "source minimization" mechanism with a mechanism that produces lexed dependency directives tokens.
* Make the special lexing for dependency scanning a first-class feature of the `Preprocessor` and `Lexer`
This is bringing the following benefits:
* Full access to the preprocessor state during dependency scanning. E.g. a component can see what includes were taken and where they were located in the actual sources.
* Improved performance for dependency scanning. Measurements with a release+thin-LTO build shows ~ -11% reduction in wall time.
* Opportunity to use dependency scanning lexing to speed-up skipping of excluded conditional blocks during normal preprocessing (as follow-up, not part of this patch).
For normal preprocessing measurements show differences are below the noise level.
Since, after this change, we don't minimize sources and pass them in place of the real sources, `DependencyScanningFilesystem` is not technically necessary, but it has valuable performance benefits for caching file `stat`s along with the results of scanning the sources. So the setup of using the `DependencyScanningFilesystem` during a dependency scan remains.
Differential Revision: https://reviews.llvm.org/D125486
Differential Revision: https://reviews.llvm.org/D125487
Differential Revision: https://reviews.llvm.org/D125488