121 Commits

Author SHA1 Message Date
Qinkun Bao
18b885f66b
Revert "[clang][modules] Timestamp-less validation API" (#139987)
Reverts llvm/llvm-project#138983
2025-05-14 21:02:57 -04:00
Jan Svoboda
960afcc90e
[clang][modules] Timestamp-less validation API (#138983)
Timestamps are an implementation detail of the cross-process module
cache implementation. This PR hides it from the `ModuleCache` API, which
simplifies the in-process implementation.
2025-05-14 14:31:23 -07:00
Jan Svoboda
49c513844d
[clang][modules] Allow not forcing validation of user headers (#139091)
Force-validation of user headers was implemented in acb803e8 to deal
with files changing during build. The dependency scanner guarantees an
immutable file system during single build session, so the validation is
unnecessary. (We don't hit the disk too often due to the caching VFS,
but even avoiding going to the cache and deserializing the input files
makes sense.)
2025-05-09 08:33:28 -07:00
Jan Svoboda
1698beb542
[clang][modules][deps] Optimize in-process timestamping of PCMs (#137363)
In the past, timestamps used for
`-fmodules-validate-once-per-build-session` were found to be a source of
contention in the dependency scanner
([D149802](https://reviews.llvm.org/D149802),
https://github.com/llvm/llvm-project/pull/112452). This PR is yet
another attempt to optimize these. We now make use of the new
`ModuleCache` interface to implement the in-process version in terms of
atomic `std::time_t` variables rather the mtime attribute on
`.timestamp` files.
2025-05-07 14:02:40 -07:00
Jan Svoboda
b69dcb8734
[clang][frontend] Require invocation to construct CompilerInstance (#137668)
This PR makes it so that `CompilerInvocation` needs to be provided to
`CompilerInstance` on construction. There are a couple of benefits in my
view:
* Making it impossible to mis-use some `CompilerInstance` APIs. For
example there are cases, where `createDiagnostics()` was called before
`setInvocation()`, causing the `DiagnosticEngine` to use the
default-constructed `DiagnosticOptions` instead of the intended ones.
* This shrinks `CompilerInstance`'s state space.
* This makes it possible to access **the** invocation in
`CompilerInstance`'s constructor (to be used in a follow-up).
2025-05-01 07:31:30 -07:00
Jan Svoboda
060f3f0dd1
[clang][deps] Make dependency directives getter thread-safe (#136178)
This PR fixes two issues in one go:
1. The dependency directives getter (a `std::function`) was being stored
in `PreprocessorOptions`. This goes against the principle where the
options classes are supposed to be value-objects representing the `-cc1`
command line arguments. This is fixed by moving the getter directly to
`CompilerInstance` and propagating it explicitly.
2. The getter was capturing the `ScanInstance` VFS. That's fine in
synchronous implicit module builds where the same VFS instance is used
throughout, but breaks down once you try to build modules asynchronously
(which forces the use of separate VFS instances). This is fixed by
explicitly passing a `FileManager` into the getter and extracting the
right instance of the scanning VFS out of it.
2025-04-23 10:33:12 -07:00
Kazu Hirata
c2d6c7cea7
[clang] Use llvm::append_range (NFC) (#136448) 2025-04-19 12:21:14 -07:00
Cyndy Ishida
40050888a1
[clang][depscan] Centralize logic for populating StableDirs, NFC (#135704)
Pass a reference to `StableDirs` when creating ModuleDepCollector. This
avoids needing to create one from the same ScanInstance for each call to
`handleTopLevelModule` & reduces the amount of potential downstream
changes needed for handling StableDirs.
2025-04-15 09:59:23 -07:00
Cyndy Ishida
1365b5b1ad
[clang][DependencyScanning] Track dependencies from prebuilt modules to determine IsInStableDir (#132237)
When a module is being scanned, it can depend on modules that have
already been built from a pch dependency. When this happens, the pcm
files are reused for the module dependencies. When this is the case,
check if input files recorded from the PCMs come from the provided
stable directories transitively since the scanner will not have access
to the full set of file dependencies from prebuilt modules.
2025-04-08 15:48:25 -07:00
Kazu Hirata
7cc17fb085
[ADT] Remove old range constructors of SmallSet and StringSet (#133205)
This patch removes the old range constructors of SmallSet and
StringSet that do not take the llvm::from_range tag.  Since there are
so few uses, this patch directly removes them without going through
the deprecation process.
2025-03-27 07:52:13 -07:00
Jan Svoboda
056264b838
[clang][deps] Implement efficient in-process ModuleCache (#129751)
The dependency scanner uses implicitly-built Clang modules under the
hood. This system was originally designed to handle multiple concurrent
processes working on the same module cache, and mutual exclusion was
implemented using file locks. The scanner, however, runs within single
process, making file locks unnecessary. This patch virtualizes the
interface for module cache locking and provides an implementation based
on `std::shared_mutex`. This reduces `clang-scan-deps` runtime by ~17%
on my benchmark.

Note that even when multiple processes run a scan on the same module
cache (and therefore don't coordinate efficiently), this should still be
correct due to the strict context hash, the write-through
`InMemoryModuleCache` and the logic for rebuilding out-of-date or
incompatible modules.
2025-03-18 14:01:04 -07:00
Jan Svoboda
d2e66625bc
[clang][deps] Propagate the entire service (#128959)
Shared state between dependency scanning workers is managed by the
dependency scanning service.

Right now, the members are individually threaded through the worker,
action, and collector. This makes any change to the service and its
members a very laborious process. Moreover, this situation causes
frequent merge conflicts in our downstream repo where the service does
have some extra members that need to be passed around.

To ease the maintenance burden, this PR starts passing a reference to
the entire service.
2025-02-27 10:06:26 -08:00
Ben Langmuir
e3cab30ab9
[clang][deps] Ensure DiagnosticConsumer::finish is always called (#127110)
When using the clang dependency scanner with an arbitrary
DiagnosticConsumer, it is important that we always call finish().
Previously, if there was an error preventing us from reaching the
scanning action, or if the command line contained no scannable actions
we would fail to finish(), which would break some consumers (e.g.
serialized diag consumer).
2025-02-13 14:06:17 -08:00
Steven Wu
7a52b93837
[DependencyScanning] Add ability to scan TU with a buffer input (#125111)
Update Dependency scanner so it can scan the dependency of a TU with
a provided buffer rather than relying on the on disk file system to
provide the input file.
2025-02-04 16:37:29 -08:00
Kadir Cetinkaya
df9a14d7bb
Reapply "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852)"
This reverts commit a1153cd6fedd4c906a9840987934ca4712e34cb2 with fixes
to lldb breakages.

Fixes https://github.com/llvm/llvm-project/issues/117145.
2024-11-21 14:55:30 +01:00
Sylvestre Ledru
a1153cd6fe Revert "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852)"
Reverted for causing:
https://github.com/llvm/llvm-project/issues/117145

This reverts commit bdd10d9d249bd1c2a45e3de56a5accd97e953458.
2024-11-21 13:04:30 +01:00
kadir çetinkaya
bdd10d9d24
[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852)
Starting with 41e3919ded78d8870f7c95e9181c7f7e29aa3cc4 DiagnosticsEngine
creation might perform IO. It was implicitly defaulting to
getRealFileSystem. This patch makes it explicit by pushing the decision
making to callers.

It uses ambient VFS if one is available, and keeps using
`getRealFileSystem` if there aren't any VFS.
2024-11-21 12:11:41 +01:00
Jan Svoboda
25d1ac11d5
[clang][deps] Only write preprocessor info into PCMs (#115239)
This patch builds on top of
https://github.com/llvm/llvm-project/pull/115237 and
https://github.com/llvm/llvm-project/pull/115235, only passing the
`Preprocessor` object to `ASTWriter`. This reduces the size of scanning
PCM files by 1/3 and speeds up scans by 16%.
2024-11-11 13:07:08 -08:00
Jan Svoboda
a6637ae2cc
[clang][deps] Share FileManager between modules (#115065)
The `FileManager` sharing between module-building `CompilerInstance`s
was disabled a while ago due to `FileEntry::getName()` being unreliable.
Now that we use `FileEntryRef::getNameAsRequested()` in places where it
matters, re-enabling `FileManager` is sound and improves performance of
`clang-scan-deps` by ~6.2%.
2024-11-06 14:21:01 -08:00
Jan Svoboda
6e4dcbb21d
[clang][deps] Print tracing VFS data (#108056)
Clang's `-cc1 -print-stats` shows lots of useful internal data including
basic `FileManager` stats. Since this layer caches some results, it is
unclear how that information translates to actual filesystem accesses.
This PR uses `llvm::vfs::TracingFileSystem` to provide that missing
information.

Similar mechanism is implemented for `clang-scan-deps`'s verbose mode
(`-v`). IO contention proved to be a real bottleneck a couple of times
already and this new feature should make those easier to detect in the
future. The tracing VFS is inserted below the caching FS and above the
real FS.
2024-09-11 16:04:56 -07:00
Chuanqi Xu
62fec3d23d [NFCI] [ClangScanDeps] [P1689] Use PreprocessorOnly Action for P1689
It is fine enough to use PreprocessorOnly action for P1689 format. We
don't need to read any PCH or module files.
2024-09-06 15:20:59 +08:00
Jan Svoboda
55323ca6c8
[clang][deps] Only bypass scanning VFS for the module cache (#88800)
The scanning VFS doesn't cache stat failures of paths with no extension.
This was originally implemented to avoid caching the non-existence of
the modules cache directory that the modular scanner will eventually
create if it does not exist.

However, this prevents caching of the non-existence of all directories
and notably also header files from the standard C++ library, which can
lead to sub-par performance.

This patch adds an API to the scanning VFS that allows clients to
configure path prefix for which to bypass the scanning VFS and use the
underlying VFS directly.
2024-08-13 08:41:39 -07:00
Chuanqi Xu
d64eccf433
[clang] Split ObjectFilePCHContainerReader from ObjectFilePCHContainerWriter (#99599)
Close https://github.com/llvm/llvm-project/issues/99479

See https://github.com/llvm/llvm-project/issues/99479 for details
2024-07-23 23:55:31 +08:00
Nishith Kumar M Shah
0559eaff5a
Revert "Pass LangOpts from CompilerInstance to DependencyScanningWorker (#93753)" (#94488)
This reverts commit 9862080b1cbf685c0d462b29596e3f7206d24aa2.
2024-06-05 11:42:13 -07:00
Nishith Kumar M Shah
9862080b1c
Pass LangOpts from CompilerInstance to DependencyScanningWorker (#93753)
This commit fixes https://github.com/llvm/llvm-project/issues/88896 by
passing LangOpts from the CompilerInstance to
DependencyScanningWorker so that the original LangOpts are
preserved/respected.
This makes for more accurate parsing/lexing when certain language
versions or features specific to versions are to be used.
2024-06-03 17:20:43 +02:00
Kazu Hirata
197c3a3efc
Use llvm::less_first (NFC) (#94136) 2024-06-02 07:45:50 -07:00
Alexandre Ganea
39ed3c68e5
[clang-scan-deps] Fix contention when updating TrackingStatistics in hot code paths in FileManager. (#88427)
`FileManager::getDirectoryRef()` and `FileManager::getFileRef()` are hot code paths in `clang-scan-deps`. These functions are updating on every call a few atomics related to printing statistics, which causes contention on high core count machines.

![Screenshot 2024-04-10
214123](https://github.com/llvm/llvm-project/assets/37383324/5756b1bc-cab5-4612-8769-ee7e03a66479)

![Screenshot 2024-04-10
214246](https://github.com/llvm/llvm-project/assets/37383324/3d560e89-61c7-4fb9-9330-f9e660e8fc8b)

![Screenshot 2024-04-10
214315](https://github.com/llvm/llvm-project/assets/37383324/006341fc-49d4-4720-a348-7af435c21b17)

After this patch we make the variables local to the `FileManager`.

In our test case, this saves about 49 sec over 1 min 47 sec of `clang-scan-deps` run time (1 min 47 sec before, 58 sec after). These figures are after applying my suggestion in https://github.com/llvm/llvm-project/pull/88152#issuecomment-2049803229, that is:
```
static bool shouldCacheStatFailures(StringRef Filename) {
  return true;
}
```
Without the above, there's just too much OS noise from the high volume of `status()` calls with regular non-modules C++ code. Tested on Windows with clang-cl.
2024-04-25 10:31:45 -04:00
Jan Svoboda
2248164a9a Revert "[clang] Move state out of PreprocessorOptions (1/n) (#86358)"
This reverts commit 407a2f23 which stopped propagating the callback to module compiles, effectively disabling dependency directive scanning for all modular dependencies. Also added a regression test.
2024-04-09 13:26:45 -07:00
Jan Svoboda
407a2f231a
[clang] Move state out of PreprocessorOptions (1/n) (#86358)
An instance of `PreprocessorOptions` is part of `CompilerInvocation`
which is supposed to be a value type. The `DependencyDirectivesForFile`
member is problematic, since it holds an owning reference of the
scanning VFS. This makes it not a true value type, and it can keep
potentially large chunk of memory (the local cache in the scanning VFS)
alive for longer than clients might expect. Let's move it into the
`Preprocessor` instead.
2024-03-29 11:20:55 -07:00
Jan Svoboda
b768a8c1db
[clang][deps] Lazy dependency directives (#86347)
Since b4c83a13f664582015ea22924b9a0c6290d41f5b, `Preprocessor` and
`Lexer` are aware of the concept of scanning dependency directives. This
makes it possible to scan for them on-demand rather than eagerly on the
first filesystem operation (open, or even just stat).

This might improve performance, but is also necessary for the "PCH as
module" mode. Some precompiled header sources use the ".pch" file
extension, which means they were not getting scanned for dependency
directives. This was okay when the PCH was the main input file in a
separate scan step, because there we just lex the file in a
scanning-specific frontend action. But when such source gets treated as
a module implicitly loaded from a TU, it will get compiled as any other
module - with Sema - which will result in compilation errors. (See
attached test case.)

rdar://107663951
2024-03-22 16:09:34 -07:00
Ben Langmuir
083da46ff0
[clang][deps] Fix dependency scanning with -working-directory (#84525)
Stop overriding -working-directory to CWD during argument parsing, which
should no longer necessary after we set the VFS working directory, and
set FSOpts correctly after parsing arguments so that working-directory
behaves correctly.
2024-03-12 08:02:54 -07:00
Michael Spencer
de3b2c293b
[clang][ScanDeps] Allow PCHs to have different VFS overlays (#82294)
It turns out it's not that uncommon for real code to pass a different
set of VFSs while building a PCH than while using the PCH. This can
cause problems as seen in `test/ClangScanDeps/optimize-vfs-pch.m`. If
you scan `compile-commands-tu-no-vfs-error.json` without -Werror and run
the resulting commands, Clang will emit a fatal error while trying to
emit a note saying that it can't find a remapped header.

This also adds textual tracking of VFSs for prebuilt modules that are
part of an included PCH, as the same issue can occur in a module we are
building if we drop VFSs. This has to be textual because we have no
guarantee the PCH had the same list of VFSs as the current TU.

This uses the `PrebuiltModuleListener` to collect `VFSOverlayFiles`
instead of trying to extract it out of a `serialization::ModuleFile`
each time it's needed. There's not a great way to just store a pointer
to the list of strings in the serialized AST.
2024-02-23 17:48:58 -08:00
Michael Spencer
d42de86eb3
reland: [clang][ScanDeps] Canonicalize -D and -U flags (#82568)
Canonicalize `-D` and `-U` flags by sorting them and only keeping the
last instance of a given name.

This optimization will only fire if all `-D` and `-U` flags start with a
simple identifier that we can guarantee a simple analysis of can
determine if two flags refer to the same identifier or not. See the
comment on `getSimpleMacroName()` for details of what the issues are.

Previous version of this had issues with sed differences between macOS,
Linux, and Windows. This test doesn't check paths, so just don't run
sed.
Other tests should use `sed -E 's:\\\\?:/:g'` to get portable behavior.

Windows has different command line parsing behavior than Linux for
compilation databases, so the test has been adjusted to ignore that
difference.
2024-02-23 17:44:32 -08:00
Nico Weber
84ed55e11f Revert "[clang][ScanDeps] Canonicalize -D and -U flags (#82298)"
This reverts commit 3ff805540173b83d73b673b39ac5760fc19bac15.

Test is failing on bots, see
https://github.com/llvm/llvm-project/pull/82298#issuecomment-1955664462
2024-02-20 20:24:32 -05:00
Michael Spencer
3ff8055401
[clang][ScanDeps] Canonicalize -D and -U flags (#82298)
Canonicalize `-D` and `-U` flags by sorting them and only keeping the
last instance of a given name.

This optimization will only fire if all `-D` and `-U` flags start with a
simple identifier that we can guarantee a simple analysis of can
determine if two flags refer to the same identifier or not. See the
comment on `getSimpleMacroName()` for details of what the issues are.
2024-02-20 15:20:40 -08:00
Michael Spencer
b21a2f9365 [clang][scan-deps] Stop scanning if any scanning setup emits an error.
Without this scanning will continue and later hit an assert that the
number of `RedirectingFileSystem`s matches the number of -ivfsoverlay
arguments.
2024-01-30 17:03:13 -08:00
Michael Spencer
7847e44594
[clang][DependencyScanner] Remove unused -ivfsoverlay files (#73734)
`-ivfsoverlay` files are unused when building most modules. Enable
removing them by,
* adding a way to visit the filesystem tree with extensible RTTI to
  access each `RedirectingFileSystem`.
* Adding tracking to `RedirectingFileSystem` to record when it
  actually redirects a file access.
* Storing this information in each PCM.

Usage tracking is only enabled when iterating over the source manager
and affecting modulemaps. Here each path is stated to cause an access.
During scanning these stats all hit the cache.
2024-01-30 15:39:18 -08:00
Kazu Hirata
9b2c25c704 [clang] Use SmallString::operator std::string (NFC) 2024-01-20 18:57:30 -08:00
Jan Svoboda
22c68511ac
[clang][deps] Skip writing DIAG_PRAGMA_MAPPINGS record (#70874)
Following up on #69975, this patch skips writing `DIAG_PRAGMA_MAPPINGS`
as well. Deserialization of this PCM record is still showing up in
profiles, since it needs to be VBR-decoded for every transitively loaded
PCM file.

The scanner doesn't make any guarantees about diagnostic accuracy (and
it even disables all warnings), so skipping this record should be safe.
2023-11-10 07:04:43 -08:00
Michael Spencer
fb07d9cc09
[clang][DepScan] Make OptimizeArgs a bit mask enum and enable by default (#71588)
Make it easier to control which optimizations are enabled by making
OptimizeArgs a bit masked enum. There's currently only one such
optimization, but more will be added in followup commits.
2023-11-07 16:06:59 -08:00
Jan Svoboda
6c465a201b
[clang][deps] Skip slow UNHASHED_CONTROL_BLOCK records (#69975)
Deserialization of the `DIAGNOSTIC_OPTIONS` and `HEADER_SEARCH_PATHS`
records is slow and done for every transitively loaded PCM.
Deserialization of these records cannot be skipped, because the words
are VBR6-encoded and we don't store the length of the entire record. We
could either turn them into binary blobs that can be skipped during
deserialization, or skip writing them altogether. This patch takes the
latter approach, since these records are not necessary in scanning PCMs.
The scanner doesn't make any guarantees about the accuracy of
diagnostics, and we always have the same header search paths due to
strict context hashing.

The commit that makes the `DIAGNOSTIC_OPTIONS` record skippable was
originally implemented by @benlangmuir in a downstream repo.
2023-11-02 15:07:58 -07:00
Connor Sughrue
6b4de7b1c7 [clang][deps] add support for dependency scanning with cc1 command line
Allow users to run a dependency scan with a cc1 command line in addition to a driver command line. DependencyScanningAction was already being run with a cc1 command line, but DependencyScanningWorker::computeDependencies assumed that it was always provided a driver command line. Now DependencyScanningWorker::computeDependencies can handle cc1 command lines too.

Reviewed By: jansvoboda11

Differential Revision: https://reviews.llvm.org/D156234
2023-08-04 14:13:18 -07:00
Jan Svoboda
227f719958 [clang][modules][deps] Avoid checks for relocated modules
Currently, `ASTReader` performs some checks to diagnose relocated modules. This can add quite a bit of overhead to the scanner: it requires looking up, parsing and resolving module maps for all transitively loaded module files (and all the module maps encountered in the search paths on the way). Most of those checks are not really useful in the scanner anyway, since it uses strict context hash and immutable filesystem, which prevent those scenarios in the first place.

This can speed up scanning by up to 30%.

Depends on D150292.

Reviewed By: benlangmuir

Differential Revision: https://reviews.llvm.org/D150320
2023-07-17 13:50:24 -07:00
Ben Langmuir
8fe8d69ddf [clang][deps] Make clang-scan-deps write modules in raw format
We have no use for debug info for the scanner modules, and writing raw
ast files speeds up scanning ~15% in some cases. Note that the compile
commands produced by the scanner will still build the obj format (if
requested), and the scanner can *read* obj format pcms, e.g. from a PCH.

rdar://108807592

Differential Revision: https://reviews.llvm.org/D149693
2023-05-03 12:07:46 -07:00
Jan Svoboda
34f143988f
[clang][deps] NFC: Don't collect PCH input files
Since b4c83a13, PCH input files are no longer necessary.
2023-04-05 12:29:03 -07:00
Ben Langmuir
fcab930cd3 [clang][deps] Handle response files in dep scanner
Extract the code the driver uses to expand response files and reuse it
in the dependency scanner.

rdar://106155880

Differential Revision: https://reviews.llvm.org/D145838
2023-03-13 15:47:35 -07:00
Ben Langmuir
296ba5bbd3 [clang][deps] Split lookupModuleOutput out of DependencyConsumer NFC
The idea is to split the callbacks that are used to consume dependency
information (DependencyConsumer) from callbacks that modify the scan
behaviour itself in any way (DependencyActionController). Currently this
is just lookupModuleOutput, but we have additional callbacks related to
CAS support that we intend to upstream in the future.

Differential Revision: https://reviews.llvm.org/D144058
2023-03-10 13:14:49 -08:00
Chuanqi Xu
eb70b38f83 Recommit [C++20] [Modules] [ClangScanDeps] Add ClangScanDeps support for C++20 Named Modules in P1689 format (2/4)
Close https://github.com/llvm/llvm-project/issues/51792
Close https://github.com/llvm/llvm-project/issues/56770

This patch adds ClangScanDeps support for C++20 Named Modules in P1689
format. We can find the P1689 format at:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1689r5.html.
After we land the patch, we're able to compile C++20 Named
Modules with CMake! And although P1689 is written by kitware people,
other build systems should be able to use the format to compile C++20
Named Modules too.

TODO: Support header units in P1689 Format.
TODO2: Support C++20 Modules in the full dependency format of
ClangScanDeps. We also want to support C++20 Modules and clang modules
together according to
https://discourse.llvm.org/t/how-should-we-support-dependency-scanner-for-c-20-modules/66027.
But P1689 format cares about C++20 Modules only for now. So let's focus
on C++ Modules and P1689 format. And look at the full dependency format
later.

I'll add the ReleaseNotes and Documentations after the patch get landed.

Reviewed By: jansvoboda11

Differential Revision: https://reviews.llvm.org/D137527
2023-02-13 10:42:35 +08:00
NAKAMURA Takumi
069dd8768a Revert "[C++20] [Modules] [ClangScanDeps] Add ClangScanDeps support for C++20 Named Modules in P1689 format (2/4)"
This reverts commit de17c665e3f995c7f5a0e453461ce3a1b8aec196.

See also D137527
2023-02-12 18:38:25 +09:00
Archibald Elliott
d768bf994f [NFC][TargetParser] Replace uses of llvm/Support/Host.h
The forwarding header is left in place because of its use in
`polly/lib/External/isl/interface/extract_interface.cc`, but I have
added a GCC warning about the fact it is deprecated, because it is used
in `isl` from where it is included by Polly.
2023-02-10 09:59:46 +00:00