152 Commits

Author SHA1 Message Date
Cyndy Ishida
15c3793cdf
[clang][scan-deps] Report a scanned TU's visible modules (#147969)
Clients of the dependency scanning service may need to add dependencies
based on the visibility of importing modules, for example, when
determining whether a Swift overlay dependency should be brought in
based on whether there's a corresponding **visible** clang module for
it.
This patch introduces a new field `VisibleModules` that contains all the
visible top-level modules in a given TU.
Because visibility is determined by which headers or (sub)modules were
imported, and not top-level module dependencies, the scanner now
performs a separate DFS starting from what was directly imported for
this computation.

In my local performance testing, there was no observable performance
impact.

resolves: rdar://151416358

---------

Co-authored-by: Jan Svoboda <jan@svoboda.ai>
2025-07-11 09:33:55 -07:00
Naveen Seth Hanig
52fbefb281
[clang][clang-scan-deps] Add named modules to format 'experimental-full' (#145221) 2025-06-24 16:21:19 -07:00
Qinkun Bao
43d042b350
Revert "[clang][scan-deps] Add option to disable caching stat failures" (#145528)
Reverts llvm/llvm-project#144000

First buildbot failure:
https://lab.llvm.org/buildbot/#/builders/164/builds/11064
2025-06-24 11:12:05 -04:00
Michael Spencer
6110dead89
[clang][scan-deps] Add option to disable caching stat failures (#144000)
While the source code isn't supposed to change during a build, in some
environments it does. This adds an option that disables caching of stat
failures, meaning that source files can be added to the build during
scanning.

This adds a `-no-cache-negative-stats` option to clang-scan-deps to
enable this behavior. There are no tests for clang-scan-deps as there's
no reliable way to do so from it. A unit test has been added that
modifies the filesystem between scans to test it.
2025-06-20 13:28:05 -07:00
Qiongsi Wu
8f4fd86403
Revert "[clang][Dependency Scanning] Report What a Module Exports during Scanning (#137421)" (#140820)
This reverts commit ea1bfbf3f6399b7d2d840722f0e87542d00f6a35.

The commit did not solve the fundamental issue we need to handle and is
no longer necessary.

rdar://144794793
2025-05-29 11:11:31 -07:00
Jan Svoboda
13e1a2cb22 Reapply "[clang] Remove intrusive reference count from DiagnosticOptions (#139584)"
This reverts commit e2a885537f11f8d9ced1c80c2c90069ab5adeb1d. Build failures were fixed right away and reverting the original commit without the fixes breaks the build again.
2025-05-22 12:52:03 -07:00
Kazu Hirata
e2a885537f Revert "[clang] Remove intrusive reference count from DiagnosticOptions (#139584)"
This reverts commit 9e306ad4600c4d3392c194a8be88919ee758425c.

Multiple builtbot failures have been reported:
https://github.com/llvm/llvm-project/pull/139584
2025-05-22 12:44:20 -07:00
Jan Svoboda
9e306ad460
[clang] Remove intrusive reference count from DiagnosticOptions (#139584)
The `DiagnosticOptions` class is currently intrusively
reference-counted, which makes reasoning about its lifetime very
difficult in some cases. For example, `CompilerInvocation` owns the
`DiagnosticOptions` instance (wrapped in `llvm::IntrusiveRefCntPtr`) and
only exposes an accessor returning `DiagnosticOptions &`. One would
think this gives `CompilerInvocation` exclusive ownership of the object,
but that's not the case:

```c++
void shareOwnership(CompilerInvocation &CI) {
  llvm::IntrusiveRefCntPtr<DiagnosticOptions> CoOwner = &CI.getDiagnosticOptions();
  // ...
}
```

This is a perfectly valid pattern that is being actually used in the
codebase.

I would like to ensure the ownership of `DiagnosticOptions` by
`CompilerInvocation` is guaranteed to be exclusive. This can be
leveraged for a copy-on-write optimization later on. This PR changes
usages of `DiagnosticOptions` across `clang`, `clang-tools-extra` and
`lldb` to not be intrusively reference-counted.
2025-05-22 12:33:52 -07:00
Jan Svoboda
989a40cba8
[clang][modules] Invalidate module cache when SDKSettings.json changes (#139751)
This PR adds the `%sdk/SDKSettings.json` file to the PCM input file
table, so that the PCM gets invalidated when the file changes. This is
necessary for availability checks to work correctly.
2025-05-13 14:00:31 -07:00
Qiongsi Wu
ea1bfbf3f6
[clang][Dependency Scanning] Report What a Module Exports during Scanning (#137421)
We would like to report, for a module, which direct dependencies it
exports during dependency scanning. This PR implements this reporting by
augmenting `ModuleDep`'s `ClangModuleDeps` variable. `ClangModuleDeps`
now contains instances of `DepInfo`, which is made of a `ModuleID` and a
boolean flag that indicates if a particular dependence is exported.

rdar://144794793
2025-04-30 14:08:02 -07:00
Jan Svoboda
506630d6db
[clang][deps] Avoid unchecked error assertion (#134284) 2025-04-03 14:57:06 -07:00
Cyndy Ishida
ad8f0e2760
[clang][DepScan] Pass references to ModuleDeps instead of ModuleID in lookupModuleOutput callbacks, NFCI (#131688)
This allows clients to reference more read-only attributes, like IsInStableDirectories.
2025-03-17 16:30:09 -07:00
Cyndy Ishida
584f8cc305
[clang][DependencyScanning] Track modules that resolve from "stable" locations (#130634)
That patch tracks whether all the file & module dependencies of a module
resolve to a stable location. This information will later be queried by
build systems for determining where to store the accompanying pcms.
2025-03-17 15:33:23 -07:00
Abhina Sree
3b6cc94e74
[SystemZ][z/OS] Mark text files as text in ClangScanDeps (#127514)
This patch continues the work that was started here
https://reviews.llvm.org/D99426 to correctly open text files in text
mode.
2025-02-18 09:09:10 -05:00
Kazu Hirata
d096f45322
[clang-scan-deps] Avoid repeated map lookups (NFC) (#127023) 2025-02-13 09:10:38 -08:00
Qiongsi Wu
54acda2e0e
[clang module] Current Working Directory Pruning (#124786)
When computing the context hash, `clang` always includes the compiler's
working directory. This can lead to situations when the only difference
between two compilations is the working directory, different module
variants are generated. These variants are redundant. This PR implements
an optimization that ignores the working directory when computing the
context hash when safe.

Specifically, `clang` checks if it is safe to ignore the working
directory in `isSafeToIgnoreCWD`. The check involves going through
compile command options to see if any paths specified are relative. The
definition of relative path used here is that the input path is not
empty, and `llvm::sys::path::is_absolute` is false. If all the paths
examined are not relative, `clang` considers it safe to ignore the
current working directory and does not consider the working directory
when computing the context hash.
2025-02-04 20:04:39 -08:00
Steven Wu
7a52b93837
[DependencyScanning] Add ability to scan TU with a buffer input (#125111)
Update Dependency scanner so it can scan the dependency of a TU with
a provided buffer rather than relying on the on disk file system to
provide the input file.
2025-02-04 16:37:29 -08:00
Kai Nacke
fdd7cafb90
[z/OS][SystemZ] Clang dependency files are text files (#121849)
The dependency file and the P1689 file are text files, but the
open call misses the OF_Text flag. This PR adds the flag.
Fixes regressions in test cases ClangScanDeps/modules-extern-unrelated.m
and ClangScanDeps/P1689.cppm.
2025-01-08 09:40:56 -05:00
Chandler Carruth
dd647e3e60
Rework the Option library to reduce dynamic relocations (#119198)
Apologies for the large change, I looked for ways to break this up and
all of the ones I saw added real complexity. This change focuses on the
option's prefixed names and the array of prefixes. These are present in
every option and the dominant source of dynamic relocations for PIE or
PIC users of LLVM and Clang tooling. In some cases, 100s or 1000s of
them for the Clang driver which has a huge number of options.

This PR addresses this by building a string table and a prefixes table
that can be referenced with indices rather than pointers that require
dynamic relocations. This removes almost 7k dynmaic relocations from the
`clang` binary, roughly 8% of the remaining dynmaic relocations outside
of vtables. For busy-boxing use cases where many different option tables
are linked into the same binary, the savings add up a bit more.

The string table is a straightforward mechanism, but the prefixes
required some subtlety. They are encoded in a Pascal-string fashion with
a size followed by a sequence of offsets. This works relatively well for
the small realistic prefixes arrays in use.

Lots of code has to change in order to land this though: both all the
option library code has to be updated to use the string table and
prefixes table, and all the users of the options library have to be
updated to correctly instantiate the objects.

Some follow-up patches in the works to provide an abstraction for this
style of code, and to start using the same technique for some of the
other strings here now that the infrastructure is in place.
2024-12-11 15:44:44 -08:00
Kadir Cetinkaya
df9a14d7bb
Reapply "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852)"
This reverts commit a1153cd6fedd4c906a9840987934ca4712e34cb2 with fixes
to lldb breakages.

Fixes https://github.com/llvm/llvm-project/issues/117145.
2024-11-21 14:55:30 +01:00
Sylvestre Ledru
a1153cd6fe Revert "[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852)"
Reverted for causing:
https://github.com/llvm/llvm-project/issues/117145

This reverts commit bdd10d9d249bd1c2a45e3de56a5accd97e953458.
2024-11-21 13:04:30 +01:00
kadir çetinkaya
bdd10d9d24
[NFC] Explicitly pass a VFS when creating DiagnosticsEngine (#115852)
Starting with 41e3919ded78d8870f7c95e9181c7f7e29aa3cc4 DiagnosticsEngine
creation might perform IO. It was implicitly defaulting to
getRealFileSystem. This patch makes it explicit by pushing the decision
making to callers.

It uses ambient VFS if one is available, and keeps using
`getRealFileSystem` if there aren't any VFS.
2024-11-21 12:11:41 +01:00
Jan Svoboda
9d4837f47c
[clang][deps][modules] Allocate input file paths lazily (#114457)
This PR builds on top of #113984 and attempts to avoid allocating input
file paths eagerly. Instead, the `InputFileInfo` type used by
`ASTReader` now only holds `StringRef`s that point into the PCM file
buffer, and the full input file paths get resolved on demand.

The dependency scanner makes use of this in a bit of a roundabout way:
`ModuleDeps` now only holds (an owning copy of) the short unresolved
input file paths, which get resolved lazily. This can be a big win, I'm
seeing up to a 5% speedup.
2024-11-11 09:46:50 -08:00
Jan Svoboda
39303e24b6
[clang][deps] Improve timing output (#113726)
This patch adds the number of executed instructions into the timing
output, which provides more stable results compared to wall or process
time.

The format itself is also tweaked so that it's more amenable for direct
import into a spreadsheet editor.
2024-10-28 14:00:31 -07:00
Jan Svoboda
c55d68fcc6
[clang][deps] Serialize JSON without creating intermediate objects (#111734)
The dependency scanner uses the `llvm::json` library for outputting the
dependency information. Until now, it created an in-memory
representation of the dependency graph using the `llvm::json::Object`
hierarchy. This not only creates unnecessary copies of the data, but
also forces lexicographical ordering of attributes in the output, both
of which I'd like to avoid. This patch adopts the `llvm::json::OStream`
API instead and reorders the attribute printing logic such that the
existing lexicographical ordering is preserved (for now).
2024-10-09 14:49:09 -07:00
Martin Storsjö
cead9044a9
[clang-scan-deps] Don't inspect Args[0] as an option (#109050)
Since a26ec542371652e1d774696e90016fd5b0b1c191, we expand the executable
name to an absolute path, if it isn't already one, if found in path.

This broke a couple tests in some environments; when the clang workdir
resides in a path under e.g. /opt. Tests that only use a tool name like
"clang-cl" would get expanded to the absolute path in the build tree.
The loop for finding the last "-o" like option for clang-cl command
lines would inspect all arguments, including Args[0] which is the
executable name itself. As an /opt path matches Arg.starts_with("/o"),
this would get detected as an object file output name in cases where
there was no other explicit output argument.

Thus, this fixes those tests in workdirs under e.g. /opt.
2024-09-19 20:46:09 +03:00
Martin Storsjö
a26ec54237
[clang-scan-deps] Infer the tool locations from PATH (#108539)
This allows the clang driver to know which tool is meant to be executed,
which allows the clang driver to load the right clang config files, and
allows clang to find colocated sysroots.

This makes sure that doing `clang-scan-deps -- <tool> ...` looks up
things in the same way as if one just would execute `<tool> ...`, when
`<tool>` isn't an absolute or relative path.
2024-09-13 23:18:10 +03:00
Martin Storsjö
87e1104cf0
[clang-scan-deps] Infer the target from the executable name (#108189)
This allows clang-scan-deps to work correctly when using cross compilers
with names like <triple>-clang.
2024-09-12 22:20:14 +03:00
Jan Svoboda
6e4dcbb21d
[clang][deps] Print tracing VFS data (#108056)
Clang's `-cc1 -print-stats` shows lots of useful internal data including
basic `FileManager` stats. Since this layer caches some results, it is
unclear how that information translates to actual filesystem accesses.
This PR uses `llvm::vfs::TracingFileSystem` to provide that missing
information.

Similar mechanism is implemented for `clang-scan-deps`'s verbose mode
(`-v`). IO contention proved to be a real bottleneck a couple of times
already and this new feature should make those easier to detect in the
future. The tracing VFS is inserted below the caching FS and above the
real FS.
2024-09-11 16:04:56 -07:00
Artem Chikin
68eb3b202f
[clang][deps] Collect discovered module dependencies' Link Libraries (#93588)
This will allow scanner clients to be able to compute e.g. auto-linking
dependencies of the scanned translation unit.
2024-06-04 09:16:49 -07:00
Alexandre Ganea
90e33e20a5
[clang-scan-deps] Expand response files before the argument adjuster (#89950)
Previously, since response (.rsp) files weren't expanded at the very
beginning of clang-scan-deps, we only parsed the command-line as
provided in the Clang .cdb file. Unfortunately, when using Unreal
Engine, arguments are always generated in a .rsp file (ie.
`/path/to/clang-cl.exe @/path/to/filename_args.rsp`).

After this patch, `/Fo` can be parsed and added to the final
command-line. Without this option, the make targets that are emitted are
made up from the input file name alone. We have some cases where the
same input in the project generates several output files, so we end up
with duplicate make targets in the scan-deps emitted dependency file.
2024-05-24 17:20:08 -04:00
Jan Svoboda
6a4eaf9b33
[clang][deps] Add -o flag to specify output path (#88767)
This makes it possible to pass "-o /dev/null" to `clang-scan-deps` and
skip some potentially expensive work, making timings less noisy. Also
removes the need for stream redirection.
2024-04-16 19:49:07 -07:00
Jan Svoboda
eafd515eca
[clang][deps] Support single-file mode for all formats (#88764)
The `clang-scan-deps` tool can be used for fast scanning of batches of
compilation commands passed in via the `-compilation-database` option.
This gets awkward in our tests where we have to resort to using
`.in`/`.template` JSON files and running them through `sed` in order to
embed LIT's `%t` variable into them. However, most of our tests only
need to pass single compilation command, so this dance is entirely
unnecessary.

This patch makes sure the existing "per-file" mode (where the
compilation command is passed in-line after the `--` argument) works for
all output formats, not only `P1689`.
2024-04-16 19:47:52 -07:00
Chuanqi Xu
9d6c43b4ae
[NFC] [C++20] [Modules] [P1689] [Scanner] Don't use thread pool in P1689 per file mode (#84285)
I suddenly found that the clang scan deps may use all concurrent threads
to scan the files. It makes sense in the batch mode. But in P1689 per
file mode, it simply wastes times and resources.

This patch itself should be a NFC patch. It simply moves codes.
2024-03-13 09:26:53 +08:00
Mehdi Amini
716042a63f
Rename llvm::ThreadPool -> llvm::DefaultThreadPool (NFC) (#83702)
The base class llvm::ThreadPoolInterface will be renamed
llvm::ThreadPool in a subsequent commit.

This is a breaking change: clients who use to create a ThreadPool must
now create a DefaultThreadPool instead.
2024-03-05 18:00:46 -08:00
Michael Spencer
d42de86eb3
reland: [clang][ScanDeps] Canonicalize -D and -U flags (#82568)
Canonicalize `-D` and `-U` flags by sorting them and only keeping the
last instance of a given name.

This optimization will only fire if all `-D` and `-U` flags start with a
simple identifier that we can guarantee a simple analysis of can
determine if two flags refer to the same identifier or not. See the
comment on `getSimpleMacroName()` for details of what the issues are.

Previous version of this had issues with sed differences between macOS,
Linux, and Windows. This test doesn't check paths, so just don't run
sed.
Other tests should use `sed -E 's:\\\\?:/:g'` to get portable behavior.

Windows has different command line parsing behavior than Linux for
compilation databases, so the test has been adjusted to ignore that
difference.
2024-02-23 17:44:32 -08:00
Nico Weber
84ed55e11f Revert "[clang][ScanDeps] Canonicalize -D and -U flags (#82298)"
This reverts commit 3ff805540173b83d73b673b39ac5760fc19bac15.

Test is failing on bots, see
https://github.com/llvm/llvm-project/pull/82298#issuecomment-1955664462
2024-02-20 20:24:32 -05:00
Michael Spencer
3ff8055401
[clang][ScanDeps] Canonicalize -D and -U flags (#82298)
Canonicalize `-D` and `-U` flags by sorting them and only keeping the
last instance of a given name.

This optimization will only fire if all `-D` and `-U` flags start with a
simple identifier that we can guarantee a simple analysis of can
determine if two flags refer to the same identifier or not. See the
comment on `getSimpleMacroName()` for details of what the issues are.
2024-02-20 15:20:40 -08:00
Mehdi Amini
744616b3ae
Rename ThreadPool::getThreadCount() to getMaxConcurrency() (NFC) (#82296)
This is addressing a long-time TODO to rename this misleading API. The
old one is preserved for now but marked deprecated.
2024-02-19 18:07:12 -08:00
Yaraslau
d783933bc9
[clang-scan-deps] Fix check for empty Compilation (#75545)
Closes https://github.com/llvm/llvm-project/issues/64144

Instead of checking for `nullptr` we need to ensure that `JobList` is
not empty to proceed
2024-01-31 09:55:05 +08:00
Michael Spencer
7847e44594
[clang][DependencyScanner] Remove unused -ivfsoverlay files (#73734)
`-ivfsoverlay` files are unused when building most modules. Enable
removing them by,
* adding a way to visit the filesystem tree with extensible RTTI to
  access each `RedirectingFileSystem`.
* Adding tracking to `RedirectingFileSystem` to record when it
  actually redirects a file access.
* Storing this information in each PCM.

Usage tracking is only enabled when iterating over the source manager
and affecting modulemaps. Here each path is stated to cause an access.
During scanning these stats all hit the cache.
2024-01-30 15:39:18 -08:00
Alexandre Ganea
3c6f47d6b8
[llvm-driver] Fix usage of InitLLVM on Windows (#76306)
Previously, some tools such as `clang` or `lld` which require strict
order for certain command-line options, such as `clang -cc1` or `lld
-flavor`, would not longer work on Windows, when these tools were linked
as part of `llvm-driver`. This was caused by `InitLLVM` which was part
of the `*_main()` function of these tools, which in turn calls
`windows::GetCommandLineArguments`. That function completly replaces
argc/argv by new UTF-8 contents, so any ajustements to argc/argv made by
`llvm-driver` prior to calling these tools was reset.

`InitLLVM` is now called by the `llvm-driver`. Any tool that
participates in (or is part of) the `llvm-driver` doesn't call
`InitLLVM` anymore.
2024-01-11 19:08:28 -05:00
Kazu Hirata
f3dcc2351c
[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.

I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
2023-12-13 08:54:13 -08:00
Michael Spencer
731152e18f
[clang][DependencyScanner] Remove all warning flags when suppressing warnings (#71612)
Since system modules don't emit most warnings, remove the warning flags
to increase module reuse.
2023-11-13 16:45:38 -08:00
Michael Spencer
fb07d9cc09
[clang][DepScan] Make OptimizeArgs a bit mask enum and enable by default (#71588)
Make it easier to control which optimizations are enabled by making
OptimizeArgs a bit masked enum. There's currently only one such
optimization, but more will be added in followup commits.
2023-11-07 16:06:59 -08:00
Jan Svoboda
3b1a68655e
[clang][deps] Generate command lines lazily (#65691)
This patch makes the generation of command lines for modular
dependencies lazy/on-demand. That operation is somewhat expensive and
prior to this patch used to be performed multiple times for the
identical `ModuleDeps` (i.e. when they were imported from multiple
different TUs).
2023-09-07 17:45:53 -07:00
Takuya Shimizu
01b88dd66d [NFC] Remove unused variables declared in conditions
D152495 makes clang warn on unused variables that are declared in conditions like `if (int var = init) {}`
This patch is an NFC fix to suppress the new warning in llvm,clang,lld builds to pass CI in the above patch.

Differential Revision: https://reviews.llvm.org/D158016
2023-08-30 10:05:06 +09:00
Jan Svoboda
3f092f37b7 [llvm] Extract common OptTable bits into macros
All command-line tools using `llvm::opt` create an enum of option IDs and a table of `OptTable::Info` object. Most of the tools use the same ID (`OPT_##ID`), kind (`Option::KIND##Class`), group ID (`OPT_##GROUP`) and alias ID (`OPT_##ALIAS`). This patch extracts that common code into canonical macros. This results in fewer changes when tweaking the `OPTION` macros emitted by the TableGen backend.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D157028
2023-08-04 13:57:13 -07:00
Jan Svoboda
8a077cfe23 [clang][deps] Make the C++ API more type-safe
Scanner's C++ API accepts a set of modular dependencies the client has already seen and for which it doesn't need the full details. This is currently a set of strings, which somewhat implies that it should contain the set of module names. However, scanner internally expects the values to be in the format "{hash}{name}". Besides not being documented, this is very unintuitive. This patch makes this expectation explicit by changing the type to set of `ModuleID`.

Reviewed By: benlangmuir

Differential Revision: https://reviews.llvm.org/D156492
2023-07-28 12:03:54 -07:00
Jan Svoboda
6b80f00bac [clang][deps] NFC: Refactor and comment ModuleDeps sorting
I once again stumbled across `IndexedModuleID` and was confused by the mutable `InputIndex` field.

This patch adds comments around that piece of code and unifies/simplifies related functions.

Reviewed By: Bigcheese

Differential Revision: https://reviews.llvm.org/D145197
2023-04-21 10:56:42 -07:00