This patch tries to remove all the direct use of DeclID except the real
low level reading and writing. All the use of DeclID is converted to
the use of LocalDeclID or GlobalDeclID. This is helpful to increase the
readability and type safety.
Previously, the DeclID is defined in serialization/ASTBitCodes.h under
clang::serialization namespace. However, actually the DeclID is not
purely used in serialization part. The DeclID is already widely used in
AST and all around the clang project via classes like `LazyPtrDecl` or
calling `ExternalASTSource::getExernalDecl()`. All such uses are via the
raw underlying type of `DeclID` as `uint32_t`. This is not pretty good.
This patch moves the DeclID class family to a new header `AST/DeclID.h`
so that the whole project can use the wrapped class `DeclID`,
`GlobalDeclID` and `LocalDeclID` instead of the raw underlying type.
This can improve the readability and the type safety.
seperately
We can compile a module unit in 2 phase compilaton:
```
clang++ -std=c++20 a.cppm --precompile -o a.pcm
clang++ -std=c++20 a.pcm -c -o a.o
```
And it is a general requirement that we need to compile a translation
unit with and without -fPIC for static and shared libraries.
But for C++20 modules with 2 phase compilation, it may be waste of time
to compile them 2 times completely. It may be fine to generate one BMI
and compile it with and without -fPIC seperately.
e.g.,
```
clang++ -std=c++20 a.cppm --precompile -o a.pcm
clang++ -std=c++20 a.pcm -c -o a.o
clang++ -std=c++20 a.pcm -c -fPIC -o a-PIC.o
```
Then we can save the time to parse a.cppm repeatedly.
Close https://github.com/llvm/llvm-project/issues/71347
Previously I misread the concept of module purview. I thought if a
declaration attached to a unnamed module, it can't be part of the module
purview. But after the issue report, I recognized that module purview is
more of a concept about locations instead of semantics.
Concretely, the things in the language linkage after module declarations
can be exported.
This patch refactors `Module::isModulePurview()` and introduces some
possible code cleanups.
This reverts commit a6acf3fd49a20c570a390af2a3c84e10b9545b68 and
relands a50e63b38b931d945f97eac882278068221eca17. The original revert
was done by mistake.
The issue #53952 is reported indicating clang is giving a crashing pch
file, when hasErrors is been passed incorrectly to WriteAST method.
To fix the issue, the parameter has been removed and instead we're
relying on the results of `hasUncompilableErrorOccured()` instead of
letting the caller override it.
Fixes https://github.com/llvm/llvm-project/issues/53952
This reapplies ddbcc10b9e26b18f6a70e23d0611b9da75ffa52f, except for a tiny part that was reverted separately: 65331da0032ab4253a4bc0ddcb2da67664bd86a9. That will be reapplied later on, since it turned out to be more involved.
This commit is enabled by 5523fefb01c282c4cbcaf6314a9aaf658c6c145f and f0f548a65a215c450d956dbcedb03656449705b9, specifically the part that makes 'clang-tidy/checkers/misc/header-include-cycle.cpp' separator agnostic.
C++20 modules
Previously, we banned the check for input files from C++20 modules since
we thought the BMI from C++20 modules should be a standalone artifact.
However, during the recent experiment with clangd for modules, I find
it is necessary to tell whether or not a BMI is out-of-date by checking the
input files especially for language servers.
So this patch brings a header search option
ForceCheckCXX20ModulesInputFiles to allow the tools (concretly, clangd)
to check the input files from BMI.
This commit replaces some calls to the deprecated `FileEntry::getName()` with `FileEntryRef::getName()` by swapping current usages of `SourceManager::getFileEntryForID()` with `SourceManager::getFileEntryRefForID()`. This lowers the number of usages of the deprecated `FileEntry::getName()` from 95 to 50.
Original commit message:
"
This patch enabled code completion for ClangREPL. The feature was built upon
three existing Clang components: a list completer for LineEditor, a
CompletionConsumer from SemaCodeCompletion, and the ASTUnit::codeComplete method.
The first component serves as the main entry point of handling interactive inputs.
Because a completion point for a compiler instance has to be unchanged once it
is set, an incremental compiler instance is created for each code
completion. Such a compiler instance carries over AST context source from the
main interpreter compiler in order to obtain declarations or bindings from
previous input in the same REPL session.
The most important API codeComplete in Interpreter/CodeCompletion is a thin
wrapper that calls with ASTUnit::codeComplete with necessary arguments, such as
a code completion point and a ReplCompletionConsumer, which communicates
completion results from SemaCodeCompletion back to the list completer for the
REPL.
In addition, PCC_TopLevelOrExpression and CCC_TopLevelOrExpression` top levels
were added so that SemaCodeCompletion can treat top level statements like
expression statements at the REPL. For example,
clang-repl> int foo = 42;
clang-repl> f<tab>
From a parser's persective, the cursor is at a top level. If we used code
completion without any changes, PCC_Namespace would be supplied to
Sema::CodeCompleteOrdinaryName, and thus the completion results would not
include foo.
Currently, the way we use PCC_TopLevelOrExpression and
CCC_TopLevelOrExpression is no different from the way we use PCC_Statement
and CCC_Statement respectively.
Differential revision: https://reviews.llvm.org/D154382
"
The new patch also fixes clangd and several memory issues that the bots reported
and upload the missing files.
Original commit message:
"
This patch enabled code completion for ClangREPL. The feature was built upon
three existing Clang components: a list completer for LineEditor, a
CompletionConsumer from SemaCodeCompletion, and the ASTUnit::codeComplete method.
The first component serves as the main entry point of handling interactive inputs.
Because a completion point for a compiler instance has to be unchanged once it
is set, an incremental compiler instance is created for each code
completion. Such a compiler instance carries over AST context source from the
main interpreter compiler in order to obtain declarations or bindings from
previous input in the same REPL session.
The most important API codeComplete in Interpreter/CodeCompletion is a thin
wrapper that calls with ASTUnit::codeComplete with necessary arguments, such as
a code completion point and a ReplCompletionConsumer, which communicates
completion results from SemaCodeCompletion back to the list completer for the
REPL.
In addition, PCC_TopLevelOrExpression and CCC_TopLevelOrExpression` top levels
were added so that SemaCodeCompletion can treat top level statements like
expression statements at the REPL. For example,
clang-repl> int foo = 42;
clang-repl> f<tab>
From a parser's persective, the cursor is at a top level. If we used code
completion without any changes, PCC_Namespace would be supplied to
Sema::CodeCompleteOrdinaryName, and thus the completion results would not
include foo.
Currently, the way we use PCC_TopLevelOrExpression and
CCC_TopLevelOrExpression is no different from the way we use PCC_Statement
and CCC_Statement respectively.
Differential revision: https://reviews.llvm.org/D154382
"
The new patch also fixes clangd and several memory issues that the bots reported.
This patch enabled code completion for ClangREPL. The feature was built upon
three existing Clang components: a list completer for LineEditor, a
CompletionConsumer from SemaCodeCompletion, and the ASTUnit::codeComplete method.
The first component serves as the main entry point of handling interactive inputs.
Because a completion point for a compiler instance has to be unchanged once it
is set, an incremental compiler instance is created for each code
completion. Such a compiler instance carries over AST context source from the
main interpreter compiler in order to obtain declarations or bindings from
previous input in the same REPL session.
The most important API codeComplete in Interpreter/CodeCompletion is a thin
wrapper that calls with ASTUnit::codeComplete with necessary arguments, such as
a code completion point and a ReplCompletionConsumer, which communicates
completion results from SemaCodeCompletion back to the list completer for the
REPL.
In addition, PCC_TopLevelOrExpression and CCC_TopLevelOrExpression` top levels
were added so that SemaCodeCompletion can treat top level statements like
expression statements at the REPL. For example,
clang-repl> int foo = 42;
clang-repl> f<tab>
From a parser's persective, the cursor is at a top level. If we used code
completion without any changes, PCC_Namespace would be supplied to
Sema::CodeCompleteOrdinaryName, and thus the completion results would not
include foo.
Currently, the way we use PCC_TopLevelOrExpression and
CCC_TopLevelOrExpression is no different from the way we use PCC_Statement
and CCC_Statement respectively.
Differential revision: https://reviews.llvm.org/D154382
With implicit modules, it's impossible to load a PCM file that was built using different command-line macro definitions. This is guaranteed by the fact that they contribute to the context hash. This means that we don't need to store those macros into PCM files for validation purposes. This patch avoids serializing them in those circumstances, since there's no other use for command-line macro definitions (besides "-module-file-info").
For a typical Apple project, this speeds up the dependency scan by 5.6% and shrinks the cache with scanning PCMs by 26%.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D158136
Change `ASTUnit::LoadFromCommandLine` to return a `std::unique_ptr` instead of a +1 pointer, fixing a leak in the unit test `LoadFromCommandLineWorkingDirectory`.
Reviewed By: bnbarham, benlangmuir
Differential Revision: https://reviews.llvm.org/D154257
Fix a couple of issues with the handling of the current working directory in ASTUnit:
- Use `createPhysicalFileSystem` instead of `getRealFileSystem` to avoid affecting the process' current working directory, and set it at the top of `ASTUnit::LoadFromCommandLine` such that the driver used for argument parsing and the ASTUnit share the same VFS. This ensures that '-working-directory' correctly sets the VFS working directory in addition to the FileManager working directory.
- Ensure we preserve the FileSystemOptions set on the FileManager when re-creating it (as `ASTUnit::Reparse` will clear the currently set FileManager).
rdar://110697657
Reviewed By: bnbarham, benlangmuir
Differential Revision: https://reviews.llvm.org/D154134
- Use this new context in Sema to limit completions to seen ObjC class
names
- Use this new context in clangd to disable include insertions when
completing ObjC forward decls
Reviewed By: kadircet
Differential Revision: https://reviews.llvm.org/D150978
This patch swaps out the `void *` behind `CXFile` from `FileEntry *` to `FileEntryRef::MapEntry *`. This allows us to remove some deprecated uses of `FileEntry::getName()`.
Depends on D151854.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D151938
I assume it's optional and ReadAST does not have to set the
counter on success.
Without the patch msan complains that we pass
uninitialized value into noundef parameters of setCounterValue.
Reviewed By: bnbarham, barannikov88
Differential Revision: https://reviews.llvm.org/D150492
The original name "ASTContext::getNamedModuleForCodeGen" is not properly
reflecting the usage of the interface. This interface can be used to
judge the current module unit in both sema analysis and code generation.
So the original name was not so correct.
Decl::isInCurrentModuleUnit
Refactor `Sema::isModuleUnitOfCurrentTU` to `Decl::isInCurrentModuleUnit`
to make code simpler a little bit. Note that although this patch
introduces a FIXME, this is an existing issue and this patch just tries
to describe it explicitly.
We have no use for debug info for the scanner modules, and writing raw
ast files speeds up scanning ~15% in some cases. Note that the compile
commands produced by the scanner will still build the obj format (if
requested), and the scanner can *read* obj format pcms, e.g. from a PCH.
rdar://108807592
Differential Revision: https://reviews.llvm.org/D149693
This commit allows libclang API users to opt into storing PCH in memory
instead of temporary files. The option can be set only during CXIndex
construction to avoid multithreading issues and confusion or bugs if
some preambles are stored in temporary files and others - in memory.
The added API works as expected in KDevelop:
https://invent.kde.org/kdevelop/kdevelop/-/merge_requests/283
Differential Revision: https://reviews.llvm.org/D145974
TempPCHFile::create() calls llvm::sys::fs::createTemporaryFile() to
create a file named preamble-*.pch in a system temporary directory. This
commit allows overriding the directory where these often many and large
preamble-*.pch files are stored.
The referenced bug report requests the ability to override the temporary
directory path used by libclang. However, overriding the return value of
llvm::sys::path::system_temp_directory() was rejected during code review
as improper and because it would negatively affect multithreading
performance. Finding all places where libclang uses the temporary
directory is very difficult. Therefore this commit is limited to
override libclang's single known use of the temporary directory.
This commit allows to override the preamble storage path only during
CXIndex construction to avoid multithreading issues and ensure that all
preambles are stored in the same directory. For the same multithreading
and consistency reasons, this commit deprecates
clang_CXIndex_setGlobalOptions() and
clang_CXIndex_setInvocationEmissionPathOption() in favor of specifying
these options during CXIndex construction.
Adding a new CXIndex constructor function each time a new initialization
argument is needed leads to either a large number of function parameters
unneeded by most libclang users or to an exponential number of overloads
that support different usage requirements. Therefore this commit
introduces a new extensible struct CXIndexOptions and a general function
clang_createIndexWithOptions().
A libclang user passes a desired preamble storage path to
clang_createIndexWithOptions(), which stores it in
CIndexer::PreambleStoragePath. Whenever
clang_parseTranslationUnit_Impl() is called, it passes
CIndexer::PreambleStoragePath to ASTUnit::LoadFromCommandLine(), which
stores this argument in ASTUnit::PreambleStoragePath. Whenever
ASTUnit::getMainBufferWithPrecompiledPreamble() is called, it passes
ASTUnit::PreambleStoragePath to PrecompiledPreamble::Build().
PrecompiledPreamble::Build() forwards the corresponding StoragePath
argument to TempPCHFile::create(). If StoragePath is not empty,
TempPCHFile::create() stores the preamble-*.pch file in the directory at
the specified path rather than in the system temporary directory.
The analysis below proves that this passing around of the
PreambleStoragePath string is sufficient to guarantee that the libclang
user override is used in TempPCHFile::create(). The analysis ignores API
uses in test code.
TempPCHFile::create() is called only in PrecompiledPreamble::Build().
PrecompiledPreamble::Build() is called only in two places: one in
clangd, which is not used by libclang, and one in
ASTUnit::getMainBufferWithPrecompiledPreamble().
ASTUnit::getMainBufferWithPrecompiledPreamble() is called in 3 places:
ASTUnit::LoadFromCompilerInvocation() [analyzed below].
ASTUnit::Reparse(), which in turn is called only from
clang_reparseTranslationUnit_Impl(), which in turn is called only from
clang_reparseTranslationUnit(). clang_reparseTranslationUnit() is never
called in LLVM code, but is part of public libclang API. This function's
documentation requires its translation unit argument to have been built
with clang_createTranslationUnitFromSourceFile().
clang_createTranslationUnitFromSourceFile() delegates its work to
clang_parseTranslationUnit(), which delegates to
clang_parseTranslationUnit2(), which delegates to
clang_parseTranslationUnit2FullArgv(), which delegates to
clang_parseTranslationUnit_Impl(), which passes
CIndexer::PreambleStoragePath to the ASTUnit it creates.
ASTUnit::CodeComplete() passes AllowRebuild = false to
ASTUnit::getMainBufferWithPrecompiledPreamble(), which makes it return
nullptr before calling PrecompiledPreamble::Build().
Both ASTUnit::LoadFromCompilerInvocation() overloads (one of which
delegates its work to another) call
ASTUnit::getMainBufferWithPrecompiledPreamble() only if their argument
PrecompilePreambleAfterNParses > 0. LoadFromCompilerInvocation() is
called in:
ASTBuilderAction::runInvocation() keeps the default parameter value
of PrecompilePreambleAfterNParses = 0, meaning that the preamble file is
never created from here.
ASTUnit::LoadFromCommandLine().
ASTUnit::LoadFromCommandLine() is called in two places:
CrossTranslationUnitContext::ASTLoader::loadFromSource() keeps the
default parameter value of PrecompilePreambleAfterNParses = 0, meaning
that the preamble file is never created from here.
clang_parseTranslationUnit_Impl(), which passes
CIndexer::PreambleStoragePath to the ASTUnit it creates.
Therefore, the overridden preamble storage path is always used in
TempPCHFile::create().
TempPCHFile::create() uses PreambleStoragePath in the same way as
LibclangInvocationReporter() uses InvocationEmissionPath. The existing
documentation for clang_CXIndex_setInvocationEmissionPathOption() does
not specify ownership, encoding, separator or relative vs absolute path
requirements. So the documentation for
CXIndexOptions::PreambleStoragePath doesn't either. The assumptions are:
no ownership transfer;
UTF-8 encoding;
native separators.
Both relative and absolute paths are supported.
The added API works as expected in KDevelop:
https://invent.kde.org/kdevelop/kdevelop/-/merge_requests/283
Fixes: https://github.com/llvm/llvm-project/issues/51847
Differential Revision: https://reviews.llvm.org/D143418
Previously we'll set the named modules for ASTContext in ParseAST. But
this is not intuitive and we need comments to tell the intuition. This
patch moves the code the right the place, where the corrresponding
module is first created/loaded. Now it is more intuitive and we can use
the value in the earlier places.
This reverts commit c5abe893120b115907376359a5809229a9f9608a.
This reverts commit a033dbbe5c43247b60869b008e67ed86ed230eaa.
This broke the build with -DLLVM_LINK_LLVM_DYLIB=ON. Reverting while I
investigate.
Every Clang instance uses an internal FileSystemStatCache to avoid
stating the same content multiple times. However, different instances
of Clang will contend for filesystem access for their initial stats
during HeaderSearch or module validation.
On some workloads, the time spent in the kernel in these concurrent
stat calls has been measured to be over 20% of the overall compilation
time. This is extremly wassteful when most of the stat calls target
mostly immutable content like a SDK.
This commit introduces a new tool `clang-stat-cache` able to generate
an OnDiskHashmap containing the stat data for a given filesystem
hierarchy.
The driver part of this has been modeled after -ivfsoverlay given
the similarities with what it influences. It introduces a new
-ivfsstatcache driver option to instruct Clang to use a stat cache
generated by `clang-stat-cache`. These stat caches are inserted at
the bottom of the VFS stack (right above the real filesystem).
Differential Revision: https://reviews.llvm.org/D136651
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
With implicitly built modules, the importing `CompilerInstance` assumes PCMs were built in a "compatible way" (i.e. with similarly set up instance). Either because their context hash matches, or because this instance has just built them.
There are some use-cases, however, where this assumption doesn't hold, libclang/c-index-test being one of them. There, the importing instance (or `ASTUnit`) is being set up while the PCM file is being deserialized. Until now, we've assumed the serialized paths to input files are the actual on-disk files, meaning the default physical VFS was always able to resolve them. This won't be the case after D135636. Therefore, this patch makes sure `ASTUnit` is initialized with the same VFS as the PCM it's deserializing - by storing paths to the VFS overlay files into the PCM itself.
For the VFS overlay files to be adopted at the very start of PCM deserialization, they are stored in a new section in the unhashed control block, together with header search paths and system header prefixes. The move to the unhashed control block should be safe: if two modules were built with different header search paths and they produced different results, the hashed part of the PCM file will reflect that.
Reviewed By: akyrtzi, benlangmuir
Differential Revision: https://reviews.llvm.org/D135634
This is generally a better default for tools other than the compiler, which
shouldn't assume a PCH file on disk is something they can consume.
Preserve the old behavior in places associated with libclang/c-index-test
(including ASTUnit) as there are tests relying on it and most important
consumers are out-of-tree. It's unclear whether the tests are specifically
trying to test this functionality, and what the downstream implications of
removing it are. Hopefully someone more familiar can clean this up in future.
Differential Revision: https://reviews.llvm.org/D125149