Handles clang::DiagnosticsEngine and clang::DiagnosticIDs.
For DiagnosticIDs, this mostly migrates from `new DiagnosticIDs` to
convenience method `DiagnosticIDs::create()`.
Part of cleanup https://github.com/llvm/llvm-project/issues/151026
This switches to `makeIntrusiveRefCnt<FileSystem>` where creating a new
object, and to passing/returning by `IntrusiveRefCntPtr<FileSystem>`
instead of `FileSystem*` or `FileSystem&`, when dealing with existing
objects.
Part of cleanup #151026.
Revert llvm/llvm-project#143520 for now since it’s causing issues for
people who are using symlinks and prefer to preserve the original path
(i.e. looks like we’ll have to make this configurable after all; I just
need to figure out how to pass `-no-canonical-prefixes` down through the
driver); I’m planning to refactor this a bit and reland it in a few
days.
This can significantly shorten file paths to standard library headers,
e.g. on my system, `<ranges>` is currently printed as
```console
/usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/ranges
```
but with this change, we instead print
```console
/usr/include/c++/15/ranges
```
This is of course just a heuristic; there are paths that would get longer
as a result of this, so we use whichever path ends up being shorter.
@AaronBallman pointed out that this might be problematic for network
file systems since path resolution might take a while, so this is enabled
only for paths that are part of a local filesystem—though not on Windows
since there we noticed that the check itself is slow.
The file names are cached in `SourceManager`.
This patch adds back the needed AutoConvert.h header and removes the
unneeded include guard of MVS to prevent this header from being removed
in the future
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Introduce a type alias for the commonly used `std::pair<FileID,
unsigned>` to improve code readability, and make it easier for future
updates (64-bit source locations).
When sharing same compiler instance for multiple compilations, we reset
source manager's file id tables in between runs. Diagnostics engine
keeps a cache based on these file ids, that became dangling references
across compilations.
This patch makes sure we reset those whenever sourcemanager is trashing
its FileIDs.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
This reverts commit e2a885537f11f8d9ced1c80c2c90069ab5adeb1d. Build failures were fixed right away and reverting the original commit without the fixes breaks the build again.
The `DiagnosticOptions` class is currently intrusively
reference-counted, which makes reasoning about its lifetime very
difficult in some cases. For example, `CompilerInvocation` owns the
`DiagnosticOptions` instance (wrapped in `llvm::IntrusiveRefCntPtr`) and
only exposes an accessor returning `DiagnosticOptions &`. One would
think this gives `CompilerInvocation` exclusive ownership of the object,
but that's not the case:
```c++
void shareOwnership(CompilerInvocation &CI) {
llvm::IntrusiveRefCntPtr<DiagnosticOptions> CoOwner = &CI.getDiagnosticOptions();
// ...
}
```
This is a perfectly valid pattern that is being actually used in the
codebase.
I would like to ensure the ownership of `DiagnosticOptions` by
`CompilerInvocation` is guaranteed to be exclusive. This can be
leveraged for a copy-on-write optimization later on. This PR changes
usages of `DiagnosticOptions` across `clang`, `clang-tools-extra` and
`lldb` to not be intrusively reference-counted.
We can simplify the code with *Map::try_emplace where we need
default-constructed values while avoding calling constructors when
keys are already present.
This allows formatting large integers in a human friendly way. Example:
"5321584" -> "5.32M".
Use it where such human numbers are generated manually today.
This patch adds an IsText parameter to the following functions
openFileForRead, getBufferForFile, getBufferForFileImpl and determines
whether a file is text by querying the file tag on z/OS. The default is
set to OF_Text instead of OF_None, this change in value does not affect
any other platforms other than z/OS.
Resolves: #70930 (and probably latest comments from clangd/clangd#251)
by fixing racing for the shared DiagStorage value which caused messing with args inside the storage and then formatting the following message with getArgSInt(1) == 2:
def err_module_odr_violation_function : Error<
"%q0 has different definitions in different modules; "
"%select{definition in module '%2'|defined here}1 "
"first difference is "
which causes HandleSelectModifier to go beyond the ArgumentLen so the recursive call to FormatDiagnostic was made with DiagStr > DiagEnd that leads to infinite while (DiagStr != DiagEnd).
The Main Idea:
Reuse the existing DiagStorageAllocator logic to make all DiagnosticBuilders having independent states.
Also, encapsulating the rest of state (e.g. ID and Loc) into DiagnosticBuilder.
The last attempt failed -
https://github.com/llvm/llvm-project/pull/108187#issuecomment-2353122096
so was reverted - #108838
Resolves: #70930 (and probably latest comments from
https://github.com/clangd/clangd/issues/251)
by fixing racing for the shared `DiagStorage` value which caused messing
with args inside the storage and then formatting the following message
with `getArgSInt(1)` == 2:
```
def err_module_odr_violation_function : Error<
"%q0 has different definitions in different modules; "
"%select{definition in module '%2'|defined here}1 "
"first difference is "
```
which causes `HandleSelectModifier` to go beyond the `ArgumentLen` so
the recursive call to `FormatDiagnostic` was made with `DiagStr` >
`DiagEnd` that leads to infinite `while (DiagStr != DiagEnd)`.
**The Main Idea:**
Reuse the existing `DiagStorageAllocator` logic to make all
`DiagnosticBuilder`s having independent states.
Also, encapsulating the rest of state (e.g. ID and Loc) into
`DiagnosticBuilder`.
**TODO (if it will be requested by reviewer):**
- [x] add a test (I have no idea how to turn a whole bunch of my
proprietary code which leads `clangd` to OOM into a small public
example.. probably I must try using
[this](https://github.com/llvm/llvm-project/issues/70930#issuecomment-2209872975)
instead)
- [x] [`Diag.CurDiagID !=
diag::fatal_too_many_errors`](https://github.com/llvm/llvm-project/pull/108187#pullrequestreview-2296395489)
- [ ] ? get rid of `DiagStorageAllocator` at all and make
`DiagnosticBuilder` having they own `DiagnosticStorage` coz it seems
pretty small so should fit the stack for short-living
`DiagnosticBuilder` instances
CompilerInstance can re-use same SourceManager across multiple
frontendactions. During this process it calls
`SourceManager::clearIDTables` to reset any caches based on FileIDs.
It didn't reset IncludeLocMap, resulting in wrong include locations for
workflows that triggered multiple frontend-actions through same
CompilerInstance.
Add libclang function `clang_isBeforeInTranslationUnit` to allow checking the order between two source locations.
Simplify the `SourceRange.__contains__` implementation using this new function.
Add tests for `SourceRange.__contains__` and the newly added functionality.
Fixes#22617Fixes#52827
We have been running into source location exhaustion recently and want
to use the statistics to monitor the usage in various files to be able
to anticipate where the next problem will happen.
I picked `Statistic` because it can be written into a structured JSON
file and is easier to consume by further automation.
This commit does not change any existing per-source-manager metrics
exposed via `SourceManager::PrintStats()`. This does create some
redundancy, but I also expect to be non-controversial because it aligns
with the intended use of `Statistic`.
The commit adds serialization and de-serialization implementations for
the stored regions. Basically, the serialized representation of the
regions of a PP is a (ordered) sequence of source location encodings.
For de-serialization, regions from loaded files are stored by their ASTs.
When later one queries if a loaded location L is in an opt-out
region, PP looks up the regions of the loaded AST where L is at.
(Background if helps: a pair of `#pragma clang unsafe_buffer_usage begin/end` pragmas marks a
warning-opt-out region. The begin and end locations (opt-out regions)
are stored in preprocessor instances (PP) and will be queried by the
`-Wunsafe-buffer-usage` analyzer.)
The reported issue at upstream: https://github.com/llvm/llvm-project/issues/90501
rdar://124035402
This patch adds several const-qualified variants of existing member
functions to `SourceManager`.
I started with removing const qualification from
`setNumCreatedFIDsForFileID`, and removing `const_cast` in the body of
this function, as I think it doesn't make sense to const-qualify
setters.
This fixes a typo ("SLocEntry's" -> "SLocEntries"), fixes capitalization ("Sloc" -> "SLoc") and adds extra information (capacity in bytes of `LoadedSLocEntryTable`).
`createExpansionLocImpl` has an assert that checks if we ran out of
source locations. We have observed this happening on real code and in
release builds the assertion does not fire and the compiler just keeps
running indefinitely without giving any indication that something went
wrong.
Diagnose this problem and reliably crash to make sure the problem is
easy to detect.
I have also tried:
- returning invalid source locations,
- reporting sloc address space usage on error.
Both caused the compiler to run indefinitely. It would be nice to dig
further why that happens, but until then crashing seems like a better
alternative.
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an enum.
This patch replaces llvm::support::{big,little,native} with
llvm::endianness::{big,little,native}.
In `SourceManager::getFileID()`, Clang performs binary search over its
buffer of `SLocEntries`. For modules, this binary search fully
deserializes the entire `SLocEntry` block for each visited entry. For
some entries, that includes decompressing the associated buffer (e.g.
the predefines buffer, macro expansion buffers, contents of volatile
files), which shows up in profiles of the dependency scanner.
This patch moves the binary search over loaded entries into `ASTReader`,
which can perform cheaper partial deserialization during the binary
search, reducing the wall time of dependency scans by ~3%. This also
reduces the number of retired instructions by ~1.4% on regular
(implicit) modules compilation.
Note that this patch drops the optimizations based on the last lookup ID
(pruning the search space and performing linear search before resorting
to the full binary search). Instead, it reduces the search space by
asking `ASTReader::GlobalSLocOffsetMap` for the containing `ModuleFile`
and only does binary search over entries of single module file.
This patch fixes:
clang/lib/Basic/SourceManager.cpp:1979:64: error: 'greater' may not
intend to support class template argument deduction
[-Werror,-Wctad-maybe-unsupported]
This commit removes the list of SLocEntry offsets to preload eagerly
from PCM files. Commit introducing this functionality (258ae54a) doesn't
clarify why this would be more performant than the lazy approach used
regularly.
Currently, the only SLocEntry the reader is supposed to preload is the
predefines buffer, but in my experience, it's not actually referenced in
most modules, so the time spent deserializing its SLocEntry is wasted.
This is especially noticeable in the dependency scanner, where this
change brings 4.56% speedup on my benchmark.
This patch removes the `SourceManager` APIs that create `FileID` from a
`const FileEntry *` in favor of APIs that take `FileEntryRef`. This also
removes a misleading documentation that claims `nullptr` file entry
represents stdin. I don't think that's right, since we just try to
dereference that pointer anyways.
The goal of the class is to be an (almost) drop in replacement for
SmallVector and std::vector when those are presized and filled later, as
it happens in SourceManager and ASTReader.
By doing so, sparsely accessed PagedVector can profit from reduced
memory footprint.